Generator#

class dataframely.random.Generator(seed: int | None = None)[source]#

Type that allows to sample primitive types using a random number generator.

All generator methods are called sample_<type> and, if applicable, allow specifying a lower (inclusive) and an upper (exclusive) bound for the type to be sampled.

These methods can be used to sample higher-level types. To this end, users may also directly access the underlying numpy_generator to reuse the generator’s seeding.

Parameters:: seed – The seed to use for initializing the random number generator used for all sampling methods.

Methods:

`sample_binary`	Sample a list of binary values in the specified length range.
`sample_bool`	Sample a list of booleans in the specified range.
`sample_choice`	Sample a list of elements from a list of choices with replacement.
`sample_date`	Sample a list of dates in the provided range.
`sample_datetime`	Sample a list of datetimes in the provided range.
`sample_duration`	Sample a list of durations in the provided range.
`sample_float`	Sample a list of floating point numbers in the specified range.
`sample_int`	Sample a list of integers in the specified range.
`sample_seed`	Sample a single integer that can be used as a seed for other RNGs.
`sample_string`	Sample a list of strings adhering to the provided regex.
`sample_time`	Sample a list of times in the provided range.

sample_binary( n: int = 1, *, min_bytes: int, max_bytes: int, null_probability: float = 0.0, ) → Series[source]#

Sample a list of binary values in the specified length range.

Parameters:

n – The number of binary values to sample.
min_bytes – The minimum number of bytes for each value.
max_bytes – The maximum number of bytes for each value.
null_probability – The probability of an element being null.

Returns:

A series with n elements of dtype Binary.

sample_bool( n: int = 1, *, null_probability: float = 0.0, p_true: float | None = None, ) → Series[source]#

Sample a list of booleans in the specified range.

Parameters:

n – The number of booleans to sample.
null_probability – The probability of an element being null.
p_true – Sampling probability for True within non-null samples. Default: 0.5 (uniform sampling)

Returns:

A series with n elements of dtype Boolean.

sample_choice( n: int = 1, *, choices: Sequence[T], null_probability: float = 0.0, weights: Sequence[float] | None = None, ) → Series[source]#

Sample a list of elements from a list of choices with replacement.

Parameters:

n – The number of elements to sample.
choices – The choices to sample from.
null_probability – The probability of an element being null.
weights – A ordered weight vector for the different choices

Returns:

A series with n elements of auto-inferred dtype.

sample_date( n: int = 1, *, min: date, max: date | None, resolution: str | None = None, null_probability: float = 0.0, ) → Series[source]#

Sample a list of dates in the provided range.

Parameters:

n – The number of dates to sample.
min – The minimum date to sample (inclusive).
max – The maximum date to sample (exclusive). ‘10000-01-01’ when None.
resolution – The resolution that dates in the column must have. This uses the formatting language used by polars datetime round method.
null_probability – The probability of an element being null.

Returns:

A series with n elements of dtype Date.

sample_datetime( n: int = 1, *, min: datetime, max: datetime | None, resolution: str | None = None, time_zone: str | tzinfo | None = None, time_unit: Literal['ns', 'us', 'ms'] = 'us', null_probability: float = 0.0, ) → Series[source]#

Sample a list of datetimes in the provided range.

Parameters:

n – The number of datetimes to sample.
min – The minimum datetime to sample (inclusive).
max – The maximum datetime to sample (exclusive). ‘10000-01-01’ when None.
resolution – The resolution that datetimes in the column must have. This uses the formatting language used by polars datetime round method.
time_unit – The time unit of the datetime column. Defaults to us (microseconds).
time_zone – The time zone that datetimes in the column must have. The time zone must use a valid IANA time zone name identifier e.x. Etc/UTC or America/New_York.
null_probability – The probability of an element being null.

Returns:

A series with n elements of dtype Datetime.

sample_duration( n: int = 1, *, min: timedelta, max: timedelta, resolution: str | None = None, null_probability: float = 0.0, ) → Series[source]#

Sample a list of durations in the provided range.

Parameters:

n – The number of durations to sample.
min – The minimum duration to sample (inclusive).
max – The maximum duration to sample (exclusive).
resolution – The resolution that durations in the column must have. This uses the formatting language used by polars datetime round method.
null_probability – The probability of an element being null.

Returns:

A series with n elements of dtype Duration.

sample_float( n: int = 1, *, min: float, max: float, null_probability: float = 0.0, nan_probability: float = 0.0, inf_probability: float = 0.0, ) → Series[source]#

Sample a list of floating point numbers in the specified range.

Parameters:

n – The number of floats to sample.
min – The minimum float to sample (inclusive).
max – The maximum float to sample (exclusive).
null_probability – The probability of an element being null.
nan_probability – The probability of an element being nan.
inf_probability – The probability of an element being inf.

Returns:

A series with n elements of dtype Float64.

sample_int( n: int = 1, *, min: int, max: int, null_probability: float = 0.0, ) → Series[source]#

Sample a list of integers in the specified range.

Parameters:

n – The number of integers to sample.
min – The minimum integer to sample (inclusive).
max – The maximum integer to sample (exclusive).
null_probability – The probability of an element being null.

Returns:

A series with n elements of dtype Int64.

sample_seed() → int[source]#

Sample a single integer that can be used as a seed for other RNGs.

Returns:: A seed of type uint32.

sample_string( n: int = 1, *, regex: str, null_probability: float = 0.0, ) → Series[source]#

Sample a list of strings adhering to the provided regex.

Parameters:

n – The number of strings to sample.
regex – The regex that all elements have to adhere to.
null_probability – The probability of an element being null.

Returns:

A series with n elements of dtype String.

sample_time( n: int = 1, *, min: time, max: time | None, resolution: str | None = None, null_probability: float = 0.0, ) → Series[source]#

Sample a list of times in the provided range.

Parameters:

n – The number of times to sample.
min – The minimum time to sample (inclusive).
max – The maximum time to sample (exclusive). Midnight when None.
resolution – The resolution that times in the column must have. This uses the formatting language used by polars datetime round method.
null_probability – The probability of an element being null.

Returns:

A series with n elements of dtype Time.