Generator#
- class dataframely.random.Generator(seed: int | None = None)[source]#
Type that allows to sample primitive types using a random number generator.
All generator methods are called
sample_<type>and, if applicable, allow specifying a lower (inclusive) and an upper (exclusive) bound for the type to be sampled.These methods can be used to sample higher-level types. To this end, users may also directly access the underlying
numpy_generatorto reuse the generator’s seeding.- Parameters:
seed – The seed to use for initializing the random number generator used for all sampling methods.
Methods:
Sample a list of binary values in the specified length range.
Sample a list of booleans in the specified range.
Sample a list of elements from a list of choices with replacement.
Sample a list of dates in the provided range.
Sample a list of datetimes in the provided range.
Sample a list of durations in the provided range.
Sample a list of floating point numbers in the specified range.
Sample a list of integers in the specified range.
Sample a single integer that can be used as a seed for other RNGs.
Sample a list of strings adhering to the provided regex.
Sample a list of times in the provided range.
- sample_binary( ) Series[source]#
Sample a list of binary values in the specified length range.
- Parameters:
n – The number of binary values to sample.
min_bytes – The minimum number of bytes for each value.
max_bytes – The maximum number of bytes for each value.
null_probability – The probability of an element being
null.
- Returns:
A series with
nelements of dtypeBinary.
- sample_bool( ) Series[source]#
Sample a list of booleans in the specified range.
- Parameters:
n – The number of booleans to sample.
null_probability – The probability of an element being
null.p_true – Sampling probability for
Truewithin non-null samples. Default: 0.5 (uniform sampling)
- Returns:
A series with
nelements of dtypeBoolean.
- sample_choice(
- n: int = 1,
- *,
- choices: Sequence[T],
- null_probability: float = 0.0,
- weights: Sequence[float] | None = None,
Sample a list of elements from a list of choices with replacement.
- Parameters:
n – The number of elements to sample.
choices – The choices to sample from.
null_probability – The probability of an element being
null.weights – A ordered weight vector for the different choices
- Returns:
A series with
nelements of auto-inferred dtype.
- sample_date(
- n: int = 1,
- *,
- min: date,
- max: date | None,
- resolution: str | None = None,
- null_probability: float = 0.0,
Sample a list of dates in the provided range.
- Parameters:
n – The number of dates to sample.
min – The minimum date to sample (inclusive).
max – The maximum date to sample (exclusive). ‘10000-01-01’ when
None.resolution – The resolution that dates in the column must have. This uses the formatting language used by
polarsdatetimeroundmethod.null_probability – The probability of an element being
null.
- Returns:
A series with
nelements of dtypeDate.
- sample_datetime(
- n: int = 1,
- *,
- min: datetime,
- max: datetime | None,
- resolution: str | None = None,
- time_zone: str | tzinfo | None = None,
- time_unit: Literal['ns', 'us', 'ms'] = 'us',
- null_probability: float = 0.0,
Sample a list of datetimes in the provided range.
- Parameters:
n – The number of datetimes to sample.
min – The minimum datetime to sample (inclusive).
max – The maximum datetime to sample (exclusive). ‘10000-01-01’ when
None.resolution – The resolution that datetimes in the column must have. This uses the formatting language used by
polarsdatetimeroundmethod.time_unit – The time unit of the datetime column. Defaults to
us(microseconds).time_zone – The time zone that datetimes in the column must have. The time zone must use a valid IANA time zone name identifier e.x.
Etc/UTCorAmerica/New_York.null_probability – The probability of an element being
null.
- Returns:
A series with
nelements of dtypeDatetime.
- sample_duration(
- n: int = 1,
- *,
- min: timedelta,
- max: timedelta,
- resolution: str | None = None,
- null_probability: float = 0.0,
Sample a list of durations in the provided range.
- Parameters:
n – The number of durations to sample.
min – The minimum duration to sample (inclusive).
max – The maximum duration to sample (exclusive).
resolution – The resolution that durations in the column must have. This uses the formatting language used by
polarsdatetimeroundmethod.null_probability – The probability of an element being
null.
- Returns:
A series with
nelements of dtypeDuration.
- sample_float(
- n: int = 1,
- *,
- min: float,
- max: float,
- null_probability: float = 0.0,
- nan_probability: float = 0.0,
- inf_probability: float = 0.0,
Sample a list of floating point numbers in the specified range.
- Parameters:
n – The number of floats to sample.
min – The minimum float to sample (inclusive).
max – The maximum float to sample (exclusive).
null_probability – The probability of an element being
null.nan_probability – The probability of an element being
nan.inf_probability – The probability of an element being
inf.
- Returns:
A series with
nelements of dtypeFloat64.
- sample_int( ) Series[source]#
Sample a list of integers in the specified range.
- Parameters:
n – The number of integers to sample.
min – The minimum integer to sample (inclusive).
max – The maximum integer to sample (exclusive).
null_probability – The probability of an element being
null.
- Returns:
A series with
nelements of dtypeInt64.
- sample_seed() int[source]#
Sample a single integer that can be used as a seed for other RNGs.
- Returns:
A seed of type
uint32.
- sample_string( ) Series[source]#
Sample a list of strings adhering to the provided regex.
- Parameters:
n – The number of strings to sample.
regex – The regex that all elements have to adhere to.
null_probability – The probability of an element being
null.
- Returns:
A series with
nelements of dtypeString.
- sample_time(
- n: int = 1,
- *,
- min: time,
- max: time | None,
- resolution: str | None = None,
- null_probability: float = 0.0,
Sample a list of times in the provided range.
- Parameters:
n – The number of times to sample.
min – The minimum time to sample (inclusive).
max – The maximum time to sample (exclusive). Midnight when
None.resolution – The resolution that times in the column must have. This uses the formatting language used by
polarsdatetimeroundmethod.null_probability – The probability of an element being
null.
- Returns:
A series with
nelements of dtypeTime.