dataframely.rule#
- dataframely.rule( ) Callable[[Callable[[], Expr]], Rule][source]#
Mark a function as a rule to evaluate during validation.
The name of the function will be used as the name of the rule. The function should return an expression providing a boolean value whether a row is valid wrt. the rule. A value of
trueindicates validity.Rules should be used only in the following two circumstances:
Validation requires accessing multiple columns (e.g. if valid values of column A depend on the value in column B).
Validation must be performed on groups of rows (e.g. if a column A must not contain any duplicate values among rows with the same value in column B).
In all other instances, column-level validation rules should be preferred as it aids readability and improves error messages.
- Parameters:
group_by – An optional list of columns to group by for rules operating on groups of rows. If this list is provided, the returned expression must return a single boolean value, i.e. some kind of aggregation function must be used (e.g.
sum,any, …).
Note
You’ll need to explicitly handle
nullvalues in your columns when defining rules. By default, any rule that evaluates tonullbecause one of the columns used in the rule isnullis interpreted astrue, i.e. the row is assumed to be valid.Attention
The rule logic should return a static result. Other implementations using arbitrary python logic works for filtering and validation, but may lead to wrong results in Schema comparisons and (de-)serialization.