Dataframely#
Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meet expectations and more readable by adding schema information to data frame type hints.
Features#
Declaratively define schemas as classes with arbitrary inheritance structure
Specify column-specific validation rules (e.g. nullability, minimum string length, …)
Specify cross-column and group validation rules with built-in support for checking the primary key property of a column set
Specify validation constraints across collections of interdependent data frames
Validate data frames softly by simply filtering out rows violating rules instead of failing hard
Introspect validation failure information for run-time failures
Enhanced type hints for validated data frames allowing users to clearly express expectations about inputs and outputs (i.e., contracts) in data pipelines
Integrate schemas with external tools (e.g.,
sqlalchemyorpyarrow)Generate test data that comply with a schema or collection of schemas and its validation rules