Schema.filter#
- classmethod Schema.filter( ) FilterResult[Self] | LazyFilterResult[Self][source]#
Filter the data frame by the rules of this schema.
This method can be thought of as a “soft alternative” to
validate(). Whilevalidate()raises an exception when a row does not adhere to the rules defined in the schema, this method simply filters out these rows and succeeds.- Parameters:
df – The data frame to filter for valid rows. The data frame is collected within this method, regardless of whether a
DataFrameorLazyFrameis passed.cast – Whether columns with a wrong data type in the input data frame are cast to the schema’s defined data type if possible. Rows for which the cast fails for any column are filtered out.
eager – Whether the filter operation should be performed eagerly. If
False, the returned lazy frame will fail to collect if the validation does not pass.
- Returns:
A tuple of the validated rows in the input data frame (potentially empty) and a simple dataclass carrying information about the rows of the data frame which could not be validated successfully. Just like in polars’ native
filter(), the order of rows in the returned data frame is maintained.- Raises:
ValidationError – If the columns of the input data frame are invalid. This happens only if the data frame misses a column defined in the schema or a column has an invalid dtype while
castis set toFalse.
Note
This method preserves the ordering of the input data frame.