Schema.validate#

classmethod Schema.validate(
df: DataFrame | LazyFrame,
/,
*,
cast: bool = False,
eager: bool = True,
) DataFrame[Self] | LazyFrame[Self][source]#

Validate that a data frame satisfies the schema.

If an eager data frame is passed as input, validation is performed within this function. If a lazy frame is passed, the lazy frame is simply extended with the validation logic. The logic will only be executed (and potentially raise an error) once collect() is called on it.

Parameters:
  • df – The data frame to validate.

  • cast – Whether columns with a wrong data type in the input data frame are cast to the schema’s defined data type if possible.

  • eager – Whether the validation should be performed eagerly and this method should raise upon failure. If False, the returned lazy frame will fail to collect if the validation does not pass.

Returns:

The input eager or lazy frame, wrapped in a generic version of the input’s data frame type to reflect schema adherence. This operation is guaranteed to maintain input ordering of rows.

Raises:
  • SchemaError – If eager=True and the input data frame misses columns or cast=False and any data type mismatches the definition in this schema. Only raised upon collection if eager=False.

  • ValidationError – If eager=True and in any rule in the schema is violated, i.e. the data does not pass the validation. When eager=False, a ComputeError is raised upon collecting.

  • InvalidOperationError – If eager=True, cast=True, and the cast fails for any value in the data. Only raised upon collection if eager=False.