dfd.dataset package

Submodules

dfd.dataset.analyses module

Tabular data analyses utilities.

class dfd.dataset.analyses.TabularAnalysesStrategy

Bases: ABC, Generic[TabularDataType]

abstractmethod describe(data: TabularDataType) list[TabularStatistics]

Return statistics for the given dataframe.

class dfd.dataset.analyses.TabularDataContext(strategy: TabularAnalysesStrategy | Literal['auto', 'pandas', 'polars'] | None = 'auto')

Bases: object

Resolve an analysis strategy for the provided tabular data.

calculate_tabular_statistics(data: Any) list[TabularStatistics]

Calculate tabular statistics for the provided data.

Args:

data: The tabular data to analyze.

Returns:

A list of TabularStatistics instances.

class dfd.dataset.analyses.TabularStatistics(*, column_name: str, count: float | None = None, highest_quantile: float | None = None, middle_quantile: float | None = None, lowest_quantile: float | None = None, max_val: float | None = None, min_val: float | None = None, mean_val: float | None = None, std_val: float | None = None)

Bases: BaseModel

Statistical analysis of a tabular data column.

column_name: str
count: float | None
static format_tabular_statistics_to_markdown(statistics: Sequence[TabularStatistics]) str

Format a list of TabularStatistics to a markdown string.

Args:

statistics: The list of TabularStatistics to format.

Returns:

The formatted markdown string.

highest_quantile: float | None
lowest_quantile: float | None
property markdown: str

Return the statistics as a markdown string.

max_val: float | None
mean_val: float | None
middle_quantile: float | None
min_val: float | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

std_val: float | None

Module contents