dfd.dataset package
Submodules
dfd.dataset.analyses module
Tabular data analyses utilities.
- class dfd.dataset.analyses.TabularAnalysesStrategy
Bases:
ABC,Generic[TabularDataType]- abstractmethod describe(data: TabularDataType) list[TabularStatistics]
Return statistics for the given dataframe.
- class dfd.dataset.analyses.TabularDataContext(strategy: TabularAnalysesStrategy | Literal['auto', 'pandas', 'polars'] | None = 'auto')
Bases:
objectResolve an analysis strategy for the provided tabular data.
- calculate_tabular_statistics(data: Any) list[TabularStatistics]
Calculate tabular statistics for the provided data.
- Args:
data: The tabular data to analyze.
- Returns:
A list of TabularStatistics instances.
- class dfd.dataset.analyses.TabularStatistics(*, column_name: str, count: float | None = None, highest_quantile: float | None = None, middle_quantile: float | None = None, lowest_quantile: float | None = None, max_val: float | None = None, min_val: float | None = None, mean_val: float | None = None, std_val: float | None = None)
Bases:
BaseModelStatistical analysis of a tabular data column.
- column_name: str
- count: float | None
- static format_tabular_statistics_to_markdown(statistics: Sequence[TabularStatistics]) str
Format a list of TabularStatistics to a markdown string.
- Args:
statistics: The list of TabularStatistics to format.
- Returns:
The formatted markdown string.
- highest_quantile: float | None
- lowest_quantile: float | None
- property markdown: str
Return the statistics as a markdown string.
- max_val: float | None
- mean_val: float | None
- middle_quantile: float | None
- min_val: float | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- std_val: float | None