Skip to content

EdaValidator

EdaValidator

Calculates and displays EDA statistics for numeric columns. Uses Rich for formatted console output.

Basic Usage

from epftoolbox2.data.validators import EdaValidator
validator = EdaValidator()
result = validator.validate(df)

Console Output

┏━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ Column ┃ Min ┃ Max ┃ Mean ┃ Std ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ load_actual │ 8500.12 │ 25000.45 │ 15234.67 │ 3456.78 │
│ price │ -5.23 │ 450.00 │ 85.45 │ 45.23 │
└─────────────┴──────────┴──────────┴──────────┴──────────┘

Statistics Included

StatisticDescription
MinMinimum value
MaxMaximum value
MeanAverage value
StdStandard deviation
Q2525th percentile
Q50Median (50th percentile)
Q7575th percentile
NullsCount of null values

Accessing Statistics Programmatically

result = validator.validate(df)
# Get stats as DataFrame
stats_df = result.stats
print(stats_df)

When to Use

  • Initial data exploration
  • Checking for outliers
  • Verifying data ranges are reasonable