ResampleTransformer
ResampleTransformer
Resamples data to a specified frequency, applying aggregation (when downsampling) or interpolation (when upsampling).
Basic Usage
from epftoolbox2.data.transformers import ResampleTransformer
transformer = ResampleTransformer(freq="1h")df = transformer.transform(df)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
freq | str | "1h" | Pandas frequency string |
method | str | "linear" | Method specifying aggregation and interpolation behavior |
columns | list[str] | str | None | None | Columns to interpolate. If None, all columns are interpolated. |
Frequency Strings
| String | Meaning |
|---|---|
"1h" | 1 hour |
"30min" | 30 minutes |
"15min" | 15 minutes |
"1D" | 1 day |
Methods
The method parameter controls behavior for both directions:
- Downsampling (e.g. 15 min → 1 h) – selects the aggregation function.
- Upsampling (e.g. 1 h → 15 min) – selects how NaN gaps are filled.
| Method | Aggregation (↓) | Fill Method (↑) |
|---|---|---|
"linear" | mean | Linear interpolation |
"ffill" | first | Forward-fill |
"bfill" | last | Backward-fill |
"sum" | sum | Forward-fill |
"first" | first | Forward-fill |
"last" | last | Backward-fill |
Selecting Columns
By default, the resampling method is applied to all columns. You can restrict it to specific columns using the columns parameter. Other columns will be resampled (introducing NaNs/dropping points) but not processed by the specified method.
# Interpolate only 'price', leave 'load' with NaNs where gaps occurtransformer = ResampleTransformer(freq="15min", columns=["price"])df = transformer.transform(df)Example: Handling DST Gaps
When converting to a local timezone with daylight saving time, a gap can occur (e.g., 2:00 AM doesn’t exist during spring-forward). Use ResampleTransformer to fill these gaps:
from epftoolbox2.pipelines import DataPipelinefrom epftoolbox2.data.sources import EntsoeSourcefrom epftoolbox2.data.transformers import TimezoneTransformer, ResampleTransformer
pipeline = ( DataPipeline() .add_source(EntsoeSource(country_code="PL", api_key="...", type=["load"])) # First convert to local timezone (creates DST gap) .add_transformer(TimezoneTransformer(target_tz="Europe/Warsaw")) # Then resample to fill the gap .add_transformer(ResampleTransformer(freq="1h", method="linear")))
df = pipeline.run(start="2024-03-30", end="2024-04-01")# The missing 2:00 AM hour during DST transition is now filledNotes
- Input DataFrame must have a DatetimeIndex
- Output values are rounded to 3 decimal places