Skip to content

ResampleTransformer

ResampleTransformer

Resamples data to a specified frequency, applying aggregation (when downsampling) or interpolation (when upsampling).

Basic Usage

from epftoolbox2.data.transformers import ResampleTransformer
transformer = ResampleTransformer(freq="1h")
df = transformer.transform(df)

Parameters

ParameterTypeDefaultDescription
freqstr"1h"Pandas frequency string
methodstr"linear"Method specifying aggregation and interpolation behavior
columnslist[str] | str | NoneNoneColumns to interpolate. If None, all columns are interpolated.

Frequency Strings

StringMeaning
"1h"1 hour
"30min"30 minutes
"15min"15 minutes
"1D"1 day

Methods

The method parameter controls behavior for both directions:

  • Downsampling (e.g. 15 min → 1 h) – selects the aggregation function.
  • Upsampling (e.g. 1 h → 15 min) – selects how NaN gaps are filled.
MethodAggregation (↓)Fill Method (↑)
"linear"meanLinear interpolation
"ffill"firstForward-fill
"bfill"lastBackward-fill
"sum"sumForward-fill
"first"firstForward-fill
"last"lastBackward-fill

Selecting Columns

By default, the resampling method is applied to all columns. You can restrict it to specific columns using the columns parameter. Other columns will be resampled (introducing NaNs/dropping points) but not processed by the specified method.

# Interpolate only 'price', leave 'load' with NaNs where gaps occur
transformer = ResampleTransformer(freq="15min", columns=["price"])
df = transformer.transform(df)

Example: Handling DST Gaps

When converting to a local timezone with daylight saving time, a gap can occur (e.g., 2:00 AM doesn’t exist during spring-forward). Use ResampleTransformer to fill these gaps:

from epftoolbox2.pipelines import DataPipeline
from epftoolbox2.data.sources import EntsoeSource
from epftoolbox2.data.transformers import TimezoneTransformer, ResampleTransformer
pipeline = (
DataPipeline()
.add_source(EntsoeSource(country_code="PL", api_key="...", type=["load"]))
# First convert to local timezone (creates DST gap)
.add_transformer(TimezoneTransformer(target_tz="Europe/Warsaw"))
# Then resample to fill the gap
.add_transformer(ResampleTransformer(freq="1h", method="linear"))
)
df = pipeline.run(start="2024-03-30", end="2024-04-01")
# The missing 2:00 AM hour during DST transition is now filled

Notes

  • Input DataFrame must have a DatetimeIndex
  • Output values are rounded to 3 decimal places