Skip to content

Model Pipeline Overview

Model Pipeline

The ModelPipeline class trains and evaluates forecasting models with rolling-window validation.

Basic Usage

from epftoolbox2.pipelines import ModelPipeline
from epftoolbox2.models import OLSModel, LassoCVModel
from epftoolbox2.evaluators import MAEEvaluator
from epftoolbox2.exporters import ExcelExporter, TerminalExporter
pipeline = (
ModelPipeline()
.add_model(OLSModel(predictors=predictors, name="OLS"))
.add_model(LassoCVModel(predictors=predictors, name="LassoCV"))
.add_evaluator(MAEEvaluator())
.add_exporter(TerminalExporter())
.add_exporter(ExcelExporter("results.xlsx"))
)
report = pipeline.run(
data=df,
test_start="2024-02-01",
test_end="2024-03-01",
target="price",
horizon=7,
freq="1h", # Optional. Allows sub-hourly testing like "15min"
save_dir="results",
)

Pipeline Components

Models

Evaluators

Exporters


Predictor Specification

Predictors can be specified in four ways:

predictors = [
"load_actual",
"load_actual_d-1",
"price_d-1",
]

Parallelism

Models use a process pool with inner thread pools. Each worker process handles one forecast day (24 hours × horizon), keeping results consistent and progress tracking stable.

VariableDefaultDescription
MAX_PROCESSEScpu_count // THREADS_PER_PROCESSWorker processes
THREADS_PER_PROCESS16Threads per process (alias: MAX_THREADS)
import os
# Must be set before any imports
os.environ["PYTHON_GIL"] = "0" # Enable free-threading (Python 3.13t+)
os.environ["MAX_PROCESSES"] = "4"
os.environ["THREADS_PER_PROCESS"] = "16" # → 4 × 16 = 64 cores
os.environ["OMP_NUM_THREADS"] = "1" # Prevent BLAS oversubscription

See Installation for full setup including platform notes.


Feature Scaling

Models automatically apply StandardScaler:

  1. Numeric features are standardized (mean=0, std=1)

  2. Binary features (0/1) are auto-detected and skipped

  3. Target variable is scaled, predictions are inverse-scaled


Run Parameters

ParameterTypeDefaultDescription
dataDataFrameRequiredInput data with DatetimeIndex
test_startstrRequiredTest period start
test_endstrRequiredTest period end
targetstr"price"Target column name
horizonint7Max forecast horizon (days)
freqstr"1h"Forecasting frequency (e.g., "15min")
save_dirstrNoneDirectory for incremental results
forecast_onlyboolFalseSkip evaluators/exporters; test_start/test_end default to "today" if omitted

EvaluationReport

The pipeline returns an EvaluationReport:

report.summary() # Overall metrics
report.by_hour() # Breakdown by hour
report.by_horizon() # Breakdown by horizon
report.by_hour_horizon() # Combined breakdown

YAML Serialization

Save and reload a complete pipeline configuration:

pipeline.save("model_pipeline.yaml")
pipeline = ModelPipeline.load("model_pipeline.yaml")

See ModelPipeline Serialization for the full YAML format and serialization rules.


Learn More