Installation
Installation
-
Install the package
Terminal window pip install epftoolbox2Terminal window uv add epftoolbox2Terminal window git clone https://github.com/dawidlinek/epftoolbox2.gitcd epftoolbox2pip install -e . -
Verify installation
import epftoolbox2epftoolbox2.verify() -
Configure for parallel processing (optional but recommended)
See the section below for GIL-free Python setup.
Requirements
- Python 3.10+ (Python 3.14t recommended for parallel processing)
- pandas, numpy, scikit-learn
- requests (for API sources)
- openpyxl (for Excel export)
- rich (for terminal output)
Parallel Model Training Python 3.13t+
Models run in parallel using a process pool with inner thread pools - one worker process per group of cores, each using multiple threads internally. This avoids GIL contention and BLAS oversubscription regardless of how many cores you have.
┌─ Process 1 ─────────────┐ ┌─ Process 2 ─────────────┐│ 16 threads │ │ 16 threads ││ processes days 0..N │ │ processes days N..M ││ BLAS threads = 1 │ │ BLAS threads = 1 │└─────────────────────────┘ └─────────────────────────┘For best results use Python 3.14t (free-threading build), which removes the GIL and allows true parallel numpy execution within each process.
Installing Python 3.14t
uv python install 3.14tuv venv --python 3.14tpyenv install 3.14.0tpyenv local 3.14.0tScript Setup
Place this block at the very top of your script:
import os
# Enable free-threading (Python 3.13t+)os.environ["PYTHON_GIL"] = "0"
# Pin BLAS to 1 thread per process — prevents oversubscriptionos.environ["OMP_NUM_THREADS"] = "1"os.environ["MKL_NUM_THREADS"] = "1"os.environ["OPENBLAS_NUM_THREADS"] = "1"
# Parallelism configuration# MAX_PROCESSES × THREADS_PER_PROCESS should equal your physical core countos.environ["MAX_PROCESSES"] = "4" # worker processesos.environ["THREADS_PER_PROCESS"] = "16" # threads per process
# Then import everything elsefrom epftoolbox2.pipelines import DataPipeline, ModelPipelinefrom epftoolbox2.models import OLSModel, LassoCVModelEnvironment Variables
| Variable | Default | Description |
|---|---|---|
MAX_PROCESSES | cpu_count // THREADS_PER_PROCESS | Number of worker processes |
THREADS_PER_PROCESS | 16 | Threads per worker process (also accepts MAX_THREADS) |
OMP_NUM_THREADS | system default | OpenMP threads per process — set to 1 |
MKL_NUM_THREADS | system default | MKL threads per process — set to 1 |
OPENBLAS_NUM_THREADS | system default | OpenBLAS threads per process — set to 1 |
Platform Notes
Uses fork — worker processes inherit the parent’s memory directly. No if __name__ == '__main__': guard needed in your scripts.
Uses spawn — worker processes start fresh Python interpreters. Scripts that call model.run() at top level must be wrapped:
if __name__ == '__main__': report = model.run(...)