Data Caching
Data Caching
epftoolbox2 caches downloaded data to avoid redundant API calls and speed up subsequent runs.
Cache Flow
-
Check cache - Pipeline checks if data exists in cache
-
Validate metadata - Verify source configuration matches
-
Fetch missing data - Only download data not in cache
-
Merge and store - Combine with cached data and save
Enabling Caching
# Enable source-level caching (recommended)df = pipeline.run(start="2024-01-01", end="2024-06-01", cache=True)
# Or cache entire pipeline output to a single filedf = pipeline.run(start="2024-01-01", end="2024-06-01", cache="my_data.csv")Cache Directory Structure
Directory.cache/
Directorysources/
Directorya1b2c3d4e5f6/ (EntsoeSource hash)
- metadata.json
- data_20230101_20240101.csv
- data_20240101_20240601.csv
Directoryf6e5d4c3b2a1/ (OpenMeteoSource hash)
- metadata.json
- data_20230101_20240601.csv
How It Works
Cache Key Generation
Each source generates a unique cache key based on its configuration:
# These create different cache directories:EntsoeSource(country_code="PL", type=["load"])EntsoeSource(country_code="PL", type=["load", "price"])EntsoeSource(country_code="DE", type=["load"])Supported Sources
| Source | Caching |
|---|---|
EntsoeSource | ✅ Yes |
OpenMeteoSource | ✅ Yes |
CalendarSource | ❌ No (generated locally) |
CsvSource | ❌ No (reads from file) |
Clearing Cache
import shutil
# Clear all cacheshutil.rmtree(".cache")
# Or clear specific source cacheshutil.rmtree(".cache/sources/a1b2c3d4e5f6")