Data Caching

epftoolbox2 caches downloaded data to avoid redundant API calls and speed up subsequent runs.

Cache Flow

Check cache - Pipeline checks if data exists in cache
Validate metadata - Verify source configuration matches
Fetch missing data - Only download data not in cache
Merge and store - Combine with cached data and save

Enabling Caching

# Enable source-level caching (recommended)
df = pipeline.run(start="2024-01-01", end="2024-06-01", cache=True)

# Or cache entire pipeline output to a single file
df = pipeline.run(start="2024-01-01", end="2024-06-01", cache="my_data.csv")

Cache Directory Structure

Directory.cache/
- Directorysources/
  - Directorya1b2c3d4e5f6/ (EntsoeSource hash)
    metadata.json
    data_20230101_20240101.csv
    data_20240101_20240601.csv
  - Directoryf6e5d4c3b2a1/ (OpenMeteoSource hash)
    metadata.json
    data_20230101_20240601.csv

How It Works

Cache Key Generation

Each source generates a unique cache key based on its configuration:

# These create different cache directories:
EntsoeSource(country_code="PL", type=["load"])
EntsoeSource(country_code="PL", type=["load", "price"])
EntsoeSource(country_code="DE", type=["load"])

Supported Sources

Source	Caching
`EntsoeSource`	✅ Yes
`OpenMeteoSource`	✅ Yes
`CalendarSource`	❌ No (generated locally)
`CsvSource`	❌ No (reads from file)

Clearing Cache

import shutil

# Clear all cache
shutil.rmtree(".cache")

# Or clear specific source cache
shutil.rmtree(".cache/sources/a1b2c3d4e5f6")