Skip to content

Data Caching

Data Caching

epftoolbox2 caches downloaded data to avoid redundant API calls and speed up subsequent runs.

Cache Flow

  1. Check cache - Pipeline checks if data exists in cache

  2. Validate metadata - Verify source configuration matches

  3. Fetch missing data - Only download data not in cache

  4. Merge and store - Combine with cached data and save

Enabling Caching

# Enable source-level caching (recommended)
df = pipeline.run(start="2024-01-01", end="2024-06-01", cache=True)
# Or cache entire pipeline output to a single file
df = pipeline.run(start="2024-01-01", end="2024-06-01", cache="my_data.csv")

Cache Directory Structure

  • Directory.cache/
    • Directorysources/
      • Directorya1b2c3d4e5f6/ (EntsoeSource hash)
        • metadata.json
        • data_20230101_20240101.csv
        • data_20240101_20240601.csv
      • Directoryf6e5d4c3b2a1/ (OpenMeteoSource hash)
        • metadata.json
        • data_20230101_20240601.csv

How It Works

No

Yes

No

Yes

Yes

Partial

Pipeline Run

Cache exists?

Fetch all data

Config matches?

Date range covered?

Load from cache

Fetch missing dates

Merge with cache

Save to cache

Return DataFrame

Cache Key Generation

Each source generates a unique cache key based on its configuration:

# These create different cache directories:
EntsoeSource(country_code="PL", type=["load"])
EntsoeSource(country_code="PL", type=["load", "price"])
EntsoeSource(country_code="DE", type=["load"])

Supported Sources

SourceCaching
EntsoeSource✅ Yes
OpenMeteoSource✅ Yes
CalendarSource❌ No (generated locally)
CsvSource❌ No (reads from file)

Clearing Cache

import shutil
# Clear all cache
shutil.rmtree(".cache")
# Or clear specific source cache
shutil.rmtree(".cache/sources/a1b2c3d4e5f6")