cache¶
src.cache
¶
Content-hash skip cache for idempotent stages.
A stage is skipped when:
1. Every declared input file exists and has the same SHA-256 as the last
successful run.
2. The stage's module source file has the same SHA-256 as last time.
3. Every declared output exists.
4. If the stage declares a ttl (seconds), the cache entry must be
younger than that. This forces periodic re-runs for stages that
fetch live data (e.g. GitHub stats) whose per-URL caches expire.
Cache entries live in <output_dir>/_build/.cache/<stage>.hash as plain
text — one key=value line each — so they survive cleanly across pipeline
runs but are easy to inspect or delete by hand.
Only stages that opt in (by declaring inputs in :class:src.stages.Stage)
are eligible for skipping.
compute_key(stage: Stage, output_dir: Path) -> str | None
¶
Return a content-hash key for stage or None if any input is missing.
Source code in src/cache.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
should_skip(stage: Stage, output_dir: Path) -> bool
¶
Return True if a previous run with identical inputs already produced every declared output file.
When the stage declares a ttl, the cache entry must also be younger
than that many seconds — otherwise the stage re-runs even if inputs are
unchanged (so that live-data stages periodically refresh).
Source code in src/cache.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
mark_done(stage: Stage, output_dir: Path) -> None
¶
Record a successful run so a subsequent identical run can be skipped.
Source code in src/cache.py
128 129 130 131 132 133 134 135 136 137 | |
invalidate(stage: Stage, output_dir: Path) -> None
¶
Remove the cache entry for stage, forcing a re-run next time.
Source code in src/cache.py
140 141 142 143 144 | |