Skip to content

Implement Caching Strategy #5

@kjaymiller

Description

@kjaymiller

Issue: Unified Caching Strategy API

Summary

TUI implements caching for pages to avoid redundant parsing. This should be generalized into a configurable caching API that all tools can use.

Current State

TUI Implementation

class ContentManager:
    def __init__(self):
        self._posts_cache: Dict[str, List[Page]] = {}

    def get_all_posts(self, use_cache: bool = True) -> List[Page]:
        if use_cache and self.current_collection in self._posts_cache:
            return self._posts_cache[self.current_collection]
        # ... fetch from backend
        self._posts_cache[self.current_collection] = posts
        return posts

    def invalidate_posts_cache(self) -> None:
        self._posts_cache.pop(self.current_collection, None)

CLI Implementation

  • No caching (re-parses on every command)
  • Could benefit from caching for serve command with --reload

Site Loader

class SiteLoader:
    def __init__(self):
        self._site: Optional[Site] = None  # Simple cache

    def reload_site(self) -> None:
        self._site = None  # Cache invalidation

Proposed API

Create render_engine_api.cache with:

from enum import Enum
from typing import Any, Optional, Callable, Dict, Generic, TypeVar

T = TypeVar('T')

class CacheStrategy(Enum):
    """Available caching strategies."""
    NONE = "none"  # No caching
    MEMORY = "memory"  # In-memory dict
    LRU = "lru"  # Least Recently Used with size limit
    TTL = "ttl"  # Time-To-Live expiration
    DISK = "disk"  # Persistent disk cache

class Cache(Generic[T]):
    """Generic cache implementation."""

    def __init__(
        self,
        strategy: CacheStrategy = CacheStrategy.MEMORY,
        max_size: Optional[int] = None,
        ttl_seconds: Optional[int] = None,
        cache_dir: Optional[Path] = None
    ):
        """Initialize cache with strategy and options."""

    def get(self, key: str) -> Optional[T]:
        """Retrieve value from cache."""

    def set(self, key: str, value: T) -> None:
        """Store value in cache."""

    def delete(self, key: str) -> None:
        """Remove specific key from cache."""

    def clear(self) -> None:
        """Clear entire cache."""

    def has(self, key: str) -> bool:
        """Check if key exists in cache."""

    def get_or_compute(
        self,
        key: str,
        compute_fn: Callable[[], T]
    ) -> T:
        """Get from cache or compute and store."""

class CacheManager:
    """Manages multiple caches for different data types."""

    def __init__(self, config: Optional[Dict[str, Any]] = None):
        """Initialize with optional configuration."""

    def get_cache(self, namespace: str) -> Cache:
        """Get or create cache for a namespace."""

    def clear_namespace(self, namespace: str) -> None:
        """Clear specific namespace cache."""

    def clear_all(self) -> None:
        """Clear all caches."""

Cache Strategies

Memory (Default)

cache = Cache(strategy=CacheStrategy.MEMORY)
pages = cache.get_or_compute(
    "blog:all",
    lambda: collection.sorted_pages
)

LRU (Size-Limited)

cache = Cache(
    strategy=CacheStrategy.LRU,
    max_size=100  # Keep only 100 most recent items
)

TTL (Time-Based)

cache = Cache(
    strategy=CacheStrategy.TTL,
    ttl_seconds=300  # 5 minute expiration
)

Disk (Persistent)

cache = Cache(
    strategy=CacheStrategy.DISK,
    cache_dir=Path(".render-engine/cache")
)
# Survives across program runs

Cache Namespaces

Different data types should use different cache namespaces:

manager = CacheManager()

# Pages cache
pages_cache = manager.get_cache("pages")
pages = pages_cache.get_or_compute(
    "blog:all",
    lambda: collection.sorted_pages
)

# Site cache
site_cache = manager.get_cache("site")
site = site_cache.get_or_compute(
    "main",
    lambda: load_site()
)

# Search results cache
search_cache = manager.get_cache("search")
results = search_cache.get_or_compute(
    "blog:python",
    lambda: search_engine.search(pages, "python")
)

Benefits

  1. Performance: Avoid redundant parsing and computation
  2. Flexibility: Multiple strategies for different use cases
  3. Configurability: Users can tune caching behavior
  4. Consistency: Same caching logic across all tools
  5. Development: Fast rebuilds during development
  6. Testing: Can disable caching in tests

Use Cases

CLI Development Server

# Cache pages during serve --reload
cache = Cache(strategy=CacheStrategy.MEMORY)

# Only re-parse changed files
for page in pages:
    cached = cache.get(page.slug)
    if cached and not file_modified(page.content_path):
        yield cached
    else:
        parsed = parse_page(page)
        cache.set(page.slug, parsed)
        yield parsed

TUI Page Browsing

# Keep recently viewed collections in cache
cache = Cache(
    strategy=CacheStrategy.LRU,
    max_size=10  # Last 10 collections
)

def switch_collection(name: str):
    pages = cache.get_or_compute(
        name,
        lambda: load_collection_pages(name)
    )

Build Process

# Only rebuild changed pages
cache = Cache(
    strategy=CacheStrategy.DISK,
    cache_dir=Path(".render-engine/cache")
)

for page in site.pages:
    content_hash = hash_file(page.content_path)
    cached_hash = cache.get(f"{page.slug}:hash")

    if cached_hash == content_hash:
        # Use cached rendered output
        output = cache.get(f"{page.slug}:output")
    else:
        # Render and cache
        output = page.render()
        cache.set(f"{page.slug}:hash", content_hash)
        cache.set(f"{page.slug}:output", output)

Configuration

Support configuration via pyproject.toml:

[tool.render-engine.cache]
strategy = "lru"  # or "memory", "ttl", "disk", "none"
max_size = 100  # for LRU
ttl_seconds = 300  # for TTL
cache_dir = ".render-engine/cache"  # for DISK

[tool.render-engine.cache.namespaces.pages]
strategy = "memory"  # Override for specific namespace

[tool.render-engine.cache.namespaces.site]
strategy = "disk"  # Persist site configuration

Migration Path

  1. Create render_engine_api.cache module
  2. Implement basic Cache class with MEMORY strategy
  3. Add LRU, TTL, and DISK strategies
  4. Create CacheManager for namespace management
  5. Update TUI to use Cache for pages
  6. Update CLI serve command to use Cache
  7. Add cache configuration to RenderEngineConfig
  8. Add tests for all strategies

Dependencies

  • Core: No external dependencies for MEMORY, LRU, TTL
  • DISK: pickle or json for serialization
  • Optional: diskcache for more robust disk caching

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions