Skip to content

ADD: Search Operations #4

@kjaymiller

Description

@kjaymiller

Issue: Unified Search Operations API

Summary

TUI implements search functionality for pages. This should be generalized into an API that supports multiple search strategies and can be used by all tools.

Current State

TUI Implementation (render_engine_tui.render_engine_integration)

SEARCHABLE_FIELDS = ['title', 'slug', 'content', 'description']

def search_posts(self, pages: List[Page], search_term: str) -> List[Page]:
    """Search pages by common fields."""
    # Case-insensitive substring matching

CLI Implementation

  • No search functionality currently

Proposed API

Create render_engine_api.search with:

from enum import Enum
from typing import List, Optional, Callable

class SearchStrategy(Enum):
    """Available search strategies."""
    SUBSTRING = "substring"  # Case-insensitive substring match
    EXACT = "exact"  # Exact match
    REGEX = "regex"  # Regular expression
    FUZZY = "fuzzy"  # Fuzzy matching (requires fuzzywuzzy)
    FULL_TEXT = "full_text"  # Backend native full-text (PostgreSQL, etc)

class SearchEngine:
    """Unified search across Page collections."""

    def __init__(
        self,
        strategy: SearchStrategy = SearchStrategy.SUBSTRING,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = False
    ):
        """Initialize search engine with strategy."""

    def search(
        self,
        pages: List[Page],
        query: str
    ) -> List[Page]:
        """Search pages using configured strategy."""

    def search_with_scores(
        self,
        pages: List[Page],
        query: str
    ) -> List[Tuple[Page, float]]:
        """Search and return results with relevance scores."""

    def filter_by_field(
        self,
        pages: List[Page],
        field: str,
        value: Any,
        matcher: Optional[Callable] = None
    ) -> List[Page]:
        """Filter pages by specific field value."""

    def get_searchable_fields(self, page: Page) -> List[str]:
        """Get all searchable fields from a Page."""

Search Strategies

Substring (Default)

engine = SearchEngine(strategy=SearchStrategy.SUBSTRING)
results = engine.search(pages, "python")
# Matches any page with "python" in title, content, etc.

Regex

engine = SearchEngine(strategy=SearchStrategy.REGEX)
results = engine.search(pages, r"python\s+3\.\d+")
# Matches "python 3.10", "python 3.11", etc.

Fuzzy

engine = SearchEngine(strategy=SearchStrategy.FUZZY)
results = engine.search_with_scores(pages, "pytohn")
# [(page, 0.95), ...] - finds "python" despite typo

Full-Text (Backend Native)

# For PostgreSQL or other databases with FTS
engine = SearchEngine(
    strategy=SearchStrategy.FULL_TEXT,
    backend=collection.content_manager
)
results = engine.search(pages, "python AND django")
# Uses database's native full-text search

Benefits

  1. Flexibility: Multiple search strategies for different use cases
  2. Performance: Can use backend-native search when available
  3. Consistency: Same search logic across CLI, TUI, and other tools
  4. Extensibility: Easy to add new search strategies
  5. Testing: Isolated search logic is easier to test
  6. Scoring: Relevance scores for better result ranking

Features to Include

Field-Based Search

# Search only in titles
engine = SearchEngine(fields=["title"])
results = engine.search(pages, "Introduction")

Combined Filters

# Search with multiple criteria
results = engine.search(pages, "python")
results = engine.filter_by_field(results, "date", date(2025, 1, 1),
                                  matcher=lambda val, query: val >= query)

Scoring and Ranking

# Get results with relevance scores
scored_results = engine.search_with_scores(pages, "machine learning")
# [(page1, 0.95), (page2, 0.87), ...]

# Sort by score
sorted_results = sorted(scored_results, key=lambda x: x[1], reverse=True)

Highlighting

# Find and highlight matches
matches = engine.get_match_positions(page, "python")
highlighted = engine.highlight_matches(page.content, matches)
# Returns content with <mark> tags around matches

Migration Path

  1. Create render_engine_api.search.SearchEngine
  2. Implement basic substring search (migrate from TUI)
  3. Add regex support
  4. Add fuzzy search (optional dependency)
  5. Add full-text search integration
  6. Update TUI to use SearchEngine
  7. Add search command to CLI
  8. Add comprehensive tests

Example CLI Usage

# Search across all collections
render-engine search "python" --collection blog

# Case-sensitive search
render-engine search "Python" --case-sensitive

# Search specific fields
render-engine search "tutorial" --fields title,description

# Regex search
render-engine search "python\s+3\.\d+" --regex

# Output search results
render-engine search "django" --output results.json

Dependencies

  • Core: No external dependencies for substring/regex
  • Fuzzy: rapidfuzz or thefuzz (optional)
  • Full-text: Depends on ContentManager backend

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions