-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Issue: Unified Search Operations API
Summary
TUI implements search functionality for pages. This should be generalized into an API that supports multiple search strategies and can be used by all tools.
Current State
TUI Implementation (render_engine_tui.render_engine_integration)
SEARCHABLE_FIELDS = ['title', 'slug', 'content', 'description']
def search_posts(self, pages: List[Page], search_term: str) -> List[Page]:
"""Search pages by common fields."""
# Case-insensitive substring matchingCLI Implementation
- No search functionality currently
Proposed API
Create render_engine_api.search with:
from enum import Enum
from typing import List, Optional, Callable
class SearchStrategy(Enum):
"""Available search strategies."""
SUBSTRING = "substring" # Case-insensitive substring match
EXACT = "exact" # Exact match
REGEX = "regex" # Regular expression
FUZZY = "fuzzy" # Fuzzy matching (requires fuzzywuzzy)
FULL_TEXT = "full_text" # Backend native full-text (PostgreSQL, etc)
class SearchEngine:
"""Unified search across Page collections."""
def __init__(
self,
strategy: SearchStrategy = SearchStrategy.SUBSTRING,
fields: Optional[List[str]] = None,
case_sensitive: bool = False
):
"""Initialize search engine with strategy."""
def search(
self,
pages: List[Page],
query: str
) -> List[Page]:
"""Search pages using configured strategy."""
def search_with_scores(
self,
pages: List[Page],
query: str
) -> List[Tuple[Page, float]]:
"""Search and return results with relevance scores."""
def filter_by_field(
self,
pages: List[Page],
field: str,
value: Any,
matcher: Optional[Callable] = None
) -> List[Page]:
"""Filter pages by specific field value."""
def get_searchable_fields(self, page: Page) -> List[str]:
"""Get all searchable fields from a Page."""Search Strategies
Substring (Default)
engine = SearchEngine(strategy=SearchStrategy.SUBSTRING)
results = engine.search(pages, "python")
# Matches any page with "python" in title, content, etc.Regex
engine = SearchEngine(strategy=SearchStrategy.REGEX)
results = engine.search(pages, r"python\s+3\.\d+")
# Matches "python 3.10", "python 3.11", etc.Fuzzy
engine = SearchEngine(strategy=SearchStrategy.FUZZY)
results = engine.search_with_scores(pages, "pytohn")
# [(page, 0.95), ...] - finds "python" despite typoFull-Text (Backend Native)
# For PostgreSQL or other databases with FTS
engine = SearchEngine(
strategy=SearchStrategy.FULL_TEXT,
backend=collection.content_manager
)
results = engine.search(pages, "python AND django")
# Uses database's native full-text searchBenefits
- Flexibility: Multiple search strategies for different use cases
- Performance: Can use backend-native search when available
- Consistency: Same search logic across CLI, TUI, and other tools
- Extensibility: Easy to add new search strategies
- Testing: Isolated search logic is easier to test
- Scoring: Relevance scores for better result ranking
Features to Include
Field-Based Search
# Search only in titles
engine = SearchEngine(fields=["title"])
results = engine.search(pages, "Introduction")Combined Filters
# Search with multiple criteria
results = engine.search(pages, "python")
results = engine.filter_by_field(results, "date", date(2025, 1, 1),
matcher=lambda val, query: val >= query)Scoring and Ranking
# Get results with relevance scores
scored_results = engine.search_with_scores(pages, "machine learning")
# [(page1, 0.95), (page2, 0.87), ...]
# Sort by score
sorted_results = sorted(scored_results, key=lambda x: x[1], reverse=True)Highlighting
# Find and highlight matches
matches = engine.get_match_positions(page, "python")
highlighted = engine.highlight_matches(page.content, matches)
# Returns content with <mark> tags around matchesMigration Path
- Create
render_engine_api.search.SearchEngine - Implement basic substring search (migrate from TUI)
- Add regex support
- Add fuzzy search (optional dependency)
- Add full-text search integration
- Update TUI to use
SearchEngine - Add search command to CLI
- Add comprehensive tests
Example CLI Usage
# Search across all collections
render-engine search "python" --collection blog
# Case-sensitive search
render-engine search "Python" --case-sensitive
# Search specific fields
render-engine search "tutorial" --fields title,description
# Regex search
render-engine search "python\s+3\.\d+" --regex
# Output search results
render-engine search "django" --output results.jsonDependencies
- Core: No external dependencies for substring/regex
- Fuzzy:
rapidfuzzorthefuzz(optional) - Full-text: Depends on ContentManager backend
Related Issues
- ADD: Search Operations #4: Content Manager Integration API
- CREATE: Collection Adapter #3: Collection Operations API
- #008: Result Formatting API
Metadata
Metadata
Assignees
Labels
No labels