Skip to content

feat: 添加 WikipediaTopViewsSource 并增强查询源优先级系统#12

Open
Disaster-Terminator wants to merge 1 commit intomainfrom
feature/query-sources
Open

feat: 添加 WikipediaTopViewsSource 并增强查询源优先级系统#12
Disaster-Terminator wants to merge 1 commit intomainfrom
feature/query-sources

Conversation

@Disaster-Terminator
Copy link
Owner

@Disaster-Terminator Disaster-Terminator commented Feb 26, 2026

变更概述

本 PR 添加了 WikipediaTopViewsSource 查询源,并增强了查询源优先级系统和评论管理功能。

主要变更

  1. WikipediaTopViewsSource 查询源

    • 支持从 Wikipedia Pageviews API 获取热门文章
    • 6 小时缓存机制
    • 自动过滤排除的文章(如 Main_Page, Special:)
  2. 查询源优先级系统

    • 在 QuerySource 基类添加 get_priority() 默认实现
    • 支持按优先级排序:LocalFile(50) → WikipediaTopViews(80) → DuckDuckGo(90) → Wikipedia(100) → BingSuggestions(110)
  3. DashboardClient API 客户端

    • 支持通过 Dashboard API 获取积分
    • API 失败时自动降级到 HTML 解析
  4. verify-context 命令

    • 验证本地评论数据是否属于当前分支
    • 支持 PR 上下文验证
  5. ReviewMetadata 增强

    • 添加 branchhead_sha 字段用于追踪评论来源
  6. fetch 命令增强

    • 支持通过 gh CLI 自动检测当前分支对应的 PR 编号
    • 完善错误处理(gh 未安装、超时、权限不足)

测试

  • 483 个单元测试全部通过
  • 覆盖 WikipediaTopViewsSource、积分检测、CLI 命令等场景

Summary by Sourcery

添加一个新的 Wikipedia 热门浏览量查询源,改进查询源优先级和可用性处理,通过仪表板 API 客户端增强奖励积分检测,并扩展评审管理 CLI,支持 PR 自动检测和上下文校验。

New Features:

  • 引入 WikipediaTopViewsSource,基于 Wikipedia Pageviews 的热门文章生成查询,并支持缓存和过滤。
  • 添加 DashboardClient API 客户端,从内部端点获取 Microsoft Rewards 积分和仪表板数据。
  • 添加 verify-context CLI 子命令,用于验证本地评审评论是否与当前 Git 分支匹配。

Enhancements:

  • QuerySource 添加默认优先级机制,并为所有查询源分配显式优先级,在查询引擎中按优先级排序来源。
  • 确保只有在在线查询源可用时才进行注册,并避免在不同来源和 Bing 建议之间添加重复查询。
  • 在获取线程时,将分支和 HEAD SHA 信息扩展到评审元数据中。
  • 改进 PointsDetector,优先使用仪表板 API 获取积分,在回退到 HTML 解析时更加健壮,并强化解析与日志记录行为。

Build:

  • playwright-stealth 依赖约束在 <2.0,并调整开发依赖。

Tests:

  • WikipediaTopViewsSource 的行为、缓存和可用性添加单元测试,以及源优先级排序和跳过不可用来源的测试。
  • 为评审 CLI 的 fetch 和 verify-context 逻辑添加测试,包括 gh CLI 和错误处理场景。
  • DashboardClient API 的行为与回退逻辑、PointsDetector 与 API 客户端的集成、解析边缘情况,以及 ReviewMetadata 的分支/HEAD 跟踪添加测试。
  • 为在非 git 和错误场景下的 git 分支/HEAD 辅助函数添加测试。
Original summary in English

Summary by Sourcery

Add a new Wikipedia top-views query source, improve query-source prioritization and availability handling, enhance rewards points detection via a dashboard API client, and extend the review-management CLI with PR auto-detection and context verification.

New Features:

  • Introduce WikipediaTopViewsSource to generate queries from Wikipedia Pageviews trending articles with caching and filtering.
  • Add a DashboardClient API client to retrieve Microsoft Rewards points and dashboard data from internal endpoints.
  • Add a verify-context CLI subcommand to validate that local review comments match the current Git branch.

Enhancements:

  • Add a default priority mechanism to QuerySource and assign explicit priorities to all query sources, sorting sources by priority in the query engine.
  • Ensure online query sources are only registered when available and avoid adding duplicate queries across sources and Bing suggestions.
  • Extend review metadata with branch and HEAD SHA information when fetching threads.
  • Improve PointsDetector to prefer the dashboard API for points, fall back more robustly to HTML parsing, and harden parsing and logging behavior.

Build:

  • Constrain the playwright-stealth dependency to <2.0 and adjust dev dependencies.

Tests:

  • Add unit tests for WikipediaTopViewsSource behavior, caching, and availability, as well as for source-priority ordering and skipping unavailable sources.
  • Add tests for the review CLI fetch and verify-context logic, including gh CLI and error handling scenarios.
  • Add tests for DashboardClient API behavior and fallbacks, PointsDetector integration with the API client and parsing edge cases, and ReviewMetadata branch/head tracking.
  • Add tests for git branch/HEAD helper functions in non-git and error scenarios.
Original summary in English

Summary by Sourcery

添加一个新的 Wikipedia 热门浏览量查询源,改进查询源优先级和可用性处理,通过仪表板 API 客户端增强奖励积分检测,并扩展评审管理 CLI,支持 PR 自动检测和上下文校验。

New Features:

  • 引入 WikipediaTopViewsSource,基于 Wikipedia Pageviews 的热门文章生成查询,并支持缓存和过滤。
  • 添加 DashboardClient API 客户端,从内部端点获取 Microsoft Rewards 积分和仪表板数据。
  • 添加 verify-context CLI 子命令,用于验证本地评审评论是否与当前 Git 分支匹配。

Enhancements:

  • QuerySource 添加默认优先级机制,并为所有查询源分配显式优先级,在查询引擎中按优先级排序来源。
  • 确保只有在在线查询源可用时才进行注册,并避免在不同来源和 Bing 建议之间添加重复查询。
  • 在获取线程时,将分支和 HEAD SHA 信息扩展到评审元数据中。
  • 改进 PointsDetector,优先使用仪表板 API 获取积分,在回退到 HTML 解析时更加健壮,并强化解析与日志记录行为。

Build:

  • playwright-stealth 依赖约束在 <2.0,并调整开发依赖。

Tests:

  • WikipediaTopViewsSource 的行为、缓存和可用性添加单元测试,以及源优先级排序和跳过不可用来源的测试。
  • 为评审 CLI 的 fetch 和 verify-context 逻辑添加测试,包括 gh CLI 和错误处理场景。
  • DashboardClient API 的行为与回退逻辑、PointsDetector 与 API 客户端的集成、解析边缘情况,以及 ReviewMetadata 的分支/HEAD 跟踪添加测试。
  • 为在非 git 和错误场景下的 git 分支/HEAD 辅助函数添加测试。
Original summary in English

Summary by Sourcery

Add a new Wikipedia top-views query source, improve query-source prioritization and availability handling, enhance rewards points detection via a dashboard API client, and extend the review-management CLI with PR auto-detection and context verification.

New Features:

  • Introduce WikipediaTopViewsSource to generate queries from Wikipedia Pageviews trending articles with caching and filtering.
  • Add a DashboardClient API client to retrieve Microsoft Rewards points and dashboard data from internal endpoints.
  • Add a verify-context CLI subcommand to validate that local review comments match the current Git branch.

Enhancements:

  • Add a default priority mechanism to QuerySource and assign explicit priorities to all query sources, sorting sources by priority in the query engine.
  • Ensure online query sources are only registered when available and avoid adding duplicate queries across sources and Bing suggestions.
  • Extend review metadata with branch and HEAD SHA information when fetching threads.
  • Improve PointsDetector to prefer the dashboard API for points, fall back more robustly to HTML parsing, and harden parsing and logging behavior.

Build:

  • Constrain the playwright-stealth dependency to <2.0 and adjust dev dependencies.

Tests:

  • Add unit tests for WikipediaTopViewsSource behavior, caching, and availability, as well as for source-priority ordering and skipping unavailable sources.
  • Add tests for the review CLI fetch and verify-context logic, including gh CLI and error handling scenarios.
  • Add tests for DashboardClient API behavior and fallbacks, PointsDetector integration with the API client and parsing edge cases, and ReviewMetadata branch/head tracking.
  • Add tests for git branch/HEAD helper functions in non-git and error scenarios.

Copilot AI review requested due to automatic review settings February 26, 2026 07:17
@qodo-code-review
Copy link

Review Summary by Qodo

Add WikipediaTopViewsSource and enhance query source priority system

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Add WikipediaTopViewsSource with 6-hour caching and article filtering
• Implement query source priority system with sorting by priority value
• Create DashboardClient for API-first points detection with HTML fallback
• Add verify-context CLI command for PR branch validation
• Enhance fetch command with automatic PR detection via gh CLI
• Track review context with branch and HEAD SHA in metadata
Diagram
flowchart LR
  A["WikipediaTopViewsSource<br/>Priority: 80"] --> B["QuerySource<br/>Priority System"]
  C["DashboardClient<br/>API + Fallback"] --> D["PointsDetector<br/>Enhanced"]
  E["ReviewMetadata<br/>branch + head_sha"] --> F["ReviewResolver<br/>Context Tracking"]
  G["fetch Command<br/>Auto PR Detection"] --> H["gh CLI Integration"]
  I["verify-context<br/>Command"] --> J["Branch Validation"]
  B --> K["Source Sorting<br/>by Priority"]
Loading

Grey Divider

File Changes

1. src/search/query_sources/wikipedia_top_views_source.py ✨ Enhancement +211/-0

New Wikipedia top views query source implementation

src/search/query_sources/wikipedia_top_views_source.py


2. src/api/dashboard_client.py ✨ Enhancement +164/-0

New Dashboard API client with fallback mechanism

src/api/dashboard_client.py


3. src/account/points_detector.py ✨ Enhancement +37/-13

Integrate DashboardClient with API-first approach

src/account/points_detector.py


View more (18)
4. src/search/query_sources/query_source.py ✨ Enhancement +12/-0

Add default get_priority method to base class

src/search/query_sources/query_source.py


5. src/search/query_sources/local_file_source.py ✨ Enhancement +4/-0

Implement priority 50 for local file source

src/search/query_sources/local_file_source.py


6. src/search/query_sources/duckduckgo_source.py ✨ Enhancement +4/-0

Implement priority 90 for DuckDuckGo source

src/search/query_sources/duckduckgo_source.py


7. src/search/query_sources/wikipedia_source.py ✨ Enhancement +4/-0

Implement priority 100 for Wikipedia source

src/search/query_sources/wikipedia_source.py


8. src/search/query_sources/bing_suggestions_source.py ✨ Enhancement +4/-0

Implement priority 110 for Bing suggestions source

src/search/query_sources/bing_suggestions_source.py


9. src/search/query_sources/__init__.py ✨ Enhancement +2/-1

Export WikipediaTopViewsSource in module

src/search/query_sources/init.py


10. src/search/query_engine.py ✨ Enhancement +29/-4

Add WikipediaTopViewsSource initialization and priority sorting

src/search/query_engine.py


11. src/review/models.py ✨ Enhancement +3/-1

Add branch and head_sha fields to ReviewMetadata

src/review/models.py


12. src/review/resolver.py ✨ Enhancement +39/-1

Add git branch and SHA detection functions

src/review/resolver.py


13. src/api/__init__.py ✨ Enhancement +5/-0

New API clients module with DashboardClient export

src/api/init.py


14. tools/manage_reviews.py ✨ Enhancement +130/-3

Add verify-context command and auto PR detection

tools/manage_reviews.py


15. config.example.yaml ⚙️ Configuration changes +5/-0

Add WikipediaTopViewsSource configuration section

config.example.yaml


16. pyproject.toml Dependencies +2/-0

Add httpx and respx test dependencies

pyproject.toml


17. tests/unit/test_dashboard_client.py 🧪 Tests +139/-0

Comprehensive tests for DashboardClient functionality

tests/unit/test_dashboard_client.py


18. tests/unit/test_manage_reviews_cli.py 🧪 Tests +435/-0

Tests for verify-context and fetch command logic

tests/unit/test_manage_reviews_cli.py


19. tests/unit/test_online_query_sources.py 🧪 Tests +290/-0

Tests for WikipediaTopViewsSource and priority system

tests/unit/test_online_query_sources.py


20. tests/unit/test_review_context.py 🧪 Tests +82/-0

Tests for git branch and SHA detection functions

tests/unit/test_review_context.py


21. tests/unit/test_review_parsers.py 🧪 Tests +1/-1

Update metadata version to 2.3

tests/unit/test_review_parsers.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 26, 2026

Code Review by Qodo

🐞 Bugs (6) 📘 Rule violations (4) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Priority与测试不一致 🐞 Bug ✓ Correctness ⭐ New
Description
QueryEngine 现在会按 get_priority() 对 sources 排序,但各 QuerySource 的 get_priority()
实现值与单测期望的优先级体系不一致,导致排序结果错误且单测必然失败。该问题会直接破坏优先级系统的核心行为。
Code

src/search/query_engine.py[R152-153]

+        self.sources.sort(key=lambda s: s.get_priority())
+
Evidence
QueryEngine 明确在初始化结束时按 get_priority() 排序;但各 source 的 get_priority() 返回值(local_file=100,
duckduckgo=50, wikipedia=60, bing_suggestions=70, wikipedia_top_views=120)与单测断言(local_file=50,
wikipedia_top_views=80, duckduckgo=90, wikipedia=100, bing_suggestions=110)完全不一致,因此排序与断言都会失败。

src/search/query_engine.py[132-153]
src/search/query_sources/local_file_source.py[181-188]
src/search/query_sources/duckduckgo_source.py[119-126]
src/search/query_sources/wikipedia_source.py[170-177]
src/search/query_sources/bing_suggestions_source.py[101-108]
src/search/query_sources/wikipedia_top_views_source.py[77-84]
tests/unit/test_online_query_sources.py[607-658]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
QueryEngine 会按 `get_priority()` 对 sources 排序,但各 QuerySource 的 `get_priority()` 实现与单测断言的优先级体系不一致,导致排序行为与测试均不正确。

### Issue Context
- QueryEngine 在初始化末尾执行 `self.sources.sort(key=lambda s: s.get_priority())`。
- 当前各 source 的 priority 返回值与 `tests/unit/test_online_query_sources.py::TestQuerySourcePriority` 的断言不一致。

### Fix Focus Areas
- src/search/query_engine.py[132-153]
- src/search/query_sources/local_file_source.py[181-188]
- src/search/query_sources/duckduckgo_source.py[119-126]
- src/search/query_sources/wikipedia_source.py[170-177]
- src/search/query_sources/bing_suggestions_source.py[101-108]
- src/search/query_sources/wikipedia_top_views_source.py[77-84]
- tests/unit/test_online_query_sources.py[607-658]

### Suggested approach
1. 选定唯一的 priority 规范(建议以测试/需求为准:local_file=50, wikipedia_top_views=80, duckduckgo=90, wikipedia=100, bing_suggestions=110)。
2. 修改各 source 的 `get_priority()` 返回值以匹配该规范(或反向调整测试)。
3. 确保 `test_sources_sort_by_priority` 的顺序断言与实现一致。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. 单测可能真实联网 🐞 Bug ⛯ Reliability ⭐ New
Description
新增单测 test_source_returns_empty_when_unavailable 未注入 mock session,直接调用
DuckDuckGoSource.fetch_queries(),该实现会创建 aiohttp.ClientSession 并执行真实 HTTP
GET。测试断言仅在外部请求失败时才会稳定通过,导致测试对网络环境敏感并可能触发超时。
Code

tests/unit/test_online_query_sources.py[R586-594]

+    def test_source_returns_empty_when_unavailable(self, mock_config):
+        """Test that fetch_queries returns empty list when unavailable"""
+        source = DuckDuckGoSource(mock_config)
+        source._available = False
+
+        import asyncio
+
+        result = asyncio.get_event_loop().run_until_complete(source.fetch_queries(5))
+        assert result == []
Evidence
该测试没有像同文件其他用例一样给 source._session 注入 AsyncMock,而是直接运行
fetch_queries()。DuckDuckGoSource.fetch_queries 内部总会调用 _get_session() 创建 aiohttp.ClientSession,并在
_fetch_suggestions() 中通过 session.get() 发起真实请求,因此该测试存在网络依赖和超时风险。

tests/unit/test_online_query_sources.py[586-594]
src/search/query_sources/duckduckgo_source.py[46-80]
src/search/query_sources/duckduckgo_source.py[92-108]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`test_source_returns_empty_when_unavailable` 在未 mock 网络层的情况下调用 `DuckDuckGoSource.fetch_queries()`,会产生真实 HTTP 请求,导致测试对网络可用性敏感并可能因超时而不稳定。

### Issue Context
同文件前面的 DuckDuckGoSource/WikipediaSource 测试均通过注入 `source._session = AsyncMock()` 来避免真实联网;该新增测试没有做同样处理。

### Fix Focus Areas
- tests/unit/test_online_query_sources.py[586-594]
- src/search/query_sources/duckduckgo_source.py[46-90]
- src/search/query_sources/duckduckgo_source.py[92-116]

### Suggested approach
任选其一(推荐 A):
A) 修改测试:patch `DuckDuckGoSource._get_session` 返回带 `get` 的 AsyncMock(或直接注入 `source._session`),并让 `session.get` 抛出异常/返回非 200,从而稳定断言 `[]`。
B) 删除该测试(与同文件 `test_fetch_queries_handles_error` 等用例覆盖面重复),避免引入网络依赖。
C) 若希望“不可用时 fetch_queries 直接返回 []”成为契约,则在 `fetch_queries()` 开头增加 `if not self._available: return []`,并同步评估是否会导致永久不重试的问题。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. cmd_fetch prints raw exception 📘 Rule violation ⛨ Security
Description
cmd_fetch returns a user-facing JSON message containing the raw exception string, which can leak
internal implementation details. This violates secure error handling expectations for user-visible
outputs.
Code

tools/manage_reviews.py[R226-228]

+        except Exception as e:
+            print(json.dumps({"success": False, "message": f"获取 PR 编号失败: {e}"}))
+            return
Evidence
The compliance rule requires user-facing errors to be generic; the code directly embeds {e} into
the message returned to the caller.

Rule 4: Generic: Secure Error Handling
tools/manage_reviews.py[226-228]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The CLI prints raw exception text in a user-facing JSON response, potentially leaking internal details.
## Issue Context
Compliance requires generic user-facing errors; detailed errors should go to internal logs only.
## Fix Focus Areas
- tools/manage_reviews.py[226-228]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. TopViews 永久不可用 🐞 Bug ⛯ Reliability
Description
WikipediaTopViewsSource 一旦遇到非 200/异常就把 _available 置为 False,但成功时从不恢复为
True,导致一次临时故障后该源永久返回空列表。并且日期用本地时间计算(datetime.now),在部分时区可能请求到 API 尚未生成的数据,从而触发上述永久禁用。
Code

src/search/query_sources/wikipedia_top_views_source.py[R116-210]

+    def _get_api_date(self) -> tuple[str, str, str]:
+        """Get yesterday's date for API call"""
+        yesterday = datetime.now() - timedelta(days=1)
+        return (str(yesterday.year), f"{yesterday.month:02d}", f"{yesterday.day:02d}")
+
+    def _build_api_url(self) -> str:
+        """Build API URL using constants"""
+        base_url = QUERY_SOURCE_URLS["wikipedia_top_views"]
+        yyyy, mm, dd = self._get_api_date()
+        return f"{base_url}/{self.lang}.wikipedia/all-access/{yyyy}/{mm}/{dd}"
+
+    async def _fetch_top_articles(self, session: aiohttp.ClientSession) -> list[dict[str, Any]]:
+        """
+        Fetch top articles from Wikipedia API
+
+        Args:
+            session: aiohttp session
+
+        Returns:
+            List of article objects
+        """
+        try:
+            url = self._build_api_url()
+            self.logger.debug(f"Fetching top articles from: {url}")
+
+            async with session.get(
+                url, timeout=aiohttp.ClientTimeout(total=self.timeout)
+            ) as response:
+                if response.status == 200:
+                    data = await response.json()
+                    if "items" in data and data["items"]:
+                        articles = data["items"][0].get("articles", [])
+                        return list(articles) if articles else []
+                else:
+                    self.logger.warning(f"Wikipedia API returned status {response.status}")
+                    self._available = False
+        except Exception as e:
+            self.logger.error(f"Error fetching top articles: {e}")
+            self._available = False
+        return []
+
+    def _filter_articles(self, articles: list[dict]) -> list[str]:
+        """
+        Filter out non-article entries
+
+        Args:
+            articles: List of article objects
+
+        Returns:
+            List of filtered article titles
+        """
+        filtered = []
+        for article in articles:
+            title = article.get("article", "")
+            if not any(title.startswith(prefix) for prefix in self.EXCLUDED_PREFIXES):
+                filtered.append(title.replace("_", " "))
+        return filtered
+
+    async def fetch_queries(self, count: int) -> list[str]:
+        """
+        Fetch queries from Wikipedia Pageviews API
+
+        Args:
+            count: Number of queries to fetch
+
+        Returns:
+            List of query strings
+        """
+        if self._is_cache_valid():
+            self.logger.debug("Cache hit for Wikipedia top views")
+            return self._get_from_cache(count)
+
+        self._cache_misses += 1
+        queries = []
+
+        try:
+            session = await self._get_session()
+            articles = await self._fetch_top_articles(session)
+
+            if not self._available:
+                return []
+
+            filtered_articles = self._filter_articles(articles)
+            queries = filtered_articles[:count]
+
+            if filtered_articles:
+                self._cache_articles(filtered_articles)
+                self.logger.debug(f"Cached {len(filtered_articles)} top articles")
+
+            self.logger.debug(f"Fetched {len(queries)} queries from Wikipedia top views")
+
+        except Exception as e:
+            self.logger.error(f"Failed to fetch queries from Wikipedia top views: {e}")
+            self._available = False
+
Evidence
该源在请求失败时将 _available 置为 False,并在 fetch_queries 中直接根据该标记返回空结果;代码中没有任何在成功路径将 _available 设回 True
的逻辑,因此会“不可恢复”。同时 API 日期基于本地时间计算,可能请求到 Wikipedia Pageviews 尚未就绪的日期,引发非 200,进而永久禁用。

src/search/query_sources/wikipedia_top_views_source.py[45-46]
src/search/query_sources/wikipedia_top_views_source.py[116-120]
src/search/query_sources/wikipedia_top_views_source.py[141-155]
src/search/query_sources/wikipedia_top_views_source.py[191-200]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
WikipediaTopViewsSource 在任何一次 API 异常/非 200 后会把 `_available` 置为 `False`,但成功路径不会置回 `True`,导致源永久不可用(`fetch_queries()` 直接返回 `[]`)。另外 API 日期基于 `datetime.now()`(本地时区),可能请求到目标日期数据尚未生成而触发非 200,从而放大永久禁用问题。
### Issue Context
QueryEngine 会持续调用该 source 的 `fetch_queries()`;当前实现会在 `_available=False` 后持续进行网络请求但仍返回空,既影响功能又浪费资源。
### Fix Focus Areas
- src/search/query_sources/wikipedia_top_views_source.py[116-210]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. DashboardClient 0 积分解析错误 🐞 Bug ✓ Correctness
Description
DashboardClient 用 availablePoints or pointsBalance 取值,若 availablePoints 为 0 会被当成 falsy 而错误回退到
pointsBalance,可能返回 None 或错误字段。并且 base_url 来自 REWARDS_URLS(带尾部 /),与 f"{base_url}/api/..." 拼接会产生双斜杠
URL,存在兼容性风险。
Code

src/api/dashboard_client.py[R28-97]

+        self.page = page
+        self._cached_points: int | None = None
+        self._base_url = REWARDS_URLS.get("dashboard", "https://rewards.bing.com")
+
+    async def get_current_points(self) -> int | None:
+        """
+        Get current points from Dashboard API
+
+        Attempts to fetch points via API call first, falls back to
+        parsing page content if API fails.
+
+        Returns:
+            Points balance or None if unable to determine
+        """
+        try:
+            points = await self._fetch_points_via_api()
+            if points is not None and points >= 0:
+                self._cached_points = points
+                return points
+        except TimeoutError as e:
+            logger.warning(f"API request timeout: {e}")
+        except ConnectionError as e:
+            logger.warning(f"API connection error: {e}")
+        except Exception as e:
+            logger.warning(f"API call failed: {e}")
+
+        try:
+            points = await self._fetch_points_via_page_content()
+            if points is not None and points >= 0:
+                self._cached_points = points
+                return points
+        except Exception as e:
+            logger.debug(f"Page content parsing failed: {e}")
+
+        return self._cached_points
+
+    async def _fetch_points_via_api(self) -> int | None:
+        """
+        Fetch points via internal API endpoint
+
+        Returns:
+            Points balance or None
+        """
+        try:
+            api_url = f"{self._base_url}/api/getuserbalance"
+            response = await self.page.evaluate(
+                f"""
+                async () => {{
+                    try {{
+                        const resp = await fetch('{api_url}', {{
+                            method: 'GET',
+                            credentials: 'include'
+                        }});
+                        if (!resp.ok) return null;
+                        return await resp.json();
+                    }} catch {{
+                        return null;
+                    }}
+                }}
+                """
+            )
+
+            if response and isinstance(response, dict):
+                points = response.get("availablePoints") or response.get("pointsBalance")
+                if isinstance(points, (int, str)):
+                    return int(points) if isinstance(points, str) else points
+
+        except Exception as e:
+            logger.debug(f"API fetch error: {e}")
+
Evidence
当返回 JSON 中 availablePoints 为 0 时,or 表达式会转而读取 pointsBalance;如果该字段不存在,会导致解析失败。另
REWARDS_URLS['dashboard'] 定义为带尾部斜杠,当前拼接方式会生成 ...com//api/...

src/api/dashboard_client.py[28-31]
src/api/dashboard_client.py[71-94]
src/constants/urls.py[20-25]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
DashboardClient 解析 points 时使用 `response.get(&amp;quot;availablePoints&amp;quot;) or response.get(&amp;quot;pointsBalance&amp;quot;)`,会把 `0` 当作 falsy,导致返回错误字段/None。并且 `REWARDS_URLS[&amp;quot;dashboard&amp;quot;]` 带尾部 `/`,当前 `f&amp;quot;{base_url}/api/...&amp;quot;` 会产生双斜杠 URL。
### Issue Context
PointsDetector 会优先走 DashboardClient;该错误会导致在某些账户/响应下积分读取失败或不稳定。
### Fix Focus Areas
- src/api/dashboard_client.py[28-98]
- src/constants/urls.py[20-25]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. fetch 自动 PR 取错仓库 🐞 Bug ✓ Correctness
Description
manage_reviews.py 的 fetch 在未提供 --pr 时通过 gh pr view --json number 自动探测,但未使用用户传入的
--owner/--repo,导致结果依赖当前工作目录所属仓库,可能拉取到错误 PR 的评论并写入本地库。
Code

tools/manage_reviews.py[R172-186]

+    pr_number = args.pr
+    if pr_number is None:
+        try:
+            result = subprocess.run(
+                ["gh", "pr", "view", "--json", "number"],
+                capture_output=True,
+                text=True,
+                timeout=30,
+            )
+            if result.returncode == 0:
+                import json as json_module
+
+                pr_data = json_module.loads(result.stdout)
+                pr_number = pr_data.get("number")
+            else:
Evidence
CLI 参数显式要求 --owner/--repo,但 gh 自动探测命令没有绑定 repo,上下文将由当前目录决定;随后 resolver 会使用 args.owner/args.repo
去请求并保存数据,导致 PR 号与目标仓库不一致的风险。

tools/manage_reviews.py[172-186]
tools/manage_reviews.py[234-237]
tools/manage_reviews.py[486-491]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`fetch` 子命令在未显式提供 `--pr` 时会调用 `gh pr view --json number` 自动获取 PR 号,但该命令未使用 `--owner/--repo` 参数,导致 PR 号取决于当前工作目录仓库,从而可能为目标仓库拉取错误 PR。
### Issue Context
该工具会把拉取的评论写入本地数据库;一旦 PR 号错误,后续 resolve/verify-context 都会受到影响。
### Fix Focus Areas
- tools/manage_reviews.py[165-238]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

7. get_git_branch() swallows errors 📘 Rule violation ⛯ Reliability
Description
get_git_branch() and get_git_head_sha() catch all exceptions and return an empty string without
any logging, which creates silent failures. This reduces diagnosability and makes it hard to
understand why PR context metadata is missing.
Code

src/review/resolver.py[R19-44]

+def get_git_branch() -> str:
+    """获取当前 git 分支名称"""
+    try:
+        result = subprocess.run(
+            ["git", "branch", "--show-current"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+        return result.stdout.strip() if result.returncode == 0 else ""
+    except Exception:
+        return ""
+
+
+def get_git_head_sha() -> str:
+    """获取当前 HEAD commit SHA(前7位)"""
+    try:
+        result = subprocess.run(
+            ["git", "rev-parse", "--short", "HEAD"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+        return result.stdout.strip() if result.returncode == 0 else ""
+    except Exception:
+        return ""
Evidence
Compliance requires meaningful error handling and avoiding silent failures; these helpers swallow
all exceptions and provide no context/logging when git is unavailable or fails.

Rule 3: Generic: Robust Error Handling and Edge Case Management
src/review/resolver.py[19-44]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`get_git_branch()` / `get_git_head_sha()` swallow exceptions and return `&amp;quot;&amp;quot;` without logging, creating silent failures and making debugging difficult.
## Issue Context
These helpers are used to populate `ReviewMetadata.branch` and `ReviewMetadata.head_sha`; when they fail, users get no clue why context validation is missing.
## Fix Focus Areas
- src/review/resolver.py[19-44]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. PointsDetector logs points 📘 Rule violation ⛨ Security
Description
PointsDetector.get_current_points() logs the user's points balance at info level, which may be
sensitive account information. This increases the risk of leaking user/account data via logs.
Code

src/account/points_detector.py[R93-111]

+            # 优先使用 Dashboard API
+            try:
+                logger.debug("尝试使用 Dashboard API 获取积分...")
+                client = DashboardClient(page)
+                api_points: int | None = await client.get_current_points()
+                if api_points is not None and api_points >= 0:
+                    logger.info(f"✓ 从 API 获取积分: {api_points:,}")
+                    return int(api_points)
+            except TimeoutError as e:
+                logger.warning(f"API 请求超时: {e},使用 HTML 解析作为备用")
+            except ConnectionError as e:
+                logger.warning(f"API 连接异常: {e},使用 HTML 解析作为备用")
+            except Exception as e:
+                logger.warning(f"API 调用异常: {type(e).__name__}: {e},使用 HTML 解析作为备用")
+
+            # 备用:HTML 解析
           logger.debug("尝试从页面源码提取积分...")
           points = await self._extract_points_from_source(page)
Evidence
The secure logging rule prohibits sensitive/account data in logs; the new code logs points balances
in multiple info messages.

Rule 5: Generic: Secure Logging Practices
src/account/points_detector.py[93-114]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Points balances are logged in plaintext at `info` level, which may be considered sensitive account data and can leak through log aggregation.
## Issue Context
Secure logging practices require keeping sensitive/user data out of logs.
## Fix Focus Areas
- src/account/points_detector.py[93-114]
- src/account/points_detector.py[129-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. lang not validated 📘 Rule violation ⛨ Security
Description
WikipediaTopViewsSource reads lang from config and uses it to construct a request URL without
validating its format. This can lead to unexpected outbound requests or malformed URLs if the config
is untrusted or misconfigured.
Code

src/search/query_sources/wikipedia_top_views_source.py[R42-45]

+        self.timeout = config.get("query_engine.sources.wikipedia_top_views.timeout", 30)
+        self.lang = config.get("query_engine.sources.wikipedia_top_views.lang", "en")
+        self.cache_ttl = config.get("query_engine.sources.wikipedia_top_views.ttl", 6 * 3600)
+        self._available: bool = True
Evidence
The input validation rule requires sanitizing external inputs; lang is taken from config and
interpolated into a URL path without validation.

Rule 6: Generic: Security-First Input Validation and Data Handling
src/search/query_sources/wikipedia_top_views_source.py[42-45]
src/search/query_sources/wikipedia_top_views_source.py[121-126]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`lang` is sourced from config and interpolated into the outbound request URL without validation.
## Issue Context
Even if config is usually trusted, validation prevents malformed requests and reduces risk if config becomes user-controlled or corrupted.
## Fix Focus Areas
- src/search/query_sources/wikipedia_top_views_source.py[42-45]
- src/search/query_sources/wikipedia_top_views_source.py[121-126]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
10. 优先级后来源归因错 🐞 Bug ✓ Correctness
Description
QueryEngine 新增按优先级排序后,去重会保留“高优先级源的首次出现”,但 _query_sources 却在汇总阶段被后续低优先级源覆盖,导致 get_query_source()
返回与实际保留的 query 不一致。该归因被 SearchEngine 用于记录搜索词来源,会误导统计/日志。
Code

src/search/query_engine.py[R152-153]

+        self.sources.sort(key=lambda s: s.get_priority())
+
Evidence
源被按优先级排序后,all_queries 的拼接顺序决定了去重保留谁(先出现者保留)。但 _fetch_from_sources 对同一 normalized query
会反复赋值,导致后出现(低优先级)的 source 覆盖映射;去重阶段仅过滤 key,不会修正 value。最终 SearchEngine 通过 get_query_source
获取来源时可能得到错误来源。

src/search/query_engine.py[152-153]
src/search/query_engine.py[210-225]
src/search/query_engine.py[290-303]
src/search/search_engine.py[119-123]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
引入 sources 按 priority 排序后,QueryEngine 的去重逻辑会保留先出现的 query(通常来自更高优先级源),但 `_query_sources` 的赋值会被后续源覆盖,导致 `get_query_source()` 归因与实际保留的 query 不一致。
### Issue Context
SearchEngine 会调用 `query_engine.get_query_source(term)` 记录/判断搜索词来源;错误归因会影响日志与统计,且与“优先级系统”的语义相悖。
### Fix Focus Areas
- src/search/query_engine.py[152-225]
- src/search/query_engine.py[280-334]
- src/search/search_engine.py[119-123]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Grey Divider

Previous review results

Review updated until commit b70faa4

Results up to commit 29c3e9c


🐞 Bugs (4) 📘 Rule violations (4) 📎 Requirement gaps (0)

Grey Divider
Action required
1. cmd_fetch prints raw exception 📘 Rule violation ⛨ Security
Description
cmd_fetch returns a user-facing JSON message containing the raw exception string, which can leak
internal implementation details. This violates secure error handling expectations for user-visible
outputs.
Code

tools/manage_reviews.py[R226-228]

+        except Exception as e:
+            print(json.dumps({"success": False, "message": f"获取 PR 编号失败: {e}"}))
+            return
Evidence
The compliance rule requires user-facing errors to be generic; the code directly embeds {e} into
the message returned to the caller.

Rule 4: Generic: Secure Error Handling
tools/manage_reviews.py[226-228]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The CLI prints raw exception text in a user-facing JSON response, potentially leaking internal details.

## Issue Context
Compliance requires generic user-facing errors; detailed errors should go to internal logs only.

## Fix Focus Areas
- tools/manage_reviews.py[226-228]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. TopViews 永久不可用 🐞 Bug ⛯ Reliability
Description
WikipediaTopViewsSource 一旦遇到非 200/异常就把 _available 置为 False,但成功时从不恢复为
True,导致一次临时故障后该源永久返回空列表。并且日期用本地时间计算(datetime.now),在部分时区可能请求到 API 尚未生成的数据,从而触发上述永久禁用。
Code

src/search/query_sources/wikipedia_top_views_source.py[R116-210]

+    def _get_api_date(self) -> tuple[str, str, str]:
+        """Get yesterday's date for API call"""
+        yesterday = datetime.now() - timedelta(days=1)
+        return (str(yesterday.year), f"{yesterday.month:02d}", f"{yesterday.day:02d}")
+
+    def _build_api_url(self) -> str:
+        """Build API URL using constants"""
+        base_url = QUERY_SOURCE_URLS["wikipedia_top_views"]
+        yyyy, mm, dd = self._get_api_date()
+        return f"{base_url}/{self.lang}.wikipedia/all-access/{yyyy}/{mm}/{dd}"
+
+    async def _fetch_top_articles(self, session: aiohttp.ClientSession) -> list[dict[str, Any]]:
+        """
+        Fetch top articles from Wikipedia API
+
+        Args:
+            session: aiohttp session
+
+        Returns:
+            List of article objects
+        """
+        try:
+            url = self._build_api_url()
+            self.logger.debug(f"Fetching top articles from: {url}")
+
+            async with session.get(
+                url, timeout=aiohttp.ClientTimeout(total=self.timeout)
+            ) as response:
+                if response.status == 200:
+                    data = await response.json()
+                    if "items" in data and data["items"]:
+                        articles = data["items"][0].get("articles", [])
+                        return list(articles) if articles else []
+                else:
+                    self.logger.warning(f"Wikipedia API returned status {response.status}")
+                    self._available = False
+        except Exception as e:
+            self.logger.error(f"Error fetching top articles: {e}")
+            self._available = False
+        return []
+
+    def _filter_articles(self, articles: list[dict]) -> list[str]:
+        """
+        Filter out non-article entries
+
+        Args:
+            articles: List of article objects
+
+        Returns:
+            List of filtered article titles
+        """
+        filtered = []
+        for article in articles:
+            title = article.get("article", "")
+            if not any(title.startswith(prefix) for prefix in self.EXCLUDED_PREFIXES):
+                filtered.append(title.replace("_", " "))
+        return filtered
+
+    async def fetch_queries(self, count: int) -> list[str]:
+        """
+        Fetch queries from Wikipedia Pageviews API
+
+        Args:
+            count: Number of queries to fetch
+
+        Returns:
+            List of query strings
+        """
+        if self._is_cache_valid():
+            self.logger.debug("Cache hit for Wikipedia top views")
+            return self._get_from_cache(count)
+
+        self._cache_misses += 1
+        queries = []
+
+        try:
+            session = await self._get_session()
+            articles = await self._fetch_top_articles(session)
+
+            if not self._available:
+                return []
+
+            filtered_articles = self._filter_articles(articles)
+            queries = filtered_articles[:count]
+
+            if filtered_articles:
+                self._cache_articles(filtered_articles)
+                self.logger.debug(f"Cached {len(filtered_articles)} top articles")
+
+            self.logger.debug(f"Fetched {len(queries)} queries from Wikipedia top views")
+
+        except Exception as e:
+            self.logger.error(f"Failed to fetch queries from Wikipedia top views: {e}")
+            self._available = False
+
Evidence
该源在请求失败时将 _available 置为 False,并在 fetch_queries 中直接根据该标记返回空结果;代码中没有任何在成功路径将 _available 设回 True
的逻辑,因此会“不可恢复”。同时 API 日期基于本地时间计算,可能请求到 Wikipedia Pageviews 尚未就绪的日期,引发非 200,进而永久禁用。

src/search/query_sources/wikipedia_top_views_source.py[45-46]
src/search/query_sources/wikipedia_top_views_source.py[116-120]
src/search/query_sources/wikipedia_top_views_source.py[141-155]
src/search/query_sources/wikipedia_top_views_source.py[191-200]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
WikipediaTopViewsSource 在任何一次 API 异常/非 200 后会把 `_available` 置为 `False`,但成功路径不会置回 `True`,导致源永久不可用(`fetch_queries()` 直接返回 `[]`)。另外 API 日期基于 `datetime.now()`(本地时区),可能请求到目标日期数据尚未生成而触发非 200,从而放大永久禁用问题。

### Issue Context
QueryEngine 会持续调用该 source 的 `fetch_queries()`;当前实现会在 `_available=False` 后持续进行网络请求但仍返回空,既影响功能又浪费资源。

### Fix Focus Areas
- src/search/query_sources/wikipedia_top_views_source.py[116-210]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. DashboardClient 0 积分解析错误 🐞 Bug ✓ Correctness
Description
DashboardClient 用 availablePoints or pointsBalance 取值,若 availablePoints 为 0 会被当成 falsy 而错误回退到
pointsBalance,可能返回 None 或错误字段。并且 base_url 来自 REWARDS_URLS(带尾部 /),与 f"{base_url}/api/..." 拼接会产生双斜杠
URL,存在兼容性风险。
Code

src/api/dashboard_client.py[R28-97]

+        self.page = page
+        self._cached_points: int | None = None
+        self._base_url = REWARDS_URLS.get("dashboard", "https://rewards.bing.com")
+
+    async def get_current_points(self) -> int | None:
+        """
+        Get current points from Dashboard API
+
+        Attempts to fetch points via API call first, falls back to
+        parsing page content if API fails.
+
+        Returns:
+            Points balance or None if unable to determine
+        """
+        try:
+            points = await self._fetch_points_via_api()
+            if points is not None and points >= 0:
+                self._cached_points = points
+                return points
+        except TimeoutError as e:
+            logger.warning(f"API request timeout: {e}")
+        except ConnectionError as e:
+            logger.warning(f"API connection error: {e}")
+        except Exception as e:
+            logger.warning(f"API call failed: {e}")
+
+        try:
+            points = await self._fetch_points_via_page_content()
+            if points is not None and points >= 0:
+                self._cached_points = points
+                return points
+        except Exception as e:
+            logger.debug(f"Page content parsing failed: {e}")
+
+        return self._cached_points
+
+    async def _fetch_points_via_api(self) -> int | None:
+        """
+        Fetch points via internal API endpoint
+
+        Returns:
+            Points balance or None
+        """
+        try:
+            api_url = f"{self._base_url}/api/getuserbalance"
+            response = await self.page.evaluate(
+                f"""
+                async () => {{
+                    try {{
+                        const resp = await fetch('{api_url}', {{
+                            method: 'GET',
+                            credentials: 'include'
+                        }});
+                        if (!resp.ok) return null;
+                        return await resp.json();
+                    }} catch {{
+                        return null;
+                    }}
+                }}
+                """
+            )
+
+            if response and isinstance(response, dict):
+                points = response.get("availablePoints") or response.get("pointsBalance")
+                if isinstance(points, (int, str)):
+                    return int(points) if isinstance(points, str) else points
+
+        except Exception as e:
+            logger.debug(f"API fetch error: {e}")
+
Evidence
当返回 JSON 中 availablePoints 为 0 时,or 表达式会转而读取 pointsBalance;如果该字段不存在,会导致解析失败。另
REWARDS_URLS['dashboard'] 定义为带尾部斜杠,当前拼接方式会生成 ...com//api/...

src/api/dashboard_client.py[28-31]
src/api/dashboard_client.py[71-94]
src/constants/urls.py[20-25]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
DashboardClient 解析 points 时使用 `response.get(&quot;availablePoints&quot;) or response.get(&quot;pointsBalance&quot;)`,会把 `0` 当作 falsy,导致返回错误字段/None。并且 `REWARDS_URLS[&quot;dashboard&quot;]` 带尾部 `/`,当前 `f&quot;{base_url}/api/...&quot;` 会产生双斜杠 URL。

### Issue Context
PointsDetector 会优先走 DashboardClient;该错误会导致在某些账户/响应下积分读取失败或不稳定。

### Fix Focus Areas
- src/api/dashboard_client.py[28-98]
- src/constants/urls.py[20-25]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. fetch 自动 PR 取错仓库 🐞 Bug ✓ Correctness
Description
manage_reviews.py 的 fetch 在未提供 --pr 时通过 gh pr view --json number 自动探测,但未使用用户传入的
--owner/--repo,导致结果依赖当前工作目录所属仓库,可能拉取到错误 PR 的评论并写入本地库。
Code

tools/manage_reviews.py[R172-186]

+    pr_number = args.pr
+    if pr_number is None:
+        try:
+            result = subprocess.run(
+                ["gh", "pr", "view", "--json", "number"],
+                capture_output=True,
+                text=True,
+                timeout=30,
+            )
+            if result.returncode == 0:
+                import json as json_module
+
+                pr_data = json_module.loads(result.stdout)
+                pr_number = pr_data.get("number")
+            else:
Evidence
CLI 参数显式要求 --owner/--repo,但 gh 自动探测命令没有绑定 repo,上下文将由当前目录决定;随后 resolver 会使用 args.owner/args.repo
去请求并保存数据,导致 PR 号与目标仓库不一致的风险。

tools/manage_reviews.py[172-186]
tools/manage_reviews.py[234-237]
tools/manage_reviews.py[486-491]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`fetch` 子命令在未显式提供 `--pr` 时会调用 `gh pr view --json number` 自动获取 PR 号,但该命令未使用 `--owner/--repo` 参数,导致 PR 号取决于当前工作目录仓库,从而可能为目标仓库拉取错误 PR。

### Issue Context
该工具会把拉取的评论写入本地数据库;一旦 PR 号错误,后续 resolve/verify-context 都会受到影响。

### Fix Focus Areas
- tools/manage_reviews.py[165-238]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
5. get_git_branch() swallows errors 📘 Rule violation ⛯ Reliability
Description
get_git_branch() and get_git_head_sha() catch all exceptions and return an empty string without
any logging, which creates silent failures. This reduces diagnosability and makes it hard to
understand why PR context metadata is missing.
Code

src/review/resolver.py[R19-44]

+def get_git_branch() -> str:
+    """获取当前 git 分支名称"""
+    try:
+        result = subprocess.run(
+            ["git", "branch", "--show-current"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+        return result.stdout.strip() if result.returncode == 0 else ""
+    except Exception:
+        return ""
+
+
+def get_git_head_sha() -> str:
+    """获取当前 HEAD commit SHA(前7位)"""
+    try:
+        result = subprocess.run(
+            ["git", "rev-parse", "--short", "HEAD"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+        return result.stdout.strip() if result.returncode == 0 else ""
+    except Exception:
+        return ""
Evidence
Compliance requires meaningful error handling and avoiding silent failures; these helpers swallow
all exceptions and provide no context/logging when git is unavailable or fails.

Rule 3: Generic: Robust Error Handling and Edge Case Management
src/review/resolver.py[19-44]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`get_git_branch()` / `get_git_head_sha()` swallow exceptions and return `&quot;&quot;` without logging, creating silent failures and making debugging difficult.

## Issue Context
These helpers are used to populate `ReviewMetadata.branch` and `ReviewMetadata.head_sha`; when they fail, users get no clue why context validation is missing.

## Fix Focus Areas
- src/review/resolver.py[19-44]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. PointsDetector logs points 📘 Rule violation ⛨ Security
Description
PointsDetector.get_current_points() logs the user's points balance at info level, which may be
sensitive account information. This increases the risk of leaking user/account data via logs.
Code

src/account/points_detector.py[R93-111]

+            # 优先使用 Dashboard API
+            try:
+                logger.debug("尝试使用 Dashboard API 获取积分...")
+                client = DashboardClient(page)
+                api_points: int | None = await client.get_current_points()
+                if api_points is not None and api_points >= 0:
+                    logger.info(f"✓ 从 API 获取积分: {api_points:,}")
+                    return int(api_points)
+            except TimeoutError as e:
+                logger.warning(f"API 请求超时: {e},使用 HTML 解析作为备用")
+            except ConnectionError as e:
+                logger.warning(f"API 连接异常: {e},使用 HTML 解析作为备用")
+            except Exception as e:
+                logger.warning(f"API 调用异常: {type(e).__name__}: {e},使用 HTML 解析作为备用")
+
+            # 备用:HTML 解析
            logger.debug("尝试从页面源码提取积分...")
            points = await self._extract_points_from_source(page)
Evidence
The secure logging rule prohibits sensitive/account data in logs; the new code logs points balances
in multiple info messages.

Rule 5: Generic: Secure Logging Practices
src/account/points_detector.py[93-114]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Points balances are logged in plaintext at `info` level, which may be considered sensitive account data and can leak through log aggregation.

## Issue Context
Secure logging practices require keeping sensitive/user data out of logs.

## Fix Focus Areas
- src/account/points_detector.py[93-114]
- src/account/points_detector.py[129-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. lang not validated 📘 Rule violation ⛨ Security
Description
WikipediaTopViewsSource reads lang from config and uses it to construct a request URL without
validating its format. This can lead to unexpected outbound requests or malformed URLs if the config
is untrusted or misconfigured.
Code

src/search/query_sources/wikipedia_top_views_source.py[R42-45]

+        self.timeout = config.get("query_engine.sources.wikipedia_top_views.timeout", 30)
+        self.lang = config.get("query_engine.sources.wikipedia_top_views.lang", "en")
+        self.cache_ttl = config.get("query_engine.sources.wikipedia_top_views.ttl", 6 * 3600)
+        self._available: bool = True
Evidence
The input validation rule requires sanitizing external inputs; lang is taken from config and
interpolated into a URL path without validation.

Rule 6: Generic: Security-First Input Validation and Data Handling
src/search/query_sources/wikipedia_top_views_source.py[42-45]
src/search/query_sources/wikipedia_top_views_source.py[121-126]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`lang` is sourced from config and interpolated into the outbound request URL without validation.

## Issue Context
Even if config is usually trusted, validation prevents malformed requests and reduces risk if config becomes user-controlled or corrupted.

## Fix Focus Areas
- src/search/query_sources/wikipedia_top_views_source.py[42-45]
- src/search/query_sources/wikipedia_top_views_source.py[121-126]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
8. 优先级后来源归因错 🐞 Bug ✓ Correctness
Description
QueryEngine 新增按优先级排序后,去重会保留“高优先级源的首次出现”,但 _query_sources 却在汇总阶段被后续低优先级源覆盖,导致 get_query_source()
返回与实际保留的 query 不一致。该归因被 SearchEngine 用于记录搜索词来源,会误导统计/日志。
Code

src/search/query_engine.py[R152-153]

+        self.sources.sort(key=lambda s: s.get_priority())
+
Evidence
源被按优先级排序后,all_queries 的拼接顺序决定了去重保留谁(先出现者保留)。但 _fetch_from_sources 对同一 normalized query
会反复赋值,导致后出现(低优先级)的 source 覆盖映射;去重阶段仅过滤 key,不会修正 value。最终 SearchEngine 通过 get_query_source
获取来源时可能得到错误来源。

src/search/query_engine.py[152-153]
src/search/query_engine.py[210-225]
src/search/query_engine.py[290-303]
src/search/search_engine.py[119-123]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
引入 sources 按 priority 排序后,QueryEngine 的去重逻辑会保留先出现的 query(通常来自更高优先级源),但 `_query_sources` 的赋值会被后续源覆盖,导致 `get_query_source()` 归因与实际保留的 query 不一致。

### Issue Context
SearchEngine 会调用 `query_engine.get_query_source(term)` 记录/判断搜索词来源;错误归因会影响日志与统计,且与“优先级系统”的语义相悖。

### Fix Focus Areas
- src/search/query_engine.py[152-225]
- src/search/query_engine.py[280-334]
- src/search/search_engine.py[119-123]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider Grey Divider

Qodo Logo

@sourcery-ai
Copy link

sourcery-ai bot commented Feb 26, 2026

Reviewer's Guide

实现了新的 WikipediaTopViewsSource 查询源,带有缓存和排除规则;将其接入查询引擎,支持按源可用性控制和显式优先级;引入 DashboardClient API 封装,被 PointsDetector 用于优先通过 API 获取积分并进行更安全的解析;同时增强评审管理 CLI 和元数据,使其具备分支 / head 感知能力,并能更稳健地处理 PR/Git 上下文。

使用 DashboardClient 并带 API 回退的 PointsDetector 时序图

sequenceDiagram
    actor User
    participant PointsDetector
    participant DashboardClient
    participant BrowserPage as Page
    participant RewardsAPI as MicrosoftRewardsAPI

    User->>PointsDetector: get_current_points(page, skip_navigation)
    PointsDetector->>BrowserPage: optional navigate to DASHBOARD_URL
    PointsDetector->>DashboardClient: create with page
    PointsDetector->>DashboardClient: get_current_points()

    DashboardClient->>DashboardClient: _fetch_points_via_api()
    DashboardClient->>BrowserPage: evaluate fetch(dashboard_balance)
    BrowserPage->>RewardsAPI: HTTP GET /api/getuserbalance
    RewardsAPI-->>BrowserPage: JSON availablePoints / pointsBalance
    BrowserPage-->>DashboardClient: JSON data
    DashboardClient->>DashboardClient: parse points
    alt API points ok
        DashboardClient-->>PointsDetector: points
        PointsDetector-->>User: points
    else API failed or invalid
        DashboardClient->>DashboardClient: _fetch_points_via_page_content()
        DashboardClient->>BrowserPage: content()
        BrowserPage-->>DashboardClient: HTML
        DashboardClient->>DashboardClient: regex search for points
        alt parsed from page
            DashboardClient-->>PointsDetector: fallback points
            PointsDetector-->>User: points
        else page parsing failed
            DashboardClient-->>PointsDetector: cached points or None
            PointsDetector-->>User: cached points or None
        end
    end
Loading

更新后的查询源和 QueryEngine 优先级的类图

classDiagram
    class QuerySource {
        <<abstract>>
        -config
        -logger
        +fetch_queries(count int) list~str~
        +get_source_name() str
        +is_available() bool
        +get_priority() int
    }

    class LocalFileSource {
        -base_terms list~str~
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +fetch_queries(count int) list~str~
    }

    class DuckDuckGoSource {
        -_available bool
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +fetch_queries(count int) list~str~
    }

    class WikipediaSource {
        -_available bool
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +fetch_queries(count int) list~str~
    }

    class BingSuggestionsSource {
        -_available bool
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +fetch_queries(count int) list~str~
    }

    class WikipediaTopViewsSource {
        -timeout int
        -lang str
        -cache_ttl int
        -_available bool
        -_session aiohttp.ClientSession
        -_cache_data list~str~
        -_cache_time float
        -_cache_hits int
        -_cache_misses int
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +get_cache_stats() dict
        +fetch_queries(count int) list~str~
        +close() None
        -_get_session() aiohttp.ClientSession
        -_is_cache_valid() bool
        -_get_from_cache(count int) list~str~
        -_cache_articles(articles list~str~) None
        -_get_api_date() tuple~str,str,str~
        -_build_api_url() str
        -_fetch_top_articles(session aiohttp.ClientSession) list~dict~
        -_filter_articles(articles list~dict~) list~str~
    }

    class QueryEngine {
        -config
        -logger
        -sources list~QuerySource~
        -_query_sources dict~str,str~
        +_init_sources() None
        +generate_queries(count int, expand bool) list~str~
        -_fetch_from_sources(count int) list~str~
        -_expand_queries(queries list~str~) list~str~
    }

    QuerySource <|-- LocalFileSource
    QuerySource <|-- DuckDuckGoSource
    QuerySource <|-- WikipediaSource
    QuerySource <|-- BingSuggestionsSource
    QuerySource <|-- WikipediaTopViewsSource

    QueryEngine "*" --> "many" QuerySource : manages
Loading

DashboardClient 与 PointsDetector 集成的类图

classDiagram
    class DashboardClient {
        -page Page
        -_cached_points int
        -_base_url str
        +DashboardClient(page Page)
        +get_current_points() int
        +get_dashboard_data() dict~str,Any~
        -_fetch_points_via_api() int
        -_fetch_points_via_page_content() int
    }

    class PointsDetector {
        -logger
        +DASHBOARD_URL str
        +POINTS_SELECTORS list~str~
        +PointsDetector()
        +get_current_points(page Page, skip_navigation bool) int
        -_extract_points_from_source(page Page) int
        -_parse_points(text str) int
        -_check_task_status(page Page, selectors list, task_name str) dict~str,bool,int~
    }

    PointsDetector ..> DashboardClient : uses for API points
Loading

增强后的评审元数据与解析器类图

classDiagram
    class ReviewMetadata {
        +pr_number int
        +owner str
        +repo str
        +branch str
        +head_sha str
        +last_updated str
        +version str
        +etag_comments str
        +etag_reviews str
    }

    class ReviewManager {
        +db_path str
        +get_metadata() ReviewMetadata
        +save_threads(threads list, metadata ReviewMetadata) None
        +save_overviews(overviews list, metadata ReviewMetadata) None
    }

    class ReviewResolver {
        +owner str
        +repo str
        -manager ReviewManager
        +ReviewResolver(token str, owner str, repo str, db_path str)
        +fetch_threads(pr_number int) dict
    }

    class GitHelpers {
        +get_git_branch() str
        +get_git_head_sha() str
    }

    ReviewResolver --> ReviewManager : persists
    ReviewResolver ..> ReviewMetadata : creates
    ReviewResolver ..> GitHelpers : reads branch/head_sha
Loading

文件层面的变更

Change Details Files
Add WikipediaTopViewsSource and integrate it into the query engine with caching, availability, and priority handling.
  • 实现由 Wikipedia Pageviews API 支持的 WikipediaTopViewsSource,包含语言校验、前缀排除规则,以及 6 小时缓存与缓存统计
  • 从 query_sources 包中导出 WikipediaTopViewsSource,并添加相应配置开关以控制 enable/timeout/lang/TTL
  • 更新 QuerySource 基类以定义默认的 get_priority,在现有源中实现 get_priority 覆盖,并在跳过 is_available()=False 的源的同时按优先级对所有源排序
  • 使用共享的 _query_sources 映射确保跨各查询源和 Bing 建议结果的查询去重
src/search/query_sources/wikipedia_top_views_source.py
src/search/query_sources/__init__.py
src/search/query_sources/query_source.py
src/search/query_sources/local_file_source.py
src/search/query_sources/duckduckgo_source.py
src/search/query_sources/wikipedia_source.py
src/search/query_sources/bing_suggestions_source.py
src/search/query_engine.py
config.example.yaml
tests/unit/test_online_query_sources.py
Introduce DashboardClient for Microsoft Rewards dashboard APIs and refactor PointsDetector to prefer API-based points retrieval with safer parsing and status handling.
  • 创建 DashboardClient,通过 page.evaluate 调用内部仪表盘余额和数据端点,带有缓存并在需要时回退到解析页面内容
  • 扩展 API_ENDPOINTS,加入 dashboard_balance 和 dashboard_data URL,并在 PointsDetector 中略微放宽 REWARDS_URLS 的使用
  • 重构 PointsDetector.get_current_points,使其优先尝试 DashboardClient,然后再进行 HTML/选择器解析,并改进日志记录、null/空白处理以及最小值过滤
  • 收紧 _parse_points,对空/空白/无效或越界的值进行处理,并让 _check_task_status 使用带类型的状态字段以及更安全的进度比较方式
src/api/dashboard_client.py
src/api/__init__.py
src/account/points_detector.py
src/constants/urls.py
tests/unit/test_dashboard_client.py
tests/unit/test_manage_reviews_cli.py
Enhance review metadata and CLI tooling to track git branch/head and validate local review context, plus auto-detect PR numbers via gh.
  • 扩展 ReviewMetadata,新增 branch 和 head_sha 字段,并将版本从 2.2 提升到 2.3,相应更新测试
  • 添加辅助函数 get_git_branch 和 get_git_head_sha,通过 shell 调用 git,并针对非 git 环境提供超时和安全回退
  • 更新 ReviewResolver.fetch_threads,在保存线程和总览信息时,用 branch/head 填充 ReviewMetadata
  • 为 manage_reviews 添加 verify-context 子命令,对比已存元数据中的分支/PR 与当前 git 分支,并输出包含警告和恢复建议的结构化 JSON
  • 修改 manage_reviews fetch 以接受可选的 --pr 参数,当缺失时回退到 gh pr view,并在 CLI 缺失、超时或认证失败时给出稳健的错误信息,同时在 main 中初始化日志
  • 为非 git 或错误场景下的 git 辅助函数行为,以及 verify-context/fetch 逻辑添加纯函数级别的测试
src/review/models.py
src/review/resolver.py
tools/manage_reviews.py
tests/unit/test_review_parsers.py
tests/unit/test_review_context.py
tests/unit/test_manage_reviews_cli.py
Adjust project dependencies and add testing support for HTTP and CLI-related behavior.
  • 将 playwright-stealth 限制为 <2.0,以避免未来的不兼容变更
  • 将之前的 test extra 合并进 dev,并将 respx 作为 dev 依赖用于 HTTP 客户端 mock
  • 新增单元测试,覆盖 WikipediaTopViewsSource 的行为、缓存、可用性和优先级排序,以及 CLI、DashboardClient、PointsDetector 解析逻辑和非 git 环境
pyproject.toml
tests/unit/test_online_query_sources.py
tests/unit/test_dashboard_client.py
tests/unit/test_manage_reviews_cli.py
tests/unit/test_review_context.py

Tips and commands

Interacting with Sourcery

  • 触发新评审: 在 pull request 上评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的评审评论。
  • 从评审评论生成 GitHub issue: 在回复某条评审评论时请求 Sourcery 从该评论创建 issue。你也可以回复评审评论并包含 @sourcery-ai issue 来从中创建 issue。
  • 生成 pull request 标题: 在 pull request 标题的任意位置写上 @sourcery-ai,即可在任意时间生成标题。你也可以在 pull request 中评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 pull request 摘要: 在 pull request 正文任意位置写上 @sourcery-ai summary,即可在任意时间于指定位置生成 PR 摘要。你也可以在 pull request 中评论 @sourcery-ai summary 来(重新)生成摘要。
  • 生成评审者指南: 在 pull request 中评论 @sourcery-ai guide,即可在任意时间(重新)生成评审者指南。
  • 解决所有 Sourcery 评论: 在 pull request 中评论 @sourcery-ai resolve,即可标记所有 Sourcery 评论为已解决。如果你已经处理完所有评论且不想继续看到它们,这会很有用。
  • 清除所有 Sourcery 评审: 在 pull request 中评论 @sourcery-ai dismiss,即可清除所有现有的 Sourcery 评审。如果你想从新的评审开始,这尤其有用——别忘了再评论 @sourcery-ai review 来触发新一轮评审!

Customizing Your Experience

访问你的 dashboard 以:

  • 启用或禁用诸如 Sourcery 自动生成的 pull request 摘要、评审者指南等评审功能。
  • 更改评审语言。
  • 添加、移除或编辑自定义评审指令。
  • 调整其他评审相关设置。

Getting Help

Original review guide in English

Reviewer's Guide

Implements a new WikipediaTopViewsSource query source with caching and exclusion rules, wires it into the query engine with per‑source availability and explicit priorities, introduces a DashboardClient API wrapper used by PointsDetector to prefer API-based points retrieval with safer parsing, and enhances the review management CLI and metadata with branch/head awareness and robust PR/ Git context handling.

Sequence diagram for PointsDetector using DashboardClient with API fallback

sequenceDiagram
    actor User
    participant PointsDetector
    participant DashboardClient
    participant BrowserPage as Page
    participant RewardsAPI as MicrosoftRewardsAPI

    User->>PointsDetector: get_current_points(page, skip_navigation)
    PointsDetector->>BrowserPage: optional navigate to DASHBOARD_URL
    PointsDetector->>DashboardClient: create with page
    PointsDetector->>DashboardClient: get_current_points()

    DashboardClient->>DashboardClient: _fetch_points_via_api()
    DashboardClient->>BrowserPage: evaluate fetch(dashboard_balance)
    BrowserPage->>RewardsAPI: HTTP GET /api/getuserbalance
    RewardsAPI-->>BrowserPage: JSON availablePoints / pointsBalance
    BrowserPage-->>DashboardClient: JSON data
    DashboardClient->>DashboardClient: parse points
    alt API points ok
        DashboardClient-->>PointsDetector: points
        PointsDetector-->>User: points
    else API failed or invalid
        DashboardClient->>DashboardClient: _fetch_points_via_page_content()
        DashboardClient->>BrowserPage: content()
        BrowserPage-->>DashboardClient: HTML
        DashboardClient->>DashboardClient: regex search for points
        alt parsed from page
            DashboardClient-->>PointsDetector: fallback points
            PointsDetector-->>User: points
        else page parsing failed
            DashboardClient-->>PointsDetector: cached points or None
            PointsDetector-->>User: cached points or None
        end
    end
Loading

Class diagram for updated query sources and QueryEngine prioritisation

classDiagram
    class QuerySource {
        <<abstract>>
        -config
        -logger
        +fetch_queries(count int) list~str~
        +get_source_name() str
        +is_available() bool
        +get_priority() int
    }

    class LocalFileSource {
        -base_terms list~str~
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +fetch_queries(count int) list~str~
    }

    class DuckDuckGoSource {
        -_available bool
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +fetch_queries(count int) list~str~
    }

    class WikipediaSource {
        -_available bool
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +fetch_queries(count int) list~str~
    }

    class BingSuggestionsSource {
        -_available bool
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +fetch_queries(count int) list~str~
    }

    class WikipediaTopViewsSource {
        -timeout int
        -lang str
        -cache_ttl int
        -_available bool
        -_session aiohttp.ClientSession
        -_cache_data list~str~
        -_cache_time float
        -_cache_hits int
        -_cache_misses int
        +get_source_name() str
        +get_priority() int
        +is_available() bool
        +get_cache_stats() dict
        +fetch_queries(count int) list~str~
        +close() None
        -_get_session() aiohttp.ClientSession
        -_is_cache_valid() bool
        -_get_from_cache(count int) list~str~
        -_cache_articles(articles list~str~) None
        -_get_api_date() tuple~str,str,str~
        -_build_api_url() str
        -_fetch_top_articles(session aiohttp.ClientSession) list~dict~
        -_filter_articles(articles list~dict~) list~str~
    }

    class QueryEngine {
        -config
        -logger
        -sources list~QuerySource~
        -_query_sources dict~str,str~
        +_init_sources() None
        +generate_queries(count int, expand bool) list~str~
        -_fetch_from_sources(count int) list~str~
        -_expand_queries(queries list~str~) list~str~
    }

    QuerySource <|-- LocalFileSource
    QuerySource <|-- DuckDuckGoSource
    QuerySource <|-- WikipediaSource
    QuerySource <|-- BingSuggestionsSource
    QuerySource <|-- WikipediaTopViewsSource

    QueryEngine "*" --> "many" QuerySource : manages
Loading

Class diagram for DashboardClient and PointsDetector integration

classDiagram
    class DashboardClient {
        -page Page
        -_cached_points int
        -_base_url str
        +DashboardClient(page Page)
        +get_current_points() int
        +get_dashboard_data() dict~str,Any~
        -_fetch_points_via_api() int
        -_fetch_points_via_page_content() int
    }

    class PointsDetector {
        -logger
        +DASHBOARD_URL str
        +POINTS_SELECTORS list~str~
        +PointsDetector()
        +get_current_points(page Page, skip_navigation bool) int
        -_extract_points_from_source(page Page) int
        -_parse_points(text str) int
        -_check_task_status(page Page, selectors list, task_name str) dict~str,bool,int~
    }

    PointsDetector ..> DashboardClient : uses for API points
Loading

Class diagram for enhanced review metadata and resolver

classDiagram
    class ReviewMetadata {
        +pr_number int
        +owner str
        +repo str
        +branch str
        +head_sha str
        +last_updated str
        +version str
        +etag_comments str
        +etag_reviews str
    }

    class ReviewManager {
        +db_path str
        +get_metadata() ReviewMetadata
        +save_threads(threads list, metadata ReviewMetadata) None
        +save_overviews(overviews list, metadata ReviewMetadata) None
    }

    class ReviewResolver {
        +owner str
        +repo str
        -manager ReviewManager
        +ReviewResolver(token str, owner str, repo str, db_path str)
        +fetch_threads(pr_number int) dict
    }

    class GitHelpers {
        +get_git_branch() str
        +get_git_head_sha() str
    }

    ReviewResolver --> ReviewManager : persists
    ReviewResolver ..> ReviewMetadata : creates
    ReviewResolver ..> GitHelpers : reads branch/head_sha
Loading

File-Level Changes

Change Details Files
Add WikipediaTopViewsSource and integrate it into the query engine with caching, availability, and priority handling.
  • Implement WikipediaTopViewsSource backed by Wikipedia Pageviews API with language validation, exclusion prefixes, and a 6‑hour cache plus cache stats
  • Expose WikipediaTopViewsSource from the query_sources package and add corresponding configuration knobs for enable/timeout/lang/TTL
  • Update QuerySource base class to define a default get_priority, implement get_priority overrides in existing sources, and sort all sources by priority while skipping those with is_available()=False
  • Ensure deduplication of queries across sources and Bing suggestions using the shared _query_sources map
src/search/query_sources/wikipedia_top_views_source.py
src/search/query_sources/__init__.py
src/search/query_sources/query_source.py
src/search/query_sources/local_file_source.py
src/search/query_sources/duckduckgo_source.py
src/search/query_sources/wikipedia_source.py
src/search/query_sources/bing_suggestions_source.py
src/search/query_engine.py
config.example.yaml
tests/unit/test_online_query_sources.py
Introduce DashboardClient for Microsoft Rewards dashboard APIs and refactor PointsDetector to prefer API-based points retrieval with safer parsing and status handling.
  • Create DashboardClient to call internal dashboard balance and data endpoints via page.evaluate, with caching and fallbacks to parsing page content
  • Extend API_ENDPOINTS with dashboard_balance and dashboard_data URLs and slightly relax REWARDS_URLS usage in PointsDetector
  • Refactor PointsDetector.get_current_points to try DashboardClient first, then HTML/selector parsing with improved logging, null/whitespace handling, and minimum-value filtering
  • Tighten _parse_points to handle empty/whitespace/invalid or out-of-range values and make _check_task_status use typed status fields and safer progress comparison
src/api/dashboard_client.py
src/api/__init__.py
src/account/points_detector.py
src/constants/urls.py
tests/unit/test_dashboard_client.py
tests/unit/test_manage_reviews_cli.py
Enhance review metadata and CLI tooling to track git branch/head and validate local review context, plus auto-detect PR numbers via gh.
  • Extend ReviewMetadata with branch and head_sha fields and bump version from 2.2 to 2.3, updating tests accordingly
  • Add helper functions get_git_branch and get_git_head_sha that shell out to git with timeouts and safe fallbacks for non‑git environments
  • Update ReviewResolver.fetch_threads to populate ReviewMetadata with branch/head when saving threads and overviews
  • Add manage_reviews verify-context subcommand to compare stored metadata branch/PR against the current git branch, emitting structured JSON with warnings and recovery guidance
  • Modify manage_reviews fetch to accept optional --pr, falling back to gh pr view with robust error messaging for missing CLI, timeouts, or auth failures, and initialise logging in main
  • Add tests for git helper behavior in non-git or error conditions and for verify-context/fetch logic at a pure-function level
src/review/models.py
src/review/resolver.py
tools/manage_reviews.py
tests/unit/test_review_parsers.py
tests/unit/test_review_context.py
tests/unit/test_manage_reviews_cli.py
Adjust project dependencies and add testing support for HTTP and CLI-related behavior.
  • Constrain playwright-stealth to <2.0 to avoid future breaking changes
  • Fold the previous test extra into dev and add respx as a dev dependency for HTTP client mocking
  • Add new unit tests for WikipediaTopViewsSource behavior, cache, availability, and priority ordering, as well as CLI, DashboardClient, PointsDetector parsing, and non‑git environments
pyproject.toml
tests/unit/test_online_query_sources.py
tests/unit/test_dashboard_client.py
tests/unit/test_manage_reviews_cli.py
tests/unit/test_review_context.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 4 个问题,并留下了一些总体反馈:

  • WikipediaTopViewsSource._fetch_top_articlesfetch_queries 中,只要出现非 200 响应或异常,就会把 _available 置为 False,这会在一次短暂错误后永久禁用该 source;建议区分“瞬时错误”和“致命错误”(例如:加入重试/退避逻辑、错误阈值,或者至少在第一次失败时不要直接修改 _available)。
  • verify-contextfetch 的 PR 自动检测逻辑在测试中被部分重新实现,而不是通过公共函数复用;同时 CLI 代码里也有较大块的内联子进程调用/错误处理逻辑。建议将这些逻辑提取成小的“纯”辅助函数,让 CLI 和测试都通过这些辅助函数调用,以减少重复并保证行为一致。
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
-`WikipediaTopViewsSource._fetch_top_articles``fetch_queries` 中,只要出现非 200 响应或异常,就会把 `_available` 置为 `False`,这会在一次短暂错误后永久禁用该 source;建议区分“瞬时错误”和“致命错误”(例如:加入重试/退避逻辑、错误阈值,或者至少在第一次失败时不要直接修改 `_available`)。
- `verify-context``fetch` 的 PR 自动检测逻辑在测试中被部分重新实现,而不是通过公共函数复用;同时 CLI 代码里也有较大块的内联子进程调用/错误处理逻辑。建议将这些逻辑提取成小的“纯”辅助函数,让 CLI 和测试都通过这些辅助函数调用,以减少重复并保证行为一致。

## Individual Comments

### Comment 1
<location path="src/account/points_detector.py" line_range="93-102" />
<code_context>
                 logger.debug("跳过导航,使用当前页面")
                 await page.wait_for_timeout(1000)

+            # 优先使用 Dashboard API
+            try:
+                logger.debug("尝试使用 Dashboard API 获取积分...")
+                client = DashboardClient(page)
+                api_points: int | None = await client.get_current_points()
+                if api_points is not None and api_points >= 0:
+                    logger.info(f"✓ 从 API 获取积分: {api_points:,}")
+                    return int(api_points)
+            except TimeoutError as e:
+                logger.warning(f"API 请求超时: {e},使用 HTML 解析作为备用")
+            except ConnectionError as e:
+                logger.warning(f"API 连接异常: {e},使用 HTML 解析作为备用")
+            except Exception as e:
+                logger.warning(f"API 调用异常: {type(e).__name__}: {e},使用 HTML 解析作为备用")
+
</code_context>
<issue_to_address>
**suggestion:** DashboardClient 内部已捕获异常,PointsDetector 中细粒度异常分支基本无法触发

`DashboardClient``_fetch_points_via_api` / `_fetch_points_via_page_content` 已经用 `except Exception` 统一吞掉并记日志,所以这里的 `TimeoutError` / `ConnectionError` 分支几乎不会被抛出,只会落到最后的 `except Exception`。建议要么在 `DashboardClient` 中改为抛出更具体的异常,要么此处直接用一个 `except Exception` 兜底,避免造成“按错误类型分别处理”的误导。

Suggested implementation:

```python
            # 优先使用 Dashboard API
            try:
                logger.debug("尝试使用 Dashboard API 获取积分...")
                client = DashboardClient(page)
                api_points: int | None = await client.get_current_points()
                if api_points is not None and api_points >= 0:
                    logger.info(f"✓ 从 API 获取积分: {api_points:,}")
                    return int(api_points)
            except Exception as e:
                # DashboardClient 内部已记录更详细的异常信息,这里仅作兜底并回退到 HTML 解析
                logger.warning(f"API 获取积分失败({type(e).__name__}: {e}),使用 HTML 解析作为备用")

```

如果你希望在 `DashboardClient` 中改为抛出更细粒度的异常(例如自定义的 `DashboardTimeoutError` / `DashboardConnectionError`),则需要在 `api/dashboard_client.py` 内将宽泛的 `except Exception` 拆分为具体异常并重新抛出;此时可以在 `PointsDetector` 中再按异常类型细分分支。但在当前实现不调整 `DashboardClient` 的前提下,上面的合并兜底处理即可满足你的评论建议。
</issue_to_address>

### Comment 2
<location path="src/search/query_sources/wikipedia_top_views_source.py" line_range="116-118" />
<code_context>
+        self._cache_data = articles
+        self._cache_time = time.monotonic()
+
+    def _get_api_date(self) -> tuple[str, str, str]:
+        """Get yesterday's date for API call"""
+        yesterday = datetime.now() - timedelta(days=1)
+        return (str(yesterday.year), f"{yesterday.month:02d}", f"{yesterday.day:02d}")
+
</code_context>
<issue_to_address>
**issue (bug_risk):** 使用本地时间计算“昨天”可能与 Wikipedia API 的日期窗口不一致

Pageviews API 按 UTC 日历日计算,用 `datetime.now()` 在本地时区(如 UTC+8)可能导致跨日偏差,出现“昨天”数据尚未生成或 404/空结果。建议改用 `datetime.utcnow()`,或统一使用显式带时区的时间来推算请求日期,以避免不同时区运行时的时间偏移。
</issue_to_address>

### Comment 3
<location path="tools/manage_reviews.py" line_range="174-183" />
<code_context>
                 await page.wait_for_timeout(1000)

+            # 优先使用 Dashboard API
+            try:
+                logger.debug("尝试使用 Dashboard API 获取积分...")
+                client = DashboardClient(page)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 获取当前分支失败时静默降级为空字符串,输出中缺少诊断信息

在 `cmd_verify_context` 中,`git branch --show-current` 失败时直接将 `current_branch` 置为 `""`,后续逻辑把它当作正常不匹配或无分支处理,无法区分“实际无分支”和“获取失败”。建议在 `except` 中至少打日志,或在最终 `output` 增加一个 `warning` 字段标明 Git 分支检测失败,以便定位环境问题。
</issue_to_address>

### Comment 4
<location path="tests/unit/test_online_query_sources.py" line_range="411-420" />
<code_context>
+        assert dashboard_client.page is mock_page
+        assert dashboard_client._cached_points is None
+
+    @pytest.mark.asyncio
+    async def test_get_current_points_api_success(self, dashboard_client, mock_page):
+        """Test get_current_points returns points via API"""
</code_context>
<issue_to_address>
**suggestion (testing):** 建议断言在发生严重 HTTP 错误后 WikipediaTopViewsSource 会被标记为不可用

生产实现中,当 `_fetch_top_articles` 抛出异常时会执行 `self._available = False`。在 `test_fetch_queries_handles_error` 中,你目前只断言返回了空列表。建议在断言空列表之后再断言 `wikipedia_top_views_source.is_available() is False`,以验证可用性标志被正确更新,并确保后续 `QueryEngine` 会跳过这个 source。
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得这次评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈持续改进评审质量。
Original comment in English

Hey - I've found 4 issues, and left some high level feedback:

  • In WikipediaTopViewsSource._fetch_top_articles and fetch_queries, _available is flipped to False on any non-200 response or exception, which will permanently disable this source after a transient error; consider distinguishing transient vs. fatal errors (e.g., retry/backoff, thresholding, or not mutating _available on first failure).
  • The verify-context and fetch PR auto-detection logic is partially reimplemented in tests instead of calling shared functions, and the CLI code contains fairly large inline subprocess/error-handling blocks; consider extracting these into small pure helpers that are called both from the CLI and tests to reduce duplication and keep behavior in sync.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `WikipediaTopViewsSource._fetch_top_articles` and `fetch_queries`, `_available` is flipped to `False` on any non-200 response or exception, which will permanently disable this source after a transient error; consider distinguishing transient vs. fatal errors (e.g., retry/backoff, thresholding, or not mutating `_available` on first failure).
- The `verify-context` and `fetch` PR auto-detection logic is partially reimplemented in tests instead of calling shared functions, and the CLI code contains fairly large inline subprocess/error-handling blocks; consider extracting these into small pure helpers that are called both from the CLI and tests to reduce duplication and keep behavior in sync.

## Individual Comments

### Comment 1
<location path="src/account/points_detector.py" line_range="93-102" />
<code_context>
                 logger.debug("跳过导航,使用当前页面")
                 await page.wait_for_timeout(1000)

+            # 优先使用 Dashboard API
+            try:
+                logger.debug("尝试使用 Dashboard API 获取积分...")
+                client = DashboardClient(page)
+                api_points: int | None = await client.get_current_points()
+                if api_points is not None and api_points >= 0:
+                    logger.info(f"✓ 从 API 获取积分: {api_points:,}")
+                    return int(api_points)
+            except TimeoutError as e:
+                logger.warning(f"API 请求超时: {e},使用 HTML 解析作为备用")
+            except ConnectionError as e:
+                logger.warning(f"API 连接异常: {e},使用 HTML 解析作为备用")
+            except Exception as e:
+                logger.warning(f"API 调用异常: {type(e).__name__}: {e},使用 HTML 解析作为备用")
+
</code_context>
<issue_to_address>
**suggestion:** DashboardClient 内部已捕获异常,PointsDetector 中细粒度异常分支基本无法触发

`DashboardClient``_fetch_points_via_api` / `_fetch_points_via_page_content` 已经用 `except Exception` 统一吞掉并记日志,所以这里的 `TimeoutError` / `ConnectionError` 分支几乎不会被抛出,只会落到最后的 `except Exception`。建议要么在 `DashboardClient` 中改为抛出更具体的异常,要么此处直接用一个 `except Exception` 兜底,避免造成“按错误类型分别处理”的误导。

Suggested implementation:

```python
            # 优先使用 Dashboard API
            try:
                logger.debug("尝试使用 Dashboard API 获取积分...")
                client = DashboardClient(page)
                api_points: int | None = await client.get_current_points()
                if api_points is not None and api_points >= 0:
                    logger.info(f"✓ 从 API 获取积分: {api_points:,}")
                    return int(api_points)
            except Exception as e:
                # DashboardClient 内部已记录更详细的异常信息,这里仅作兜底并回退到 HTML 解析
                logger.warning(f"API 获取积分失败({type(e).__name__}: {e}),使用 HTML 解析作为备用")

```

如果你希望在 `DashboardClient` 中改为抛出更细粒度的异常(例如自定义的 `DashboardTimeoutError` / `DashboardConnectionError`),则需要在 `api/dashboard_client.py` 内将宽泛的 `except Exception` 拆分为具体异常并重新抛出;此时可以在 `PointsDetector` 中再按异常类型细分分支。但在当前实现不调整 `DashboardClient` 的前提下,上面的合并兜底处理即可满足你的评论建议。
</issue_to_address>

### Comment 2
<location path="src/search/query_sources/wikipedia_top_views_source.py" line_range="116-118" />
<code_context>
+        self._cache_data = articles
+        self._cache_time = time.monotonic()
+
+    def _get_api_date(self) -> tuple[str, str, str]:
+        """Get yesterday's date for API call"""
+        yesterday = datetime.now() - timedelta(days=1)
+        return (str(yesterday.year), f"{yesterday.month:02d}", f"{yesterday.day:02d}")
+
</code_context>
<issue_to_address>
**issue (bug_risk):** 使用本地时间计算“昨天”可能与 Wikipedia API 的日期窗口不一致

Pageviews API 按 UTC 日历日计算,用 `datetime.now()` 在本地时区(如 UTC+8)可能导致跨日偏差,出现“昨天”数据尚未生成或 404/空结果。建议改用 `datetime.utcnow()`,或统一使用显式带时区的时间来推算请求日期,以避免不同时区运行时的时间偏移。
</issue_to_address>

### Comment 3
<location path="tools/manage_reviews.py" line_range="174-183" />
<code_context>
                 await page.wait_for_timeout(1000)

+            # 优先使用 Dashboard API
+            try:
+                logger.debug("尝试使用 Dashboard API 获取积分...")
+                client = DashboardClient(page)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 获取当前分支失败时静默降级为空字符串,输出中缺少诊断信息

在 `cmd_verify_context` 中,`git branch --show-current` 失败时直接将 `current_branch` 置为 `""`,后续逻辑把它当作正常不匹配或无分支处理,无法区分“实际无分支”和“获取失败”。建议在 `except` 中至少打日志,或在最终 `output` 增加一个 `warning` 字段标明 Git 分支检测失败,以便定位环境问题。
</issue_to_address>

### Comment 4
<location path="tests/unit/test_online_query_sources.py" line_range="411-420" />
<code_context>
+        assert dashboard_client.page is mock_page
+        assert dashboard_client._cached_points is None
+
+    @pytest.mark.asyncio
+    async def test_get_current_points_api_success(self, dashboard_client, mock_page):
+        """Test get_current_points returns points via API"""
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting that WikipediaTopViewsSource becomes unavailable after a hard HTTP error

The production implementation sets `self._available = False` when `_fetch_top_articles` raises. In `test_fetch_queries_handles_error`, you only assert that an empty list is returned. Please also assert that `wikipedia_top_views_source.is_available() is False` afterward to verify the availability flag and ensure `QueryEngine` will skip this source subsequently.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 为搜索查询引擎新增 Wikipedia Top Views 查询源,并扩展本地 Review/评论管理工具的上下文追踪能力,同时引入 Dashboard API 的积分获取路径以提升稳定性。

Changes:

  • 新增 WikipediaTopViewsSource(Pageviews API + 6 小时缓存)并在 QueryEngine 中按优先级排序查询源
  • Review 元数据增加 branch/head_sha,并在 manage_reviews.py 中加入 verify-contextfetch 自动探测 PR 号能力
  • 新增 DashboardClient,PointsDetector 优先走 Dashboard API,失败回退到 HTML 解析;补充相应单测与依赖

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tools/manage_reviews.py fetch 支持自动探测 PR、增加 verify-context 子命令
tests/unit/test_review_parsers.py 更新 ReviewMetadata 版本断言到 2.3
tests/unit/test_review_context.py 覆盖 git 分支/SHA 获取与 metadata 新字段
tests/unit/test_online_query_sources.py 增加 WikipediaTopViewsSource 与优先级排序相关测试
tests/unit/test_manage_reviews_cli.py 增加 CLI/积分检测相关测试(但当前多为“复写逻辑”式断言)
tests/unit/test_dashboard_client.py 新增 DashboardClient 单测
src/search/query_sources/wikipedia_top_views_source.py 新增 Wikipedia Top Views 查询源实现与缓存
src/search/query_sources/wikipedia_source.py 添加 get_priority()
src/search/query_sources/query_source.py 基类新增默认 get_priority()
src/search/query_sources/local_file_source.py 添加 get_priority()
src/search/query_sources/duckduckgo_source.py 添加 get_priority()
src/search/query_sources/bing_suggestions_source.py 添加 get_priority()
src/search/query_sources/init.py 导出 WikipediaTopViewsSource
src/search/query_engine.py 初始化时跳过不可用源、加入 WikipediaTopViewsSource,并按优先级排序
src/review/resolver.py 保存 metadata 时写入 branch/head_sha
src/review/models.py ReviewMetadata 新增 branch/head_sha,版本更新为 2.3
src/api/dashboard_client.py 新增 Dashboard API 客户端(API 优先、HTML 回退)
src/api/init.py 导出 DashboardClient
src/account/points_detector.py 积分获取改为 API 优先 + HTML 回退,解析逻辑微调
pyproject.toml test 额外依赖增加 httpx/respx
config.example.yaml 新增 wikipedia_top_views 配置示例

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Disaster-Terminator Disaster-Terminator force-pushed the feature/query-sources branch 6 times, most recently from 474cb33 to b70faa4 Compare February 28, 2026 03:12
@Disaster-Terminator
Copy link
Owner Author

@sourcery-ai review

@Disaster-Terminator
Copy link
Owner Author

/agentic_review

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 28, 2026

Persistent review updated to latest commit b70faa4

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了两个问题,并给出了一些整体性的反馈:

  • 各个查询源(LocalFileSource、DuckDuckGoSource、WikipediaSource、BingSuggestionsSource、WikipediaTopViewsSource)的 get_priority() 返回值与 PR 描述中声明的优先级(LocalFile=50 → TopViews=80 → DuckDuckGo=90 → Wikipedia=100 → Bing=110)以及新测试用例的预期不一致;建议统一这些数值,使排序结果与文档/测试保持一致。
  • WikipediaTopViewsSource 中,get_priority() 当前返回 120,而测试和描述中期望的是 80,这会改变它相对于其他数据源的排序位置;请更新优先级值或测试/文档,以确保预期的排序结果清晰明确。
供 AI Agent 使用的提示词
Please address the comments from this code review:

## Overall Comments
- 各个查询源(LocalFileSource、DuckDuckGoSource、WikipediaSource、BingSuggestionsSource、WikipediaTopViewsSource)的 `get_priority()` 返回值与 PR 描述中声明的优先级(LocalFile=50 → TopViews=80 → DuckDuckGo=90 → Wikipedia=100 → Bing=110)以及新测试用例的预期不一致;建议统一这些数值,使排序结果与文档/测试保持一致。
-`WikipediaTopViewsSource` 中,`get_priority()` 当前返回 120,而测试和描述中期望的是 80,这会改变它相对于其他数据源的排序位置;请更新优先级值或测试/文档,以确保预期的排序结果清晰明确。

## Individual Comments

### Comment 1
<location path="src/api/dashboard_client.py" line_range="126" />
<code_context>
+                match = re.search(pattern, content)
+                if match:
+                    points = int(match.group(1))
+                    if 0 <= points <= 1000000:
+                        return points
+
</code_context>
<issue_to_address>
**issue:** 1000000 分的硬性上限可能会错误地过滤掉合法的高余额。

这一约束假设余额永远不会超过一百万,但对于生命周期很长或交易量很大的账户,这个假设可能并不成立,从而导致更大但合法的数值被丢弃。建议放宽这个上限、完全移除它,或者从配置中派生该限制,而不是将其硬编码。
</issue_to_address>

### Comment 2
<location path="tests/unit/test_online_query_sources.py" line_range="404-411" />
<code_context>
+        assert "Main_Page" in wikipedia_top_views_source.EXCLUDED_PREFIXES
+        assert "Special:" in wikipedia_top_views_source.EXCLUDED_PREFIXES
+
+    def test_cache_stats_initial(self, wikipedia_top_views_source):
+        """Test initial cache stats"""
+        stats = wikipedia_top_views_source.get_cache_stats()
+        assert stats["hits"] == 0
+        assert stats["misses"] == 0
+        assert stats["hit_rate"] == 0
+
+    @pytest.mark.asyncio
</code_context>
<issue_to_address>
**suggestion (testing):** 建议为 WikipediaTopViewsSource 的缓存过期行为增加一个测试

当前测试覆盖了初始统计信息和缓存命中,但没有覆盖基于 TTL 的失效逻辑。请添加一个测试,让 `_is_cache_valid()` 返回 `False`(例如通过调整 `_cache_time` 或 patch `time.monotonic`),然后断言随后的 `fetch_queries` 调用会产生一次缓存未命中,并触发新的 HTTP 请求。这样可以直接测试 TTL 逻辑,并防止无限期地返回陈旧数据。

```suggestion
    def test_cache_stats_initial(self, wikipedia_top_views_source):
        """Test initial cache stats"""
        stats = wikipedia_top_views_source.get_cache_stats()
        assert stats["hits"] == 0
        assert stats["misses"] == 0
        assert stats["hit_rate"] == 0

    @pytest.mark.asyncio
    async def test_cache_expiration_triggers_new_fetch(self, wikipedia_top_views_source):
        """Test that expired cache forces a new HTTP request and records a cache miss."""
        mock_session = AsyncMock()
        mock_response = AsyncMock()
        mock_response.status = 200
        mock_response.json = AsyncMock(
            return_value={
                "items": [
                    {
                        "articles": [
                            {"article": "Article1", "views": 100},
                            {"article": "Article2", "views": 90},
                        ]
                    }
                ]
            }
        )
        mock_session.get = AsyncMock(return_value=mock_response)

        # Initial fetch should populate the cache and record a miss
        initial_stats = wikipedia_top_views_source.get_cache_stats()
        await wikipedia_top_views_source.fetch_queries(mock_session)
        stats_after_first_fetch = wikipedia_top_views_source.get_cache_stats()
        assert mock_session.get.call_count == 1
        assert stats_after_first_fetch["misses"] == initial_stats["misses"] + 1

        # Force cache expiration by moving the cache timestamp sufficiently into the past
        wikipedia_top_views_source._cache_time -= (
            wikipedia_top_views_source._cache_ttl + 1
        )

        # Second fetch should detect expired cache, perform a new HTTP request, and record another miss
        await wikipedia_top_views_source.fetch_queries(mock_session)
        stats_after_second_fetch = wikipedia_top_views_source.get_cache_stats()
        assert mock_session.get.call_count == 2
        assert stats_after_second_fetch["misses"] == stats_after_first_fetch["misses"] + 1

    @pytest.mark.asyncio
```

Sourcery 对开源项目是免费的——如果你觉得这些 review 有帮助,欢迎分享 ✨
帮我变得更有用!请对每条评论点 👍 或 👎,我会根据反馈改进后续的 review。
Original comment in English

Hey - I've found 2 issues, and left some high level feedback:

  • The get_priority() values for the various query sources (LocalFileSource, DuckDuckGoSource, WikipediaSource, BingSuggestionsSource, WikipediaTopViewsSource) don’t match the priorities described in the PR (LocalFile=50 → TopViews=80 → DuckDuckGo=90 → Wikipedia=100 → Bing=110) and the new tests’ expectations; consider aligning the numeric values so the sort order and documentation/tests are consistent.
  • In WikipediaTopViewsSource, get_priority() currently returns 120 while the tests and description expect 80, which will change its relative ordering vs other sources; update either the priority or the tests/docs so the intended ranking is unambiguous.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `get_priority()` values for the various query sources (LocalFileSource, DuckDuckGoSource, WikipediaSource, BingSuggestionsSource, WikipediaTopViewsSource) don’t match the priorities described in the PR (LocalFile=50 → TopViews=80 → DuckDuckGo=90 → Wikipedia=100 → Bing=110) and the new tests’ expectations; consider aligning the numeric values so the sort order and documentation/tests are consistent.
- In `WikipediaTopViewsSource`, `get_priority()` currently returns 120 while the tests and description expect 80, which will change its relative ordering vs other sources; update either the priority or the tests/docs so the intended ranking is unambiguous.

## Individual Comments

### Comment 1
<location path="src/api/dashboard_client.py" line_range="126" />
<code_context>
+                match = re.search(pattern, content)
+                if match:
+                    points = int(match.group(1))
+                    if 0 <= points <= 1000000:
+                        return points
+
</code_context>
<issue_to_address>
**issue:** The hard upper bound of 1,000,000 points may incorrectly filter valid high balances.

This constraint assumes balances never exceed one million, which may not hold for long‑lived or high‑volume accounts and would cause larger but valid values to be dropped. Consider relaxing the upper bound, removing it, or deriving it from configuration instead of hard‑coding this limit.
</issue_to_address>

### Comment 2
<location path="tests/unit/test_online_query_sources.py" line_range="404-411" />
<code_context>
+        assert "Main_Page" in wikipedia_top_views_source.EXCLUDED_PREFIXES
+        assert "Special:" in wikipedia_top_views_source.EXCLUDED_PREFIXES
+
+    def test_cache_stats_initial(self, wikipedia_top_views_source):
+        """Test initial cache stats"""
+        stats = wikipedia_top_views_source.get_cache_stats()
+        assert stats["hits"] == 0
+        assert stats["misses"] == 0
+        assert stats["hit_rate"] == 0
+
+    @pytest.mark.asyncio
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for WikipediaTopViewsSource cache expiration behavior

Current tests cover initial stats and cache hits, but not TTL-based invalidation. Please add a test that forces `_is_cache_valid()` to return `False` (e.g., by adjusting `_cache_time` or patching `time.monotonic`) and then asserts that a subsequent `fetch_queries` call incurs a cache miss and triggers a new HTTP request. This will directly exercise the TTL logic and guard against stale data being served indefinitely.

```suggestion
    def test_cache_stats_initial(self, wikipedia_top_views_source):
        """Test initial cache stats"""
        stats = wikipedia_top_views_source.get_cache_stats()
        assert stats["hits"] == 0
        assert stats["misses"] == 0
        assert stats["hit_rate"] == 0

    @pytest.mark.asyncio
    async def test_cache_expiration_triggers_new_fetch(self, wikipedia_top_views_source):
        """Test that expired cache forces a new HTTP request and records a cache miss."""
        mock_session = AsyncMock()
        mock_response = AsyncMock()
        mock_response.status = 200
        mock_response.json = AsyncMock(
            return_value={
                "items": [
                    {
                        "articles": [
                            {"article": "Article1", "views": 100},
                            {"article": "Article2", "views": 90},
                        ]
                    }
                ]
            }
        )
        mock_session.get = AsyncMock(return_value=mock_response)

        # Initial fetch should populate the cache and record a miss
        initial_stats = wikipedia_top_views_source.get_cache_stats()
        await wikipedia_top_views_source.fetch_queries(mock_session)
        stats_after_first_fetch = wikipedia_top_views_source.get_cache_stats()
        assert mock_session.get.call_count == 1
        assert stats_after_first_fetch["misses"] == initial_stats["misses"] + 1

        # Force cache expiration by moving the cache timestamp sufficiently into the past
        wikipedia_top_views_source._cache_time -= (
            wikipedia_top_views_source._cache_ttl + 1
        )

        # Second fetch should detect expired cache, perform a new HTTP request, and record another miss
        await wikipedia_top_views_source.fetch_queries(mock_session)
        stats_after_second_fetch = wikipedia_top_views_source.get_cache_stats()
        assert mock_session.get.call_count == 2
        assert stats_after_second_fetch["misses"] == stats_after_first_fetch["misses"] + 1

    @pytest.mark.asyncio
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Disaster-Terminator Disaster-Terminator force-pushed the feature/query-sources branch 2 times, most recently from d084d12 to 822f321 Compare February 28, 2026 06:19
- 添加 WikipediaTopViewsSource,支持6小时缓存和文章过滤
- 在 QuerySource 基类添加 get_priority() 默认实现 (100)
- 创建 DashboardClient 用于 API 积分获取,支持降级到 HTML 解析
- 添加 verify-context CLI 命令用于 PR 上下文验证
- 在 ReviewMetadata 添加 branch 和 head_sha 字段用于上下文追踪
- 增强 fetch 命令,支持通过 gh CLI 自动检测 PR
- 添加 gh CLI 完整错误处理 (FileNotFoundError, Timeout, PermissionError)
- 添加 WikipediaTopViewsSource 导入的 ImportError 处理
- 增强 _parse_points 处理纯空格字符串
- 修复 DashboardClient API 端点硬编码,使用常量配置
- 优化 WikipediaTopViewsSource aiohttp 会话管理,添加连接器配置
- 修复 WikipediaTopViewsSource 本地时间问题,改用 UTC 时间
- 增强 verify-context Git 分支检测,添加诊断信息
- 简化 PointsDetector 异常处理,使用统一兜底
- 添加 cmd_fetch 和 cmd_verify_context 错误日志记录
- 修复积分打印日志安全问题,改用 debug 级别
- 修复 cmd_fetch 打印原始异常安全问题
- 修复 lang 参数未验证安全问题
- 修复 Topviews 永久不可用 Bug(成功时恢复可用状态)
- 修复 0 积分解析错误 Bug
- 修复 gh 需要 -R 参数指定仓库
- 修复双斜杠 URL 问题

修复审查评论:
- PR#12: 修复多个安全问题 (积分日志、异常泄露、参数验证)
- PR#12: 修复多个 Bug (永久禁用、0积分、PR仓库)

测试: 483 个单元测试通过
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants