feat: 添加 WikipediaTopViewsSource 并增强查询源优先级系统#12
feat: 添加 WikipediaTopViewsSource 并增强查询源优先级系统#12Disaster-Terminator wants to merge 1 commit intomainfrom
Conversation
Review Summary by QodoAdd WikipediaTopViewsSource and enhance query source priority system
WalkthroughsDescription• Add WikipediaTopViewsSource with 6-hour caching and article filtering • Implement query source priority system with sorting by priority value • Create DashboardClient for API-first points detection with HTML fallback • Add verify-context CLI command for PR branch validation • Enhance fetch command with automatic PR detection via gh CLI • Track review context with branch and HEAD SHA in metadata Diagramflowchart LR
A["WikipediaTopViewsSource<br/>Priority: 80"] --> B["QuerySource<br/>Priority System"]
C["DashboardClient<br/>API + Fallback"] --> D["PointsDetector<br/>Enhanced"]
E["ReviewMetadata<br/>branch + head_sha"] --> F["ReviewResolver<br/>Context Tracking"]
G["fetch Command<br/>Auto PR Detection"] --> H["gh CLI Integration"]
I["verify-context<br/>Command"] --> J["Branch Validation"]
B --> K["Source Sorting<br/>by Priority"]
File Changes1. src/search/query_sources/wikipedia_top_views_source.py
|
Code Review by Qodo
1. Priority与测试不一致
|
Reviewer's Guide实现了新的 WikipediaTopViewsSource 查询源,带有缓存和排除规则;将其接入查询引擎,支持按源可用性控制和显式优先级;引入 DashboardClient API 封装,被 PointsDetector 用于优先通过 API 获取积分并进行更安全的解析;同时增强评审管理 CLI 和元数据,使其具备分支 / head 感知能力,并能更稳健地处理 PR/Git 上下文。 使用 DashboardClient 并带 API 回退的 PointsDetector 时序图sequenceDiagram
actor User
participant PointsDetector
participant DashboardClient
participant BrowserPage as Page
participant RewardsAPI as MicrosoftRewardsAPI
User->>PointsDetector: get_current_points(page, skip_navigation)
PointsDetector->>BrowserPage: optional navigate to DASHBOARD_URL
PointsDetector->>DashboardClient: create with page
PointsDetector->>DashboardClient: get_current_points()
DashboardClient->>DashboardClient: _fetch_points_via_api()
DashboardClient->>BrowserPage: evaluate fetch(dashboard_balance)
BrowserPage->>RewardsAPI: HTTP GET /api/getuserbalance
RewardsAPI-->>BrowserPage: JSON availablePoints / pointsBalance
BrowserPage-->>DashboardClient: JSON data
DashboardClient->>DashboardClient: parse points
alt API points ok
DashboardClient-->>PointsDetector: points
PointsDetector-->>User: points
else API failed or invalid
DashboardClient->>DashboardClient: _fetch_points_via_page_content()
DashboardClient->>BrowserPage: content()
BrowserPage-->>DashboardClient: HTML
DashboardClient->>DashboardClient: regex search for points
alt parsed from page
DashboardClient-->>PointsDetector: fallback points
PointsDetector-->>User: points
else page parsing failed
DashboardClient-->>PointsDetector: cached points or None
PointsDetector-->>User: cached points or None
end
end
更新后的查询源和 QueryEngine 优先级的类图classDiagram
class QuerySource {
<<abstract>>
-config
-logger
+fetch_queries(count int) list~str~
+get_source_name() str
+is_available() bool
+get_priority() int
}
class LocalFileSource {
-base_terms list~str~
+get_source_name() str
+get_priority() int
+is_available() bool
+fetch_queries(count int) list~str~
}
class DuckDuckGoSource {
-_available bool
+get_source_name() str
+get_priority() int
+is_available() bool
+fetch_queries(count int) list~str~
}
class WikipediaSource {
-_available bool
+get_source_name() str
+get_priority() int
+is_available() bool
+fetch_queries(count int) list~str~
}
class BingSuggestionsSource {
-_available bool
+get_source_name() str
+get_priority() int
+is_available() bool
+fetch_queries(count int) list~str~
}
class WikipediaTopViewsSource {
-timeout int
-lang str
-cache_ttl int
-_available bool
-_session aiohttp.ClientSession
-_cache_data list~str~
-_cache_time float
-_cache_hits int
-_cache_misses int
+get_source_name() str
+get_priority() int
+is_available() bool
+get_cache_stats() dict
+fetch_queries(count int) list~str~
+close() None
-_get_session() aiohttp.ClientSession
-_is_cache_valid() bool
-_get_from_cache(count int) list~str~
-_cache_articles(articles list~str~) None
-_get_api_date() tuple~str,str,str~
-_build_api_url() str
-_fetch_top_articles(session aiohttp.ClientSession) list~dict~
-_filter_articles(articles list~dict~) list~str~
}
class QueryEngine {
-config
-logger
-sources list~QuerySource~
-_query_sources dict~str,str~
+_init_sources() None
+generate_queries(count int, expand bool) list~str~
-_fetch_from_sources(count int) list~str~
-_expand_queries(queries list~str~) list~str~
}
QuerySource <|-- LocalFileSource
QuerySource <|-- DuckDuckGoSource
QuerySource <|-- WikipediaSource
QuerySource <|-- BingSuggestionsSource
QuerySource <|-- WikipediaTopViewsSource
QueryEngine "*" --> "many" QuerySource : manages
DashboardClient 与 PointsDetector 集成的类图classDiagram
class DashboardClient {
-page Page
-_cached_points int
-_base_url str
+DashboardClient(page Page)
+get_current_points() int
+get_dashboard_data() dict~str,Any~
-_fetch_points_via_api() int
-_fetch_points_via_page_content() int
}
class PointsDetector {
-logger
+DASHBOARD_URL str
+POINTS_SELECTORS list~str~
+PointsDetector()
+get_current_points(page Page, skip_navigation bool) int
-_extract_points_from_source(page Page) int
-_parse_points(text str) int
-_check_task_status(page Page, selectors list, task_name str) dict~str,bool,int~
}
PointsDetector ..> DashboardClient : uses for API points
增强后的评审元数据与解析器类图classDiagram
class ReviewMetadata {
+pr_number int
+owner str
+repo str
+branch str
+head_sha str
+last_updated str
+version str
+etag_comments str
+etag_reviews str
}
class ReviewManager {
+db_path str
+get_metadata() ReviewMetadata
+save_threads(threads list, metadata ReviewMetadata) None
+save_overviews(overviews list, metadata ReviewMetadata) None
}
class ReviewResolver {
+owner str
+repo str
-manager ReviewManager
+ReviewResolver(token str, owner str, repo str, db_path str)
+fetch_threads(pr_number int) dict
}
class GitHelpers {
+get_git_branch() str
+get_git_head_sha() str
}
ReviewResolver --> ReviewManager : persists
ReviewResolver ..> ReviewMetadata : creates
ReviewResolver ..> GitHelpers : reads branch/head_sha
文件层面的变更
Tips and commandsInteracting with Sourcery
Customizing Your Experience访问你的 dashboard 以:
Getting HelpOriginal review guide in EnglishReviewer's GuideImplements a new WikipediaTopViewsSource query source with caching and exclusion rules, wires it into the query engine with per‑source availability and explicit priorities, introduces a DashboardClient API wrapper used by PointsDetector to prefer API-based points retrieval with safer parsing, and enhances the review management CLI and metadata with branch/head awareness and robust PR/ Git context handling. Sequence diagram for PointsDetector using DashboardClient with API fallbacksequenceDiagram
actor User
participant PointsDetector
participant DashboardClient
participant BrowserPage as Page
participant RewardsAPI as MicrosoftRewardsAPI
User->>PointsDetector: get_current_points(page, skip_navigation)
PointsDetector->>BrowserPage: optional navigate to DASHBOARD_URL
PointsDetector->>DashboardClient: create with page
PointsDetector->>DashboardClient: get_current_points()
DashboardClient->>DashboardClient: _fetch_points_via_api()
DashboardClient->>BrowserPage: evaluate fetch(dashboard_balance)
BrowserPage->>RewardsAPI: HTTP GET /api/getuserbalance
RewardsAPI-->>BrowserPage: JSON availablePoints / pointsBalance
BrowserPage-->>DashboardClient: JSON data
DashboardClient->>DashboardClient: parse points
alt API points ok
DashboardClient-->>PointsDetector: points
PointsDetector-->>User: points
else API failed or invalid
DashboardClient->>DashboardClient: _fetch_points_via_page_content()
DashboardClient->>BrowserPage: content()
BrowserPage-->>DashboardClient: HTML
DashboardClient->>DashboardClient: regex search for points
alt parsed from page
DashboardClient-->>PointsDetector: fallback points
PointsDetector-->>User: points
else page parsing failed
DashboardClient-->>PointsDetector: cached points or None
PointsDetector-->>User: cached points or None
end
end
Class diagram for updated query sources and QueryEngine prioritisationclassDiagram
class QuerySource {
<<abstract>>
-config
-logger
+fetch_queries(count int) list~str~
+get_source_name() str
+is_available() bool
+get_priority() int
}
class LocalFileSource {
-base_terms list~str~
+get_source_name() str
+get_priority() int
+is_available() bool
+fetch_queries(count int) list~str~
}
class DuckDuckGoSource {
-_available bool
+get_source_name() str
+get_priority() int
+is_available() bool
+fetch_queries(count int) list~str~
}
class WikipediaSource {
-_available bool
+get_source_name() str
+get_priority() int
+is_available() bool
+fetch_queries(count int) list~str~
}
class BingSuggestionsSource {
-_available bool
+get_source_name() str
+get_priority() int
+is_available() bool
+fetch_queries(count int) list~str~
}
class WikipediaTopViewsSource {
-timeout int
-lang str
-cache_ttl int
-_available bool
-_session aiohttp.ClientSession
-_cache_data list~str~
-_cache_time float
-_cache_hits int
-_cache_misses int
+get_source_name() str
+get_priority() int
+is_available() bool
+get_cache_stats() dict
+fetch_queries(count int) list~str~
+close() None
-_get_session() aiohttp.ClientSession
-_is_cache_valid() bool
-_get_from_cache(count int) list~str~
-_cache_articles(articles list~str~) None
-_get_api_date() tuple~str,str,str~
-_build_api_url() str
-_fetch_top_articles(session aiohttp.ClientSession) list~dict~
-_filter_articles(articles list~dict~) list~str~
}
class QueryEngine {
-config
-logger
-sources list~QuerySource~
-_query_sources dict~str,str~
+_init_sources() None
+generate_queries(count int, expand bool) list~str~
-_fetch_from_sources(count int) list~str~
-_expand_queries(queries list~str~) list~str~
}
QuerySource <|-- LocalFileSource
QuerySource <|-- DuckDuckGoSource
QuerySource <|-- WikipediaSource
QuerySource <|-- BingSuggestionsSource
QuerySource <|-- WikipediaTopViewsSource
QueryEngine "*" --> "many" QuerySource : manages
Class diagram for DashboardClient and PointsDetector integrationclassDiagram
class DashboardClient {
-page Page
-_cached_points int
-_base_url str
+DashboardClient(page Page)
+get_current_points() int
+get_dashboard_data() dict~str,Any~
-_fetch_points_via_api() int
-_fetch_points_via_page_content() int
}
class PointsDetector {
-logger
+DASHBOARD_URL str
+POINTS_SELECTORS list~str~
+PointsDetector()
+get_current_points(page Page, skip_navigation bool) int
-_extract_points_from_source(page Page) int
-_parse_points(text str) int
-_check_task_status(page Page, selectors list, task_name str) dict~str,bool,int~
}
PointsDetector ..> DashboardClient : uses for API points
Class diagram for enhanced review metadata and resolverclassDiagram
class ReviewMetadata {
+pr_number int
+owner str
+repo str
+branch str
+head_sha str
+last_updated str
+version str
+etag_comments str
+etag_reviews str
}
class ReviewManager {
+db_path str
+get_metadata() ReviewMetadata
+save_threads(threads list, metadata ReviewMetadata) None
+save_overviews(overviews list, metadata ReviewMetadata) None
}
class ReviewResolver {
+owner str
+repo str
-manager ReviewManager
+ReviewResolver(token str, owner str, repo str, db_path str)
+fetch_threads(pr_number int) dict
}
class GitHelpers {
+get_git_branch() str
+get_git_head_sha() str
}
ReviewResolver --> ReviewManager : persists
ReviewResolver ..> ReviewMetadata : creates
ReviewResolver ..> GitHelpers : reads branch/head_sha
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - 我发现了 4 个问题,并留下了一些总体反馈:
- 在
WikipediaTopViewsSource._fetch_top_articles和fetch_queries中,只要出现非 200 响应或异常,就会把_available置为False,这会在一次短暂错误后永久禁用该 source;建议区分“瞬时错误”和“致命错误”(例如:加入重试/退避逻辑、错误阈值,或者至少在第一次失败时不要直接修改_available)。 verify-context和fetch的 PR 自动检测逻辑在测试中被部分重新实现,而不是通过公共函数复用;同时 CLI 代码里也有较大块的内联子进程调用/错误处理逻辑。建议将这些逻辑提取成小的“纯”辅助函数,让 CLI 和测试都通过这些辅助函数调用,以减少重复并保证行为一致。
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- 在 `WikipediaTopViewsSource._fetch_top_articles` 和 `fetch_queries` 中,只要出现非 200 响应或异常,就会把 `_available` 置为 `False`,这会在一次短暂错误后永久禁用该 source;建议区分“瞬时错误”和“致命错误”(例如:加入重试/退避逻辑、错误阈值,或者至少在第一次失败时不要直接修改 `_available`)。
- `verify-context` 和 `fetch` 的 PR 自动检测逻辑在测试中被部分重新实现,而不是通过公共函数复用;同时 CLI 代码里也有较大块的内联子进程调用/错误处理逻辑。建议将这些逻辑提取成小的“纯”辅助函数,让 CLI 和测试都通过这些辅助函数调用,以减少重复并保证行为一致。
## Individual Comments
### Comment 1
<location path="src/account/points_detector.py" line_range="93-102" />
<code_context>
logger.debug("跳过导航,使用当前页面")
await page.wait_for_timeout(1000)
+ # 优先使用 Dashboard API
+ try:
+ logger.debug("尝试使用 Dashboard API 获取积分...")
+ client = DashboardClient(page)
+ api_points: int | None = await client.get_current_points()
+ if api_points is not None and api_points >= 0:
+ logger.info(f"✓ 从 API 获取积分: {api_points:,}")
+ return int(api_points)
+ except TimeoutError as e:
+ logger.warning(f"API 请求超时: {e},使用 HTML 解析作为备用")
+ except ConnectionError as e:
+ logger.warning(f"API 连接异常: {e},使用 HTML 解析作为备用")
+ except Exception as e:
+ logger.warning(f"API 调用异常: {type(e).__name__}: {e},使用 HTML 解析作为备用")
+
</code_context>
<issue_to_address>
**suggestion:** DashboardClient 内部已捕获异常,PointsDetector 中细粒度异常分支基本无法触发
`DashboardClient` 的 `_fetch_points_via_api` / `_fetch_points_via_page_content` 已经用 `except Exception` 统一吞掉并记日志,所以这里的 `TimeoutError` / `ConnectionError` 分支几乎不会被抛出,只会落到最后的 `except Exception`。建议要么在 `DashboardClient` 中改为抛出更具体的异常,要么此处直接用一个 `except Exception` 兜底,避免造成“按错误类型分别处理”的误导。
Suggested implementation:
```python
# 优先使用 Dashboard API
try:
logger.debug("尝试使用 Dashboard API 获取积分...")
client = DashboardClient(page)
api_points: int | None = await client.get_current_points()
if api_points is not None and api_points >= 0:
logger.info(f"✓ 从 API 获取积分: {api_points:,}")
return int(api_points)
except Exception as e:
# DashboardClient 内部已记录更详细的异常信息,这里仅作兜底并回退到 HTML 解析
logger.warning(f"API 获取积分失败({type(e).__name__}: {e}),使用 HTML 解析作为备用")
```
如果你希望在 `DashboardClient` 中改为抛出更细粒度的异常(例如自定义的 `DashboardTimeoutError` / `DashboardConnectionError`),则需要在 `api/dashboard_client.py` 内将宽泛的 `except Exception` 拆分为具体异常并重新抛出;此时可以在 `PointsDetector` 中再按异常类型细分分支。但在当前实现不调整 `DashboardClient` 的前提下,上面的合并兜底处理即可满足你的评论建议。
</issue_to_address>
### Comment 2
<location path="src/search/query_sources/wikipedia_top_views_source.py" line_range="116-118" />
<code_context>
+ self._cache_data = articles
+ self._cache_time = time.monotonic()
+
+ def _get_api_date(self) -> tuple[str, str, str]:
+ """Get yesterday's date for API call"""
+ yesterday = datetime.now() - timedelta(days=1)
+ return (str(yesterday.year), f"{yesterday.month:02d}", f"{yesterday.day:02d}")
+
</code_context>
<issue_to_address>
**issue (bug_risk):** 使用本地时间计算“昨天”可能与 Wikipedia API 的日期窗口不一致
Pageviews API 按 UTC 日历日计算,用 `datetime.now()` 在本地时区(如 UTC+8)可能导致跨日偏差,出现“昨天”数据尚未生成或 404/空结果。建议改用 `datetime.utcnow()`,或统一使用显式带时区的时间来推算请求日期,以避免不同时区运行时的时间偏移。
</issue_to_address>
### Comment 3
<location path="tools/manage_reviews.py" line_range="174-183" />
<code_context>
await page.wait_for_timeout(1000)
+ # 优先使用 Dashboard API
+ try:
+ logger.debug("尝试使用 Dashboard API 获取积分...")
+ client = DashboardClient(page)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 获取当前分支失败时静默降级为空字符串,输出中缺少诊断信息
在 `cmd_verify_context` 中,`git branch --show-current` 失败时直接将 `current_branch` 置为 `""`,后续逻辑把它当作正常不匹配或无分支处理,无法区分“实际无分支”和“获取失败”。建议在 `except` 中至少打日志,或在最终 `output` 增加一个 `warning` 字段标明 Git 分支检测失败,以便定位环境问题。
</issue_to_address>
### Comment 4
<location path="tests/unit/test_online_query_sources.py" line_range="411-420" />
<code_context>
+ assert dashboard_client.page is mock_page
+ assert dashboard_client._cached_points is None
+
+ @pytest.mark.asyncio
+ async def test_get_current_points_api_success(self, dashboard_client, mock_page):
+ """Test get_current_points returns points via API"""
</code_context>
<issue_to_address>
**suggestion (testing):** 建议断言在发生严重 HTTP 错误后 WikipediaTopViewsSource 会被标记为不可用
生产实现中,当 `_fetch_top_articles` 抛出异常时会执行 `self._available = False`。在 `test_fetch_queries_handles_error` 中,你目前只断言返回了空列表。建议在断言空列表之后再断言 `wikipedia_top_views_source.is_available() is False`,以验证可用性标志被正确更新,并确保后续 `QueryEngine` 会跳过这个 source。
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈持续改进评审质量。
Original comment in English
Hey - I've found 4 issues, and left some high level feedback:
- In
WikipediaTopViewsSource._fetch_top_articlesandfetch_queries,_availableis flipped toFalseon any non-200 response or exception, which will permanently disable this source after a transient error; consider distinguishing transient vs. fatal errors (e.g., retry/backoff, thresholding, or not mutating_availableon first failure). - The
verify-contextandfetchPR auto-detection logic is partially reimplemented in tests instead of calling shared functions, and the CLI code contains fairly large inline subprocess/error-handling blocks; consider extracting these into small pure helpers that are called both from the CLI and tests to reduce duplication and keep behavior in sync.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `WikipediaTopViewsSource._fetch_top_articles` and `fetch_queries`, `_available` is flipped to `False` on any non-200 response or exception, which will permanently disable this source after a transient error; consider distinguishing transient vs. fatal errors (e.g., retry/backoff, thresholding, or not mutating `_available` on first failure).
- The `verify-context` and `fetch` PR auto-detection logic is partially reimplemented in tests instead of calling shared functions, and the CLI code contains fairly large inline subprocess/error-handling blocks; consider extracting these into small pure helpers that are called both from the CLI and tests to reduce duplication and keep behavior in sync.
## Individual Comments
### Comment 1
<location path="src/account/points_detector.py" line_range="93-102" />
<code_context>
logger.debug("跳过导航,使用当前页面")
await page.wait_for_timeout(1000)
+ # 优先使用 Dashboard API
+ try:
+ logger.debug("尝试使用 Dashboard API 获取积分...")
+ client = DashboardClient(page)
+ api_points: int | None = await client.get_current_points()
+ if api_points is not None and api_points >= 0:
+ logger.info(f"✓ 从 API 获取积分: {api_points:,}")
+ return int(api_points)
+ except TimeoutError as e:
+ logger.warning(f"API 请求超时: {e},使用 HTML 解析作为备用")
+ except ConnectionError as e:
+ logger.warning(f"API 连接异常: {e},使用 HTML 解析作为备用")
+ except Exception as e:
+ logger.warning(f"API 调用异常: {type(e).__name__}: {e},使用 HTML 解析作为备用")
+
</code_context>
<issue_to_address>
**suggestion:** DashboardClient 内部已捕获异常,PointsDetector 中细粒度异常分支基本无法触发
`DashboardClient` 的 `_fetch_points_via_api` / `_fetch_points_via_page_content` 已经用 `except Exception` 统一吞掉并记日志,所以这里的 `TimeoutError` / `ConnectionError` 分支几乎不会被抛出,只会落到最后的 `except Exception`。建议要么在 `DashboardClient` 中改为抛出更具体的异常,要么此处直接用一个 `except Exception` 兜底,避免造成“按错误类型分别处理”的误导。
Suggested implementation:
```python
# 优先使用 Dashboard API
try:
logger.debug("尝试使用 Dashboard API 获取积分...")
client = DashboardClient(page)
api_points: int | None = await client.get_current_points()
if api_points is not None and api_points >= 0:
logger.info(f"✓ 从 API 获取积分: {api_points:,}")
return int(api_points)
except Exception as e:
# DashboardClient 内部已记录更详细的异常信息,这里仅作兜底并回退到 HTML 解析
logger.warning(f"API 获取积分失败({type(e).__name__}: {e}),使用 HTML 解析作为备用")
```
如果你希望在 `DashboardClient` 中改为抛出更细粒度的异常(例如自定义的 `DashboardTimeoutError` / `DashboardConnectionError`),则需要在 `api/dashboard_client.py` 内将宽泛的 `except Exception` 拆分为具体异常并重新抛出;此时可以在 `PointsDetector` 中再按异常类型细分分支。但在当前实现不调整 `DashboardClient` 的前提下,上面的合并兜底处理即可满足你的评论建议。
</issue_to_address>
### Comment 2
<location path="src/search/query_sources/wikipedia_top_views_source.py" line_range="116-118" />
<code_context>
+ self._cache_data = articles
+ self._cache_time = time.monotonic()
+
+ def _get_api_date(self) -> tuple[str, str, str]:
+ """Get yesterday's date for API call"""
+ yesterday = datetime.now() - timedelta(days=1)
+ return (str(yesterday.year), f"{yesterday.month:02d}", f"{yesterday.day:02d}")
+
</code_context>
<issue_to_address>
**issue (bug_risk):** 使用本地时间计算“昨天”可能与 Wikipedia API 的日期窗口不一致
Pageviews API 按 UTC 日历日计算,用 `datetime.now()` 在本地时区(如 UTC+8)可能导致跨日偏差,出现“昨天”数据尚未生成或 404/空结果。建议改用 `datetime.utcnow()`,或统一使用显式带时区的时间来推算请求日期,以避免不同时区运行时的时间偏移。
</issue_to_address>
### Comment 3
<location path="tools/manage_reviews.py" line_range="174-183" />
<code_context>
await page.wait_for_timeout(1000)
+ # 优先使用 Dashboard API
+ try:
+ logger.debug("尝试使用 Dashboard API 获取积分...")
+ client = DashboardClient(page)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 获取当前分支失败时静默降级为空字符串,输出中缺少诊断信息
在 `cmd_verify_context` 中,`git branch --show-current` 失败时直接将 `current_branch` 置为 `""`,后续逻辑把它当作正常不匹配或无分支处理,无法区分“实际无分支”和“获取失败”。建议在 `except` 中至少打日志,或在最终 `output` 增加一个 `warning` 字段标明 Git 分支检测失败,以便定位环境问题。
</issue_to_address>
### Comment 4
<location path="tests/unit/test_online_query_sources.py" line_range="411-420" />
<code_context>
+ assert dashboard_client.page is mock_page
+ assert dashboard_client._cached_points is None
+
+ @pytest.mark.asyncio
+ async def test_get_current_points_api_success(self, dashboard_client, mock_page):
+ """Test get_current_points returns points via API"""
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting that WikipediaTopViewsSource becomes unavailable after a hard HTTP error
The production implementation sets `self._available = False` when `_fetch_top_articles` raises. In `test_fetch_queries_handles_error`, you only assert that an empty list is returned. Please also assert that `wikipedia_top_views_source.is_available() is False` afterward to verify the availability flag and ensure `QueryEngine` will skip this source subsequently.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
该 PR 为搜索查询引擎新增 Wikipedia Top Views 查询源,并扩展本地 Review/评论管理工具的上下文追踪能力,同时引入 Dashboard API 的积分获取路径以提升稳定性。
Changes:
- 新增
WikipediaTopViewsSource(Pageviews API + 6 小时缓存)并在 QueryEngine 中按优先级排序查询源 - Review 元数据增加
branch/head_sha,并在manage_reviews.py中加入verify-context与fetch自动探测 PR 号能力 - 新增
DashboardClient,PointsDetector 优先走 Dashboard API,失败回退到 HTML 解析;补充相应单测与依赖
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/manage_reviews.py | fetch 支持自动探测 PR、增加 verify-context 子命令 |
| tests/unit/test_review_parsers.py | 更新 ReviewMetadata 版本断言到 2.3 |
| tests/unit/test_review_context.py | 覆盖 git 分支/SHA 获取与 metadata 新字段 |
| tests/unit/test_online_query_sources.py | 增加 WikipediaTopViewsSource 与优先级排序相关测试 |
| tests/unit/test_manage_reviews_cli.py | 增加 CLI/积分检测相关测试(但当前多为“复写逻辑”式断言) |
| tests/unit/test_dashboard_client.py | 新增 DashboardClient 单测 |
| src/search/query_sources/wikipedia_top_views_source.py | 新增 Wikipedia Top Views 查询源实现与缓存 |
| src/search/query_sources/wikipedia_source.py | 添加 get_priority() |
| src/search/query_sources/query_source.py | 基类新增默认 get_priority() |
| src/search/query_sources/local_file_source.py | 添加 get_priority() |
| src/search/query_sources/duckduckgo_source.py | 添加 get_priority() |
| src/search/query_sources/bing_suggestions_source.py | 添加 get_priority() |
| src/search/query_sources/init.py | 导出 WikipediaTopViewsSource |
| src/search/query_engine.py | 初始化时跳过不可用源、加入 WikipediaTopViewsSource,并按优先级排序 |
| src/review/resolver.py | 保存 metadata 时写入 branch/head_sha |
| src/review/models.py | ReviewMetadata 新增 branch/head_sha,版本更新为 2.3 |
| src/api/dashboard_client.py | 新增 Dashboard API 客户端(API 优先、HTML 回退) |
| src/api/init.py | 导出 DashboardClient |
| src/account/points_detector.py | 积分获取改为 API 优先 + HTML 回退,解析逻辑微调 |
| pyproject.toml | test 额外依赖增加 httpx/respx |
| config.example.yaml | 新增 wikipedia_top_views 配置示例 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
474cb33 to
b70faa4
Compare
|
@sourcery-ai review |
|
/agentic_review |
|
Persistent review updated to latest commit b70faa4 |
There was a problem hiding this comment.
Hey - 我发现了两个问题,并给出了一些整体性的反馈:
- 各个查询源(LocalFileSource、DuckDuckGoSource、WikipediaSource、BingSuggestionsSource、WikipediaTopViewsSource)的
get_priority()返回值与 PR 描述中声明的优先级(LocalFile=50 → TopViews=80 → DuckDuckGo=90 → Wikipedia=100 → Bing=110)以及新测试用例的预期不一致;建议统一这些数值,使排序结果与文档/测试保持一致。 - 在
WikipediaTopViewsSource中,get_priority()当前返回 120,而测试和描述中期望的是 80,这会改变它相对于其他数据源的排序位置;请更新优先级值或测试/文档,以确保预期的排序结果清晰明确。
供 AI Agent 使用的提示词
Please address the comments from this code review:
## Overall Comments
- 各个查询源(LocalFileSource、DuckDuckGoSource、WikipediaSource、BingSuggestionsSource、WikipediaTopViewsSource)的 `get_priority()` 返回值与 PR 描述中声明的优先级(LocalFile=50 → TopViews=80 → DuckDuckGo=90 → Wikipedia=100 → Bing=110)以及新测试用例的预期不一致;建议统一这些数值,使排序结果与文档/测试保持一致。
- 在 `WikipediaTopViewsSource` 中,`get_priority()` 当前返回 120,而测试和描述中期望的是 80,这会改变它相对于其他数据源的排序位置;请更新优先级值或测试/文档,以确保预期的排序结果清晰明确。
## Individual Comments
### Comment 1
<location path="src/api/dashboard_client.py" line_range="126" />
<code_context>
+ match = re.search(pattern, content)
+ if match:
+ points = int(match.group(1))
+ if 0 <= points <= 1000000:
+ return points
+
</code_context>
<issue_to_address>
**issue:** 1000000 分的硬性上限可能会错误地过滤掉合法的高余额。
这一约束假设余额永远不会超过一百万,但对于生命周期很长或交易量很大的账户,这个假设可能并不成立,从而导致更大但合法的数值被丢弃。建议放宽这个上限、完全移除它,或者从配置中派生该限制,而不是将其硬编码。
</issue_to_address>
### Comment 2
<location path="tests/unit/test_online_query_sources.py" line_range="404-411" />
<code_context>
+ assert "Main_Page" in wikipedia_top_views_source.EXCLUDED_PREFIXES
+ assert "Special:" in wikipedia_top_views_source.EXCLUDED_PREFIXES
+
+ def test_cache_stats_initial(self, wikipedia_top_views_source):
+ """Test initial cache stats"""
+ stats = wikipedia_top_views_source.get_cache_stats()
+ assert stats["hits"] == 0
+ assert stats["misses"] == 0
+ assert stats["hit_rate"] == 0
+
+ @pytest.mark.asyncio
</code_context>
<issue_to_address>
**suggestion (testing):** 建议为 WikipediaTopViewsSource 的缓存过期行为增加一个测试
当前测试覆盖了初始统计信息和缓存命中,但没有覆盖基于 TTL 的失效逻辑。请添加一个测试,让 `_is_cache_valid()` 返回 `False`(例如通过调整 `_cache_time` 或 patch `time.monotonic`),然后断言随后的 `fetch_queries` 调用会产生一次缓存未命中,并触发新的 HTTP 请求。这样可以直接测试 TTL 逻辑,并防止无限期地返回陈旧数据。
```suggestion
def test_cache_stats_initial(self, wikipedia_top_views_source):
"""Test initial cache stats"""
stats = wikipedia_top_views_source.get_cache_stats()
assert stats["hits"] == 0
assert stats["misses"] == 0
assert stats["hit_rate"] == 0
@pytest.mark.asyncio
async def test_cache_expiration_triggers_new_fetch(self, wikipedia_top_views_source):
"""Test that expired cache forces a new HTTP request and records a cache miss."""
mock_session = AsyncMock()
mock_response = AsyncMock()
mock_response.status = 200
mock_response.json = AsyncMock(
return_value={
"items": [
{
"articles": [
{"article": "Article1", "views": 100},
{"article": "Article2", "views": 90},
]
}
]
}
)
mock_session.get = AsyncMock(return_value=mock_response)
# Initial fetch should populate the cache and record a miss
initial_stats = wikipedia_top_views_source.get_cache_stats()
await wikipedia_top_views_source.fetch_queries(mock_session)
stats_after_first_fetch = wikipedia_top_views_source.get_cache_stats()
assert mock_session.get.call_count == 1
assert stats_after_first_fetch["misses"] == initial_stats["misses"] + 1
# Force cache expiration by moving the cache timestamp sufficiently into the past
wikipedia_top_views_source._cache_time -= (
wikipedia_top_views_source._cache_ttl + 1
)
# Second fetch should detect expired cache, perform a new HTTP request, and record another miss
await wikipedia_top_views_source.fetch_queries(mock_session)
stats_after_second_fetch = wikipedia_top_views_source.get_cache_stats()
assert mock_session.get.call_count == 2
assert stats_after_second_fetch["misses"] == stats_after_first_fetch["misses"] + 1
@pytest.mark.asyncio
```帮我变得更有用!请对每条评论点 👍 或 👎,我会根据反馈改进后续的 review。
Original comment in English
Hey - I've found 2 issues, and left some high level feedback:
- The
get_priority()values for the various query sources (LocalFileSource, DuckDuckGoSource, WikipediaSource, BingSuggestionsSource, WikipediaTopViewsSource) don’t match the priorities described in the PR (LocalFile=50 → TopViews=80 → DuckDuckGo=90 → Wikipedia=100 → Bing=110) and the new tests’ expectations; consider aligning the numeric values so the sort order and documentation/tests are consistent. - In
WikipediaTopViewsSource,get_priority()currently returns 120 while the tests and description expect 80, which will change its relative ordering vs other sources; update either the priority or the tests/docs so the intended ranking is unambiguous.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `get_priority()` values for the various query sources (LocalFileSource, DuckDuckGoSource, WikipediaSource, BingSuggestionsSource, WikipediaTopViewsSource) don’t match the priorities described in the PR (LocalFile=50 → TopViews=80 → DuckDuckGo=90 → Wikipedia=100 → Bing=110) and the new tests’ expectations; consider aligning the numeric values so the sort order and documentation/tests are consistent.
- In `WikipediaTopViewsSource`, `get_priority()` currently returns 120 while the tests and description expect 80, which will change its relative ordering vs other sources; update either the priority or the tests/docs so the intended ranking is unambiguous.
## Individual Comments
### Comment 1
<location path="src/api/dashboard_client.py" line_range="126" />
<code_context>
+ match = re.search(pattern, content)
+ if match:
+ points = int(match.group(1))
+ if 0 <= points <= 1000000:
+ return points
+
</code_context>
<issue_to_address>
**issue:** The hard upper bound of 1,000,000 points may incorrectly filter valid high balances.
This constraint assumes balances never exceed one million, which may not hold for long‑lived or high‑volume accounts and would cause larger but valid values to be dropped. Consider relaxing the upper bound, removing it, or deriving it from configuration instead of hard‑coding this limit.
</issue_to_address>
### Comment 2
<location path="tests/unit/test_online_query_sources.py" line_range="404-411" />
<code_context>
+ assert "Main_Page" in wikipedia_top_views_source.EXCLUDED_PREFIXES
+ assert "Special:" in wikipedia_top_views_source.EXCLUDED_PREFIXES
+
+ def test_cache_stats_initial(self, wikipedia_top_views_source):
+ """Test initial cache stats"""
+ stats = wikipedia_top_views_source.get_cache_stats()
+ assert stats["hits"] == 0
+ assert stats["misses"] == 0
+ assert stats["hit_rate"] == 0
+
+ @pytest.mark.asyncio
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for WikipediaTopViewsSource cache expiration behavior
Current tests cover initial stats and cache hits, but not TTL-based invalidation. Please add a test that forces `_is_cache_valid()` to return `False` (e.g., by adjusting `_cache_time` or patching `time.monotonic`) and then asserts that a subsequent `fetch_queries` call incurs a cache miss and triggers a new HTTP request. This will directly exercise the TTL logic and guard against stale data being served indefinitely.
```suggestion
def test_cache_stats_initial(self, wikipedia_top_views_source):
"""Test initial cache stats"""
stats = wikipedia_top_views_source.get_cache_stats()
assert stats["hits"] == 0
assert stats["misses"] == 0
assert stats["hit_rate"] == 0
@pytest.mark.asyncio
async def test_cache_expiration_triggers_new_fetch(self, wikipedia_top_views_source):
"""Test that expired cache forces a new HTTP request and records a cache miss."""
mock_session = AsyncMock()
mock_response = AsyncMock()
mock_response.status = 200
mock_response.json = AsyncMock(
return_value={
"items": [
{
"articles": [
{"article": "Article1", "views": 100},
{"article": "Article2", "views": 90},
]
}
]
}
)
mock_session.get = AsyncMock(return_value=mock_response)
# Initial fetch should populate the cache and record a miss
initial_stats = wikipedia_top_views_source.get_cache_stats()
await wikipedia_top_views_source.fetch_queries(mock_session)
stats_after_first_fetch = wikipedia_top_views_source.get_cache_stats()
assert mock_session.get.call_count == 1
assert stats_after_first_fetch["misses"] == initial_stats["misses"] + 1
# Force cache expiration by moving the cache timestamp sufficiently into the past
wikipedia_top_views_source._cache_time -= (
wikipedia_top_views_source._cache_ttl + 1
)
# Second fetch should detect expired cache, perform a new HTTP request, and record another miss
await wikipedia_top_views_source.fetch_queries(mock_session)
stats_after_second_fetch = wikipedia_top_views_source.get_cache_stats()
assert mock_session.get.call_count == 2
assert stats_after_second_fetch["misses"] == stats_after_first_fetch["misses"] + 1
@pytest.mark.asyncio
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
b70faa4 to
01e5a6a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d084d12 to
822f321
Compare
- 添加 WikipediaTopViewsSource,支持6小时缓存和文章过滤 - 在 QuerySource 基类添加 get_priority() 默认实现 (100) - 创建 DashboardClient 用于 API 积分获取,支持降级到 HTML 解析 - 添加 verify-context CLI 命令用于 PR 上下文验证 - 在 ReviewMetadata 添加 branch 和 head_sha 字段用于上下文追踪 - 增强 fetch 命令,支持通过 gh CLI 自动检测 PR - 添加 gh CLI 完整错误处理 (FileNotFoundError, Timeout, PermissionError) - 添加 WikipediaTopViewsSource 导入的 ImportError 处理 - 增强 _parse_points 处理纯空格字符串 - 修复 DashboardClient API 端点硬编码,使用常量配置 - 优化 WikipediaTopViewsSource aiohttp 会话管理,添加连接器配置 - 修复 WikipediaTopViewsSource 本地时间问题,改用 UTC 时间 - 增强 verify-context Git 分支检测,添加诊断信息 - 简化 PointsDetector 异常处理,使用统一兜底 - 添加 cmd_fetch 和 cmd_verify_context 错误日志记录 - 修复积分打印日志安全问题,改用 debug 级别 - 修复 cmd_fetch 打印原始异常安全问题 - 修复 lang 参数未验证安全问题 - 修复 Topviews 永久不可用 Bug(成功时恢复可用状态) - 修复 0 积分解析错误 Bug - 修复 gh 需要 -R 参数指定仓库 - 修复双斜杠 URL 问题 修复审查评论: - PR#12: 修复多个安全问题 (积分日志、异常泄露、参数验证) - PR#12: 修复多个 Bug (永久禁用、0积分、PR仓库) 测试: 483 个单元测试通过
变更概述
本 PR 添加了 WikipediaTopViewsSource 查询源,并增强了查询源优先级系统和评论管理功能。
主要变更
WikipediaTopViewsSource 查询源
查询源优先级系统
get_priority()默认实现DashboardClient API 客户端
verify-context 命令
ReviewMetadata 增强
branch和head_sha字段用于追踪评论来源fetch 命令增强
测试
Summary by Sourcery
添加一个新的 Wikipedia 热门浏览量查询源,改进查询源优先级和可用性处理,通过仪表板 API 客户端增强奖励积分检测,并扩展评审管理 CLI,支持 PR 自动检测和上下文校验。
New Features:
WikipediaTopViewsSource,基于 Wikipedia Pageviews 的热门文章生成查询,并支持缓存和过滤。DashboardClientAPI 客户端,从内部端点获取 Microsoft Rewards 积分和仪表板数据。verify-contextCLI 子命令,用于验证本地评审评论是否与当前 Git 分支匹配。Enhancements:
QuerySource添加默认优先级机制,并为所有查询源分配显式优先级,在查询引擎中按优先级排序来源。PointsDetector,优先使用仪表板 API 获取积分,在回退到 HTML 解析时更加健壮,并强化解析与日志记录行为。Build:
playwright-stealth依赖约束在<2.0,并调整开发依赖。Tests:
WikipediaTopViewsSource的行为、缓存和可用性添加单元测试,以及源优先级排序和跳过不可用来源的测试。verify-context逻辑添加测试,包括ghCLI 和错误处理场景。DashboardClientAPI 的行为与回退逻辑、PointsDetector与 API 客户端的集成、解析边缘情况,以及ReviewMetadata的分支/HEAD 跟踪添加测试。Original summary in English
Summary by Sourcery
Add a new Wikipedia top-views query source, improve query-source prioritization and availability handling, enhance rewards points detection via a dashboard API client, and extend the review-management CLI with PR auto-detection and context verification.
New Features:
Enhancements:
Build:
Tests:
Original summary in English
Summary by Sourcery
添加一个新的 Wikipedia 热门浏览量查询源,改进查询源优先级和可用性处理,通过仪表板 API 客户端增强奖励积分检测,并扩展评审管理 CLI,支持 PR 自动检测和上下文校验。
New Features:
WikipediaTopViewsSource,基于 Wikipedia Pageviews 的热门文章生成查询,并支持缓存和过滤。DashboardClientAPI 客户端,从内部端点获取 Microsoft Rewards 积分和仪表板数据。verify-contextCLI 子命令,用于验证本地评审评论是否与当前 Git 分支匹配。Enhancements:
QuerySource添加默认优先级机制,并为所有查询源分配显式优先级,在查询引擎中按优先级排序来源。PointsDetector,优先使用仪表板 API 获取积分,在回退到 HTML 解析时更加健壮,并强化解析与日志记录行为。Build:
playwright-stealth依赖约束在<2.0,并调整开发依赖。Tests:
WikipediaTopViewsSource的行为、缓存和可用性添加单元测试,以及源优先级排序和跳过不可用来源的测试。verify-context逻辑添加测试,包括ghCLI 和错误处理场景。DashboardClientAPI 的行为与回退逻辑、PointsDetector与 API 客户端的集成、解析边缘情况,以及ReviewMetadata的分支/HEAD 跟踪添加测试。Original summary in English
Summary by Sourcery
Add a new Wikipedia top-views query source, improve query-source prioritization and availability handling, enhance rewards points detection via a dashboard API client, and extend the review-management CLI with PR auto-detection and context verification.
New Features:
Enhancements:
Build:
Tests: