Skip to content

data_validatorの不整合検出時にPRコメントへ内訳を投稿する#1406

Merged
TinyKitten merged 5 commits intodevfrom
claude/add-pr-validation-comments-OWqjc
Mar 5, 2026
Merged

data_validatorの不整合検出時にPRコメントへ内訳を投稿する#1406
TinyKitten merged 5 commits intodevfrom
claude/add-pr-validation-comments-OWqjc

Conversation

@TinyKitten
Copy link
Member

@TinyKitten TinyKitten commented Mar 5, 2026

Summary

  • data_validatorが不整合を検出した際、全ての不整合レコードを収集しMarkdownレポートとしてPRコメントに投稿するように改善
  • visualize_stopping_patterns.ymlと同様のパターン(peter-evans/find-comment + create-or-update-comment)を採用
  • バリデーション成功時は過去の不整合コメントを自動削除

Test plan

  • 5!station_station_types.csvに不正なstation_cdを含むデータを追加してPRを作成し、不整合の内訳がコメントとして投稿されることを確認
  • 不正データを修正して再pushし、コメントが削除されることを確認
  • cargo check --bin data_validator および cargo clippy --bin data_validator が通ることを確認

https://claude.ai/code/session_01Lr3k5y8UYcH26a8hndo8ek

Summary by CodeRabbit

リリースノート

  • New Features

    • プルリクでデータ検証結果を自動投稿・更新・削除する機能を追加。検証失敗時は詳細レポートを投稿し、成功時は既存レポートを削除。
    • 検証失敗時にジョブを明示的に失敗扱いにするステップを導入。
  • Chores

    • レコード単位の検証を強化し、無効レコードの詳細を含む折りたたみ可能なMarkdownレポートを生成するよう改善。

- data_validatorを改善し、最初の1件だけでなく全ての不整合レコードを収集するように変更
- 不整合検出時にMarkdownレポートファイルを生成
- verify_data_Integrity.ymlワークフローにPRコメント投稿機能を追加
  - 失敗時: 不整合の内訳をPRコメントとして投稿
  - 成功時: 過去の不整合コメントがあれば削除
- visualize_stopping_patternsと同様のパターン(peter-evans/find-comment, create-or-update-comment)を採用

https://claude.ai/code/session_01Lr3k5y8UYcH26a8hndo8ek
@github-actions github-actions bot added the ci/cd label Mar 5, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ad623b68-86da-456e-8710-65248224ba83

📥 Commits

Reviewing files that changed from the base of the PR and between 1f35759 and 5a99b32.

📒 Files selected for processing (1)
  • data_validator/src/main.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • data_validator/src/main.rs

📝 Walkthrough

Walkthrough

GitHub Actions ワークフローが単一の cargo run からマルチステップ検証フローへ置換され、検証結果を Outputs に設定。PR では既存コメント検索、失敗時の投稿/更新、成功時の削除を行い、データバリデータは無効行を収集して失敗時に Markdown レポートを出力します。

Changes

Cohort / File(s) Summary
GitHub Actions ワークフロー
.github/workflows/verify_data_Integrity.yml
単一の cargo run 呼び出しをマルチステップ検証フローへ置換。検証結果を GitHub Outputs に設定し、PR 向けに既存コメント検索、失敗時の投稿/更新、成功時の削除を追加。検証失敗時は専用ステップでジョブを失敗させる。
データバリデーターロジック
data_validator/src/main.rs
単一フラグから複数のアキュムレータ(invalid_station_ids, invalid_type_ids)へ変更し、各行を個別に検証して無効行を蓄積。失敗時に build_markdown_report() で Markdown レポートを生成し、環境変数指定可能なパスへ書き出してからパニックで終了する処理を追加。新関数 fn build_markdown_report(&[String], &[String]) -> String を追加。

Sequence Diagram(s)

sequenceDiagram
    participant Workflow as "GitHub Workflow"
    participant Validator as "Data Validator"
    participant API as "GitHub API"
    participant FS as "Filesystem"

    Workflow->>Validator: Run data validator
    Validator->>Validator: Validate records\n(accumulate invalid_station_ids, invalid_type_ids)

    alt Validation Fails
        Validator->>FS: Write markdown report (env path)
        Validator-->>Workflow: set output = failure
        Workflow->>API: Find existing comment (PR only)
        API-->>Workflow: Return comment ID / none
        Workflow->>API: Post or update failure comment
        Workflow->>Workflow: Fail job (failure step)
    else Validation Succeeds
        Validator-->>Workflow: set output = success
        Workflow->>API: Find existing comment (PR only)
        API-->>Workflow: Return comment ID / none
        Workflow->>API: Delete comment (if exists)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰✨
行をかじり見てはぴょん、誤りを見つけるよ、
無効なIDはきちんと並べて、
マークダウンで包み届けるんだ、
コメントで知らせて、消してまた跳ぶ。

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed プルリクエストのタイトルは、data_validatorが不整合を検出した際にPRコメントに内訳を投稿する機能追加という変更内容を正確に要約しており、チェンジセットの主要な目的と完全に一致しています。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/add-pr-validation-comments-OWqjc

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
data_validator/src/main.rs (1)

28-44: パフォーマンス改善: HashSet の使用を検討してください。

Vec::contains は O(n) のため、レコード数が多い場合は全体の処理が O(n*m) になります。station_idstype_idsHashSet に変更すると、ルックアップが O(1) になり効率的です。

また、2つのループを1つに統合することで、records のイテレーションを1回に削減できます。

♻️ 改善案
+use std::collections::HashSet;
 use core::panic;
 use std::path::Path;

 use csv::{ReaderBuilder, StringRecord};

 fn main() -> Result<(), Box<dyn std::error::Error>> {
     let mut invalid_station_ids: Vec<String> = Vec::new();
     let mut invalid_type_ids: Vec<String> = Vec::new();

     let data_path: &Path = Path::new("data");
     let mut rdr = ReaderBuilder::new().from_path(data_path.join("3!stations.csv"))?;
     let records: Vec<StringRecord> = rdr.records().filter_map(|row| row.ok()).collect();
-    let station_ids: Vec<u32> = records
+    let station_ids: HashSet<u32> = records
         .iter()
         .map(|row| row.get(0).unwrap().parse::<u32>().unwrap())
         .collect();

     let mut rdr = ReaderBuilder::new().from_path(data_path.join("4!types.csv"))?;
     let records: Vec<StringRecord> = rdr.records().filter_map(|row| row.ok()).collect();
-    let type_ids: Vec<u32> = records
+    let type_ids: HashSet<u32> = records
         .iter()
         .map(|row| row.get(1).unwrap().parse::<u32>().unwrap())
         .collect();

     let mut rdr = ReaderBuilder::new().from_path(data_path.join("5!station_station_types.csv"))?;
     let records: Vec<StringRecord> = rdr.records().filter_map(|row| row.ok()).collect();

-    for record in &records {
-        let station_cd = record.get(1).unwrap();
-        if !station_ids.contains(&station_cd.parse::<u32>().unwrap()) {
-            let line = record.iter().collect::<Vec<&str>>().join(",");
-            println!("[INVALID] Unrecognized Station ID {:?} Found!", station_cd);
-            invalid_station_ids.push(line);
-        }
-    }
-
-    for record in &records {
-        let type_cd = record.get(2).unwrap();
-        if !type_ids.contains(&type_cd.parse::<u32>().unwrap()) {
-            let line = record.iter().collect::<Vec<&str>>().join(",");
-            println!("[INVALID] Unrecognized Type ID {:?} Found!", type_cd);
-            invalid_type_ids.push(line);
+    for record in &records {
+        let station_cd = record.get(1).unwrap();
+        let type_cd = record.get(2).unwrap();
+        let line = record.iter().collect::<Vec<&str>>().join(",");
+
+        if !station_ids.contains(&station_cd.parse::<u32>().unwrap()) {
+            println!("[INVALID] Unrecognized Station ID {:?} Found!", station_cd);
+            invalid_station_ids.push(line.clone());
+        }
+        if !type_ids.contains(&type_cd.parse::<u32>().unwrap()) {
+            println!("[INVALID] Unrecognized Type ID {:?} Found!", type_cd);
+            invalid_type_ids.push(line);
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data_validator/src/main.rs` around lines 28 - 44, Convert station_ids and
type_ids to HashSet<u32> and change the membership checks from Vec::contains to
HashSet::contains for O(1) lookups, then merge the two separate loops over
records into a single loop that checks both station and type per record and
pushes into invalid_station_ids and invalid_type_ids as needed; reference the
existing symbols records, station_ids, type_ids, invalid_station_ids,
invalid_type_ids and ensure you parse station_cd/type_cd to u32 once per record
(handling parse errors as appropriate) to avoid duplicate parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@data_validator/src/main.rs`:
- Around line 28-44: Convert station_ids and type_ids to HashSet<u32> and change
the membership checks from Vec::contains to HashSet::contains for O(1) lookups,
then merge the two separate loops over records into a single loop that checks
both station and type per record and pushes into invalid_station_ids and
invalid_type_ids as needed; reference the existing symbols records, station_ids,
type_ids, invalid_station_ids, invalid_type_ids and ensure you parse
station_cd/type_cd to u32 once per record (handling parse errors as appropriate)
to avoid duplicate parsing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aa3ad3ff-4703-4a80-8c24-f48718b4d8a2

📥 Commits

Reviewing files that changed from the base of the PR and between 11802dd and 36a763a.

📒 Files selected for processing (2)
  • .github/workflows/verify_data_Integrity.yml
  • data_validator/src/main.rs

- station_ids/type_idsをVec<u32>からHashSet<u32>に変更しO(1)ルックアップに
- 2つの別々のループを1つに統合し、レコード毎にstation_cd/type_cdを一度だけパース

https://claude.ai/code/session_01Lr3k5y8UYcH26a8hndo8ek
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data_validator/src/main.rs`:
- Around line 74-76: The Markdown table is broken because raw CSV values (e.g.,
in the loop over invalid_station_ids that calls md.push_str(&format!("| `{}`
|\n", line))) are not escaped; create a small helper (e.g.,
escape_markdown_cell) and use it wherever you format CSV values into the
Markdown cells (the loop over invalid_station_ids and the similar loop at lines
88-90) to replace backticks and pipe characters (at minimum replace '`' →
"&#96;" and '|' → "&#124;") before inserting into md.push_str so table cells
never break the Markdown layout.
- Around line 29-44: The loop over records currently accumulates invalid entries
in invalid_station_ids/invalid_type_ids and defers failure; instead modify the
validation within the for record in &records loop (where station_cd and type_cd
are parsed and line() is constructed) to panic immediately upon the first
detection of an unrecognized station or type (i.e., replace pushing to
invalid_station_ids/invalid_type_ids with a panic! that includes a clear message
and the offending line() or IDs). Remove the post-loop has_err check/collection
logic so the program fails fast as soon as station_cd or type_cd validation
fails.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29337ef6-6396-40fc-b222-564ce70ba688

📥 Commits

Reviewing files that changed from the base of the PR and between 36a763a and 6c31f02.

📒 Files selected for processing (1)
  • data_validator/src/main.rs

Raw CSV values containing backticks or pipe characters would break
the Markdown table layout in the validation report. Add an
escape_markdown_cell helper that replaces ` → &#96; and | → &#124;.

https://claude.ai/code/session_01Lr3k5y8UYcH26a8hndo8ek
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data_validator/src/main.rs`:
- Around line 13-14: CSVパースエラーを黙って捨てている箇所を修正してください:
rdr.records().filter_map(|row| row.ok()).collect() を、パースエラーを伝播するよう
rdr.records().collect::<Result<Vec<_>, _>>()?(または collect::<Result<Vec<_>,
csv::Error>>()?)に置き換え、3箇所の該当コード(records: Vec<StringRecord> の生成、station_ids
の生成、もう一箇所の filter_map を使っている箇所)すべてに適用してエラーを ? で返すようにしてください。

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f918c39d-f38a-42e6-b559-dc4f28f76281

📥 Commits

Reviewing files that changed from the base of the PR and between 6c31f02 and 587f58b.

📒 Files selected for processing (1)
  • data_validator/src/main.rs

…ng them

Replace filter_map(|row| row.ok()) with collect::<Result<Vec<_>, _>>()?
at all three CSV-reading sites so malformed rows surface as errors
rather than being silently skipped.

https://claude.ai/code/session_01Lr3k5y8UYcH26a8hndo8ek
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data_validator/src/main.rs`:
- Around line 29-41: The loop over records currently uses unwrap() in
record.get(...).unwrap().parse().unwrap() (used to produce station_cd and
type_cd), which can panic on missing columns or parse errors; change the logic
in the for record in &records loop to validate record.get(1) and record.get(2)
and their parse results via pattern matching (e.g., match or if let) instead of
unwrap, and when a get or parse fails push the line() into invalid_station_ids
or invalid_type_ids (as appropriate) and continue processing; keep the existing
line closure and println! messages but replace the unwrap-based assignments for
station_cd and type_cd with safe parsing that records invalid lines rather than
panicking.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 061c20c5-e170-4ce7-b4d8-73f825f1d6a1

📥 Commits

Reviewing files that changed from the base of the PR and between 587f58b and 1f35759.

📒 Files selected for processing (1)
  • data_validator/src/main.rs

Use match with record.get().and_then(parse) instead of chained
unwrap() calls for station_cd and type_cd. Missing or unparseable
values now log the row and push it into the appropriate invalid list
instead of panicking.

https://claude.ai/code/session_01Lr3k5y8UYcH26a8hndo8ek
@TinyKitten TinyKitten merged commit 1fb3ef4 into dev Mar 5, 2026
10 checks passed
@TinyKitten TinyKitten deleted the claude/add-pr-validation-comments-OWqjc branch March 5, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants