Skip to content

fix: IPA生成でエ段の母音をɛからeに変更しGoogle TTSの発音を修正#1418

Merged
TinyKitten merged 4 commits intodevfrom
claude/fix-seibu-ipa-pronunciation-kpjNL
Mar 6, 2026
Merged

fix: IPA生成でエ段の母音をɛからeに変更しGoogle TTSの発音を修正#1418
TinyKitten merged 4 commits intodevfrom
claude/fix-seibu-ipa-pronunciation-kpjNL

Conversation

@TinyKitten
Copy link
Member

@TinyKitten TinyKitten commented Mar 6, 2026

Summary

  • IPA生成で使用していた半開前舌母音 ɛ を中段前舌母音 e に変更
  • Google TTSが ɛi を英語の /aɪ/ のように解釈し、「西武(セイブ)」が「さいぶ」と発音される問題を修正
  • ei → eː の長母音化ルールを追加(既存の oɯ → oː と同様のパターン)

Test plan

  • 全49件のIPAユニットテストが通過することを確認
  • Google TTSで「西武」「京成」「名鉄」等のエイ音を含む路線名が正しく発音されることを確認

https://claude.ai/code/session_0199yco9qEy3UF4UoBegNEzN

Summary by CodeRabbit

リリースノート

  • バグ修正

    • 単一母音表記を「ɛ」から「e」へ統一し、複数の駅名IPA表記を修正
    • 母音検出・発音分割判定を改善して接続処理の精度を向上
    • 「えい」等の長音化を正しく扱い、長音記号の重複挿入を防止
    • 複数語間の空白を保持する出力を改善
  • テスト

    • IPA出力の期待値と長音重複防止などのケースを追加・更新

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

片仮名→IPA変換の内部ロジックを修正し、複数母音マッピングで ɛe`` に統一。オンセット検出/最後母音判定と長母音処理(ei → eː`、長音マークの重複回避)を更新し、関連テスト期待値を調整。公開API署名に変更なし。

Changes

Cohort / File(s) Summary
IPA mapping & vowel logic
stationapi/src/domain/ipa.rs
単文字母音マッピングの ɛe への置換。split_onset から ɛ を除外しオンセット判定を狭める。last_vowel'e' を認識するよう更新。apply_vowel_lengthei → eː 処理を追加し、直後の長音記号(ː/ー)をスキップして重複を防止。
Tests / DTO expectations
stationapi/src/use_case/dto/line.rs
line DTO の name_ipa 等テスト期待値を新しいIPA出力に合わせて更新(例: sɛi...seː... 等)。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

fix

Poem

🐰 えをえに変えてぴょんと一跳び、
音の小径をすり抜ける虹のよう、
長い「え」は伸ばしてぽんと響き、
テストの草原に跡を残して、
新しい発音でこんにちは。

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed プルリクエストのタイトルは「IPA生成でエ段の母音をɛからeに変更しGoogle TTSの発音を修正」であり、変更内容の要点を正確に反映しています。母音マッピングの変更とGoogle TTSの発音修正という主要な改善点が明確に表現されています。
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/fix-seibu-ipa-pronunciation-kpjNL

Comment @coderabbitai help to get the list of available commands and usage tips.

…pronunciation

The open-mid front vowel ɛ was causing Google TTS to misinterpret
Japanese エ-row sounds. For example, セイブ (Seibu/西武) generated
IPA "sɛibɯ" which TTS read as "さいぶ" instead of "せいぶ".

Changes:
- Replace all ɛ mappings with e (mid front vowel) for エ-row kana
- Add ei → eː vowel contraction rule (parallel to existing oɯ → oː)
  so エイ sequences produce long [eː] as in natural Japanese speech

https://claude.ai/code/session_0199yco9qEy3UF4UoBegNEzN
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 382-388: Unit tests (test_seibu, test_keisei, test_meitetsu)
verify the ei → eː IPA mapping in ipa.rs, but you must validate the real Google
TTS output before merging: run the IPA conversion for those three station names
using the conversion function in ipa.rs, synthesize audio via Google TTS with
the same locale/voice settings used in production, save the generated audio, and
compare the perceived pronunciations against the expected IPA (note any
mismatches such as whether Google renders "ei" as a long e or as separate
vowels); then attach the audio samples and a short result summary to the PR and,
if Google differs, update the ipa.rs mapping or add a normalization/SSML
workaround so runtime TTS matches the intended pronunciation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dccb0518-6a80-42bb-a2b7-a4a56380c35c

📥 Commits

Reviewing files that changed from the base of the PR and between d602572 and 0c09cfe.

📒 Files selected for processing (1)
  • stationapi/src/domain/ipa.rs

@TinyKitten TinyKitten force-pushed the claude/fix-seibu-ipa-pronunciation-kpjNL branch from 0c09cfe to 2d1a027 Compare March 6, 2026 12:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 382-388: The ei→eː branch can produce duplicate long-tone marks
(e.g. input "...eiー" -> "eːː"); modify the block in ipa.rs that handles the `if
chars[i] == 'e' && chars[i + 1] == 'i'` case so that after you intend to append
'ː' you first check and avoid duplicating it — e.g. skip pushing 'ː' if the last
char in `result` is already 'ː' or if the next input char (chars[i+2] if
present) is a chōon mark like 'ː' or 'ー'; update the `result.push('e');
result.push('ː'); i += 2;` logic in that branch accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ca9f833b-84c1-4815-81d7-af44526c0984

📥 Commits

Reviewing files that changed from the base of the PR and between 0c09cfe and 2d1a027.

📒 Files selected for processing (2)
  • stationapi/src/domain/ipa.rs
  • stationapi/src/use_case/dto/line.rs

When input like "eiː" is processed by apply_vowel_length, the ei→eː
rule already appends ː, then the trailing ː would be pushed again.
Skip a following long-vowel mark after the contraction to avoid "eːː".

https://claude.ai/code/session_0199yco9qEy3UF4UoBegNEzN
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
stationapi/src/domain/ipa.rs (1)

368-392: ⚠️ Potential issue | 🟡 Minor

ei 以外の長母音収縮で ː が重複する余地があります。

ei 分岐では後続長音を吸収していますが、oɯ -> oː / oo -> oː では同じ吸収がなく、入力次第で oːː が残る可能性があります。収縮分岐で処理を統一したほうが安全です。

🔧 提案差分
     while i < len {
         if i + 1 < len && chars[i] == 'o' && chars[i + 1] == 'ɯ' {
             // oɯ → oː (おう/こう pattern)
             result.push('o');
             result.push('ː');
             i += 2;
+            if i < len && chars[i] == 'ː' {
+                i += 1;
+            }
             continue;
         }
         if i + 1 < len && chars[i] == 'o' && chars[i + 1] == 'o' {
             // oo → oː (おお pattern)
             result.push('o');
             result.push('ː');
             i += 2;
+            if i < len && chars[i] == 'ː' {
+                i += 1;
+            }
             continue;
         }
         if i + 1 < len && chars[i] == 'e' && chars[i + 1] == 'i' {
             // ei → eː (えい/けい pattern — 京成 keisei → keːseː)
             result.push('e');
             result.push('ː');
             i += 2;
             // Skip a following long-vowel mark to avoid duplicate 'ː'
-            if i < len && (chars[i] == 'ː' || chars[i] == 'ー') {
+            if i < len && chars[i] == 'ː' {
                 i += 1;
             }
             continue;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/domain/ipa.rs` around lines 368 - 392, The oɯ and oo
long-vowel branches can leave a duplicate 'ː' because they don't skip a
following long-vowel mark like the ei branch does; update the if blocks that
check chars[i]=='o' && chars[i+1]=='ɯ' and chars[i]=='o' && chars[i+1]=='o' so
after pushing 'o' and 'ː' and advancing i by 2 you also check if i < len and
(chars[i] == 'ː' || chars[i] == 'ー') and, if so, increment i by 1 to consume the
existing long-vowel marker (use the same pattern as the ei branch; refer to
variables chars, i, len, and result in ipa.rs).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 368-392: The oɯ and oo long-vowel branches can leave a duplicate
'ː' because they don't skip a following long-vowel mark like the ei branch does;
update the if blocks that check chars[i]=='o' && chars[i+1]=='ɯ' and
chars[i]=='o' && chars[i+1]=='o' so after pushing 'o' and 'ː' and advancing i by
2 you also check if i < len and (chars[i] == 'ː' || chars[i] == 'ー') and, if so,
increment i by 1 to consume the existing long-vowel marker (use the same pattern
as the ei branch; refer to variables chars, i, len, and result in ipa.rs).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 554f4243-3713-4cd1-a187-d6e1c4ed9c8c

📥 Commits

Reviewing files that changed from the base of the PR and between 2d1a027 and fec36de.

📒 Files selected for processing (1)
  • stationapi/src/domain/ipa.rs

Same issue as the ei→eː branch: if the input already has a trailing
long-vowel mark after the digraph, the contraction would produce "oːː".
Add the same skip-next-ː guard to both branches.

https://claude.ai/code/session_0199yco9qEy3UF4UoBegNEzN
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
stationapi/src/domain/ipa.rs (1)

373-398: 'ー' のチェックは到達不能コードの可能性があります。

apply_vowel_length が呼ばれる時点で、全ての 'ー' は既に apply_phonological_rules 内の Phoneme::LongVowel 処理により 'ː' に変換されています。そのため、chars[i] == 'ー' の条件は常に false になります。

防御的プログラミングとして残すか、削除するかはお任せします。

♻️ 簡略化案(オプショナル)
             // Skip a following long-vowel mark to avoid duplicate 'ː'
-            if i < len && (chars[i] == 'ː' || chars[i] == 'ー') {
+            if i < len && chars[i] == 'ː' {
                 i += 1;
             }

同様の変更を3箇所(lines 374, 385, 396)に適用。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/domain/ipa.rs` around lines 373 - 398, apply_vowel_length
contains redundant checks for 'ー' because apply_phonological_rules already
converts Phoneme::LongVowel to 'ː'; remove the unreachable chars[i] == 'ー'
conditions in the three places inside apply_vowel_length (the blocks handling
"aa", "oo", and "ei") so the skip-after-long-vowel checks only test for 'ː' (or,
alternatively, leave a single explanatory comment and keep only 'ː' for
defensive clarity); locate these checks by searching for the comments "Skip a
following long-vowel mark to avoid duplicate 'ː'" within apply_vowel_length and
update the conditions accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 373-398: apply_vowel_length contains redundant checks for 'ー'
because apply_phonological_rules already converts Phoneme::LongVowel to 'ː';
remove the unreachable chars[i] == 'ー' conditions in the three places inside
apply_vowel_length (the blocks handling "aa", "oo", and "ei") so the
skip-after-long-vowel checks only test for 'ː' (or, alternatively, leave a
single explanatory comment and keep only 'ː' for defensive clarity); locate
these checks by searching for the comments "Skip a following long-vowel mark to
avoid duplicate 'ː'" within apply_vowel_length and update the conditions
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74aec398-184f-4833-be3e-720025b1adf4

📥 Commits

Reviewing files that changed from the base of the PR and between fec36de and 4b073c1.

📒 Files selected for processing (1)
  • stationapi/src/domain/ipa.rs

@TinyKitten TinyKitten merged commit 4de3289 into dev Mar 6, 2026
11 checks passed
@TinyKitten TinyKitten deleted the claude/fix-seibu-ipa-pronunciation-kpjNL branch March 6, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants