Skip to content

Conversation

@csbobby
Copy link
Contributor

@csbobby csbobby commented Feb 10, 2026

[mellea decomp] Improve ConstraintExtractor parsing fails and improve robustness

Type of PR

  • Bug Fix

Description

This PR improves the parsing logic in ConstraintExtractor (mellea decomp) to make constraint extraction more robust, less brittle to formatting variance, and safer under partial / malformed model outputs.

ConstraintExtractor currently relies on naive patterns (e.g., line formatting). In practice, LLM outputs vary across:
• bullet styles (-, *, 1.)
• inline constraints mixed with prose
• missing sections or merged sections
• multi-line constraints with continuation indentation

This leads to:
• false negatives (constraints missed)
• false positives (non-constraints treated as constraints)
• unstable behavior across models / temperatures

Key Changes:
1. Stricter N/A handling
• The extractor now returns an empty list only when the input, after trimming and uppercasing, exactly matches known N/A variants ("N/A", "N / A", "N/ A", "N /A").
• This avoids false positives where N/A appears as a substring within a longer, meaningful constraint description.
2. Line-based parsing with empty-line filtering
• Constraints are processed line by line with leading/trailing whitespace removed.
• Empty lines are skipped to prevent generating empty constraints.
3. Bullet and numbering removal
• Common bullet and numbering prefixes (e.g. -, *, •, 1., 1)) are stripped from each line to normalize constraint text.
4. Inline multi-constraint splitting
• Single lines containing multiple constraints are split using common separators such as ;, -, –, and —.
• Each extracted constraint is individually trimmed and validated before being added to the results.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@github-actions
Copy link
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Feb 10, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@csbobby csbobby changed the title Upgrade ConstraintExtractor parsing with a robust extraction in mellea decomposition [mellea decomp] Upgrade ConstraintExtractor parsing with a robust extraction Feb 10, 2026
@csbobby csbobby changed the title [mellea decomp] Upgrade ConstraintExtractor parsing with a robust extraction [mellea decomp] Solve ConstraintExtractor parsing fails and improve robustness Feb 10, 2026
@jakelorocco jakelorocco changed the title [mellea decomp] Solve ConstraintExtractor parsing fails and improve robustness fix(mellea decomp): Solve ConstraintExtractor parsing fails and improve robustness Feb 10, 2026
auto-merge was automatically disabled February 12, 2026 16:34

Head branch was pushed to by a user without write access

@jakelorocco jakelorocco added this pull request to the merge queue Feb 12, 2026
Merged via the queue into generative-computing:main with commit ca3a7f2 Feb 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants