Skip to content

Commit af14ae3

Browse files
jahoomaclaude
andcommitted
Fix model ID and clean up parallelism comments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 6a7de4c commit af14ae3

File tree

3 files changed

+4
-10
lines changed

3 files changed

+4
-10
lines changed

evalbuff/src/docs-optimizer.ts

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -340,13 +340,8 @@ export function revertDocEdit(
340340
/**
341341
* Compare scores to determine if a doc edit improved things.
342342
*
343-
* With parallelism=1, score variance is very high (often 3+ points on
344-
* the same task). To avoid rejecting good docs due to noise:
345-
* - Require only small improvement to accept (0.3 threshold)
346-
* - Require large decline to reject (1.5 threshold) — benefit of the doubt
347-
*
348-
* With higher parallelism, averages are more stable so we can use
349-
* tighter thresholds.
343+
* With parallelism=5, averages are reasonably stable. A 0.3 threshold
344+
* catches real improvements without being too sensitive to noise.
350345
*/
351346
export function compareScores(
352347
oldScore: number,

evalbuff/src/llm.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ import { generateText } from 'ai'
1212

1313
const anthropic = createAnthropic()
1414

15-
const DEFAULT_MODEL = 'claude-sonnet-4-6-20250415'
15+
const DEFAULT_MODEL = 'claude-sonnet-4-6'
1616

1717
/**
1818
* Generate a task prompt from a commit diff using the LLM API directly.

evalbuff/src/run-evalbuff.ts

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -473,8 +473,7 @@ async function improveDocs(opts: {
473473

474474
if (comparison === 'improved' || comparison === 'same') {
475475
// 'improved' = clear signal the doc helps
476-
// 'same' = within noise range — keep it (benefit of the doubt,
477-
// especially at low parallelism where variance is high)
476+
// 'same' = within noise range — keep it (benefit of the doubt)
478477
const reason = comparison === 'improved' ? 'score improved' : 'within noise range, keeping'
479478
console.log(` Keeping doc: ${docSuggestion.suggestedDocPath} (${reason})`)
480479
docsKept.push({

0 commit comments

Comments
 (0)