perf: reduce search batch test from 10k to 100 documents#2526
perf: reduce search batch test from 10k to 100 documents#2526BrendanWalsh wants to merge 1 commit intomasterfrom
Conversation
|
Hey @BrendanWalsh 👋! We use semantic commit messages to streamline the release process. Examples of commit messages with semantic prefixes:
To test your commit locally, please follow our guild on building from source. |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
Reduces the scale of the Azure Cognitive Search batch-write test inputs in services.search.split1 to shorten CI runtime for the search test matrix job.
Changes:
- Reduce
bigDFtest dataset size from 10,000 to 100 documents. - Reduce custom
batchSizefrom 2,000 to 20 to keep a 5:1 doc-to-batch ratio. - Update the corresponding index-size assertion from 10,000 to 100.
| lazy val in3 = generateIndexName() | ||
| dependsOn(3, writeHelper(bigDF, in3, isVectorField=false, Map("batchSize" -> "2000"))) | ||
| dependsOn(3, writeHelper(bigDF, in3, isVectorField=false, Map("batchSize" -> "20"))) | ||
|
|
||
| dependsOn(1, retryWithBackoff(assertSize(in1, 4))) | ||
| dependsOn(2, retryWithBackoff(assertSize(in2, 10))) | ||
| dependsOn(3, retryWithBackoff(assertSize(in3, 10000))) | ||
| dependsOn(3, retryWithBackoff(assertSize(in3, 100))) |
There was a problem hiding this comment.
The batch-size scenario (dependsOn(3, ...)) is currently gated by testsToRun = Set(1, 2) //, 3), so this code path (and the bigDF indexing) does not run. As written, reducing bigDF and the batchSize here is unlikely to affect CI runtime unless test 3 is re-enabled or the gating logic is removed/made configurable; otherwise the PR description’s motivation doesn’t match what executes.
Reduce the document count in the batch-size search test (test 3) from 10,000 to 100 documents and re-enable it. Test 3 was previously gated off because 10k documents made the search1 job a ~27-minute long-pole. At 100 documents (batch size 20, preserving the 5:1 ratio), the test exercises the same batch-write code paths without the excessive runtime. Changes: - bigDF: createTestData(10000) -> createTestData(100) - batchSize: 2000 -> 20 - assertSize: 10000 -> 100 - testsToRun: Set(1, 2) -> Set(1, 2, 3) [re-enabled]
975b73b to
3b00897
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2526 +/- ##
=======================================
Coverage 84.61% 84.61%
=======================================
Files 335 335
Lines 17708 17708
Branches 1612 1612
=======================================
Hits 14984 14984
Misses 2724 2724 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Reduces the Azure Search batch write test from 10,000 documents to 100 documents, and re-enables the previously disabled test. This test was a 27-minute long-pole job in CI.
Changes
testsToRun = Set(1, 2, 3)) — previously disabled because 10k documents was too slowMotivation
The
search1test job was consistently the longest-running unit test at ~27 minutes, primarily due to indexing 10,000 documents in Azure Search. The test validates batch writing correctness, which doesn't require 10k documents. Test 3 (custom batch size) was gated off because of this excessive runtime. With 100 documents and proportional batch sizes, all 3 tests can run quickly while exercising the same code paths.Testing
Split from #2506 for independent review.