perf: reduce search batch test from 10k to 100 documents by BrendanWalsh · Pull Request #2526 · microsoft/SynapseML

BrendanWalsh · 2026-03-27T05:27:27Z

Summary

Reduces the Azure Search batch write test from 10,000 documents to 100 documents, and re-enables the previously disabled test. This test was a 27-minute long-pole job in CI.

Changes

SearchWriterSuitePart1.scala:
- Document count: 10,000 → 100
- Batch size: 2,000 → 20 (preserves the 5:1 document-to-batch ratio)
- Assertion count: 10,000 → 100
- Re-enable test 3 (testsToRun = Set(1, 2, 3)) — previously disabled because 10k documents was too slow

Motivation

The search1 test job was consistently the longest-running unit test at ~27 minutes, primarily due to indexing 10,000 documents in Azure Search. The test validates batch writing correctness, which doesn't require 10k documents. Test 3 (custom batch size) was gated off because of this excessive runtime. With 100 documents and proportional batch sizes, all 3 tests can run quickly while exercising the same code paths.

Testing

Same test assertions, same batch ratio, same code paths exercised
Test 3 is now re-enabled and will run in CI

Split from #2506 for independent review.

github-actions · 2026-03-27T05:27:35Z

Hey @BrendanWalsh 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

github-actions · 2026-03-27T05:27:37Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 3b00897.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copilot

Pull request overview

Reduces the scale of the Azure Cognitive Search batch-write test inputs in services.search.split1 to shorten CI runtime for the search test matrix job.

Changes:

Reduce bigDF test dataset size from 10,000 to 100 documents.
Reduce custom batchSize from 2,000 to 20 to keep a 5:1 doc-to-batch ratio.
Update the corresponding index-size assertion from 10,000 to 100.

Copilot · 2026-03-27T05:31:03Z

...est/scala/com/microsoft/azure/synapse/ml/services/search/split1/SearchWriterSuitePart1.scala

    lazy val in3 = generateIndexName()
-    dependsOn(3, writeHelper(bigDF, in3, isVectorField=false, Map("batchSize" -> "2000")))
+    dependsOn(3, writeHelper(bigDF, in3, isVectorField=false, Map("batchSize" -> "20")))

    dependsOn(1, retryWithBackoff(assertSize(in1, 4)))
    dependsOn(2, retryWithBackoff(assertSize(in2, 10)))
-    dependsOn(3, retryWithBackoff(assertSize(in3, 10000)))
+    dependsOn(3, retryWithBackoff(assertSize(in3, 100)))


The batch-size scenario (dependsOn(3, ...)) is currently gated by testsToRun = Set(1, 2) //, 3), so this code path (and the bigDF indexing) does not run. As written, reducing bigDF and the batchSize here is unlikely to affect CI runtime unless test 3 is re-enabled or the gating logic is removed/made configurable; otherwise the PR description’s motivation doesn’t match what executes.

Reduce the document count in the batch-size search test (test 3) from 10,000 to 100 documents and re-enable it. Test 3 was previously gated off because 10k documents made the search1 job a ~27-minute long-pole. At 100 documents (batch size 20, preserving the 5:1 ratio), the test exercises the same batch-write code paths without the excessive runtime. Changes: - bigDF: createTestData(10000) -> createTestData(100) - batchSize: 2000 -> 20 - assertSize: 10000 -> 100 - testsToRun: Set(1, 2) -> Set(1, 2, 3) [re-enabled]

BrendanWalsh · 2026-03-27T19:25:48Z

/azp run

azure-pipelines · 2026-03-27T19:26:00Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter · 2026-03-27T19:46:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.61%. Comparing base (895752c) to head (3b00897).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2526   +/-   ##
=======================================
  Coverage   84.61%   84.61%           
=======================================
  Files         335      335           
  Lines       17708    17708           
  Branches     1612     1612           
=======================================
  Hits        14984    14984           
  Misses       2724     2724

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings March 27, 2026 05:27

BrendanWalsh mentioned this pull request Mar 27, 2026

ci: containerize CI pipeline with pre-built Docker image #2529

Open

Copilot started reviewing on behalf of BrendanWalsh March 27, 2026 05:28 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

BrendanWalsh force-pushed the brwals/reduce-search-batch branch from 975b73b to 3b00897 Compare March 27, 2026 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce search batch test from 10k to 100 documents#2526

perf: reduce search batch test from 10k to 100 documents#2526
BrendanWalsh wants to merge 1 commit intomasterfrom
brwals/reduce-search-batch

BrendanWalsh commented Mar 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

BrendanWalsh commented Mar 27, 2026

Uh oh!

azure-pipelines bot commented Mar 27, 2026

Uh oh!

codecov-commenter commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BrendanWalsh commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Motivation

Testing

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

BrendanWalsh commented Mar 27, 2026

Uh oh!

azure-pipelines bot commented Mar 27, 2026

Uh oh!

codecov-commenter commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BrendanWalsh commented Mar 27, 2026 •

edited

Loading

github-actions bot commented Mar 27, 2026 •

edited

Loading

codecov-commenter commented Mar 27, 2026 •

edited

Loading