Skip to content

fix(ci): make dead links workflow robust against silent curl failures#2534

Merged
BrendanWalsh merged 1 commit intomasterfrom
fix/dead-links-ci-sitemap-fetch
Mar 31, 2026
Merged

fix(ci): make dead links workflow robust against silent curl failures#2534
BrendanWalsh merged 1 commit intomasterfrom
fix/dead-links-ci-sitemap-fetch

Conversation

@BrendanWalsh
Copy link
Copy Markdown
Collaborator

Problem

The Scan Website for Dead Links CI job has been consistently failing with:

Total: 0 | Successful: 0 | Errors: 0
No links were found. This usually indicates a configuration error.

The workflow fetches sitemap URLs from GitHub Pages, but curl -s was silently failing — producing no output and no error — so the grep pipeline generated an empty urls.txt, which caused lychee to report 0 links.

Root Cause

Multiple issues compounded:

  1. curl -s hides errors — network failures, DNS issues, or encoding mismatches produced no diagnostic output
  2. No pipefail — grep failures in the pipeline were silently swallowed
  3. grep -oP (PCRE) — less portable than POSIX extended regex; may behave differently across runner environments
  4. No validation — an empty urls.txt was passed to lychee, which correctly rejected it
  5. Quoted --accept value — extra single quotes around the range value in the YAML block scalar could cause lychee argument parsing issues

Fix

  • Add set -euo pipefail for strict error handling
  • Download sitemap to a file (not pipe) with --compressed and --retry 3 for reliability
  • Check HTTP status code and fail explicitly on non-200
  • Switch from grep -oP (PCRE) to grep -oE (POSIX ERE) for portability
  • Validate URL count before proceeding to lychee scan
  • Log sitemap size and dump content on failure for debuggability
  • Remove extra single quotes around --accept value

Testing

Verified locally: the sitemap at https://microsoft.github.io/SynapseML/sitemap.xml returns 344KB / 1821 URLs, and the fixed extraction pipeline handles it correctly.

The 'Scan Website for Dead Links' CI job was failing because it found
0 URLs in the sitemap. Root cause: curl -s was silently failing (no
error output) and the pipeline had no error checking, so the step
appeared to succeed with an empty urls.txt.

Changes:
- Add 'set -euo pipefail' for strict error handling
- Download sitemap to a file with --compressed and --retry for
  reliability (fixes potential gzip encoding issues)
- Check HTTP status code and fail explicitly on non-200
- Switch from grep -oP (PCRE) to grep -oE (POSIX ERE) for
  portability across runner environments
- Validate URL count before proceeding to lychee scan
- Remove extra single quotes around --accept value in lychee args
  that could cause parsing issues in YAML block scalars
- Log sitemap size and dump content on failure for debuggability
Copilot AI review requested due to automatic review settings March 31, 2026 06:01
@github-actions
Copy link
Copy Markdown

Hey @BrendanWalsh 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

  • fix: Fix LightGBM crashes with empty partitions
  • feat: Make HTTP on Spark back-offs configurable
  • docs: Update Spark Serving usage
  • build: Add codecov support
  • perf: improve LightGBM memory usage
  • refactor: make python code generation rely on classes
  • style: Remove nulls from CNTKModel
  • test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

@github-actions
Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA b39d374.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

@BrendanWalsh BrendanWalsh merged commit 1e21fd1 into master Mar 31, 2026
14 of 15 checks passed
@BrendanWalsh BrendanWalsh deleted the fix/dead-links-ci-sitemap-fetch branch March 31, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant