Skip to content

Add support for rotating multiple proxies & improving timing in dynamic catalogues link harvesting#180

Merged
rsaksida merged 3 commits intomainfrom
feat/multiple-proxies
Mar 2, 2026
Merged

Add support for rotating multiple proxies & improving timing in dynamic catalogues link harvesting#180
rsaksida merged 3 commits intomainfrom
feat/multiple-proxies

Conversation

@alexculealt
Copy link
Copy Markdown
Collaborator

This PR aims to fix reliability issues when proxies arbitrarily stop working on some page loads during an extraction, in addition to detecting the same loading issues to AJAX dependencies of pages. The solution proposed is to rotate multiple proxy URLs until one works. In testing, this appears sufficiently effective and could lead to proxy usage cost decreases when the websites are accessible without a proxy. Should resolve #176 and #177

  • 34c44aa — Fix rejection called to undefined function (Alex Culea, 2026-02-23)
  • 37bdaaa — Add a log entry for each page attempt failure and surface them in the UI
    • Adds crawl page ID to the extraction_logs table so we can log information relating to a certain page
    • Adjusts crawl page detail UI to surface the associated logs
    • Adds logging catch all in page fetching failed jobs
  • a48d4af — Add support for multiple proxies rotation and no-proxy first attempt
    • Adds retry mechanism to page fetching logic, starting with no proxy at first, then moving to the next configured proxy URL
    • Adjusts proxy setting UI to allow list editing, allowing to add or remove proxy URLs
    • Fixes a number of timing issue when dynamically harvesting URLs contained in the page

@alexculealt alexculealt requested a review from rsaksida February 23, 2026 12:21
@rsaksida rsaksida merged commit 5e0d60b into main Mar 2, 2026
@rsaksida rsaksida deleted the feat/multiple-proxies branch March 2, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Norco (Dynamic variant of Curriqunet) type catalogues load pages partially

2 participants