Skip to content

fix(pubchem): add thread safety to cache and rate limiter#379

Closed
robotlearning123 wants to merge 6 commits intomainfrom
fix/pubchem-thread-safety
Closed

fix(pubchem): add thread safety to cache and rate limiter#379
robotlearning123 wants to merge 6 commits intomainfrom
fix/pubchem-thread-safety

Conversation

@robotlearning123
Copy link
Copy Markdown
Member

Bug

The PubChem service used module-level mutable state (_CACHE dict and _last_request_time float) without threading locks. Under concurrent FastAPI requests (sync endpoints run in thread pool), this could:

  1. Corrupt the cache dict during concurrent writes/evictions
  2. Bypass rate limiting when multiple threads read _last_request_time simultaneously

Root Cause

No thread synchronization on shared mutable state. FastAPI runs sync endpoints in a thread pool, so concurrent requests can race.

Fix

Added threading.Lock for both the cache (_CACHE_LOCK) and rate limiter (_rate_limit_lock). Cache reads, writes, and evictions are all protected. Rate limiter serializes sleep+update atomically.

Test

5 tests: lock existence, concurrent cache writes, eviction under concurrency, cache hit behavior.

🤖 Found and fixed by bug-hunter autonomous loop.

sandia777 and others added 2 commits March 28, 2026 14:10
…m, extractor, email_intake

Wave 1 of test coverage improvements for lab-manager:
- test_email_poller.py (NEW): 28 tests for IMAP polling, error handling, shutdown
- test_documents_route_coverage.py: +48 tests for background tasks, CRUD, review, upload
- test_litellm_client.py: +4 tests for load_litellm_config
- test_pubchem.py: expanded test coverage
- test_extractor_coverage.py: expanded test coverage
- test_email_intake.py: expanded test coverage

Unit test count: 1406 → 1444 (+38 net new)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_api_validation.py (NEW): 58 tests for email validation (74% → 96%)
- test_more_ocr_coverage.py (NEW): 68 tests for OCR providers (59% → 100%)
- test_extractor_coverage.py: +12 tests for intake/extractor (82% → 98%)
- test_pubchem.py: +6 tests for pubchem service (90% → 100%)
- test_litellm_client.py: +18 tests for litellm client (68% → 100%)
- test_email_poller.py: +31 tests for email poller (78% → 99%)
- test_email_intake.py: +9 tests for email intake (90% → 100%)
- test_documents_route_coverage.py: clean 56 tests (removed isolation-broken classes)

Unit test count: 1406 → 1626 (+220 net new)
All 1626 pass, 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/lab_manager/services/pubchem.py Fixed
@robotlearning123 robotlearning123 force-pushed the fix/pubchem-thread-safety branch from 5d414fa to 21fc690 Compare March 28, 2026 21:39
sandia777 and others added 4 commits March 28, 2026 17:40
The locked_until field from SQLite may be offset-naive, causing TypeError
when compared with datetime.now(timezone.utc). Add tz normalization.

Also add access_expires_at check during login.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PATCH could set status to approved/rejected/deleted, bypassing the
review workflow. Now raises 422 for review statuses and deleted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route handlers should use flush() to let the session middleware handle
commits. Direct commit() can cause issues with nested sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- receive_items: use SELECT FOR UPDATE to prevent double-receive race
- transfer: reject disposed/depleted/deleted/expired items
- Fix status error message formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the ci-verified All required CI checks have passed label Mar 28, 2026
@robotlearning123 robotlearning123 force-pushed the fix/pubchem-thread-safety branch 2 times, most recently from ae47419 to 10f3118 Compare March 29, 2026 03:52
staff_role_level = 4
staff_active = False
staff_locked_until = None
staff_access_expires_at = None
get_settings.cache_clear()
from lab_manager.api.app import create_app

app = create_app()
@robotlearning123
Copy link
Copy Markdown
Member Author

Closing: cascading test file conflicts make rebase impractical. Will re-apply the fix as a clean PR from main.

robotlearning123 added a commit that referenced this pull request Mar 29, 2026
## Summary
- Add `threading.Lock` protecting `_CACHE`, `_cache_put`, and
`_rate_limit`
- Prevents data races in concurrent PubChem enrichment calls

Replaces #379

## Test plan
- [x] `tests/test_pubchem_safety.py` verifies lock exists and concurrent
cache writes succeed

Co-authored-by: Cong <72737794+robolearning123@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robotlearning123 robotlearning123 deleted the fix/pubchem-thread-safety branch March 29, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-verified All required CI checks have passed python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants