-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Context
- From discussion here: feat(db): improve
GET /annotate/allperformance #601 (review) - The materialized view which drives the ordering of annotations to GET refreshes periodically
- As it works now, if we have a labeling event, everyone at the event would be labeling the same URLs, even if they meet the acceptance threshold
- We don't have a ton of volunteers, but we do have labeling events / training sessions where a group will work together
We are therefore in a situation where we may have thousands of URLs to annotate but everyone will be getting the one on top of the pile. Even if the view refreshed every minute, annotations often take under a minute—we'll get pileups on the top of the queue.
The goal
- Each URL gets just enough annotations, so that we don't have duplication of effort.
- It's not the end of the world if 2 or 3 people hit the endpoint at about the same time and get the same task; we shouldn't make this way slower or use an iron fist, just mitigate.
Requirements
- Once a URL meets the acceptance threshold, it should be removed from the pile proactively rather than waiting for a refresh
- Mitigate pileups
- option a: randomize which URL from near the top of the pile comes back when you GET the next annotation; chaotic but maybe simple and effective
- option b: create some kind of
task_claimstable; when an annotation task is claimed, add a timestamp; when 5 minutes have passed, the claim expires. Or, manually expire the claim when an annotation is submitted. - option c: some other strategy cleverer than my first two ideas
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels