Skip to content

GET annotations: mitigate volunteer pileup on the first annotation tasks #604

@josh-chamberlain

Description

@josh-chamberlain

Context

  • From discussion here: feat(db): improve GET /annotate/all performance #601 (review)
  • The materialized view which drives the ordering of annotations to GET refreshes periodically
  • As it works now, if we have a labeling event, everyone at the event would be labeling the same URLs, even if they meet the acceptance threshold
  • We don't have a ton of volunteers, but we do have labeling events / training sessions where a group will work together

We are therefore in a situation where we may have thousands of URLs to annotate but everyone will be getting the one on top of the pile. Even if the view refreshed every minute, annotations often take under a minute—we'll get pileups on the top of the queue.

The goal

  • Each URL gets just enough annotations, so that we don't have duplication of effort.
  • It's not the end of the world if 2 or 3 people hit the endpoint at about the same time and get the same task; we shouldn't make this way slower or use an iron fist, just mitigate.

Requirements

  • Once a URL meets the acceptance threshold, it should be removed from the pile proactively rather than waiting for a refresh
  • Mitigate pileups
    • option a: randomize which URL from near the top of the pile comes back when you GET the next annotation; chaotic but maybe simple and effective
    • option b: create some kind of task_claims table; when an annotation task is claimed, add a timestamp; when 5 minutes have passed, the claim expires. Or, manually expire the claim when an annotation is submitted.
    • option c: some other strategy cleverer than my first two ideas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions