Skip to content

Implement Annotation Queue (CQRS) #602

@maxachis

Description

@maxachis

#601 provided substantive increases in annotation performance, at the cost of stale data.

However, our esteemed @labradorite-dev has proposed a solution which could resolve this issue in turn!

I would be remiss to not include the other, much larger refactor I considered which is to implement proper CQRS. The core of the issue here is that data is prioritized for writes, NOT reads. Then, at query time we pay the price and have to stitch all the data together. Materialized views shift that data stitching slightly left (once a day in a separate job), but a potentially more "complete" fix would be to add a new, fully denormalized table which looks like this:

-- read model, updated on every write command
annotation_queue (
    url_id        PRIMARY KEY,
    priority_rank INT,  -- pre-computed: manual > followed > count > id
    total_count   INT,
    is_manual     BOOL,
    followed_by_user BOOL,
    ...
)

Then at each time we write a new annotation to be consumed by annotators, we also populate this table correctly. Then, at query time we have a primary key lookup (extremely cheap): SELECT * FROM annotation_queue ORDER BY priority_rank LIMIT 1.

This seems nifty, and so I propose we implement it!

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Awaiting Dev

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions