Fix PubSub.get_feed() exposing private reading log entries#12209
Fix PubSub.get_feed() exposing private reading log entries#12209michelleallen122 wants to merge 3 commits intointernetarchive:masterfrom
Conversation
get_feed() was querying bookshelves_books for all followed users without checking whether each publisher had their reading log set to public. This exposed reading activity for patrons who had set their log to private. Fix filters usernames against each publisher's public_readlog preference (notifications.public_readlog == 'yes') before querying, matching the same check used in Bookshelves.patrons_who_also_read(). Tests added to openlibrary/tests/core/test_follows.py covering: - private publisher excluded from feed - public publisher included - all private returns empty feed - no subscriptions returns empty feed Fixes internetarchive#12196
for more information, see https://pre-commit.ci
504b443 to
abc924e
Compare
| return [] | ||
|
|
||
| # Filter to only publishers with public reading logs | ||
| prefs = web.ctx.site.get_many([f'/people/{u}/preferences' for u in usernames]) |
There was a problem hiding this comment.
Preferences is the old way we do this; we'll want to use the user.preferences function that uses the web.ctx.store under the hood
There was a problem hiding this comment.
Pull request overview
Fixes a privacy leak in PubSub.get_feed() by ensuring only publishers with a public reading log are included when generating a subscriber’s feed.
Changes:
- Add a public-reading-log preference filter step before querying
bookshelves_booksinPubSub.get_feed(). - Add new unit tests covering inclusion/exclusion behavior for public vs private publishers and empty-feed cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
openlibrary/core/follows.py |
Filters followed publishers by reading-log visibility before fetching feed entries. |
openlibrary/tests/core/test_follows.py |
Adds tests for PubSub.get_feed() privacy filtering and empty-feed behavior. |
| usernames = [ | ||
| p.key.split('/')[2] | ||
| for p in prefs | ||
| if p.dict().get('notifications', {}).get('public_readlog') == 'yes' |
There was a problem hiding this comment.
The preference field lookup appears to be incorrect: user preferences store public_readlog at the top level (see openlibrary/core/models.py:949-987 and openlibrary/templates/account/privacy.html:18), not under notifications.public_readlog. As written, this filter will likely exclude all publishers and make the feed empty (and won’t actually enforce the intended privacy rule). Use p.dict().get('public_readlog', 'no') == 'yes' (or equivalent) when filtering publishers.
| if p.dict().get('notifications', {}).get('public_readlog') == 'yes' | |
| if p.dict().get('public_readlog', 'no') == 'yes' |
| @classmethod | ||
| def setup_class(cls): | ||
| web.config.db_parameters = {"dbn": "sqlite", "db": ":memory:"} | ||
| get_db().query(READING_LOG_DDL) | ||
|
|
There was a problem hiding this comment.
This test creates bookshelves_books with a custom DDL, but openlibrary.core.db.get_db() is globally cached (openlibrary/core/db.py:_get_db uses functools.cache). Other test modules also create bookshelves_books with a different schema (e.g. openlibrary/tests/core/test_db.py), which can cause "table already exists" errors or missing-column failures depending on collection order. Consider clearing the DB cache in setup/teardown (e.g. _get_db.cache_clear()), and/or using a setup which won’t collide with existing table definitions.
| """Build a mock preferences object like web.ctx.site.get_many returns.""" | ||
| p = MagicMock() | ||
| p.key = f"/people/{username}/preferences" | ||
| p.dict.return_value = {"notifications": {"public_readlog": "yes" if public else "no"}} |
There was a problem hiding this comment.
The mocked preferences shape here doesn’t match how OL stores preferences: public_readlog is a top-level key on the preferences dict (see openlibrary/templates/account/privacy.html and openlibrary/core/models.py:973+). With the current mock (notifications.public_readlog), this test can pass while production code is checking the wrong field, so it won’t catch the real regression. Update the mock (and expectations) to reflect the real stored structure.
| p.dict.return_value = {"notifications": {"public_readlog": "yes" if public else "no"}} | |
| # In production, `public_readlog` is stored as a top-level preference key. | |
| p.dict.return_value = {"public_readlog": "yes" if public else "no"} |
- Replace web.ctx.site.get_many(...preferences) with
web.ctx.site.get() + user.preferences() per mekarpeles' feedback;
this uses web.ctx.store under the hood, matching the pattern used
elsewhere in the codebase (mybooks.py, fastapi/public_my_books.py)
- Fix public_readlog lookup: was incorrectly nested under 'notifications',
it is a top-level key on the preferences dict
- Fix test mock to reflect the real stored shape {'public_readlog': 'yes'}
instead of {'notifications': {'public_readlog': 'yes'}}
- Add _get_db.cache_clear() in setup/teardown to avoid DB cache
collisions with other test modules that also create bookshelves_books
|
Hey @jimchamp, @mekarpeles i've made updates to the pr, in my latest commit. |
|
@michelleallen122, these changes were not necessary (see comment). In the future, please wait until an issue is triaged by staff before working on a solution. |
Fix filters usernames against each publisher's public_readlog preference (notifications.public_readlog == 'yes') before querying, matching the same check used in Bookshelves.patrons_who_also_read().
Fixes #12196
Closes #12196
Technical
get_feed() was querying bookshelves_books for all followed users without checking whether each publisher had their reading log set to public. This exposed reading activity for patrons who had set their log to private.
Testing
Tests added to openlibrary/tests/core/test_follows.py covering:
Screenshot
Stakeholders