Skip to content

Fix PubSub.get_feed() exposing private reading log entries#12209

Closed
michelleallen122 wants to merge 3 commits intointernetarchive:masterfrom
michelleallen122:12196/hotfix/fix-get-feed-private-readlog-privacy
Closed

Fix PubSub.get_feed() exposing private reading log entries#12209
michelleallen122 wants to merge 3 commits intointernetarchive:masterfrom
michelleallen122:12196/hotfix/fix-get-feed-private-readlog-privacy

Conversation

@michelleallen122
Copy link
Copy Markdown

Fix filters usernames against each publisher's public_readlog preference (notifications.public_readlog == 'yes') before querying, matching the same check used in Bookshelves.patrons_who_also_read().

Fixes #12196

Closes #12196

Technical

get_feed() was querying bookshelves_books for all followed users without checking whether each publisher had their reading log set to public. This exposed reading activity for patrons who had set their log to private.

Testing

Tests added to openlibrary/tests/core/test_follows.py covering:

  • private publisher excluded from feed
  • public publisher included
  • all private returns empty feed
  • no subscriptions returns empty feed

Screenshot

Stakeholders

michelleallen122 and others added 2 commits March 29, 2026 07:14
get_feed() was querying bookshelves_books for all followed users
without checking whether each publisher had their reading log set
to public. This exposed reading activity for patrons who had set
their log to private.

Fix filters usernames against each publisher's public_readlog
preference (notifications.public_readlog == 'yes') before querying,
matching the same check used in Bookshelves.patrons_who_also_read().

Tests added to openlibrary/tests/core/test_follows.py covering:
- private publisher excluded from feed
- public publisher included
- all private returns empty feed
- no subscriptions returns empty feed

Fixes internetarchive#12196
@michelleallen122 michelleallen122 force-pushed the 12196/hotfix/fix-get-feed-private-readlog-privacy branch from 504b443 to abc924e Compare March 29, 2026 11:14
@mekarpeles mekarpeles requested a review from jimchamp March 30, 2026 20:06
@mekarpeles mekarpeles added the Priority: 1 Do this week, receiving emails, time sensitive, . [managed] label Mar 30, 2026
@mekarpeles mekarpeles requested review from Copilot and removed request for jimchamp March 30, 2026 20:07
Comment thread openlibrary/core/follows.py Outdated
return []

# Filter to only publishers with public reading logs
prefs = web.ctx.site.get_many([f'/people/{u}/preferences' for u in usernames])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferences is the old way we do this; we'll want to use the user.preferences function that uses the web.ctx.store under the hood

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a privacy leak in PubSub.get_feed() by ensuring only publishers with a public reading log are included when generating a subscriber’s feed.

Changes:

  • Add a public-reading-log preference filter step before querying bookshelves_books in PubSub.get_feed().
  • Add new unit tests covering inclusion/exclusion behavior for public vs private publishers and empty-feed cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
openlibrary/core/follows.py Filters followed publishers by reading-log visibility before fetching feed entries.
openlibrary/tests/core/test_follows.py Adds tests for PubSub.get_feed() privacy filtering and empty-feed behavior.

Comment thread openlibrary/core/follows.py Outdated
usernames = [
p.key.split('/')[2]
for p in prefs
if p.dict().get('notifications', {}).get('public_readlog') == 'yes'
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preference field lookup appears to be incorrect: user preferences store public_readlog at the top level (see openlibrary/core/models.py:949-987 and openlibrary/templates/account/privacy.html:18), not under notifications.public_readlog. As written, this filter will likely exclude all publishers and make the feed empty (and won’t actually enforce the intended privacy rule). Use p.dict().get('public_readlog', 'no') == 'yes' (or equivalent) when filtering publishers.

Suggested change
if p.dict().get('notifications', {}).get('public_readlog') == 'yes'
if p.dict().get('public_readlog', 'no') == 'yes'

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +38
@classmethod
def setup_class(cls):
web.config.db_parameters = {"dbn": "sqlite", "db": ":memory:"}
get_db().query(READING_LOG_DDL)

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test creates bookshelves_books with a custom DDL, but openlibrary.core.db.get_db() is globally cached (openlibrary/core/db.py:_get_db uses functools.cache). Other test modules also create bookshelves_books with a different schema (e.g. openlibrary/tests/core/test_db.py), which can cause "table already exists" errors or missing-column failures depending on collection order. Consider clearing the DB cache in setup/teardown (e.g. _get_db.cache_clear()), and/or using a setup which won’t collide with existing table definitions.

Copilot uses AI. Check for mistakes.
Comment thread openlibrary/tests/core/test_follows.py Outdated
"""Build a mock preferences object like web.ctx.site.get_many returns."""
p = MagicMock()
p.key = f"/people/{username}/preferences"
p.dict.return_value = {"notifications": {"public_readlog": "yes" if public else "no"}}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mocked preferences shape here doesn’t match how OL stores preferences: public_readlog is a top-level key on the preferences dict (see openlibrary/templates/account/privacy.html and openlibrary/core/models.py:973+). With the current mock (notifications.public_readlog), this test can pass while production code is checking the wrong field, so it won’t catch the real regression. Update the mock (and expectations) to reflect the real stored structure.

Suggested change
p.dict.return_value = {"notifications": {"public_readlog": "yes" if public else "no"}}
# In production, `public_readlog` is stored as a top-level preference key.
p.dict.return_value = {"public_readlog": "yes" if public else "no"}

Copilot uses AI. Check for mistakes.
- Replace web.ctx.site.get_many(...preferences) with
  web.ctx.site.get() + user.preferences() per mekarpeles' feedback;
  this uses web.ctx.store under the hood, matching the pattern used
  elsewhere in the codebase (mybooks.py, fastapi/public_my_books.py)
- Fix public_readlog lookup: was incorrectly nested under 'notifications',
  it is a top-level key on the preferences dict
- Fix test mock to reflect the real stored shape {'public_readlog': 'yes'}
  instead of {'notifications': {'public_readlog': 'yes'}}
- Add _get_db.cache_clear() in setup/teardown to avoid DB cache
  collisions with other test modules that also create bookshelves_books
@michelleallen122
Copy link
Copy Markdown
Author

michelleallen122 commented Mar 31, 2026

Hey @jimchamp, @mekarpeles i've made updates to the pr, in my latest commit.

@jimchamp
Copy link
Copy Markdown
Collaborator

@michelleallen122, these changes were not necessary (see comment). In the future, please wait until an issue is triaged by staff before working on a solution.

@jimchamp jimchamp closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority: 1 Do this week, receiving emails, time sensitive, . [managed]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PubSub.get_feed()` exposes reading log entries from patrons who have set their log to private

4 participants