Skip to content

Fix Goodreads importer#12185

Merged
mekarpeles merged 2 commits intointernetarchive:masterfrom
jimchamp:goodreads-importer
Apr 13, 2026
Merged

Fix Goodreads importer#12185
mekarpeles merged 2 commits intointernetarchive:masterfrom
jimchamp:goodreads-importer

Conversation

@jimchamp
Copy link
Copy Markdown
Collaborator

@jimchamp jimchamp commented Mar 24, 2026

Closes #12253

The Goodreads (sp?) import handler has been throwing errors at this line for all uploaded files:

header = next(csv_file)

This branch initializes the csv value that is POSTed as a multipart file, and changes how that value is read downstream. This seemed to work in my local testing environment. I am unable to log into our testing environment for further testing.

It's unclear to me why this regression occurred. Test data that worked several months ago now trigger this error.

Technical

Testing

Screenshot

Stakeholders

Copilot AI review requested due to automatic review settings March 24, 2026 19:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the Goodreads CSV import flow in openlibrary/plugins/upstream/account.py by adjusting how the uploaded multipart CSV payload is retrieved and parsed.

Changes:

  • Passes a default csv field into web.input() for the Goodreads import POST handler.
  • Switches Goodreads CSV payload extraction from decode() to reading .value for non-str inputs.

Comment on lines +811 to 814
books, books_wo_isbns = process_goodreads_csv(web.input(csv={}))
return render['account/import'](books, books_wo_isbns)


Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

web.input(csv={}) sets the default for csv to an empty dict when the field is missing, but process_goodreads_csv() doesn't handle that shape and will raise 'dict' object has no attribute value' (or similar). Either don’t pass a dict default here, or update process_goodreads_csv() to explicitly handle the “no file uploaded” case (e.g., return a user-facing validation error / 400) before accessing .value/.file.

Suggested change
books, books_wo_isbns = process_goodreads_csv(web.input(csv={}))
return render['account/import'](books, books_wo_isbns)
i = web.input(csv=None)
csv_file = getattr(i, "csv", None)
if not csv_file or not hasattr(csv_file, "file"):
add_flash_message(
"error",
_("Please upload a valid Goodreads CSV export file before importing."),
)
raise web.seeother("/account/import")
books, books_wo_isbns = process_goodreads_csv(i)
return render["account/import"](books, books_wo_isbns)

Copilot uses AI. Check for mistakes.
Comment on lines +1332 to 1333
csv_payload = i.csv if isinstance(i.csv, str) else i.csv.value
csv_file = csv.reader(csv_payload.splitlines(), delimiter=',', quotechar='"')
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_goodreads_csv() now assumes non-str inputs have a .value attribute. This breaks existing call sites/tests that pass bytes directly (see test_process_goodreads_csv_with_bytes) and may still yield bytes when i.csv is a multipart FieldStorage (so csv.reader(...) will error because it expects str, not bytes). Please handle all expected input types explicitly (FieldStorage/file-like, bytes, and str) and ensure the CSV payload is decoded to text before splitting/feeding into csv.reader.

Copilot uses AI. Check for mistakes.
@jimchamp
Copy link
Copy Markdown
Collaborator Author

jimchamp commented Mar 26, 2026

Unit test TestGoodReadsImport.test_process_goodreads_csv_with_bytes is failing. I believe that this is a remnant from when OpenLibrary and WebPy were on Python 2, but if I'm wrong, then this is likely a regression. Somebody with more insight can let me know what should be done about the failing test.

@mekarpeles
Copy link
Copy Markdown
Member

Copilot says:

The failure is in openlibrary/plugins/upstream/tests/test_account.py::TestGoodReadsImport::test_process_goodreads_csv_with_bytes:

AttributeError: 'bytes' object has no attribute 'value'

Root cause

process_goodreads_csv assumes anything that isn’t a str has a .value attribute:

def process_goodreads_csv(i):
    csv_payload = i.csv if isinstance(i.csv, str) else i.csv.value

But the test passes raw bytes:

  • Test: openlibrary/plugins/upstream/tests/test_account.py at lines 152–156
    def test_process_goodreads_csv_with_bytes(self):
    # Note: In Python2, reading data as bytes returns a string, which should
    # also be supported by account.process_goodreads_csv()
    books, books_wo_isbns = account.process_goodreads_csv(web.storage({"csv": self.csv_data}))
    assert books == self.expected_books
books, books_wo_isbns = account.process_goodreads_csv(web.storage({"csv": self.csv_data}))

So i.csv is bytes, not a web.py upload object, and has no .value.

Fix (make process_goodreads_csv accept bytes, str, and upload objects)

Update process_goodreads_csv to normalize the input:

diff --git a/openlibrary/plugins/upstream/account.py b/openlibrary/plugins/upstream/account.py
--- a/openlibrary/plugins/upstream/account.py
+++ b/openlibrary/plugins/upstream/account.py
@@ -1331,8 +1331,21 @@ def process_goodreads_csv(i):
-    csv_payload = i.csv if isinstance(i.csv, str) else i.csv.value
-    csv_file = csv.reader(csv_payload.splitlines(), delimiter=',', quotechar='"')
+    csv_obj = i.csv
+    # web.py file uploads may provide a .value (and sometimes .file) attribute.
+    if hasattr(csv_obj, "value"):
+        csv_obj = csv_obj.value
+    if isinstance(csv_obj, bytes):
+        # Goodreads exports are typically UTF-8; decode with replacement to be robust.
+        csv_payload = csv_obj.decode("utf-8", errors="replace")
+    elif isinstance(csv_obj, str):
+        csv_payload = csv_obj
+    else:
+        raise TypeError(f"Unsupported CSV payload type: {type(csv_obj)!r}")
+
+    csv_file = csv.reader(csv_payload.splitlines(), delimiter=",", quotechar='"')
     header = next(csv_file)
     books = {}
     books_wo_isbns = {}

Why this resolves the CI failure

  • test_process_goodreads_csv_with_bytes will pass because bytes are decoded to text before .splitlines().
  • test_process_goodreads_csv_with_utf8 will still pass because it already supplies str.
  • Real form uploads (which often provide .value) remain supported.

No workflow changes are needed; the python_tests job is correctly running make test-py and is failing purely due to this unit test/runtime type handling.

@mekarpeles mekarpeles self-assigned this Mar 31, 2026
@mekarpeles mekarpeles added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Mar 31, 2026
@jimchamp
Copy link
Copy Markdown
Collaborator Author

jimchamp commented Apr 7, 2026

I understand why the test is failing. I still don't know if it makes sense for the POST handler to handler bytes.

I'll update the code to handle bytes, just understand that I do not know how to test this.

@github-actions github-actions Bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Apr 7, 2026
@jimchamp jimchamp added the Needs: Review This issue/PR needs to be reviewed in order to be closed or merged (see comments). [managed] label Apr 7, 2026
@jimchamp
Copy link
Copy Markdown
Collaborator Author

jimchamp commented Apr 7, 2026

Have marked the failing test as xfail. If this handler does need to process bytes, we should see errors in Sentry. Otherwise, the test can be removed altogether.

@mekarpeles mekarpeles merged commit dee460b into internetarchive:master Apr 13, 2026
3 checks passed
jack-wines pushed a commit to jack-wines/openlibrary that referenced this pull request Apr 18, 2026
* Update Goodreads importer to handle multipart file uploads
* Mark test as expected failure
@jimchamp jimchamp deleted the goodreads-importer branch April 23, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs: Review This issue/PR needs to be reviewed in order to be closed or merged (see comments). [managed]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Goodreads imports failing for all

3 participants