Fix Goodreads importer by jimchamp · Pull Request #12185 · internetarchive/openlibrary

jimchamp · 2026-03-24T19:05:36Z

The Goodreads (sp?) import handler has been throwing errors at this line for all uploaded files:

openlibrary/openlibrary/plugins/upstream/account.py

Line 1335 in 1b97104

header = next(csv_file)

This branch initializes the csv value that is POSTed as a multipart file, and changes how that value is read downstream. This seemed to work in my local testing environment. I am unable to log into our testing environment for further testing.

It's unclear to me why this regression occurred. Test data that worked several months ago now trigger this error.

Technical

Testing

Screenshot

Stakeholders

Copilot

Pull request overview

Fixes the Goodreads CSV import flow in openlibrary/plugins/upstream/account.py by adjusting how the uploaded multipart CSV payload is retrieved and parsed.

Changes:

Passes a default csv field into web.input() for the Goodreads import POST handler.
Switches Goodreads CSV payload extraction from decode() to reading .value for non-str inputs.

Copilot · 2026-03-24T19:08:18Z

+        books, books_wo_isbns = process_goodreads_csv(web.input(csv={}))
        return render['account/import'](books, books_wo_isbns)




web.input(csv={}) sets the default for csv to an empty dict when the field is missing, but process_goodreads_csv() doesn't handle that shape and will raise 'dict' object has no attribute value' (or similar). Either don’t pass a dict default here, or update process_goodreads_csv() to explicitly handle the “no file uploaded” case (e.g., return a user-facing validation error / 400) before accessing .value/.file.

Suggested change

books, books_wo_isbns = process_goodreads_csv(web.input(csv={}))

return render['account/import'](books, books_wo_isbns)

i = web.input(csv=None)

csv_file = getattr(i, "csv", None)

if not csv_file or not hasattr(csv_file, "file"):

add_flash_message(

"error",

_("Please upload a valid Goodreads CSV export file before importing."),

)

raise web.seeother("/account/import")

books, books_wo_isbns = process_goodreads_csv(i)

return render["account/import"](books, books_wo_isbns)

Copilot · 2026-03-24T19:08:19Z

+    csv_payload = i.csv if isinstance(i.csv, str) else i.csv.value
    csv_file = csv.reader(csv_payload.splitlines(), delimiter=',', quotechar='"')


process_goodreads_csv() now assumes non-str inputs have a .value attribute. This breaks existing call sites/tests that pass bytes directly (see test_process_goodreads_csv_with_bytes) and may still yield bytes when i.csv is a multipart FieldStorage (so csv.reader(...) will error because it expects str, not bytes). Please handle all expected input types explicitly (FieldStorage/file-like, bytes, and str) and ensure the CSV payload is decoded to text before splitting/feeding into csv.reader.

jimchamp · 2026-03-26T23:03:27Z

Unit test TestGoodReadsImport.test_process_goodreads_csv_with_bytes is failing. I believe that this is a remnant from when OpenLibrary and WebPy were on Python 2, but if I'm wrong, then this is likely a regression. Somebody with more insight can let me know what should be done about the failing test.

mekarpeles · 2026-03-31T00:47:11Z

Copilot says:

The failure is in openlibrary/plugins/upstream/tests/test_account.py::TestGoodReadsImport::test_process_goodreads_csv_with_bytes:

AttributeError: 'bytes' object has no attribute 'value'

Root cause

process_goodreads_csv assumes anything that isn’t a str has a .value attribute:

Failing code: openlibrary/plugins/upstream/account.py at line 1332

openlibrary/openlibrary/plugins/upstream/account.py

Line 1332 in 39334d1

csv_payload = i.csv if isinstance(i.csv, str) else i.csv.value

def process_goodreads_csv(i):
    csv_payload = i.csv if isinstance(i.csv, str) else i.csv.value

But the test passes raw bytes:

Test: openlibrary/plugins/upstream/tests/test_account.py at lines 152–156

openlibrary/openlibrary/plugins/upstream/tests/test_account.py

Lines 152 to 156 in 39334d1

    
           def test_process_goodreads_csv_with_bytes(self): 
        
               # Note: In Python2, reading data as bytes returns a string, which should 
        
               # also be supported by account.process_goodreads_csv() 
        
               books, books_wo_isbns = account.process_goodreads_csv(web.storage({"csv": self.csv_data})) 
        
               assert books == self.expected_books

books, books_wo_isbns = account.process_goodreads_csv(web.storage({"csv": self.csv_data}))

So i.csv is bytes, not a web.py upload object, and has no .value.

Fix (make `process_goodreads_csv` accept bytes, str, and upload objects)

Update process_goodreads_csv to normalize the input:

diff --git a/openlibrary/plugins/upstream/account.py b/openlibrary/plugins/upstream/account.py
--- a/openlibrary/plugins/upstream/account.py
+++ b/openlibrary/plugins/upstream/account.py
@@ -1331,8 +1331,21 @@ def process_goodreads_csv(i):
-    csv_payload = i.csv if isinstance(i.csv, str) else i.csv.value
-    csv_file = csv.reader(csv_payload.splitlines(), delimiter=',', quotechar='"')
+    csv_obj = i.csv
+    # web.py file uploads may provide a .value (and sometimes .file) attribute.
+    if hasattr(csv_obj, "value"):
+        csv_obj = csv_obj.value
+    if isinstance(csv_obj, bytes):
+        # Goodreads exports are typically UTF-8; decode with replacement to be robust.
+        csv_payload = csv_obj.decode("utf-8", errors="replace")
+    elif isinstance(csv_obj, str):
+        csv_payload = csv_obj
+    else:
+        raise TypeError(f"Unsupported CSV payload type: {type(csv_obj)!r}")
+
+    csv_file = csv.reader(csv_payload.splitlines(), delimiter=",", quotechar='"')
     header = next(csv_file)
     books = {}
     books_wo_isbns = {}

Why this resolves the CI failure

test_process_goodreads_csv_with_bytes will pass because bytes are decoded to text before .splitlines().
test_process_goodreads_csv_with_utf8 will still pass because it already supplies str.
Real form uploads (which often provide .value) remain supported.

No workflow changes are needed; the python_tests job is correctly running make test-py and is failing purely due to this unit test/runtime type handling.

jimchamp · 2026-04-07T20:07:21Z

I understand why the test is failing. I still don't know if it makes sense for the POST handler to handler bytes.

I'll update the code to handle bytes, just understand that I do not know how to test this.

jimchamp · 2026-04-07T22:44:52Z

Have marked the failing test as xfail. If this handler does need to process bytes, we should see errors in Sentry. Otherwise, the test can be removed altogether.

* Update Goodreads importer to handle multipart file uploads * Mark test as expected failure

Update Goodreads importer to handle multipart file uploads

39334d1

Copilot AI review requested due to automatic review settings March 24, 2026 19:05

Copilot started reviewing on behalf of jimchamp March 24, 2026 19:06 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

mekarpeles self-assigned this Mar 31, 2026

mekarpeles added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Mar 31, 2026

Mark test as expected failure

197859a

github-actions Bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Apr 7, 2026

jimchamp added the Needs: Review This issue/PR needs to be reviewed in order to be closed or merged (see comments). [managed] label Apr 7, 2026

mekarpeles merged commit dee460b into internetarchive:master Apr 13, 2026
3 checks passed

jack-wines pushed a commit to jack-wines/openlibrary that referenced this pull request Apr 18, 2026

Fix Goodreads importer (internetarchive#12185)

f22018d

* Update Goodreads importer to handle multipart file uploads * Mark test as expected failure

jimchamp deleted the goodreads-importer branch April 23, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Goodreads importer#12185

Fix Goodreads importer#12185
mekarpeles merged 2 commits intointernetarchive:masterfrom
jimchamp:goodreads-importer

jimchamp commented Mar 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

jimchamp commented Mar 26, 2026 •

edited

Loading

Uh oh!

mekarpeles commented Mar 31, 2026

Uh oh!

jimchamp commented Apr 7, 2026

Uh oh!

jimchamp commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		books, books_wo_isbns = process_goodreads_csv(web.input(csv={}))
		return render['account/import'](books, books_wo_isbns)

-        books, books_wo_isbns = process_goodreads_csv(web.input(csv={}))
-        return render['account/import'](books, books_wo_isbns)
+        i = web.input(csv=None)
+        csv_file = getattr(i, "csv", None)
+        if not csv_file or not hasattr(csv_file, "file"):
+            add_flash_message(
+                "error",
+                _("Please upload a valid Goodreads CSV export file before importing."),
+            )
+            raise web.seeother("/account/import")
+        books, books_wo_isbns = process_goodreads_csv(i)
+        return render["account/import"](books, books_wo_isbns)

		csv_payload = i.csv if isinstance(i.csv, str) else i.csv.value
		csv_file = csv.reader(csv_payload.splitlines(), delimiter=',', quotechar='"')

Uh oh!

Conversation

jimchamp commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Technical

Testing

Screenshot

Stakeholders

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

jimchamp commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mekarpeles commented Mar 31, 2026

Root cause

Fix (make process_goodreads_csv accept bytes, str, and upload objects)

Why this resolves the CI failure

Uh oh!

jimchamp commented Apr 7, 2026

Uh oh!

jimchamp commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jimchamp commented Mar 24, 2026 •

edited

Loading

jimchamp commented Mar 26, 2026 •

edited

Loading

Fix (make `process_goodreads_csv` accept bytes, str, and upload objects)