Skip to content

Add ability to import PDF#408

Draft
albertski wants to merge 1 commit intobasecamp:mainfrom
albertski:add-pdf-import
Draft

Add ability to import PDF#408
albertski wants to merge 1 commit intobasecamp:mainfrom
albertski:add-pdf-import

Conversation

@albertski
Copy link

Users can now import a PDF into a book directly from the table of contents toolbar. Each page of the PDF is imported as a separate textcpage, using the first line as the title.

Implementation:

  • PdfImporter model wrapping the pdf-reader gem
  • Books::ImportsController (new/create, editor-only)
  • Import button in the create buttons toolbar
  • Unit tests for PdfImporter and controller integration tests

Totally understand if this is something you don’t want to support.

Copilot AI review requested due to automatic review settings March 22, 2026 18:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a PDF import workflow for books so editors can upload a PDF from the TOC toolbar and have each PDF page imported as a separate text Page/Leaf.

Changes:

  • Introduces PdfImporter to parse PDFs (via pdf-reader) and create leaves/pages.
  • Adds Books::ImportsController + route + books/imports/new form to upload a PDF (editor-only).
  • Adds an “Import PDF” button to the existing create-buttons toolbar; adds unit/integration tests and PDF fixtures.

Reviewed changes

Copilot reviewed 8 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
app/models/pdf_importer.rb New importer that reads PDF pages and creates Leaf/Page records.
app/controllers/books/imports_controller.rb New editor-only endpoints to render the form and run the import.
app/views/books/imports/new.html.erb Upload form UI for selecting and importing a PDF.
app/views/books/_create_buttons.html.erb Adds a new toolbar button to access the import flow.
config/routes.rb Adds nested book import routes (new/create).
test/models/pdf_importer_test.rb Unit tests for importer behavior (titles, body content, skipping blank pages, malformed PDFs).
test/controllers/books/imports_controller_test.rb Integration tests for import endpoints + authorization.
test/fixtures/files/sample.pdf PDF fixture with extractable text across 2 pages.
test/fixtures/files/blank.pdf PDF fixture intended to have no extractable text.
Gemfile / Gemfile.lock Adds pdf-reader and its transitive dependencies.

[!TIP]
If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +23 to +30
<%= link_to new_book_import_path(book), class: "btn txt-medium", title: "Import PDF", aria: { label: "Import PDF" } do %>
<svg viewBox="0 0 20 24" xmlns="http://www.w3.org/2000/svg" fill="var(--color-ink)">
<path d="m15.8 21.7c0 .3-.2.4-.4.4h-13.5-.2v-16.9c0-.3.2-.4.4-.4h6.3c0-.6.1-1.2.3-1.8h-6.9c-1 0-1.8.8-1.8 1.8v17.5c0 1 .8 1.8 1.8 1.8h14c.9 0 1.6-.6 1.8-1.4v-11.6c-.5.2-1.1.4-1.8.5v10.3z"/>
<path fill="var(--color-positive)" d="m15 0c-2.8 0-5 2.2-5 5s2.2 5 5 5 5-2.2 5-5-2.2-5-5-5zm.7 7.5-2 2-2-2 1-1 .4.4v-2.9h1.1v2.9l.5-.4z"/>
<path d="m4.5 14h9v1.5h-9zm0 3h6v1.5h-6z"/>
</svg>
<span class="for-screen-reader">Import PDF</span>
<% end %>
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new "Import PDF" toolbar button is missing the disable-when-arranging / disable-when-deleting classes used by the other create buttons in this toolbar, so it will remain clickable during rearrange/delete modes while other actions are intentionally disabled. Consider reusing the same button classes/disable hooks for consistent behavior and styling.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +18
pages = PdfImporter.new(@book, params[:pdf].tempfile).import

if pages.any?
redirect_to book_slug_url(@book), notice: "Successfully imported #{pages.size} #{"page".pluralize(pages.size)} from PDF."
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PdfImporter#import returns Leaf records (as also asserted in PdfImporterTest), but this action stores the result in a variable named pages. Renaming this to something like imported_leaves (and adjusting the notice string accordingly) would make the code clearer and reduce confusion between Page (leafable) vs Leaf (wrapper).

Suggested change
pages = PdfImporter.new(@book, params[:pdf].tempfile).import
if pages.any?
redirect_to book_slug_url(@book), notice: "Successfully imported #{pages.size} #{"page".pluralize(pages.size)} from PDF."
imported_leaves = PdfImporter.new(@book, params[:pdf].tempfile).import
if imported_leaves.any?
redirect_to book_slug_url(@book), notice: "Successfully imported #{imported_leaves.size} #{"leaf".pluralize(imported_leaves.size)} from PDF."

Copilot uses AI. Check for mistakes.
render :new, status: :unprocessable_entity
end
rescue PdfImporter::InvalidPdfError => e
flash.now[:alert] = "Could not read PDF: #{e.message}"
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user-facing alert message includes the raw exception message from pdf-reader (Could not read PDF: #{e.message}). These messages can be noisy/inconsistent and may reveal implementation details; consider logging the exception details and showing a stable, user-friendly message instead (optionally with guidance like “Please verify the file is a valid, unencrypted PDF”).

Suggested change
flash.now[:alert] = "Could not read PDF: #{e.message}"
Rails.logger.warn("PDF import failed for book #{ @book&.id || 'unknown' }: #{e.class}: #{e.message}")
flash.now[:alert] = "Could not read PDF. Please verify the file is a valid, unencrypted PDF."

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +34
def extract_title(text, page_number)
first_line = text.lines.first&.strip
if first_line.present? && first_line.length.between?(3, 100)
first_line
else
"Page #{page_number}"
end
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_title uses hard-coded title length limits (between?(3, 100)). To make the intent clearer and future changes safer, consider extracting these numbers into named constants (e.g., MIN_TITLE_LENGTH, MAX_TITLE_LENGTH) or documenting why those bounds were chosen.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +25
def import
imported = []

@reader.pages.each_with_index do |page, index|
text = page.text.strip
next if text.blank?

title = extract_title(text, index + 1)
body = normalize_body(text)

imported << @book.press(Page.new(body: body), title: title)
end

imported
end
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import creates multiple records (one per PDF page) but does not run inside a transaction. If an exception occurs midway (e.g., validation/database error), the book can end up partially imported. Wrapping the per-page creation loop in a single transaction (or otherwise ensuring all-or-nothing behavior) would keep imports consistent.

Copilot uses AI. Check for mistakes.
@albertski albertski marked this pull request as draft March 23, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants