title	Getting started
description	Parse your first PDF with Parxy, explore the unified Document model, and learn how to extract plain text and Markdown from any supported driver.

Getting Started with Parxy

Parxy is a unified Python interface for document parsing.

This tutorial will guide you step-by-step through:

Parsing your first document with Parxy
Loading a PDF and extracting its text
Understanding the unified Document model returned by all parsers

What You'll Learn

By the end of this tutorial, you'll be able to:

Install and use Parxy as a Python library
Parse documents with a single function call
Access structured text through Parxy's unified data model
Convert parsed documents to plain text or Markdown

Installation

Install Parxy from PyPI (or your development package):

pip install parxy

You can also install optional parser backends depending on your needs (e.g. PyMuPDF, Unstructured, LlamaParse):

pip install parxy[llama]

Step 1 — Parse Your First Document

Let's start by parsing a simple PDF.

The easiest way is to use the Parxy.parse() method, which automatically selects the default parser (usually pymupdf).

from parxy_core.facade.parxy import Parxy

# Parse a document from a local file path
doc = Parxy.parse("samples/example.pdf")

# Print basic information
print(f"Pages: {len(doc.pages)}")
print(f"Title: {doc.metadata.title}")

You can also specify a parser explicitly:

doc = Parxy.parse("samples/example.pdf", driver_name=Parxy.PYMUPDF)

Or even pass an in-memory file:

import io

with open("samples/example.pdf", "rb") as f:
    pdf_bytes = io.BytesIO(f.read())

doc = Parxy.parse(pdf_bytes)

Each parser requires a configurations that can be specified through enviroment variables. Refers to config.py for details.

Step 2 — Extract Text

Once parsed, the returned object is a Document model — a structured representation of your file.

You can access its text content in different ways:

Get all text as a single string

text = doc.text()
print(text[:500])  # print first 500 characters

Convert the document to Markdown

markdown = doc.markdown()
print(markdown[:500])

This method preserves headings, paragraphs, and lists (when identified by the parser).

Step 3 — Explore the Unified Document Model

Every parser in Parxy returns the same structure, built with Pydantic:

Document
 ├── Metadata
 ├── Page[]
 │   ├── TextBlock[]
 │   │   ├── Line[]
 │   │   │   ├── Span[]
 │   │   │   │   ├── Character[]
 │   │   │   │   └── ...
 │   ├── ImageBlock[]
 │   └── TableBlock[]
 └── Outline[]

Example:

page = doc.pages[0]
first_block = page.blocks[0]

print(first_block.text)
print(first_block.bbox)
print(first_block.category)

What Happens Under the Hood

When you call:

doc = Parxy.parse("file.pdf")

Parxy performs the following steps:

Initializes a singleton DriverFactory
Selects the appropriate driver (e.g. PyMuPDF)
Invokes the driver's .parse() method
Returns a normalized Document object with consistent structure

This means you can switch parsers (e.g., from PyMuPDF to LlamaParse) without changing how you handle the output.

Summary

In this tutorial you:

Installed and imported Parxy
Parsed a document with a single line of code
Extracted text and Markdown
Explored the unified document model

You're now ready to try more advanced use cases, such as:

Using Parxy from the command line
Processing multiple documents in parallel
Extending Parxy with a custom driver
Monitoring document processing with OpenTelemetry
Comparing different parsers on the same document

Tip

If your parsed text seems incomplete or misaligned, try a different driver:

doc = Parxy.parse("file.pdf", driver_name=Parxy.UNSTRUCTURED_LIBRARY)

Each backend may specialize in different document types.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started with Parxy

What You'll Learn

Installation

Step 1 — Parse Your First Document

Step 2 — Extract Text

Step 3 — Explore the Unified Document Model

What Happens Under the Hood

Summary

FilesExpand file tree

getting_started.md

Latest commit

History

getting_started.md

File metadata and controls

Getting Started with Parxy

What You'll Learn

Installation

Step 1 — Parse Your First Document

Step 2 — Extract Text

Step 3 — Explore the Unified Document Model

What Happens Under the Hood

Summary