tc-lib-pdf-parser

Parser library for reading and extracting PDF document structures.

If this library helps your analysis pipeline, please consider supporting development via PayPal.

Overview

tc-lib-pdf-parser parses raw PDF data into structured PHP arrays suitable for extraction, analysis, and downstream processing.

The parser is designed for tooling scenarios such as content inspection, metadata extraction, validation, and migration pipelines. It favors clear structured output so applications can build higher-level analysis features without depending on fragile regular-expression parsing.


Namespace	`\Com\Tecnick\Pdf\Parser`
Author	Nicola Asuni info@tecnick.com
License	GNU LGPL v3 - see LICENSE
API docs	https://tcpdf.org/docs/srcdoc/tc-lib-pdf-parser
Packagist	https://packagist.org/packages/tecnickcom/tc-lib-pdf-parser

Features

Parsing Capabilities

Cross-reference and object stream parsing
Filter-aware stream decoding integration
Structured output suitable for custom extractors

Runtime Design

Configuration options for tolerant parsing modes
Pure-PHP parser with no external service dependency
Typed exceptions for error handling

Requirements

PHP 8.1 or later
Extension: pcre
Composer

Installation

composer require tecnickcom/tc-lib-pdf-parser

Quick Start

<?php

require_once __DIR__ . '/vendor/autoload.php';

$raw = file_get_contents('/path/to/document.pdf');
$parser = new \Com\Tecnick\Pdf\Parser\Parser(['ignore_filter_errors' => true]);
$data = $parser->parse((string) $raw);

var_dump($data);

Development

make deps
make help
make qa

Packaging

make rpm
make deb

For system packages, bootstrap with:

require_once '/usr/share/php/Com/Tecnick/Pdf/Parser/autoload.php';

Contributing

Contributions are welcome. Please review CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.

Contact

Nicola Asuni - info@tecnick.com

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github		.github
example		example
resources		resources
src		src
test		test
.editorconfig		.editorconfig
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE		RELEASE
SECURITY.md		SECURITY.md
VERSION		VERSION
composer.json		composer.json
phpcompatinfo.json		phpcompatinfo.json
phpcs.xml		phpcs.xml
phpstan.neon		phpstan.neon
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tc-lib-pdf-parser

Overview

Features

Parsing Capabilities

Runtime Design

Requirements

Installation

Quick Start

Development

Packaging

Contributing

Contact

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

tc-lib-pdf-parser

Overview

Features

Parsing Capabilities

Runtime Design

Requirements

Installation

Quick Start

Development

Packaging

Contributing

Contact

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages