Parser library for reading and extracting PDF document structures.
If this library helps your analysis pipeline, please consider supporting development via PayPal.
tc-lib-pdf-parser parses raw PDF data into structured PHP arrays suitable for extraction, analysis, and downstream processing.
The parser is designed for tooling scenarios such as content inspection, metadata extraction, validation, and migration pipelines. It favors clear structured output so applications can build higher-level analysis features without depending on fragile regular-expression parsing.
| Namespace | \Com\Tecnick\Pdf\Parser |
| Author | Nicola Asuni info@tecnick.com |
| License | GNU LGPL v3 - see LICENSE |
| API docs | https://tcpdf.org/docs/srcdoc/tc-lib-pdf-parser |
| Packagist | https://packagist.org/packages/tecnickcom/tc-lib-pdf-parser |
- Cross-reference and object stream parsing
- Filter-aware stream decoding integration
- Structured output suitable for custom extractors
- Configuration options for tolerant parsing modes
- Pure-PHP parser with no external service dependency
- Typed exceptions for error handling
- PHP 8.1 or later
- Extension:
pcre - Composer
composer require tecnickcom/tc-lib-pdf-parser<?php
require_once __DIR__ . '/vendor/autoload.php';
$raw = file_get_contents('/path/to/document.pdf');
$parser = new \Com\Tecnick\Pdf\Parser\Parser(['ignore_filter_errors' => true]);
$data = $parser->parse((string) $raw);
var_dump($data);make deps
make help
make qamake rpm
make debFor system packages, bootstrap with:
require_once '/usr/share/php/Com/Tecnick/Pdf/Parser/autoload.php';Contributions are welcome. Please review CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.
Nicola Asuni - info@tecnick.com