【English | Chinese | Japanese】
This tool is specifically designed for individuals who need to spend a significant amount of time proofreading the content of PDF files. It efficiently compares the differences between different PDF files. The sample comparison results generated by this tool allow for a quick identification of discrepancies in pixels and text between PDF files.
Sample Comparison Results:
This tool generates a three-row comparison sheet for each matched page pair. The top row highlights text differences on the left and right pages, the middle row overlays pixel-level image differences on both pages, and the bottom row shows the grayscale absolute difference together with its inverted view for easier inspection.
The tool will mark all recognizable text in the PDF with colored masks, where different colors have different meanings.
- Green: The word remains unchanged.
- Orange: The text matches, but its size and/or color changed.
- Red: The word is unmatched, added, or modified.
Please follow the steps below:
- Clone the GitHub Repository: Clone the repository using the following command:
git clone https://github.com/VintLin/pdf-comparator.git- Set up Python Environment: Open the "pdf-comparator" project directory and ensure you have Python 3.8 or higher. You can create and activate this environment using the following command, replacing "venv" with your preferred environment name:
cd pdf-comparator
python3 -m venv venv- Install Dependencies: Install the required dependencies by running the following command:
pip3 install -r requirements.txtPage rendering now uses pypdfium2, so no separate poppler installation is required.
- Run the Code Directly: Compare PDF files by running the following command:
python3 -m pdfcomparator "/compare_file_1.pdf" "/compare_file_2.pdf" "/result_folder/"- Build an Executable: You can also build an executable using cx-Freeze as needed (the executable can be found in "/build/" after a successful build):
python3 setup.py build- Run the Executable: Compare PDF files by running the following command with the executable:
./pdfcomparator.exe "/compare_file_1.pdf" "/compare_file_2.pdf" "/result_folder/"This program accepts the following command line arguments:
-
file1(required): Path to input file 1. Please provide the path to the first file you want to compare. -
file2(required): Path to input file 2. Please provide the path to the second file you want to compare. -
output_folder(required): Path to the output folder. Comparison results will be saved in this folder. -
--log-diror--cacheor-c: Optional directory used to writeapp.log.--cacheis kept as a legacy alias for compatibility.
Here are some usage examples:
# Perform comparison
python3 -m pdfcomparator file1.pdf file2.pdf output_folder/
# Perform comparison and write logs to a custom directory
python3 -m pdfcomparator file1.pdf file2.pdf output_folder/ --log-dir /path/to/logsMade with contrib.rocks.
- Source Code Licensing: Our project's source code is licensed under the MIT License. This license permits the use, modification, and distribution of the code, subject to certain conditions outlined in the MIT License.
- Project Open-Source Status: The project is indeed open-source; however, this designation is primarily intended for non-commercial purposes. While we encourage collaboration and contributions from the community for research and non-commercial applications, it is important to note that any utilization of the project's components for commercial purposes necessitates separate licensing agreements.
If you have any questions, feedback, or would like to get in touch, please feel free to reach out to us via email at vintonlin@gmail.com



