Skip to content

VintLin/pdf-comparator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf-comparator

【English | Chinese | Japanese

📖 Overview

This tool is specifically designed for individuals who need to spend a significant amount of time proofreading the content of PDF files. It efficiently compares the differences between different PDF files. The sample comparison results generated by this tool allow for a quick identification of discrepancies in pixels and text between PDF files.

Sample Comparison Results:

❓ What Can PDF Comparator Do?

1. Image Difference Comparison

This tool generates a three-row comparison sheet for each matched page pair. The top row highlights text differences on the left and right pages, the middle row overlays pixel-level image differences on both pages, and the bottom row shows the grayscale absolute difference together with its inverted view for easier inspection.

2. Text Difference Comparison

The tool will mark all recognizable text in the PDF with colored masks, where different colors have different meanings.

  • Green: The word remains unchanged.
  • Orange: The text matches, but its size and/or color changed.
  • Red: The word is unmatched, added, or modified.

🖥️ Quick Start

Please follow the steps below:

  1. Clone the GitHub Repository: Clone the repository using the following command:
git clone https://github.com/VintLin/pdf-comparator.git
  1. Set up Python Environment: Open the "pdf-comparator" project directory and ensure you have Python 3.8 or higher. You can create and activate this environment using the following command, replacing "venv" with your preferred environment name:
cd pdf-comparator
python3 -m venv venv
  1. Install Dependencies: Install the required dependencies by running the following command:
pip3 install -r requirements.txt

Page rendering now uses pypdfium2, so no separate poppler installation is required.

  1. Run the Code Directly: Compare PDF files by running the following command:
python3 -m pdfcomparator "/compare_file_1.pdf" "/compare_file_2.pdf" "/result_folder/"
  1. Build an Executable: You can also build an executable using cx-Freeze as needed (the executable can be found in "/build/" after a successful build):
python3 setup.py build
  1. Run the Executable: Compare PDF files by running the following command with the executable:
./pdfcomparator.exe "/compare_file_1.pdf" "/compare_file_2.pdf" "/result_folder/"

Command Line Argument Usage

This program accepts the following command line arguments:

  • file1 (required): Path to input file 1. Please provide the path to the first file you want to compare.

  • file2 (required): Path to input file 2. Please provide the path to the second file you want to compare.

  • output_folder (required): Path to the output folder. Comparison results will be saved in this folder.

  • --log-dir or --cache or -c: Optional directory used to write app.log. --cache is kept as a legacy alias for compatibility.

Examples

Here are some usage examples:

# Perform comparison
python3 -m pdfcomparator file1.pdf file2.pdf output_folder/

# Perform comparison and write logs to a custom directory
python3 -m pdfcomparator file1.pdf file2.pdf output_folder/ --log-dir /path/to/logs

👨‍💻‍ Contributors

Made with contrib.rocks.

⚖️ License

  • Source Code Licensing: Our project's source code is licensed under the MIT License. This license permits the use, modification, and distribution of the code, subject to certain conditions outlined in the MIT License.
  • Project Open-Source Status: The project is indeed open-source; however, this designation is primarily intended for non-commercial purposes. While we encourage collaboration and contributions from the community for research and non-commercial applications, it is important to note that any utilization of the project's components for commercial purposes necessitates separate licensing agreements.

🌟 Star History

Star History Chart

📬 Contact

If you have any questions, feedback, or would like to get in touch, please feel free to reach out to us via email at vintonlin@gmail.com

About

This tool is used to compare PDF files. With the pdf-comparator, you can streamline your PDF document analysis process and ensure the accuracy and consistency of your documents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages