This project demonstrates a text classification pipeline using BERT, PyTorch, and ONNX. It includes data preprocessing, model training, evaluation, and exporting the trained model to ONNX format.
- Text classification using BERT
- Data preprocessing and cleaning
- Model training and evaluation
- Exporting the trained model to ONNX format
- Python 3.9 or later
- Install dependencies using the provided
requirements.txtfile:
py -m pip install -r requirements.txtsample_script.py: Main script for training and evaluating the model.create_json.py: Script to generate a sample JSON dataset (L1Files.json).L1Files.json: Sample dataset for training and testing.requirements.txt: List of required Python packages.
-
Generate the Sample Dataset: Run the
create_json.pyscript to create theL1Files.jsonfile:py create_json.py
-
Train and Evaluate the Model: Run the
sample_script.pyscript:py sample_script.py
Follow the prompts to load a saved model or train a new one. The script will output the model's accuracy, classification report, and confusion matrix.
-
Export the Model to ONNX: If you choose to save the trained model, it will also be exported to ONNX format as
bert_sequence_classification.onnx.
- The sample dataset (
L1Files.json) is small and intended for demonstration purposes. Replace it with a larger, real-world dataset for better results. - Ensure that the
transformersandtorchlibraries are compatible with your Python version.
This project is licensed under the MIT License.