Skip to content

Commit dff0f43

Browse files
committed
Initial commit
0 parents  commit dff0f43

39 files changed

Lines changed: 3659 additions & 0 deletions

.github/workflows/ci.yml

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
name: CI Pipeline
2+
3+
on:
4+
push:
5+
branches: [ master ]
6+
pull_request:
7+
branches: [ master ]
8+
9+
jobs:
10+
pipeline:
11+
name: Test Python ${{ matrix.python-version }}
12+
runs-on: ubuntu-latest
13+
strategy:
14+
fail-fast: false
15+
matrix:
16+
python-version: ["3.9", "3.10", "3.11", "3.12"]
17+
steps:
18+
#----------------------------------------------
19+
# check-out repo and set-up python
20+
#----------------------------------------------
21+
- name: Checkout repository
22+
uses: actions/checkout@v4
23+
- name: Set up Python ${{ matrix.python-version }}
24+
id: setup-python
25+
uses: actions/setup-python@v5
26+
with:
27+
python-version: ${{ matrix.python-version }}
28+
cache: 'pip'
29+
30+
#----------------------------------------------
31+
# ----- install & configure poetry -----
32+
#----------------------------------------------
33+
- name: Install Poetry
34+
uses: abatilo/actions-poetry@v4
35+
- name: Configure Poetry virtual environments
36+
run: |
37+
poetry config virtualenvs.create true --local
38+
poetry config virtualenvs.in-project true --local
39+
40+
#----------------------------------------------
41+
# load cached venv if cache exists
42+
#----------------------------------------------
43+
- name: Cache Poetry virtual environment
44+
uses: actions/cache@v4
45+
id: cached-poetry-dependencies
46+
with:
47+
path: |
48+
.venv
49+
~/.cache/pypoetry
50+
key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('poetry.lock', 'pyproject.toml') }}
51+
restore-keys: |
52+
poetry-${{ runner.os }}-py${{ matrix.python-version }}-
53+
54+
#----------------------------------------------
55+
# install dependencies if cache does not exist
56+
#----------------------------------------------
57+
- name: Install Poetry dependencies
58+
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
59+
run: poetry install --no-interaction --no-root
60+
61+
- name: Install project
62+
run: poetry install --no-interaction
63+
64+
#----------------------------------------------
65+
# run test suite
66+
#----------------------------------------------
67+
- name: Run test suite with coverage
68+
run: poetry run pytest tests/ -v --cov=anyparser_core --cov-report=term-missing --cov-fail-under=100
69+
70+
#----------------------------------------------
71+
# format checking
72+
#----------------------------------------------
73+
- name: Check code formatting with Black
74+
run: poetry run black ./ --check

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
__pycache__
2+
.*
3+
!.gitignore
4+
!.vscode
5+
!.github
6+
dist
7+
htmlcov
8+
!.github

.vscode/extensions.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
"recommendations": [
3+
"ms-python.isort",
4+
"ms-python.black-formatter",
5+
"ms-vscode.cmake-tools",
6+
"twxs.cmake",
7+
"anysphere.pyright",
8+
"ms-python.vscode-pylance",
9+
"ms-python.python",
10+
"ms-python.debugpy"
11+
]
12+
}

.vscode/settings.json

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
{
2+
"editor.tabSize": 4,
3+
"editor.foldingStrategy": "indentation",
4+
"editor.insertSpaces": true,
5+
"editor.detectIndentation": false,
6+
"editor.autoIndent": "none",
7+
"editor.formatOnSave": true,
8+
"editor.wordWrap": "bounded",
9+
"search.exclude": {
10+
"**/__pycache__": true,
11+
"**/.pytest_cache": true,
12+
"**/.mypy_cache": true
13+
},
14+
"files.exclude": {
15+
"**/__pycache__": true,
16+
"**/.pytest_cache": true,
17+
"**/.mypy_cache": true
18+
},
19+
"files.associations": {
20+
"*.toml": "ini"
21+
},
22+
"[python]": {
23+
"editor.defaultFormatter": "ms-python.black-formatter",
24+
"editor.formatOnSave": true,
25+
"editor.codeActionsOnSave": {
26+
"source.organizeImports": "explicit"
27+
},
28+
},
29+
"isort.args": [
30+
"--profile",
31+
"black"
32+
],
33+
"editor.padding.top": 12,
34+
"terminal.integrated.scrollback": 100000,
35+
"terminal.integrated.fontSize": 14,
36+
"problems.defaultViewMode": "table",
37+
"editor.wordWrapColumn": 150,
38+
"workbench.sideBar.location": "right",
39+
"editor.mouseWheelZoom": true,
40+
"explorer.compactFolders": false
41+
}

CONTRIBUTING.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# Contributing to Anyparser Core
2+
3+
First off, thank you for considering contributing to Anyparser Core! It's people like you that make Anyparser Core such a great tool for AI data preparation.
4+
5+
## Code of Conduct
6+
7+
By participating in this project, you are expected to uphold our Code of Conduct:
8+
9+
- Use welcoming and inclusive language
10+
- Be respectful of differing viewpoints and experiences
11+
- Gracefully accept constructive criticism
12+
- Focus on what is best for the community
13+
- Show empathy towards other community members
14+
15+
## How Can I Contribute?
16+
17+
### Reporting Bugs
18+
19+
Before creating bug reports, please check the issue list as you might find out that you don't need to create one. When you are creating a bug report, please include as many details as possible:
20+
21+
* **Use a clear and descriptive title**
22+
* **Describe the exact steps which reproduce the problem**
23+
* **Provide specific examples to demonstrate the steps**
24+
* **Describe the behavior you observed after following the steps**
25+
* **Explain which behavior you expected to see instead and why**
26+
* **Include any error messages or stack traces**
27+
28+
> **Note:** When reporting bugs, do not include any sensitive information or API keys.
29+
30+
### Suggesting Enhancements
31+
32+
Enhancement suggestions are tracked as GitHub issues. When creating an enhancement suggestion, please include:
33+
34+
* **Use a clear and descriptive title**
35+
* **Provide a step-by-step description of the suggested enhancement**
36+
* **Provide specific examples to demonstrate the steps**
37+
* **Describe the current behavior and explain the behavior you expected to see**
38+
* **Explain why this enhancement would be useful for AI data preparation**
39+
40+
### Pull Requests
41+
42+
* Fork the repo and create your branch from `main`
43+
* If you've added code that should be tested, add tests
44+
* If you've changed APIs, update the documentation
45+
* Ensure the test suite passes
46+
* Make sure your code lints
47+
* Issue that pull request!
48+
49+
## Development Process
50+
51+
1. Fork the repository
52+
2. Create a new branch for your feature or bugfix: `git checkout -b feature-name`
53+
3. Make your changes
54+
4. Write or update tests as needed
55+
5. Run the test suite
56+
6. Push to your fork and submit a pull request
57+
58+
### Setting Up Development Environment
59+
60+
```bash
61+
# Clone your fork
62+
git clone https://github.com/your-username/anyparser_core.git
63+
cd anyparser_core
64+
65+
# Prerequisites
66+
# Make sure you have Poetry installed on your system
67+
# Visit https://python-poetry.org/docs for installation instructions
68+
69+
# Install dependencies (including dev dependencies)
70+
make install-dev
71+
72+
# Or alternatively using Poetry directly:
73+
poetry install --with dev
74+
```
75+
76+
### Running Tests
77+
78+
```bash
79+
# Run tests with verbose output
80+
make test
81+
82+
# Run tests with coverage report
83+
make coverage
84+
85+
# View coverage report in browser
86+
make coverage-view
87+
```
88+
89+
### Code Style
90+
91+
We use the following tools to maintain code quality:
92+
93+
* **Black** for code formatting
94+
95+
Please ensure your code passes all linting checks:
96+
97+
```bash
98+
# Format code with Black
99+
make lint
100+
```
101+
102+
## Documentation
103+
104+
* Keep docstrings up to date
105+
* Follow Google-style docstring format
106+
* Update README.md if needed
107+
* Add examples for new features
108+
109+
## Core Focus Areas
110+
111+
We especially welcome contributions in these areas:
112+
113+
1. **AI Data Preparation Enhancements**
114+
- Improvements to RAG-focused features
115+
- Better support for AI model training data extraction
116+
- Enhanced structured data extraction
117+
118+
2. **Performance Optimizations**
119+
- Speed improvements for large document processing
120+
- Memory usage optimizations
121+
- Batch processing enhancements
122+
123+
3. **New Model Support**
124+
- Integration with new OCR models
125+
- Support for additional document types
126+
- Enhanced language support
127+
128+
4. **Documentation and Examples**
129+
- Better examples for AI/ML use cases
130+
- Improved API documentation
131+
- Tutorial content
132+
133+
## Community
134+
135+
* Join our [Community Discussions](https://github.com/anyparser/anyparser_core/discussions)
136+
* Follow our [GitHub repository](https://github.com/anyparser/anyparser_core)
137+
* Check out our [Documentation](https://docs.anyparser.com)
138+
139+
## License
140+
141+
By contributing to Anyparser Core, you agree that your contributions will be licensed under its Apache-2.0 license.

LICENSE.md

Whitespace-only changes.

Makefile

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
install:
2+
poetry install
3+
4+
install-dev:
5+
poetry install --with "dev"
6+
7+
clean:
8+
rm -rf dist
9+
rm -rf .coverage
10+
rm -rf htmlcov
11+
rm -rf .pytest_cache
12+
find . -type d -name "__pycache__" -exec rm -rf {} +
13+
14+
build:
15+
python -m build
16+
17+
compile:
18+
make build
19+
20+
publish-test:
21+
make clean
22+
make build
23+
python -m twine upload --repository testpypi dist/* --verbose -p ${TEST_PYPI_TOKEN}
24+
25+
publish-prod:
26+
make clean
27+
make build
28+
python -m twine upload dist/* --verbose -p ${PYPI_TOKEN}
29+
30+
test:
31+
poetry run pytest tests/ -v
32+
33+
coverage:
34+
poetry run pytest tests/ --cov=anyparser_core --cov-report=html
35+
36+
lint:
37+
black ./
38+
39+
all: clean build

0 commit comments

Comments
 (0)