A playful RSE tool for gender annotation in Digital Humanities projects.
genderTagger is a web-based application for gender classification in TEI-XML person registers. It combines multiple automated classification methods with a gamified web interface for human annotation, developed by TELOTA's KI-Lab and the Gender & Data initiative at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW).
Status: The current codebase contains a Streamlit-based proof of concept and standalone implementations of all classification methods used for initial testing and benchmarking. The project is now being refactored into a production architecture with a FastAPI backend and a Vue.js frontend.
Person registers are the backbone of digital scholarly editions — they link actors, create context, and open perspectives on social and intellectual networks. Yet gender information is often missing, making the Gender Data Gap invisible. genderTagger addresses this by providing:
- 8 classification methods ranging from authority file lookups to transformer models and local LLMs
- A gamified web UI with browse/annotate mode, a timed game mode with scoring and leaderboards, and statistics dashboards
- TEI-XML import/export so enriched data flows back into existing edition workflows
- Benchmark tooling to evaluate and compare classifier performance
The primary development case study is schleiermacher digital, whose person register contains ~8,000 entries.
-
Install dependencies:
pip install -r requirements.txt
-
Launch the prototype app:
streamlit run src/gendertagger/prototype/app.py
-
Import your data: go to Settings → Import Person Register (default path:
data/raw/schleiermacher/Register/Personen) -
Start annotating: use Browse & Annotate for systematic work or Game Mode for speed annotation with scoring.
See docs/QUICK_START.md for the full quick-start guide and docs/UI_GUIDE.md for detailed usage instructions.
genderTagger ships with nine independent classification approaches:
| Method | Type | Source / Model |
|---|---|---|
| GND API | Authority file lookup | lobid-gnd (Gemeinsame Normdatei) |
| GND Local Lookup | Authority file (offline) | Pre-built SQLite DB from GND MARC XML |
| gender-guesser | Rule-based heuristic | gender-guesser library |
| nomquamgender | Statistical ML | nomquamgender library |
| HF Gender-Classification | DistilBERT transformer | padmajabfrl/Gender-Classification |
| HF Genderize | BERT transformer | imranali291/genderize |
| HIVOTO | Historical name lookup | HIVOTO v1.0.0 (Historisches Vornamentool) |
| HIVOTO-XGBoost | XGBoost + char n-gram TF-IDF | Trained on HIVOTO dataset |
| LLM (Ollama) | Local large language model | Configurable model via Ollama REST API |
Five of these classifiers (gender-guesser, nomquamgender, HF Gender-Classification, GND API, and optionally Ollama LLM) are integrated directly into the web UI to assist human annotators.
All nine methods were first developed and tested as standalone scripts (in methods/) to validate their accuracy and coverage independently before integration.
A Streamlit proof-of-concept (src/gendertagger/prototype/) was built to explore the annotation workflow and gamification features. It offers five modes:
- Browse & Annotate — filter by status, gender, or name; view classifier predictions per person; annotate with one click (Male / Female / Institution / Uncertain)
- Game Mode — timed annotation with score tracking, streak bonuses, and three difficulty levels (Easy: 10, Medium: 20, Hard: 50 questions)
- Statistics — annotation progress and gender distribution charts
- Leaderboard — top 10 scores for team competition
- Settings — TEI-XML import, CSV/JSON export, classifier management
The project is being refactored into a decoupled client-server architecture:
- Backend (
src/gendertagger/backend/) — a FastAPI REST API exposing classification endpoints, TEI-XML import/export, annotation management, and user/session handling. All nine classifiers will be accessible through a unified API. - Frontend (
src/gendertagger/frontend/) — a Vue.js single-page application providing the gamified annotation UI, statistics dashboards, and team leaderboards.
This separation enables independent scaling, easier testing, and a richer interactive frontend.
gendertagger/
├── data/
│ ├── output/ # Outputs and annotation database
│ │ ├── annotations.db # SQLite annotation database
│ │ └── results/ # Classification results (9 CSVs)
│ ├── processed/ # Processed datasets (e.g. persons CSV)
│ └── raw/ # Raw source data
│ ├── GND/ # GND authority MARC XML
│ ├── HIVOTO/ # HIVOTO v1.0.0 dataset
│ ├── schleiermacher/ # TEI-XML edition data
│ │ ├── Briefe/ # Letters (1774–1834)
│ │ ├── Register/ # Person/place/work registers
│ │ └── Thesaurus/ # SKOS/RDF thesaurus
│ └── wiki_gendersort/ # Wiki-Gendersort name database
├── docs/ # Documentation
│ └── references/ # Reference docs for tools and APIs
├── methods/ # 9 standalone classification scripts (test implementations)
├── notebooks/ # Jupyter notebooks for analysis
├── scripts/ # Benchmark, agreement and poster scripts
├── src/
│ └── gendertagger/ # Main Python package
│ ├── backend/ # Backend logic
│ │ ├── classification/ # Classifier implementations
│ │ ├── config/ # Configuration
│ │ ├── data/ # TEI-XML parsing and enrichment
│ │ ├── llm/ # LLM integration (prompts, schemas)
│ │ ├── transformations/ # XSLT transformations
│ │ └── utils/ # Utility functions
│ ├── frontend/ # Vue.js frontend (planned)
│ └── prototype/ # Streamlit proof-of-concept app
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
├── LICENSE # MIT
└── README.md
The scripts/ directory provides tools for systematic evaluation:
- benchmark.py — compares all nine methods against GND as ground truth; computes coverage, classification distributions, and accuracy metrics
- agreements.py — inter-annotator agreement: pairwise Cohen's κ, Fleiss' κ, Krippendorff's α
- coverage.py — coverage and classification distribution report across three tiers (binary, substantive, overall)
Poster visualisation scripts (poster_benchmark_chart.py, poster_pies.py, poster_example_badges.py) generate figures for presentations.
- Python 3.10+
- Core packages:
streamlit,pandas,plotly,gender-guesser,nomquamgender,transformers,torch,requests
-
Clone the repository:
git clone https://github.com/telota/gendertagger.git cd gendertagger -
Install dependencies:
pip install -r requirements.txt
-
Launch the Streamlit prototype:
streamlit run src/gendertagger/prototype/app.py
For the benchmark scripts, additional packages may be required: xgboost, scikit-learn, krippendorff, statsmodels, seaborn, matplotlib.
- Gender classification is based on historical names and uses a binary scheme (Male / Female) plus the categories Uncertain and Institution. Non-binary, trans*, or agender categories are not currently supported.
- Name-based classification has inherent limitations and cultural biases.
- Classifications represent a statistical estimate and do not reflect individual gender identity.
- Results require validation, especially for ambiguous or culturally diverse names.
This work was presented at deRSE26 (German Conference on Research Software Engineering, University of Stuttgart, March 3–5, 2026). The submitted abstract is available here.
This project is part of ongoing research. For questions or collaboration inquiries, please open an issue or contact the project team.
This project is licensed under the MIT License.
Copyright (c) 2026 TELOTA – The Electronic Life Of The Academy.
Developed by the KI-Lab and Gender & Data initiative at TELOTA, Berlin-Brandenburg Academy of Sciences and Humanities (BBAW).
- docs/QUICK_START.md — 30-second setup guide
- docs/UI_GUIDE.md — full user guide for the web application
- docs/INTEGRATION.md — technical integration details
- docs/references/ — reference documentation for tools and APIs