Skip to content

Open-Book-Genome-Project/tags

Repository files navigation

Open Library Canonical Tags

A community-maintained, controlled vocabulary of typed tags for Open Library and the Open Book Genome Project.

This repository is the canonical source of truth for the tag taxonomy used across Open Library's work records. It replaces the historical "subjects" free-for-all with a structured, typed system that supports taxonomy, hierarchy, i18n, and advanced search.


Background

Open Library has historically lumped everything — genres, places, people, moods, reading levels, catalog codes — into a single flat subjects list. The result is noise: "new-york", "newyork", and "place:new_york" all coexist, meaning nothing cleanly.

This project defines a clean set of typed tag categories, each with its own controlled vocabulary, rules for inclusion, and a mapping from the old subject strings to the new canonical tags.

Related reading:


Tag Types

Each type lives in its own directory with rules, criteria, and (where applicable) a controlled vocabulary list. All controlled types also have a machine-readable vocabulary.json consumed by the Tags API.

Type Directory Controlled? Description
literary_form literary_form/ Strict Fiction or Nonfiction — the top-level binary
genres genres/ Managed Broad stylistic/market category; the emotional promise
subgenres subgenres/ Managed Structural framework within a genre; includes parent_genres links
content_formats content_formats/ Managed Physical/structural form of the work
moods moods/ Managed Emotional resonance felt by the reader
audience audience/ Managed Intended readership / age group
content_warnings content_warnings/ Strict Flags for potentially distressing content
content_features content_features/ Managed Verifiable structural features (maps, index, multiple narrators, etc.)
literary_themes literary_themes/ Open/growing Abstract philosophical or emotional drivers
literary_tropes literary_tropes/ Open/growing Recognizable narrative devices or archetypes
main_topics main_topics/ Open/growing Primary concepts or academic subject matter
sub_topics sub_topics/ Open/growing Secondary/supporting concepts
things things/ Open/growing Tangible objects physically present in the text
people people/ Open/growing Characters and people mentioned
places places/ Open/growing Geographic locations and settings
times times/ Open/growing Eras, periods, dates

Example: Wuthering Heights

Before (raw OL subjects — 80+ mixed strings):

form:novel, genre:tragedy, genre:gothic, British and irish fiction, Children's fiction,
Classic Literature, Country homes, Death, Drama, Foundlings, Historical Fiction,
Juvenile fiction, love, orphans, Psychological fiction, Reading Level-Grade 7,
revenge, romance, Yorkshire (England), Pr4172 .w7 2009c, 823/.8, ...

After (typed canonical tags):

literary_form:    [Fiction]
content_formats:  [Novel]
genres:           [Tragedy, Gothic, Classic Literature, Drama, Romance]
subgenres:        [Psychological, Historical, Domestic, English Gothic]
literary_tropes:  [Foundlings, Orphans, Love Triangles, Inheritance and succession]
literary_themes:  [Love, Revenge, Death, Rejection, Man-woman relationships]
main_topics:      [Interpersonal relations, Family life, Social Conditions, Class]
sub_topics:       [Country life, Cousins, Rural families, Landscape, Manners and customs]
people:           [Heathcliff, Catherine Earnshaw]
places:           [Yorkshire (England), England, Country homes]
moods:            [Atmospheric, Melancholic, Foreboding, Bleak, Intense]

API

The api/ directory contains a FastAPI microservice for autocomplete tag search. It powers the OL editing UI and patron-facing tag lookups.

pip install -r api/requirements.txt
uvicorn api.main:app --reload
# → http://localhost:8000/docs

Key endpoints:

  • GET /v1/tags?q=hor&type=genres — autocomplete
  • GET /v1/types — list all types
  • GET /v1/types/subgenres/gothic — single tag lookup

Scripts

The scripts/ directory contains tooling for migrating legacy OL subjects to canonical typed tags and for generating the combined tags.json snapshot.

# Generate tags.json from all vocabulary.json files
python scripts/dump_tags.py

# Migrate a single OL work's subjects
python scripts/migrate_subjects.py --work OL82563W --dry-run

Contributing

See CONTRIBUTING.md for how to propose new tags, correct mappings, or update controlled vocabularies.

Each type directory has a proposals.md file listing pending proposals, accepted additions, rejected terms (with reasoning), and deferred ideas. Check there before opening an issue.

See AGENTS.md for the taxonomy's core decision rules — useful both for human contributors and for AI-assisted proposal review.

See SCHEMA.md for the vocabulary.json field reference.

See ROADMAP.md for the project plan.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages