This guide explains the best strategies for updating your Firestore documentation index when docs change.
For most documentation updates, use incremental updates because they are:
- ⚡ Faster (only processes changed files)
- 💰 Cost-effective (fewer Firestore API calls and OpenAI embedding requests)
- 🔄 Efficient (maintains existing records)
- 🛡️ Safer (less risk of data loss)
python update_docs.py --mode incremental- Only processes files that have changed
- Deletes records for removed files
- Updates records for modified files
- Keeps unchanged records intact
python update_docs.py --mode full- Deletes ALL existing records
- Processes ALL documentation files
- Use when you need a complete rebuild
The script tracks file changes using:
- MD5 hashes of file content
- Modification timestamps
- File sizes
- State persistence in
docs_state.json
- ✅ Regular documentation updates
- ✅ Adding new pages
- ✅ Editing existing content
- ✅ Minor structural changes
- ✅ Daily/weekly updates
- 🔄 Major restructuring
- 🔄 Changing chunking logic
- 🔄 Schema changes in metadata
- 🔄 After long periods without updates
- 🔄 Troubleshooting index issues
| Mode | Speed | Firestore Calls | OpenAI Calls | Risk | Use Case |
|---|---|---|---|---|---|
| Incremental | ⚡ Fast | 🟢 Low | 🟢 Low | 🟢 Safe | Regular updates |
| Full Refresh | 🐌 Slow | 🔴 High | 🔴 High | 🟡 Medium | Major changes |
python update_docs.py --mode fullrm docs_state.json
python update_docs.py --mode incremental# Check the state file
cat docs_state.json- Run incremental updates frequently (daily/weekly)
- Use full refresh sparingly (monthly/quarterly)
- Monitor the state file for consistency
- Test changes in a staging environment first
- Keep backups of your Firestore data
You can modify these settings in the script:
REPO_FOLDER: Documentation directorySTATE_FILE: Change tracking fileBATCH_SIZE: Firestore upload batch sizeMAX_CHUNK_SIZE: Maximum chunk sizeFIRESTORE_COLLECTION: Firestore collection name (default: "knowledge_base")OPENAI_EMBEDDING_MODEL: OpenAI embedding model (default: "text-embedding-3-small")
The script requires the following environment variables:
FIRESTORE_PROJECT_ID: Your Google Cloud Firestore project IDFIRESTORE_CREDENTIALS_JSON: Service account credentials as JSON stringOPENAI_API_KEY: Your OpenAI API key for generating embeddings
export FIRESTORE_PROJECT_ID="your-project-id"
export FIRESTORE_CREDENTIALS_JSON='{"type": "service_account", "project_id": "your-project-id", ...}'
export OPENAI_API_KEY="sk-your-openai-key"The script creates documents in the knowledge_base collection with this structure:
{
"id": "filename-header-0",
"text": "Documentation content...",
"embedding": [0.1, 0.2, ...],
"chunk_index": 0,
"source_file": "get-started/quickstart.md",
"header": "Getting Started",
"docs_url": "https://docs.dreamflow.com/get-started/quickstart#getting-started",
"category": "Dreamflow Documentation",
"created_at": "2024-01-01T00:00:00Z"
}The script provides detailed logging:
- Files detected as changed
- Number of chunks processed
- OpenAI embedding generation progress
- Firestore upload progress
- Error handling
Monitor these logs to ensure updates are working correctly.
The repository includes automated deployment workflows:
- Trigger: Push to
mainordevelopbranches - Workflow:
.github/workflows/update-docs-dev.yml - Environment: Dev Firestore project
- Secrets Required:
DEV_FIRESTORE_PROJECT_IDDEV_FIRESTORE_CREDENTIALS_JSONOPENAI_API_KEY
- Trigger: Push to
mainbranch - Workflow:
.github/workflows/update-docs-prod.yml - Environment: Prod Firestore project
- Secrets Required:
PROD_FIRESTORE_PROJECT_IDPROD_FIRESTORE_CREDENTIALS_JSONOPENAI_API_KEY
- Go to your repository Settings → Secrets and variables → Actions
- Add the following secrets:
DEV_FIRESTORE_PROJECT_ID: Your dev Firestore project IDDEV_FIRESTORE_CREDENTIALS_JSON: Complete service account JSONPROD_FIRESTORE_PROJECT_ID: Your prod Firestore project IDPROD_FIRESTORE_CREDENTIALS_JSON: Complete service account JSONOPENAI_API_KEY: Your OpenAI API key
- Document writes: ~$0.18 per 100K operations
- Document reads: ~$0.06 per 100K operations
- Storage: ~$0.18 per GB per month
- text-embedding-3-small: ~$0.02 per 1M tokens
- Typical documentation: ~100-500 tokens per chunk
- Use incremental updates to minimize API calls
- Batch operations to reduce Firestore write costs
- Monitor usage in Google Cloud Console
- Set up billing alerts for cost control