CLI tool for ingesting twitterapi.io tweets and media into local SQLite. Built with oclif.
- Node.js >= 18
- macOS (primary target)
- A twitterapi.io API key with active account credits (pay-as-you-go: $0.15/1k tweets, $0.18/1k profiles)
- Network access to
api.twitterapi.io(outbound HTTPS on port 443)
brew tap 0xsend/backscraxer
brew install backscraxerThe formula installs a prebuilt Bun executable from GitHub Releases (no npm link/global install required).
Homebrew verifies SHA-256 checksums for each release asset.
Tap repository: https://github.com/0xsend/homebrew-backscraxer.
Note: Homebrew install works once a tagged release has uploaded the three platform binaries.
git clone https://github.com/0xsend/backscraXer.git backscraxer
cd backscraxer
npm install && npm run build
npm linkpkg is archived and no longer maintained. Use Bun compile for single-file binaries.
# Build host-target executable at ./dist/backscraxer
npm run build
# Run directly
./dist/backscraxer docs:get-endpoints --format jsonCross-target examples:
bun run ./scripts/build-bun-executable.ts --target bun-linux-x64 --outfile ./dist/backscraxer-linux-x64
bun run ./scripts/build-bun-executable.ts --target bun-darwin-arm64 --outfile ./dist/backscraxer-darwin-arm64Install the agent skill (Codex/Claude/Cursor):
backscraxer install
# or target explicitly:
backscraxer install --target codexIf you are using only the standalone Bun binary, use:
./dist/backscraxer installWhen using bash ./scripts/install-skill.sh, if backscraxer is not in PATH and
./dist/backscraxer exists, the installer can optionally create ~/.local/bin/backscraxer
and append a PATH block to your shell profile.
Recommended one-time setup:
# Works for npm-installed CLI or Bun standalone executable
backscraxer setup
# or:
./dist/backscraxer setupsetup initializes local defaults:
- DB:
~/.backscraxer/data.db - Media dir:
~/.backscraxer/media - Optional saved key file:
~/.backscraxer/session_env.sh
It prompts for TWITTERAPI_IO_KEY (hidden input) and can save it for future runs.
You can also set the key manually in your shell:
export TWITTERAPI_IO_KEY=your-api-key-hereAPI commands resolve keys in this order:
TWITTERAPI_IO_KEYfrom environment- Saved session file (
~/.backscraxer/session_env.sh) - Saved agent skill files (
~/.codex/skills/backscraxer/session_env.sh, etc.)
Run preflight checks:
bash ./scripts/check-env.sh --mode api
# Machine-readable output:
bash ./scripts/check-env.sh --mode api --format json --checks cli,key,networkIf you run a standalone binary without adding it to PATH, pass it explicitly:
BACKSCRAXER_BIN=./dist/backscraxer bash ./scripts/check-env.sh --mode api --format json --checks cli,key,networkAll data is stored locally in ~/.backscraxer/data.db by default. Override with --db /path/to/file.db.
In sandboxed environments, prefer a writable local path like --db ./tmp/backscraxer.db or --db /tmp/backscraxer.db.
Enable namespaced debug logs with the DEBUG environment variable:
# Enable all debug logs
DEBUG=backscraxer:* backscraxer db:stats
# Enable only API + ingest logs
DEBUG=backscraxer:api,backscraxer:ingest backscraxer ingest:user --user-name nasa
# Exclude a noisy namespace
DEBUG=backscraxer:*,-backscraxer:media backscraxer ingest:user --user-name nasa --with-mediaDebug logs are emitted to stderr only, so stdout output contracts (JSON/table summaries)
remain stable for scripts and pipelines.
Single-user-first default for agent prompts:
/backscraxer fetch the most recent 5 tweets from @send2vic
All fetch commands follow the DB-first workflow: API fetch -> DB persist -> DB query -> output. Returned results are always derived from the local database, never direct API passthrough.
Setup-intent routing for agents:
/backscraxer install/backscraxer setup/backscraxer configure
These should run the installer workflow (backscraxer install or bash ./scripts/install-skill.sh),
then run bash ./scripts/check-env.sh --mode api.
Mode split for agents:
- Live fetch mode (
fetch:*,ingest:*): requires CLI + key + network. - Offline analysis mode (
report:users,db:*,docs:get-endpoints): works without key/network.
Before live fetch commands, run:
bash ./scripts/check-env.sh --mode api --format json --checks cli,key,networkIf key is missing, prefer this recovery path first:
source ./scripts/source-session-env.shFor setting environment variables, initializing DB, installing the skill into Claude/Codex/Cursor, and invoking this CLI through agent skills, see:
docs/agent-setup-guide.md
Initialize local storage and optional API key persistence for standalone and npm installs.
backscraxer setup
backscraxer setup --db ./data/twitter.db --media-dir ./data/media
backscraxer setup --key your-api-key --no-prompt-keyInstall the backscraxer skill files for Codex/Claude/Cursor.
backscraxer install
backscraxer install --target codex
backscraxer install --target claude --target cursorWhen --target is omitted in an interactive terminal, the CLI prompts for target selection.
List available twitterapi.io GET endpoints and their metadata.
backscraxer docs:get-endpoints
backscraxer docs:get-endpoints --format jsonIngest tweets from a user's timeline.
# Ingest recent tweets for a user
backscraxer ingest:user --user-name nasa
# With date range and limit
backscraxer ingest:user --user-name elonmusk \
--from 2025-01-01T00:00:00Z --to 2025-06-01T00:00:00Z \
--limit 500
# With media downloads
backscraxer ingest:user --user-name nasa --with-media
# Using user ID instead
backscraxer ingest:user --user-id 11348282Fetch tweets from a user timeline: ingest from API, persist to DB, return DB-derived rows.
Default limit is 5 tweets when --limit is omitted.
# Fetch the most recent 5 tweets from a user (default)
backscraxer fetch:user --user-name send2vic
# With @ prefix (auto-stripped)
backscraxer fetch:user --user-name @send2vic
# JSON output with custom limit
backscraxer fetch:user --user-name send2vic --limit 10 --format json
# With date range
backscraxer fetch:user --user-name send2vic \
--from 2025-01-01T00:00:00Z --to 2025-06-01T00:00:00ZWorkflow: API fetch -> DB persist -> DB query -> output. Output rows are always derived from the local database, never direct API passthrough.
Fetch tweets for multiple users: resolve profiles, ingest per-user, return DB-derived rows.
Multi-user is explicit opt-in; --max-users defaults to 1.
# Single user by handle (default max-users=1)
backscraxer fetch:users --user-name send2vic
# Multiple users with explicit opt-in
backscraxer fetch:users --user-name send2vic --user-name nasa --max-users 5
# By user IDs (mutually exclusive with --user-name)
backscraxer fetch:users --user-id 123456 --user-id 789012 --max-users 2 --format jsonFlags --user-name and --user-id are mutually exclusive. Handles are normalized (leading @ stripped, deduplicated).
Per-user tweet ingestion runs sequentially to control API quota usage. Partial failures are reported
in the output summary; at least one successful user yields exit code 0.
Workflow: API fetch -> DB persist -> DB query -> output (same DB-first contract as fetch:user).
Ingest tweets from advanced search.
# Search for tweets
backscraxer ingest:search --query "from:nasa"
# With date range
backscraxer ingest:search --query "climate change" \
--from 2025-01-01T00:00:00Z --to 2025-03-01T00:00:00Z
# Top results instead of latest
backscraxer ingest:search --query "AI safety" --query-type TopReport per-user metrics from persisted DB data. No API key required.
# Table output (default)
backscraxer report:users
# JSON output
backscraxer report:users --format json
# CSV export
backscraxer report:users --format csv
# With date range for posts_in_range_count
backscraxer report:users --format csv \
--from 2025-01-01T00:00:00Z --to 2025-06-30T23:59:59ZOutput fields: user_name, followers, following, last_post_1_at, last_post_2_at,
last_post_3_at, posts_in_range_count, join_month, join_year.
posts_in_range_count semantics:
- When
--fromand/or--toare provided: inclusive count of tweets in the specified range. - When neither is provided:
null(not computed). - Users with no tweets in range show
0when a range is provided.
last_post_1_at through last_post_3_at are the newest-to-older post timestamps from persisted tweets.
Show aggregate statistics for the local database.
backscraxer db:stats
backscraxer db:stats --format json
backscraxer db:stats --db /path/to/custom.dbDelete tweets and associated data before a given date. Safe by default (dry-run).
# Preview what would be deleted (no changes)
backscraxer db:prune --before 2024-01-01T00:00:00Z
# Actually delete
backscraxer db:prune --before 2024-01-01T00:00:00Z --apply
# Also remove media files from disk
backscraxer db:prune --before 2024-01-01T00:00:00Z --apply --delete-media-files| Flag | Description | Default |
|---|---|---|
--db, -d |
SQLite database path | ~/.backscraxer/data.db |
--from |
Start date (inclusive, ISO 8601) | none |
--to |
End date (inclusive, ISO 8601) | none |
--limit, -l |
Max in-range tweets to ingest | unlimited (fetch:user defaults to 5) |
--with-media |
Download media files | false |
--out-media-dir |
Media download directory | ~/.backscraxer/media |
--resume / --no-resume |
Resume from checkpoint | true |
--format, -f |
Output format (table or json; report:users also supports csv) |
table |
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Unexpected internal error |
| 2 | Usage/flag validation error |
| 3 | Configuration/environment error (e.g. missing API key) |
| 4 | Authentication error (401/403) |
| 5 | Rate limit exhausted (429) |
| 6 | Network failure |
| 7 | Upstream API error (including 402 insufficient credits) |
| 8 | Database/filesystem error |
All ingest:* and fetch:* commands require outbound HTTPS access to api.twitterapi.io:443.
Test connectivity:
curl -I https://api.twitterapi.ioVerify your API key and account status:
# Should return user info JSON, not 401/402/403 error
curl -H "X-API-Key: $TWITTERAPI_IO_KEY" \
"https://api.twitterapi.io/twitter/user/info?userName=nasa"Common errors:
- HTTP 401 Unauthorized: API key invalid or missing
- HTTP 402 Payment Required: Account out of credits — recharge at twitterapi.io
- HTTP 403 Forbidden: API key disabled or account suspended
- HTTP 429 Rate Limit: Too many requests (default: 1000+ req/sec)
Codex and Claude sandboxes often have restricted outbound network access. If outbound HTTPS to api.twitterapi.io is blocked, API commands will fail.
-
Ingest data on your local machine (where network access works):
backscraxer ingest:user --user-name nasa \ --from 2025-01-01T00:00:00Z --to 2025-06-30T23:59:59Z \ --db ~/data/twitter-analysis.db -
Copy the SQLite database into the sandbox:
- Codex: Place the
.dbfile in your project directory - Claude: Upload the
.dbfile as a project resource
- Codex: Place the
-
Analyze locally in the sandbox:
# These work without API access backscraxer db:stats --db ./twitter-analysis.db sqlite3 ./twitter-analysis.db < src/db/views.sql sqlite3 -header -column ./twitter-analysis.db < examples/analysis_queries.sql
✅ docs:get-endpoints, db:stats, db:prune, report:users (no API calls)
✅ SQLite queries on local .db files
✅ Analytics views and reports
❌ fetch:user, fetch:users, ingest:user, ingest:search (require live API access)
If network access to twitterapi.io is unavailable, you can still:
- Run
db:stats,db:prune,docs:get-endpoints(no API calls required) - Query already-ingested data via SQLite:
sqlite3 ~/.backscraxer/data.db "SELECT * FROM tweets LIMIT 5;"
All ingest:* and fetch:* commands require live API access and will fail without network connectivity.
- Date range is authoritative: when
--from/--toand--limitare both provided, date range is the hard boundary and limit caps only in-range records. - Checkpoint resume: long-running ingests automatically checkpoint progress. Re-running the same command resumes from where it left off. Use
--no-resumeto start fresh. - Idempotent writes: re-ingesting the same data window updates existing records without duplication.
- Dry-run pruning:
db:prunedefaults to preview mode. Pass--applyto actually delete data.
npm install
npm run build
npm run typecheck
npm test- Create the tap repository:
gh repo create 0xsend/homebrew-backscraxer --public --description "Homebrew tap for backscraxer" --add-readme - Add
HOMEBREW_TAP_TOKENto the main repo secrets (PAT withcontents:writeto0xsend/homebrew-backscraxer). - Optional: set repo variable
HOMEBREW_TAP_REPOif your tap lives somewhere other than0xsend/homebrew-backscraxer.
- Bump version and sync formula metadata:
npm run version:bump -- patch
- Optional changelog generation:
npm run release:changelog -- --release-as <new-version>
- Commit and create a semver tag:
git add package.json Formula/backscraxer.rb git commit -m "release: v<new-version>" git tag v<new-version> git push origin main --tags
release.ymlruns on the tag and automatically:- Builds
backscraxer-darwin-arm64,backscraxer-darwin-x64,backscraxer-linux-x64 - Uploads binaries to the GitHub Release
- Computes SHA-256 checksums
- Updates
Formula/backscraxer.rbin the tap repository
- Builds
brew update
brew tap 0xsend/backscraxer
brew install backscraxer
backscraxer --helpUNLICENSED