Skip to content

0xsend/backscraXer

Repository files navigation

backscraxer

CLI tool for ingesting twitterapi.io tweets and media into local SQLite. Built with oclif.

Prerequisites

  • Node.js >= 18
  • macOS (primary target)
  • A twitterapi.io API key with active account credits (pay-as-you-go: $0.15/1k tweets, $0.18/1k profiles)
  • Network access to api.twitterapi.io (outbound HTTPS on port 443)

Install

Homebrew (recommended)

brew tap 0xsend/backscraxer
brew install backscraxer

The formula installs a prebuilt Bun executable from GitHub Releases (no npm link/global install required). Homebrew verifies SHA-256 checksums for each release asset. Tap repository: https://github.com/0xsend/homebrew-backscraxer. Note: Homebrew install works once a tagged release has uploaded the three platform binaries.

npm (alternative)

git clone https://github.com/0xsend/backscraXer.git backscraxer
cd backscraxer
npm install && npm run build
npm link

Standalone Executable (Bun --compile)

pkg is archived and no longer maintained. Use Bun compile for single-file binaries.

# Build host-target executable at ./dist/backscraxer
npm run build

# Run directly
./dist/backscraxer docs:get-endpoints --format json

Cross-target examples:

bun run ./scripts/build-bun-executable.ts --target bun-linux-x64 --outfile ./dist/backscraxer-linux-x64
bun run ./scripts/build-bun-executable.ts --target bun-darwin-arm64 --outfile ./dist/backscraxer-darwin-arm64

Install the agent skill (Codex/Claude/Cursor):

backscraxer install
# or target explicitly:
backscraxer install --target codex

If you are using only the standalone Bun binary, use:

./dist/backscraxer install

When using bash ./scripts/install-skill.sh, if backscraxer is not in PATH and ./dist/backscraxer exists, the installer can optionally create ~/.local/bin/backscraxer and append a PATH block to your shell profile.

Configuration

Recommended one-time setup:

# Works for npm-installed CLI or Bun standalone executable
backscraxer setup
# or:
./dist/backscraxer setup

setup initializes local defaults:

  • DB: ~/.backscraxer/data.db
  • Media dir: ~/.backscraxer/media
  • Optional saved key file: ~/.backscraxer/session_env.sh

It prompts for TWITTERAPI_IO_KEY (hidden input) and can save it for future runs.

You can also set the key manually in your shell:

export TWITTERAPI_IO_KEY=your-api-key-here

API commands resolve keys in this order:

  1. TWITTERAPI_IO_KEY from environment
  2. Saved session file (~/.backscraxer/session_env.sh)
  3. Saved agent skill files (~/.codex/skills/backscraxer/session_env.sh, etc.)

Run preflight checks:

bash ./scripts/check-env.sh --mode api
# Machine-readable output:
bash ./scripts/check-env.sh --mode api --format json --checks cli,key,network

If you run a standalone binary without adding it to PATH, pass it explicitly:

BACKSCRAXER_BIN=./dist/backscraxer bash ./scripts/check-env.sh --mode api --format json --checks cli,key,network

All data is stored locally in ~/.backscraxer/data.db by default. Override with --db /path/to/file.db. In sandboxed environments, prefer a writable local path like --db ./tmp/backscraxer.db or --db /tmp/backscraxer.db.

Debug Logging

Enable namespaced debug logs with the DEBUG environment variable:

# Enable all debug logs
DEBUG=backscraxer:* backscraxer db:stats

# Enable only API + ingest logs
DEBUG=backscraxer:api,backscraxer:ingest backscraxer ingest:user --user-name nasa

# Exclude a noisy namespace
DEBUG=backscraxer:*,-backscraxer:media backscraxer ingest:user --user-name nasa --with-media

Debug logs are emitted to stderr only, so stdout output contracts (JSON/table summaries) remain stable for scripts and pipelines.

Agent Setup

Single-user-first default for agent prompts:

/backscraxer fetch the most recent 5 tweets from @send2vic

All fetch commands follow the DB-first workflow: API fetch -> DB persist -> DB query -> output. Returned results are always derived from the local database, never direct API passthrough.

Setup-intent routing for agents:

  • /backscraxer install
  • /backscraxer setup
  • /backscraxer configure

These should run the installer workflow (backscraxer install or bash ./scripts/install-skill.sh), then run bash ./scripts/check-env.sh --mode api.

Mode split for agents:

  • Live fetch mode (fetch:*, ingest:*): requires CLI + key + network.
  • Offline analysis mode (report:users, db:*, docs:get-endpoints): works without key/network.

Before live fetch commands, run:

bash ./scripts/check-env.sh --mode api --format json --checks cli,key,network

If key is missing, prefer this recovery path first:

source ./scripts/source-session-env.sh

For setting environment variables, initializing DB, installing the skill into Claude/Codex/Cursor, and invoking this CLI through agent skills, see:

  • docs/agent-setup-guide.md

Commands

setup

Initialize local storage and optional API key persistence for standalone and npm installs.

backscraxer setup
backscraxer setup --db ./data/twitter.db --media-dir ./data/media
backscraxer setup --key your-api-key --no-prompt-key

install

Install the backscraxer skill files for Codex/Claude/Cursor.

backscraxer install
backscraxer install --target codex
backscraxer install --target claude --target cursor

When --target is omitted in an interactive terminal, the CLI prompts for target selection.

docs:get-endpoints

List available twitterapi.io GET endpoints and their metadata.

backscraxer docs:get-endpoints
backscraxer docs:get-endpoints --format json

ingest:user

Ingest tweets from a user's timeline.

# Ingest recent tweets for a user
backscraxer ingest:user --user-name nasa

# With date range and limit
backscraxer ingest:user --user-name elonmusk \
  --from 2025-01-01T00:00:00Z --to 2025-06-01T00:00:00Z \
  --limit 500

# With media downloads
backscraxer ingest:user --user-name nasa --with-media

# Using user ID instead
backscraxer ingest:user --user-id 11348282

fetch:user

Fetch tweets from a user timeline: ingest from API, persist to DB, return DB-derived rows. Default limit is 5 tweets when --limit is omitted.

# Fetch the most recent 5 tweets from a user (default)
backscraxer fetch:user --user-name send2vic

# With @ prefix (auto-stripped)
backscraxer fetch:user --user-name @send2vic

# JSON output with custom limit
backscraxer fetch:user --user-name send2vic --limit 10 --format json

# With date range
backscraxer fetch:user --user-name send2vic \
  --from 2025-01-01T00:00:00Z --to 2025-06-01T00:00:00Z

Workflow: API fetch -> DB persist -> DB query -> output. Output rows are always derived from the local database, never direct API passthrough.

fetch:users

Fetch tweets for multiple users: resolve profiles, ingest per-user, return DB-derived rows. Multi-user is explicit opt-in; --max-users defaults to 1.

# Single user by handle (default max-users=1)
backscraxer fetch:users --user-name send2vic

# Multiple users with explicit opt-in
backscraxer fetch:users --user-name send2vic --user-name nasa --max-users 5

# By user IDs (mutually exclusive with --user-name)
backscraxer fetch:users --user-id 123456 --user-id 789012 --max-users 2 --format json

Flags --user-name and --user-id are mutually exclusive. Handles are normalized (leading @ stripped, deduplicated). Per-user tweet ingestion runs sequentially to control API quota usage. Partial failures are reported in the output summary; at least one successful user yields exit code 0.

Workflow: API fetch -> DB persist -> DB query -> output (same DB-first contract as fetch:user).

ingest:search

Ingest tweets from advanced search.

# Search for tweets
backscraxer ingest:search --query "from:nasa"

# With date range
backscraxer ingest:search --query "climate change" \
  --from 2025-01-01T00:00:00Z --to 2025-03-01T00:00:00Z

# Top results instead of latest
backscraxer ingest:search --query "AI safety" --query-type Top

report:users

Report per-user metrics from persisted DB data. No API key required.

# Table output (default)
backscraxer report:users

# JSON output
backscraxer report:users --format json

# CSV export
backscraxer report:users --format csv

# With date range for posts_in_range_count
backscraxer report:users --format csv \
  --from 2025-01-01T00:00:00Z --to 2025-06-30T23:59:59Z

Output fields: user_name, followers, following, last_post_1_at, last_post_2_at, last_post_3_at, posts_in_range_count, join_month, join_year.

posts_in_range_count semantics:

  • When --from and/or --to are provided: inclusive count of tweets in the specified range.
  • When neither is provided: null (not computed).
  • Users with no tweets in range show 0 when a range is provided.

last_post_1_at through last_post_3_at are the newest-to-older post timestamps from persisted tweets.

db:stats

Show aggregate statistics for the local database.

backscraxer db:stats
backscraxer db:stats --format json
backscraxer db:stats --db /path/to/custom.db

db:prune

Delete tweets and associated data before a given date. Safe by default (dry-run).

# Preview what would be deleted (no changes)
backscraxer db:prune --before 2024-01-01T00:00:00Z

# Actually delete
backscraxer db:prune --before 2024-01-01T00:00:00Z --apply

# Also remove media files from disk
backscraxer db:prune --before 2024-01-01T00:00:00Z --apply --delete-media-files

Shared Flags

Flag Description Default
--db, -d SQLite database path ~/.backscraxer/data.db
--from Start date (inclusive, ISO 8601) none
--to End date (inclusive, ISO 8601) none
--limit, -l Max in-range tweets to ingest unlimited (fetch:user defaults to 5)
--with-media Download media files false
--out-media-dir Media download directory ~/.backscraxer/media
--resume / --no-resume Resume from checkpoint true
--format, -f Output format (table or json; report:users also supports csv) table

Exit Codes

Code Meaning
0 Success
1 Unexpected internal error
2 Usage/flag validation error
3 Configuration/environment error (e.g. missing API key)
4 Authentication error (401/403)
5 Rate limit exhausted (429)
6 Network failure
7 Upstream API error (including 402 insufficient credits)
8 Database/filesystem error

Troubleshooting

Network Access

All ingest:* and fetch:* commands require outbound HTTPS access to api.twitterapi.io:443.

Test connectivity:

curl -I https://api.twitterapi.io

API Key and Credits

Verify your API key and account status:

# Should return user info JSON, not 401/402/403 error
curl -H "X-API-Key: $TWITTERAPI_IO_KEY" \
  "https://api.twitterapi.io/twitter/user/info?userName=nasa"

Common errors:

  • HTTP 401 Unauthorized: API key invalid or missing
  • HTTP 402 Payment Required: Account out of credits — recharge at twitterapi.io
  • HTTP 403 Forbidden: API key disabled or account suspended
  • HTTP 429 Rate Limit: Too many requests (default: 1000+ req/sec)

Sandbox Environments (Codex/Claude)

Codex and Claude sandboxes often have restricted outbound network access. If outbound HTTPS to api.twitterapi.io is blocked, API commands will fail.

Workaround Strategy

  1. Ingest data on your local machine (where network access works):

    backscraxer ingest:user --user-name nasa \
      --from 2025-01-01T00:00:00Z --to 2025-06-30T23:59:59Z \
      --db ~/data/twitter-analysis.db
  2. Copy the SQLite database into the sandbox:

    • Codex: Place the .db file in your project directory
    • Claude: Upload the .db file as a project resource
  3. Analyze locally in the sandbox:

    # These work without API access
    backscraxer db:stats --db ./twitter-analysis.db
    sqlite3 ./twitter-analysis.db < src/db/views.sql
    sqlite3 -header -column ./twitter-analysis.db < examples/analysis_queries.sql

What Works in Sandboxes

docs:get-endpoints, db:stats, db:prune, report:users (no API calls) ✅ SQLite queries on local .db files
✅ Analytics views and reports

fetch:user, fetch:users, ingest:user, ingest:search (require live API access)

Offline/Local-Only Usage

If network access to twitterapi.io is unavailable, you can still:

  • Run db:stats, db:prune, docs:get-endpoints (no API calls required)
  • Query already-ingested data via SQLite:
    sqlite3 ~/.backscraxer/data.db "SELECT * FROM tweets LIMIT 5;"

All ingest:* and fetch:* commands require live API access and will fail without network connectivity.

Key Behaviors

  • Date range is authoritative: when --from/--to and --limit are both provided, date range is the hard boundary and limit caps only in-range records.
  • Checkpoint resume: long-running ingests automatically checkpoint progress. Re-running the same command resumes from where it left off. Use --no-resume to start fresh.
  • Idempotent writes: re-ingesting the same data window updates existing records without duplication.
  • Dry-run pruning: db:prune defaults to preview mode. Pass --apply to actually delete data.

Development

npm install
npm run build
npm run typecheck
npm test

Release Process (GitHub Actions + Homebrew Tap)

One-time setup

  1. Create the tap repository:
    gh repo create 0xsend/homebrew-backscraxer --public --description "Homebrew tap for backscraxer" --add-readme
  2. Add HOMEBREW_TAP_TOKEN to the main repo secrets (PAT with contents:write to 0xsend/homebrew-backscraxer).
  3. Optional: set repo variable HOMEBREW_TAP_REPO if your tap lives somewhere other than 0xsend/homebrew-backscraxer.

Per release

  1. Bump version and sync formula metadata:
    npm run version:bump -- patch
  2. Optional changelog generation:
    npm run release:changelog -- --release-as <new-version>
  3. Commit and create a semver tag:
    git add package.json Formula/backscraxer.rb
    git commit -m "release: v<new-version>"
    git tag v<new-version>
    git push origin main --tags
  4. release.yml runs on the tag and automatically:
    • Builds backscraxer-darwin-arm64, backscraxer-darwin-x64, backscraxer-linux-x64
    • Uploads binaries to the GitHub Release
    • Computes SHA-256 checksums
    • Updates Formula/backscraxer.rb in the tap repository

Verify install

brew update
brew tap 0xsend/backscraxer
brew install backscraxer
backscraxer --help

License

UNLICENSED

About

agentic twitter scraper cli

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors