A professional-grade Google Maps scraper that extracts business metadata from list-view results. Features bandwidth optimization for free proxy tiers, job resume capability, deduplication, and a real-time monitoring dashboard.
- Category 1 Extraction - Scrapes list-view data without clicking into individual listings
- Bandwidth Optimized - Blocks images/fonts/CSS for minimal data usage (perfect for WebShare.io 1GB free tier)
- Job Resume - Interrupt and resume scrapes at any time
- Multi-Query Batching - Run multiple search queries in a single session
- Cross-Scrape Deduplication - Exclude businesses from previous scrapes
- Rich Terminal UI - Real-time progress with live business feed monitor
- SQLite Storage - WAL mode for concurrent read/write with MD5 deduplication
- Parallel Scraping - Multi-process scraping for faster results
- Website Analysis - Analyze business websites for quality scoring
- Health Scoring - Composite scoring system for lead qualification
- Report Generation - Auto-generate HTML/JSON opportunity reports
# 1. Clone the repository
git clone https://github.com/yourusername/gMapsFullPurposeScraper.git
cd gMapsFullPurposeScraper
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
playwright install chromium
# 4. Seed the cities database
python setup.py
# 5. Run the interactive wizard (recommended)
python main.py --wizard
# Or run directly:
python main.py --query "Pet Cremation" --preset top_10 --no-proxy- Python 3.9+
- pip
-
Install Python dependencies:
pip install -r requirements.txt
-
Install Playwright browsers:
playwright install chromium
-
Seed city database:
python setup.py
-
Configure proxy (optional): Edit
config.yamland set your proxy credentials:proxy: enabled: true server: "http://user:pass@proxy.webshare.io:8080"
The wizard guides you through configuration:
python main.py --wizard# Single query with city preset
python main.py --query "Pet Cremation" --preset top_100
# Multiple queries (semicolon-separated)
python main.py --queries "Pet Cremation; Pet Cemetery; Animal Hospital" --preset top_10
# Custom cities
python main.py --query "Veterinarian" --custom "Dallas, TX; Miami, FL; Austin, TX"
# With live monitor dashboard
python main.py --query "Pet Cremation" --preset top_10 --live
# Without proxy (for testing)
python main.py --query "Pet Cremation" --preset top_10 --no-proxy# List resumable jobs
python main.py --list-jobs
# Resume an interrupted job
python main.py --resume <JOB_ID># List all result tables
python main.py --list-tables
# Export specific table to CSV
python main.py --export results_PetCremation_top100_20251215
# Export all tables as JSON
python main.py --export-all --format json# Search all fields
python main.py --search "cremation"
# Search by specific field
python main.py --search-name "Pet Heaven"
python main.py --search-phone "555-1234"
python main.py --search-city "Dallas"
# Export search results
python main.py --search "dallas" --export-search resultsExclude businesses already found in previous scrapes:
python main.py --query "Pet Cremation" --preset top_100 --exclude-tables "results_Pet*"| Preset | Cities | Description |
|---|---|---|
top_10 |
10 | Largest US metros |
top_100 |
100 | Major cities |
top_1000 |
1,000 | Medium+ cities |
top_2500 |
2,500 | All tracked cities |
main.py CLI entry point and Rich UI orchestration
wizard.py Interactive CLI wizard for guided setup
scraper.py Playwright automation and listing extraction
db.py SQLite database with WAL mode
job_manager.py Job persistence and resume capability
monitor.py Live dashboard with sparklines and business feed
setup.py City database seeding
config.yaml Configuration file
- CLI parses arguments and loads config
- DatabaseManager creates dynamic result table
- Scraper iterates cities and extracts listings
- Deduplication via MD5 hash of (name + phone)
- Job state persisted for resume capability
Key settings in config.yaml:
| Setting | Description |
|---|---|
proxy.server |
Proxy URL with credentials |
scrape.scroll_limit |
Scrolls per city (10-20 results each) |
bandwidth.budget_mb |
Auto-pause threshold |
optimization.block_* |
Toggle resource blocking |
For WebShare.io's 1GB free tier, the default settings block images, fonts, CSS, and media to minimize bandwidth usage. The scraper auto-pauses when approaching the budget limit.
Results are stored in SQLite (maps_data.db) with tables named:
results_{query}_{preset}_{timestamp}
Export formats: CSV (default), JSON, Excel
MIT