Skip to content

Ibrahim009-Devloper/ScrapeMaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛒 Multi website Data Scraping & Automation Pipeline

This is a professional, modular data scraping and processing framework built with Python and Playwright. It automates the extraction of product data from Amazon, cleans it, and exports it into structured formats.


🏗 Project Structure

├── config/
│   └── setting.py             # Global configurations & constants
├── Data/
│   ├── Row_data/              # Raw extracted data (Unprocessed)
│   ├── proceced_data/         # Cleaned and structured data
│   └── export_data/           # Final CSV/Excel files for delivery
├── Data_cleaner/
│   └── amazon_product_data_cleaner.py  # Data cleaning logic
├── logs/                      # Runtime execution logs
├── pipline/
│   └── run_amazon.py          # Main Execution Script (Entry Point)
├── scrapers/
│   └── amazon/
│       ├── product_link_scraper.py # Stage 1: Collect product URLs
│       └── product_info_scraper.py # Stage 2: Extract details from URLs
├── utils/
│   ├── browser.py             # Browser driver factory
│   ├── parser.py              # HTML Parsing logic
│   ├── server.py              # Proxy/Server configurations
│   └── logger.py              # Custom logging setup
├── requirements.txt           # Project dependencies
└── README.md

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors