Skip to content

Latest commit

 

History

History
27 lines (19 loc) · 514 Bytes

File metadata and controls

27 lines (19 loc) · 514 Bytes

Crawler

The crawler for WaveGenAI

Setup

  1. Install the required packages
pip install -r requirements.txt
  1. Install docker

Usage

Run the proxy

docker run -d --rm -it -p 3128:3128 -p 4444:4444 -e "TOR_INSTANCES=40" zhaowde/rotating-tor-http-proxy

Run the crawler,

python main.py --csv --input src_data.txt --overwrite --file_name FILE.csv --num_processes 40

License

This project is licensed under the MIT License - see the LICENSE file for details