Different ways to scrape data from a website without getting caught

  1. We have to understand that the site that we scrape has other purposes too. They are meant to serve customers. So when we scrape data at a higher speed, the server gets busy serving ourselves than the customers it was intended for. To solve the above problem sites like Amazon have bot detection algorithms. And your IP is temporarily blocked.
  1. It is advisable to scrape at a lower rate.
  2. Always try to act like a user. Try using login in middle, add headers in your requests. So basically headers will have information regarding the session, what kind of browser is being used and many more. So try rotating your headers( particularly the browser, use chrome, firefox,..).
  3. Try one or more methods to scrape. You can use python (can be any lang) request based code to scrape for some time, a selenium based web scraper randomly. If possible buy VPN’s or try free VPN's to scrape.
  4. Always try not to expose your IP.
  1. Use Cloud servers to scrape than your local machine. Eg: use AWS ec2, whenever you get blocked try stopping and restarting. By doing so your IP gets restarted.
  2. You can also try AWS Lambda to scrape. IP’s in Lamda too get changed, but not so frequently. For more details see my previous story..https://medium.com/@reena.m4444/web-scraping-with-aws-lambda-961de86d8433
  3. Try adding free VPN's in selenium and then scrape data such that your IP is not exposed.
  4. If not above, use Proxy servers. There are many VPN providers who support web scraping.




Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
reena .m

reena .m

More from Medium

Step-By-Step Instructions For Creating A WordPress Blog Running on An EC2 Instance in AWS

5. AWS CodeDeploy

Deep Insights into Low-code Development Tools

Are you spending more money for your EC2 instance?