Proxy for scraping : which type to choose ?

Datacenter, residential, mobile, shared, exclusive, rotating, static… let’s understand the different types of proxies

stabler.tech
8 min readJan 24, 2023
Photo by Ilya Pavlov on Unsplash

As soon as you will start doing web scraping, you’ll realize that you will need to buy proxies to plug to your scraping bots.

At Stabler, we build and sell a low code SaaS that allows you to configure bots for web scraping. You will have to buy your own proxies and plug them to your Stabler account.

We will see in this article what are the different types of proxies and how to select the right one depending on your use case and budget !

Stabler can be plugged to any proxy provider on the market !

We offer a FREE trial if you want to test and learn web scraping, we have dedicated tutorials and learning materials :
https://app.stabler.io/login?signup=open

stablerSOLO website

3 reasons why you should use a proxy

1. Geo-targeting 🌎

Select the geographic location you want to scrap !

If you want to scrap a website where your geolocation is very important and impacts the data you want to extract, you will need to use a proxy that supports geo-targeting.

Imagine you want to scrap a UK based website or the Italian version of Google Shopping 🛍, having an IP address from these countries is almost mandatory to have a credible fingerprint when you will be scraping the website.

In fact, there are some websites that automatically redirect you to your local e-store depending on your IP address location and does not even allow you to change it 🥵

Photo by henry perks on Unsplash

2. IP rotation ♾️

To by-pass ip rate limiting

Many websites have an IP rate limiting D-DOS protection. It means you cannot send as many requests from a given IP address as you want during a given time period.

Companies use this protection to avoid being D-DOSed and protect themselves from any kind of abusive behavior.

You will generally receive a HTTP 429 Too Many Requests error

An efficient way to not trigger this alert is to use a rotating proxy where each request / session / connection will use a new IP address.

Your ability to avoid rate limiting will directly depend on the size of the number of IPs in the proxy pool. The more IPs you have, the lower is your probability to be blocked by the website.

3. Appear like a real user 🧑‍💻 and not like a bot 🤖

If I do not want to be a bot, I will not be a bot !

If your scraping bot is using a proxy that is hosted in a datacenter, the website will easily identify it as a bot.

Real users do not (generally) use datacenter IPs 😉

We call this the IP quality !

That’s why today, you have many proxy providers that are selling Residential and Mobile proxies. Residential proxies channel traffic through “normal” IPs and the Mobile proxies use the phone network to access to the internet.

Photo by Kirill Sh on Unsplash

Mainstream : Datacenter proxies

Cheap, medium IP pool, easily blocked

Datacenter proxies are hosted in different data centers all over the world. They are pretty easy to build as you subscribe to a cloud provider (AWS, GCP, OVH, …) , pop up a virtual machine and install an open source proxy like Squid.

Here is a docker image for Squid with username and password auth: https://github.com/anisgandoura/squid-proxy-basic-auth

You plug a load balancer in front of all of this fleet and you have a scalable and distributed rotating proxy 🦾.

Of course, maintaining the fleet and upgrading it can be a pain but some companies build their own data center proxy fleet to reduce operating costs and to ensure they are not dependent on a given provider.

If you want to go down this road, I recommend you to check this open source project which allows you to build your multi-cloud datacenter proxy fleet: https://scrapoxy.io/

Scrapoxy mecanics

Geo-location can achieved here by hosting your proxies in different AWS, GCP or OVH regions for example.

These proxies are also fast as they are hosted in professional and performant data centers.

Pros:

  • Cheap to build / to buy 💸
  • Medium sized IP pools (from hundreds to tens of thousands for some providers) ♾️ ♾️
  • Fast 🚀🚀🚀

Nevertheless, these proxies can be easily blocked. IP ranges allocated to cloud providers are public and static. Anti-bot protection companies can bloc these IP ranges without a big risk on the website legit traffic.

Cons:

  • Can be blocked easily ✋

Premium : Residential proxies

Expensive, very large IP pool, rarely blocked

There are multiple proxy providers that are selling residential proxies. These proxies are composed of IP addresses of real users sharing their internet home connection.

These IP addresses are identified as residential IPs as they are included in the IP ranges given to the Internet Service Providers to enable their clients to connect to the internet.

Thus, when you scrape a website using a residential proxy, your scraping bot is displaying a legit IP address to the website and will appear more like a human user than a bot, which reduces its chances to be blocked.

Photo by Tierra Mallorca on Unsplash

The second big advantage of residential proxies is that they are composed of large and worldwide IP pools. The pools can be so large that every request can be channeled through a unique IP address ! (which is the perfect scenario to avoid rate-limiting)

Pros:

  • Rarely blocked (good fingerprint)
  • Very large IP pools ♾️ ♾️ ♾️ (up to hundred of thousands IPs for some providers)
  • Credible geo-targeting 🌎

Nevertheless, these proxies are very expensive ! They are generally priced by GB of bandwidth and using an automated Chromium to scrap thousands of pages can ruin you 💸

In addition to that, they are significantly slower than datacenter proxies as they rely on a home / non-professional internet connection.

Cons:

  • Very expensive 💸 💸
  • Slower connection

Premium ^ 2 : Mobile proxies

Even more expensive, large IP pool, nearly never blocked

Mobile proxies are like residential proxies but instead of sharing their home internet connection, users share their mobile internet connection ! 🤳

Like residential proxies, the IP address you will present to the scraped website will be very legit as there are not that many bots using the cellular network to scrap websites !

But the main advantage of Mobile proxies is that the scraped website will rarely block your IP address.

Photo by Francesco on Unsplash

In fact, the cellular network is built on a way that users sharing the same geographic location will have the same IP address. This means, that in dense areas, where many people are connected to the same cellular relay, they will all have the same IP address.

Thus, if the website blocks your IP, it will also block all the users sharing your IP address which can lead to a serious hit to their internet traffic.

Pros:

  • Nearly never blocked (ok ! never say never ^^)
  • Large IP pools ♾️ ♾️
  • Credible geo-targeting 🌎

Nevertheless, these proxies are extremely expensive ! Using an automated Chromium to scrap thousands of pages is not a viable option 💸 💸 💸

Cons:

  • Extremely expensive 💸 💸 💸
  • Slow connection

Fast & Premium : ISP Proxies

Residential proxies hosted in data centers !

Besides serving its customers, Internet Service Providers can own data centers and host their own servers.

These servers will then fall in the same IP ranges than the ones they provide for their customers.

These proxies are called ISP (Internet Service Providers) proxies as they are, like data center proxies, hosted on professional servers and thus benefit from a high speed and stable internet connection.

Thus, ISP proxies, combine the best of the two worlds: the legimicy of residential IPs and the speed of data center proxies !

Pros:

  • Fast connections
  • Rarely blocked (good fingerprint)
  • Large IP pools ♾️ ♾️
  • Credible geo-targeting 🌎

Cons:

  • Very expensive 💸 💸 (same as residential)

Shared vs dedicated proxies

Decrease the probability of being blocked with dedicated !

Shared proxies are proxies that are used by all the customers of a given proxy provider.

This means, if you are scraping Amazon for example, you can have other customers that are also scraping Amazon using the same proxies as you at the same time.

Photo by Morgane Perraud on Unsplash

This will inevitably lead to a higher probability of being blocked by the website as he is observing a high load from this IPs.

Dedicated proxies ensure that the website you are scraping is not being scraped by other customers at the same time. That’s why, when you buy dedicated proxies, you will be asked to provide the list of domains you want to scrap.

Generally, the pricing of these proxies depend on the number of websites you want to have dedicated access to.

In summary:

  • Shared proxies are cheaper than dedicated ones 💸
  • Dedicated proxies decrease your probability of being blocked ✋

Rotating vs static proxies

Rotate the proxy for each request or keep the same IP?

Rotating proxies allow you to change your IP address for each request where static proxies maintain the same pool of proxies or even the same IP address.

If you are doing “stateless” scraping where the content of the scraped page does not depend on your historical browsing history, it is preferred to use rotating proxies as this will allow you to increase your IPs pool and decrease the probability of being blocked.

On the other hand, if you are doing “stateful” scraping like browsing under a connected user session, it is recommended to use static proxies.

Some providers offer a kind of mixed offer where you have a pool of static proxies that you can renew when you want or on a daily (or weekly, …) basis.

To summarize !

As you see, selecting the right proxy type depends on the website you are scraping and the use case in place.

These are the elements to check to make the right choice:

  • Does the website have a rate limiting / anti-bots protection ?
  • What is your budget ?
  • Do you need Geo-targeting or not ?
  • Do you need of fast connection ?

Stabler can use any kind of proxy !

Choose it wisely :)

You can add any proxy provider to your Stabler account, you will need to provide these information:

  • Hostname
  • Port
  • Username
  • Password
stabler.tech Proxy farm

Check our Pricing for the SOLO product line

We offer a FREE trial if you want to test !
https://app.stabler.io/login?signup=open

Author:
Anis Gandoura — CEO & VP of Engineering of Stabler

Twitter: https://twitter.com/anis_gandoura

--

--