donker proxyscrape logo

How To Use A Proxy In Python

Hoe doe je dat?, Proxies, Python, 02-02-20225 min gelezen

We often come across the term ‘proxy‘ when we are working in the computer science field. When connected to the Internet, every computer gets a unique Internet Protocol (IP) address that identifies the computer and its geographic location. Your computer sends out a request whenever it needs any information from the Internet. The request is

Inhoudsopgave

We often come across the term ‘proxy‘ when we are working in the computer science field. When connected to the Internet, every computer gets a unique Internet Protocol (IP) address that identifies the computer and its geographic location. Your computer sends out a request whenever it needs any information from the Internet. The request is sent to a target computer that checks the type of information being asked for. The target computer sends the information back if it is allowed to give it to our IP address. At times, the computer wants to get the information from the Internet without being identified. That information is usually blocked, but we can get it using a proxy that acts as an intermediary between the client and the server machine.

The clients usually use the proxy server to browse web pages and request resources anonymously as it acts as an identification field between the client computer and the Internet. 

Proxy servers have become quite popular with the growing concern of online security and data theft. Here the question arises how the proxy server is connected to the security of our system? We can say that a proxy server adds an additional security level between our server and the external world. This extra security helps in saving our system from a breach. 

How To Use A Proxy In Python?

Om proxies te gebruiken met de Python-verzoeken, moet je de onderstaande stappen volgen.

Verzoeken importeren

Importeer het requests pakket dat een eenvoudige HTTP bibliotheek is. Via dit pakket kun je eenvoudig verzoeken versturen zonder handmatig query strings aan je URL's toe te voegen. Je kunt requests importeren met het onderstaande commando.

importverzoeken

Een woordenboek maken

Je moet een proxies woordenboek maken dat de HTTP- en HTTPS-verbindingen definieert. Je kunt de woordenboekvariabele een naam geven zoals "proxies" geven die een protocol aan de URL proxy koppelt. Verder moet je de URL-variabele instellen op de website waarvan je wilt scrapen.

proxies = {
  "http":'http://203.190.46.62:8080',
  "https":'https://111.68.26.237:8080'
}
url = 'https://httpbin.org/ip'

Hier definieert het woordenboek de URL van de proxy voor twee afzonderlijke protocollen, namelijk HTTP en HTTPS.

Een responsvariabele maken

Je moet een antwoordvariabele maken die een van de requests-methoden gebruikt. Deze methode neemt twee argumenten:

  • De URL die u hebt gemaakt
  • Het woordenboek dat u hebt gedefinieerd
antwoord = requests.get(url,proxies = proxies)
print(response.json())

De uitvoer is als volgt:

Requests Methods

There are a number of requests methods like:

  • GET – It retrieves information from a given server using a given URL. 
  • POST – This method requests that the given web server accepts the enclosed data in the body of the request message to store it.
  • PUT – It requests that the enclosed data gets stored under the given URL.
  • DELETE – This method sends a DELETE request to the given URL.
  • PATCH – This request method is supported by the HTTP protocol and makes partial changes to an existing resource. 
  • HEAD – It sends a HEAD request to the given URL when you do not need the file content and only want the HTTP headers or the status_code.

You can use the below syntax of the requests methods when the URL is specified. Here, our URL is the same as we used in the above code i-e., https://httpbin.org/ip.

response = requests.get(url)
response = requests.post(url, data={"a": 1, "b": 2})
response = requests.put(url)
response = requests.delete(url)
response = requests.patch(url)
response = requests.head(url)
response = requests.options(url)

Proxy Sessions

If you want to scrape the data from websites that utilize sessions, you can follow the steps given below.

Step#01

Import the requests library.

importverzoeken

Step#02

Create a session object by creating a session variable and setting it to the requests Session() method. 

session = requests.Session()

session.proxies = {
   'http': 'http://10.10.10.10:8000',
   'https': 'http://10.10.10.10:8000',
}

url = 'http://mywebsite.com/example'

Step#03

Send the session proxies through the requests method and pass the URL as an argument.

response = session.get(url)

Main Types Of Proxies

Let’s discuss the two essential types of proxies, i-e;

  1. Static Proxies
  2. Roterend Proxies

Static Proxies

We can define static proxies as the datacenter Internet Protocols assigned via an Internet Service Provider (ISP) contract. They are designed to remain connected to one proxy server for a set amount of time. The name “static” implies that it allows us to operate as a residential user with the same IP for as long as required. 

In short, with the use of static proxies, we get the speed of datacenter proxies and the high anonymity of residential proxies. Furthermore, a static proxy allows us to avoid IP address rotation, making its use significantly simpler.

The static IP services are not created by using virtual machines, unlike regular datacenter proxies. These proxies, also known as sticky IP addresses, look like genuine consumers to almost all websites. 

Roterend Proxies

We can define proxy rotation as a feature that changes our IP address with every new request we send.

When we visit a website, we send a request that shows a destination server a lot of data, including our IP address. For instance, when we gather data using a scraper( for generating leads), we send many such requests. So, the destination server gets suspicious and bans it when most requests come from the same IP. 

Therefore, there must be a solution to change our IP address with each request we send. That solution is a rotating proxy. So, to avoid the needless hassle of getting a scraper for rotating IPs in web scraping, we can get rotating proxies and let our provider take care of the rotation.

Waarom moet je Proxies gebruiken?

Following are the reasons to use various types of proxies.

  • Social media managers appreciate proxies for letting them stick to a single server. If users constantly log in to their accounts by changing IP addresses, the social media platform will get suspicious and block their profile.
  • E-commerce sites might show different data for users from other locations and returning visitors. Also, the server becomes alert if a buyer logins his account multiple times from various IP addresses. So, we have to use proxies for online shopping.
  • We need proxies for manual marketing research when a specialist wants to check the required data through a user’s eyes from one location. 
  • Ad verification allows the advertisers to check if their ads are displayed on the right websites and seen by the right audiences. The constant change of IP addresses accesses many different websites and thus verifies ads without IP blocks.
  • When accessed from specific locations, the same content can look different or may not be available. The use of the proxies allows us to access the necessary data regardless of its geo-location. 
  • We can use proxies for accessing data, speeding up the browsing speed as they have a good cache system.

Conclusie

Tot nu toe hebben we besproken dat een proxy fungeert als een relais tussen de client en de servermachine. Wanneer je informatie aanvraagt, stuurt jouw computer dit verzoek naar de proxy, die de informatie vervolgens naar de doelcomputer stuurt via een ander IP-adres. Je IP-adres blijft dus vertrouwelijk. Verder kun je proxies gebruiken met de requests module in Python en verschillende acties uitvoeren afhankelijk van je behoefte. Als je een statisch IP nodig hebt met de snelheid van datacenter proxies en de hoge anonimiteit van residentiële proxies dan is statisch proxies de juiste keuze omdat het IP-adres bij elke nieuwe aanvraag onveranderd blijft. De roterende proxies bieden daarentegen voordelen bij het testen en scrapen.