Puppeteer is a Node library which controls Chrome or Chromium. It offers a high-level API for web scraping in Node.js.
This article will explain the steps needed to use a proxy in Puppeteer. But, first, let’s understand more about Puppeteer and Proxies.
Here are some of the things that you can do with a Puppeteer:
A proxy server acts as a gateway between the client browser and the internet.
In simple words, you browse the internet via a proxy instead of directly requesting the resources from your visit websites. Your browser forwards a request to the proxy server, and the proxy sends that request to the website. The website returns the information to the proxy, and the proxy sends back the information to you.
The primary benefit of using a proxy server to browse the web is that your IP address remains hidden, and websites can’t trace the request’s origin.
You might need to use a proxy with a Puppeteer for any of the following reasons:
A Puppeteer Proxy handles proxy requests and offers HTTPs support, error, and cookie handling. You only need to specify your main proxy to start testing different URLs without using other proxies for different URLs.
Puppeteer supports the use of external proxies. Here is the step by step process to use a proxy in Puppeteer:
The first step is specifying the address of your proxy and telling Chrome where your proxy is located. Here is the command line code to process this request:
?–proxy-server=https://YOUR-PROXY-SERVER-DOMAIN:PORT
If your proxy doesn’t require a password, you don’t need to proceed to step two. However, if your proxy requires a password for login, then you need to enable automatic login using the second step, as discussed below.
The next step is authenticating your proxy because most of the proxies will require you to login using your user name and password.
There are three ways to achieve this.
First method: The page.authenticate method for network requests
Under this method, you directly pass the values to automatically login to the proxy interface during network requests.
await page.authenticate({
username: 'mike',
password: 'puppeteer-demo'
});
Similarly, you can use the page. Authenticate method to use the following fields directly in the POST JSON body.
{
authenticate: {
username: 'mike',
password: 'puppeteer-demo'
}
}
Second Method: The page.setExtraHTTPHeaders method for sending extra authentication information
You can use the below code to send new HTTP headers using the page.setExtraHTTPHeaders:
await page.setExtraHTTPHeaders({
'Proxy-Authorization': 'Basic username:password',
// OR
Authorization: 'Basic username:password',
});
Please note: Remember to base64 encode your username:password!
Additionally, you might also require HTTPS support to handle the proxy. The Puppeteer Proxy package will help you to do that.
Here is the code to use it:
import puppeteer from 'puppeteer';
import {
createPageProxy
} from 'puppeteer-proxy';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const pageProxy = createPageProxy({
page
});
await page.setRequestInterception(true);
page.once('request', async (request) => {
await pageProxy.proxyRequest({
request,
proxyUrl: 'http://127.1.0.1:3000',
});
});
await page.goto('http://demo.com');
})();
Using a proxy with Puppeteer is easy when you know the exact steps. Follow the steps and use the code discussed in this article to start using a proxy of your choice in Puppeteer. Remember to test run the settings before you start your work because this will save you time and allow you to complete your dev work hassle-free.