Puppeteer proxy - how to use a proxy in Puppeteer | GeoSurf

How to use a proxy in Puppeteer

//
Blog

How to use a proxy in Puppeteer

Posted at September 23, 2020 in Proxy 101, Web scraping

Puppeteer is a Node library which controls Chrome or Chromium. It offers a high-level API for web scraping in Node.js.

This article will explain the steps needed to use a proxy in Puppeteer. But, first, let’s understand more about Puppeteer and Proxies.

WHAT CAN YOU DO WITH A PUPPETEER?

Here are some of the things that you can do with a Puppeteer:

  • Produce screenshots and PDFs of pages
  • Generate pre-rendered content by crawling a SPA (Single-Page Application)
  • Perform form submission automation, UI testing, keyboard input, and a variety of similar tasks
  • Create an up-to-date, automated testing environment for running your Chrome (recent version) tests using the latest JavaScript and browser features
  • Perform runtime analysis for detecting performance issues
  • Test a variety of Chrome Extensions

WHAT IS A PROXY SERVER?

A proxy server acts as a gateway between the client browser and the internet.

In simple words, you browse the internet via a proxy instead of directly requesting the resources from your visit websites. Your browser forwards a request to the proxy server, and the proxy sends that request to the website. The website returns the information to the proxy, and the proxy sends back the information to you.

The primary benefit of using a proxy server to browse the web is that your IP address remains hidden, and websites can’t trace the request’s origin.

WHY DO YOU NEED A PROXY SERVER WITH A PUPPETEER?

You might need to use a proxy with a Puppeteer for any of the following reasons:

  • When you wish to hide your origin access location
  • To open a website that has geographical restrictions
  • When you need to carry out specific tasks anonymously
  • To speed up the common requests
Reasons to use a proxy with a Puppeteer

WHAT IS A PUPPETEER PROXY?

A Puppeteer Proxy handles proxy requests and offers HTTPs support, error, and cookie handling. You only need to specify your main proxy to start testing different URLs without using other proxies for different URLs.

HOW TO USE A 3RD PARTY PROXY IN PUPPETEER?

Puppeteer supports the use of external proxies. Here is the step by step process to use a proxy in Puppeteer:

STEP 1: SPECIFY YOUR PROXY

The first step is specifying the address of your proxy and telling Chrome where your proxy is located. Here is the command line code to process this request:

?–proxy-server=https://YOUR-PROXY-SERVER-DOMAIN:PORT

If your proxy doesn’t require a password, you don’t need to proceed to step two. However, if your proxy requires a password for login, then you need to enable automatic login using the second step, as discussed below.

STEP 2: AUTHENTICATE YOUR PROXY TO ALLOW PUPPETEER TO USE IT

The next step is authenticating your proxy because most of the proxies will require you to login using your user name and password.

There are three ways to achieve this.

First method: The page.authenticate method for network requests

Under this method, you directly pass the values to automatically login to the proxy interface during network requests.

await page.authenticate({
    username: 'mike',
    password: 'puppeteer-demo'
});

Similarly, you can use the page. Authenticate method to use the following fields directly in the POST JSON body.

{
    authenticate: {
        username: 'mike',
        password: 'puppeteer-demo'
    }
}

Second Method: The page.setExtraHTTPHeaders method for sending extra authentication information

You can use the below code to send new HTTP headers using the page.setExtraHTTPHeaders:

await page.setExtraHTTPHeaders({
    'Proxy-Authorization': 'Basic username:password',

    // OR

    Authorization: 'Basic username:password',

});

Please note: Remember to base64 encode your username:password!

PUPPETEER PROXY ABSTRACT FOR HTTPS PROXY HANDLING

Additionally, you might also require HTTPS support to handle the proxy. The Puppeteer Proxy package will help you to do that.

Here is the code to use it:

import puppeteer from 'puppeteer';
import {
  createPageProxy
} from 'puppeteer-proxy';

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  const pageProxy = createPageProxy({
    page
  });
  await page.setRequestInterception(true);
  page.once('request', async (request) => {
    await pageProxy.proxyRequest({
      request,
      proxyUrl: 'http://127.1.0.1:3000',
    });
  });
  await page.goto('http://demo.com');
})();

FINAL THOUGHTS

Using a proxy with Puppeteer is easy when you know the exact steps. Follow the steps and use the code discussed in this article to start using a proxy of your choice in Puppeteer. Remember to test run the settings before you start your work because this will save you time and allow you to complete your dev work hassle-free.