[Part 2] - Understanding/Evaluating Web Scraping, Crawling, and Automation with NodeJS Libraries

This article is part of Open-Source Bolster Engineering and Research aimed at evaluating the performance and working of various libraries available in NodeJS for Web Scraping, Crawling, and Automation. In part 2 of this blog, we will talk about how to use Puppeteer for web automation. Stay tuned for further articles on this topic.

Prerequisite

Please read part 1 of this blog series, Understanding Web Scrapping, Crawling, and Automation with NodeJS.

What is Web Automation?

Web Automation is the process of using lines of code to automate certain web functions. Web automation can be used to automate tasks like checking if: links in pages are functional, pages load properly, forms are receiving data, and more.


Simple Use Case on Web Automation

Here are two use cases to help us understand what Web Automation is better:

Case 1: Let’s say you have a website that involves highly sensitive customer support. In each next build, you want to validate that the customer support page, chat, and form is working properly. You can attach test cases to your development pipeline so that in each production deployment it is pre-validated.

Case 2: You have a great product-selling website that includes payment. You will want to make sure that payment is working properly all the time. Payment can include multiple methods. You can create a test execution on certain intervals (per requirement) so that your customer never has to face a problem during checkout for all these payment methods.

Puppeteer for Web Automation

Puppeteer is a great tool for Web Automation as it provides different features. Some of these are:

  • Generate screenshots and PDFs of pages.
  • Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e., "SSR" (Server-Side Rendering)).
  • Automate form submission, UI testing, keyboard input, etc.
  • Create an up-to-date, automated testing environment. Run your tests directly in the latest version of Chrome using the latest JavaScript and browser features.
  • Capture a timeline trace of your site to help diagnose performance issues.
  • Test Chrome Extensions.

What are we going to do today with Puppeteer?

In this part of the article, we are going to see how to use Puppeteer to:

  • validate links and page(s)
  • fill out and submit a form

We will need Selectors during the code execution. So, let’s learn how to find our selectors easily.

  1. Open any URL in your browser and let the page load. Here we have loaded https://bolster.ai

2. Right-click on the element you want to find the Selector of. Here we have right clicked on Blog

3. Click on Inspect Element (Additional section will open in your browser)

4. Right-click on the text element displayed on the Elements tab

5. Under Copy click on Copy Selector

Starting with Puppeteer


a) Click or Navigate: The below code helps to visit the Bolster website and navigate to the resource center

b) Fill out the form: The below code helps to fill out the form on any website. Make sure you fill out website details correctly

We can see that from the above examples we can automate click events or fill out the form with pre-written scripts.

At Bolster, we use such tools to find illegitimate websites and take them down.

About Us

Bolster is the only automated digital risk protection platform in the world that detects, analyses, and takes down fraudulent sites and content across the web, social media, app stores, marketplaces, and the dark web.

Interested in learning more about Bolster's solutions? Request a demo here.

If you are interested in advanced cybersecurity research and working with cutting-edge AI, come work with us at Bolster. Check out open positions here.