Deciphering Webpage DOMs: 5 Steps to Protect Businesses

Contents hide

1 What Our Research Reveals About Webpage DOMs

The dangerous impact of typosquat domains, and how frequently bad actors are mimicking legitimate brands is not an unknown problem for security leaders. It’s critical for security and IT teams to properly scan and monitor for brand threats that live across the internet, using machine learning technology to identify domains with similarities to the known legitimate brand’s webpage DOMs, Natural Language Processing for comparing rendered text, and many other tricks.

When you use a domain monitoring technology to scan web pages on such a large scale, sometimes you end up finding things that you never anticipated in the first place. In many investigations at Bolster, we saw webpage DOMs containing different API keys and email credentials. API keys varied from high severity, Twitter consumer, access tokens, AWS access, and secret tokens to medium severity slack webhook URLs, google maps API keys, and many low severity API tokens.

These mishaps can happen when frontend HTML isn’t reviewed for secrets before publishing. In some instances, like google maps API, the key is supposed to be embedded in HTML by design, but it needs to be limited from the admin dashboard for the referrers and origin.

In this pieces we will dive into the scale of this problem, and how understanding webpage DOMs can help arm your security team against internet threats.

What Our Research Reveals About Webpage DOMs

We took 1.5 million random DOMs and scanned them for popular API key regexes such as Google Cloud API, AWS API & Access, Mailchimp, Mailgun, Telegram, Stripe, Twilio, and many more.

The results are shown below:

Found key types from 1.5 million webpage DOM scans

A significant chunk of the found google maps API keys didn’t have any limiting policies set up. So anybody can make queries using those API keys. If we scan found google maps key using the gmapsapiscanner, we get a list of endpoints on which the particular API key works and what the pricing for those endpoints is.

Such exposed API keys can be abused by malicious actors to wreck up unexpected billing costs if keys don’t have any rate limit or usage cap on them. Also, if the attacker consumes the entire queries limit quota, and the billing isn’t configured properly, then attacker can cause a Denial of Service attack. Since the quota is consumed, all the newly made API queries by apps will fail for the users.

Example cost for one of the found vulnerable Google Maps API key for different endpoints searches

We did not test all of the found keys for their working status. It should be assumed that a certain percentage of keys might have already been revoked and are unusable.

Using Internet Search Engines

Shodan & ZoomEye

Other approaches for finding API keys passively can include using internet search engines like Shodan & Zoomeye that scan the IP addresses for running services. In case the server is running HTTP service, then these services also take a snapshot of the webpage DOM and allow us to search through those.

For searching DOM content scanned by Shodan you can use the following filter http.html:hooks.slack.com/services
On Zoomeye you can search the pattern or initial words of the API key directly without any filter. Zoomeye auto searches the search terms in the scanned DOMs.

For this example, we searched for Slack webhook URL patterns that can be used to send messages into the internal slack channels. A post request has to be made to the URL to send the text message to the slack channel.

Almost 2800 results on Zoomeye for DOMs that contain slack webhook

PublicWWW.com

There are also webpage DOM search engines like publicwww.com. You can search for keywords or regex patterns in the DOMs using publicwww. We were able to find many slack webhooks URLs using publicwww.

Internet archive services

Internet archive services like Archive.org’s Wayback Machine also take snapshots of webpage DOMs routinely. If an API key or some other secret was embedded in the webpage DOM in past, then those can still be found using internet archive services. That’s why exposed secrets & API keys should not only be removed but also revoked because you can never be sure which services have taken snapshots of the DOM.

Preventing Cybersecurity Threats

It’s important to be on top of every potential cybersecurity monitoring and insights tool to prevent bad actors from gaining access to your sensitive information or company network. Ensuring your webpage DOMs aren’t posing any additional threats to your organization is one step mature cybersecurity teams take to protect their assets.

Scan and check the webpage DOMs before publishing to avoid mishaps. Truffle Hog is a great tool to automate the scanning of various popular API keys in source code.

If a key has to be embedded in webpage DOM, or in the JavaScript, then set limiting policies such as verifying referrer, origin, IP address, and set rate limits accordingly. And make sure it isn’t enabled to perform any sensitive actions.

If you find any embedded key in the publicly accessible DOM of your site, then you should not only remove it from DOM but also revoke it. The keys might have also got cached on the Internet Archive service, or other similar platforms.

If you’re interested in how a digital risk protection provider can help facilitate internet scanning and safeguarding your brand, checkout Bolster’s suite of risk protection capabilities.