At Bolster, we scan millions of web pages every day. We look for similarities with the known legitimate brand's webpage DOMs, use Natural Language Processing for comparing rendered text, and many other tricks.
When you scan web pages on such a large scale sometimes you end up finding things that you never anticipated in the first place. In many investigations, we saw webpage DOMs containing different API keys and email credentials. API keys varied from high severity, Twitter consumer, access tokens, AWS access, and secret tokens to medium severity slack webhook URLs, google maps API keys, and many low severity API tokens.
These mishaps can happen when frontend HTML isn't reviewed for secrets before publishing. In some instances like google maps API, the key is supposed to be embedded in HTML by design, but it needs to be limited from the admin dashboard for the referrers and origin.
For this blog, we decided to dive into the scale of this problem
We took 1.5 million random DOMs and scanned them for popular API key regexes such as Google Cloud API, AWS API & Access, Mailchimp, Mailgun, Telegram, Stripe, Twilio, and many more.
A significant chunk of the found google maps API keys didn't had any limiting policies set up. So anybody can make queries using those API keys. If we scan found google maps key using the gmapsapiscanner we can get a list of endpoints on which the particular API key works and what the pricing for those endpoints is.
Such exposed API keys can be abused by malicious actors to wreck up unexpected billing costs if keys don't have any rate limit or usage cap on them. Also, if attacker consume the entire queries limit quota and the billing isn't configured properly then attacker can cause a Denial of Service attack. Since quota is consumed all the newly made API queries by apps will fail for the users.
We did not test all of the found keys for their working status. It should be assumed that a certain percentage of keys might have already been revoked and are unusable.
Using Internet Search Engines
Shodan & ZoomEye
Other approaches for finding API keys passively can include using Internet Search Engines like Shodan & Zoomeye that scan the IP addresses for running services. In case the server is running HTTP service then these services also take a snapshot of the webpage DOM and allow us to search through those.
- For searching DOM content scanned by Shodan you can use the following filter
- On Zoomeye you can search the pattern or initial words of the API key directly without any filter. Zoomeye auto searches the search terms in the scanned DOMs.
For this example, we searched for Slack webhook URL patterns that can be used to send messages into the internal slack channels. A post request has to be made to the URL to send the text message to the slack channel.
There are also dom search engines like publicwww.com. You can search for keywords or regex patterns in the DOMs using publicwww. We were able to find many slack webhooks URLs using publicwww.
Internet Archive Services
Internet archive services like Archive.org's Wayback Machine also take snapshots of webpages' DOMs routinely. If an API key or some other secret was embedded in the DOM in past then those can still be found using internet archive services. That's why exposed secrets & API keys should not only be removed but also revoked because you can never be sure which services have taken snapshots of the DOM.
- Scan/Check the webpage DOMs before publishing to avoid such mishaps. Truffle Hog is a great tool to automate the scanning of various popular API keys in source code.
- If you find any embedded key in the publicly accessible DOM of your site, then you should not only remove it from DOM but also revoke it. The keys might have also got cached on the Internet Archive service, or other similar platforms.