Shashi and I started Bolster in 2017. Our first product was an AI-powered web-based phishing and fraud detection service. We complemented our detection engine with an automated takedown service. This way, we can detect and takedown phishing and fraud without human intervention.
Since we started selling our web-based brand protection product with automated detection and takedown services, I got the opportunity to sit on multiple prospect calls with security and legal teams trying to protect their brand against such external attacks. Most of them asked us, 'What is your platform's takedown time for the phishing and fraud you detect?'. While it is an important question, I'm often surprised why they do not ask the question that matters the most 'What is your time to detection?'.
This blog is neither about how our technology can automatically detect phishing webpages nor how it can automatically take them down. It is about the latter question on why time to detection is much more important than the former on takedown time.
Identify the real problem
In this blog, I will refer to a paper called "Sunrise to Sunset: Analyzing the End-to-end Life Cycle and Effectiveness of Phishing Attacks at Scale" written by researchers from ASU, PayPal, Google, and Samsung. They analyzed over 400,000 phishing URLs and recorded their traffic (both victim and non-victim).
The authors highlight that it takes nearly 24 hours for a phishing URL to be detected and listed as a malicious website. They also mention that 50% of the victims fall prey to a phishing attack within 24 hours (before it is detected). The number rises to 75% in 25 hours. The number of victim and non-victim visitors dies down after the 25th hour of the phishing attack.
Note: A phishing URL going live and a domain's registration date is not the same. An attacker can register a domain a year before they start hosting phishing on it.
To summarize, if you are from a security or a legal team trying to protect your brand's customers from online phishing and do not detect and takedown (or at least add a blocker to) a phishing website within 24 - 30 hours of it going live - the damage is already done.
However, using threat intel feeds alone to detect phishing targeting a brand and acting on them takes longer than 24 hours. In most cases, threat intel feeds do not have URLs until 24 hours or later. Sometimes these URLs never make it into a feed because the classification process is either regex-based or involves human intervention. Let's look into one such threat intel feed/ community, PhishTank. Once we submit a URL on the PhishTank platform, its users manually review the URLs and identify phishing and fraud. There is a voting for each URL, and if it gets sufficient negative votes, the system marks it as a phish. One can only imagine how long this process takes to classify a URL as a phish. Almost all the security threat intel vendors across the globe use PhishTank's feed. As per statistics published by PhishTank, even after the classification process, about 50% of the phishing URLs never make it to the feed. Although not all submissions on the platform are phishing pages, from our analysis, we observed a large percentage of phishing URLs never get classified.
Let's look into an example of a phishing website targeting MetaMask users - cinsobo[.]net. Below is the screenshot of the website. You can find more details of the website here.
PhishTank does not know about it yet (scan on 25th April 2022)
Google safebrowsing does block it (scan on 25th April 2022)
Also, it's neither detected nor blocked by any of the 67 vendors on VirusTotal (scan on 25th April 2022)
The problem with the above results is that the phishing page has not yet gone through the cycle of human detection. And here is where the right question matters. Are we detecting a phishing page fast enough? Would it be okay to detect a phishing page 10 days after it goes live but gets taken down within 24 hours once detected?
The real problem here is real-time detection. Taking down a phishing page comes much later in the process.
How do we solve this problem
The volume and variety of these large-scale phishing attacks will keep increasing. Waiting for a human reviewer to identify phishing pages before working on taking them down is too slow. I believe that the only way we can solve this problem is by leveraging technology and automation to replace the human intervention layer. We can identify phishing pages earlier by combining deep learning models with browser automation technology for URL sandboxing. Building an automated solution to do this is not easy, and hence most brands rely on vendors to do it for them.
If you are a security team or a legal team working to onboard a new vendor for online brand protection (online phishing and fraud detection), ask the following questions:
- How do they identify phishing and fraudulent web pages?
- What are their sources of URL data?
- Do they monitor typosquatting variants of a given domain continually?
- Do they monitor domains that are recently registered?
- Do they monitor new subdomains on suspicious TLDs and IPs?
- Do they monitor search engines, app stores, and social media for fraudulent activity against the brand?
- Can they monitor your referral logs? (Many phishing kits use assets like logos and favicons from the brand's legitimate website)
- And finally: How long does it take for them to identify a live phishing/ fraudulent site once they see the phishing URL from the sources we discussed above?
I understand that there are multiple approaches to solving a problem. We at Bolster believe in innovation. If you are looking for someone to help with detecting and taking down a phishing campaign targeting your brand on the web, social media, app stores, marketplaces, or dark web, please feel free to reach out to us.
Sunrise to Sunset: https://www.usenix.org/system/files/sec20fall_oest_prepub.pdf