ThreatSTOP Blog

How to Work Towards Better Whitelisting

Written by John Bambenek | July 9, 2020

One of the key problems in threat intelligence is curating whitelists of infrastructure and domains that should never be blocked. Just recently, a government CERT distributed lists of IoCs that included private IP addresses that just are not useful for analysts and hunt teams. At best, it creates wasted time and effort. At worst, key infrastructure is blocked and there is business impact and/or loss of revenue.

The Uphill Battle of Accurate Whitelisting

A crude method of whitelisting is to use the Alexa, Umbrella, or Majestic lists of “most popular domains”. This is crude because popular is used as an analogy for safe. The fact is while most of those top of those lists are safe, it is possible for malicious domains to make it into those lists. For instance, some of the domains distributed in the recent Awake Security blog post on Malicious Domain Registrars include domains that are also popular.

One attempt to mitigate the possible manipulation is the creation of the Tranco List which uses various metrics to score and prevent against manipulation. There are a variety of academic research papers that have vetted the methodology. The one remaining problem of this list is that whitelisting just popular domains is not enough to prevent inadvertent blocking of these properties.

 

ThreatSTOP Security Team Observations & Research

We recently noticed in our own feedback loop and machine learning generate targets that ultimately LinkedIn.com ended up being flagged as malicious but it wasn’t because Linkedin.com was being blocked (that is obviously in our whitelist). It is because in order to safely ensure linkedin.com isn’t blocked, the whitelist is not complete.

 

As of this writing, here is the dig output for www.linkedin.com (and note, this is different than the query for linked.com without the www):

 

www.linkedin.com. 187      IN        CNAME          2-01-2c3e-005a.cdx.cedexis.net.

2-01-2c3e-005a.cdx.cedexis.net.  102 IN CNAME          www-linkedin-com.l-0005.l-msedge.net.

www-linkedin-com.l-0005.l-msedge.net. 229 IN CNAME l-0005.l-msedge.net.

l-0005.l-msedge.net.            121         IN           A             13.107.42.14

 

You’ll note that in DNS, www.linkedin.com is pointed to cedexis.net via CNAME that is then pointed to l-msedge.net. In order to safely whitelist www.linkedin.com you have to whitelist every CNAME in the chain until it gets to the IP address (which coincidently also needs to be whitelisted but that’s another story).

From the browser perspective, you only ever see linkedin.com. However, from DNS you see every request until you get to the A record and if any one is blocked, then you can’t get to LinkedIn.

 

Quality & Applicable Whitelisting Resources 

More than a few popular websites use CNAME records to point to Content Delivery Network farms so this kind of analysis is necessary to ensure your whitelists are complete. This finding has caused us to adjust several aspects of how ThreatSTOP curates our threat intelligence. The first is that in an upcoming release, when you run CheckIoC on a domain, it will list all relevant CNAME records in the chain of the query and also check those for presence in our targets. We have also adjusted how we are populating our internal whitelists to ensure that underlying CNAME records are also being properly curated out of our targets.

In the future, we’ll also be producing a feed available on Github based on the Tranco list that will include CNAMEs, IPv4, and IPv6 addresses that can be used for intelligence curation as well.

 

See, for yourself, how ThreatSTOP curates our threat intelligence to ensure that you can block threats with low false positives to protect your organization. Start a 14 day trial below.