Jump to Section
- Vulnerability Management Primer
- Vulnerability Triage Deep-Dive
- Symphonic Vulnerability Surface Mapping (SVSM)
- SVSM Using Vulnscape
Vulnerability Triage is an essential component of any Vulnerability Management (“VM”) program. I define Vulnerability Triage as the process of identifying disclosed vulnerabilities, mapping the affected products within these vulnerability disclosures to an environment inventory and then ultimately making decisions on how to address these correlated findings through subsequent analysis and prioritization. In other words, as new vulnerabilities are disclosed (i.e. as a CVE through NVD), there is a process to determine if systems in an environment are potentially affected. If so, what is the risk and what should be done about it? A high level depiction of this process is illustrated below. *The “Decision” diamond in this diagram represents how the findings are ultimately processed with respect to escalation, remediation and mitigation.
Every organization that has a VM program (and that really should be every organization) is doing some variation of this process. They may not explicitly call it “Vulnerability Triage”, but they are doing it all the same. In my experience building and running VM programs over the years I have identified a number of commonalities, pitfalls, bottlenecks, high-friction areas and other points of interest related to this process of Vulnerablity Triage. The goal of this article is to describe in detail these findings, and how we can leverage orchestration to perform enterprise-grade vulnerability triage at scale while eliminating some of the common friction points and bottlenecks I have alluded to.
A Primer on Vulnerability Management
First, let’s quickly go over the concept of Vulnerability Management (a.k.a. “VM”). VM in a nutshell is the continuous process of identifying, classifying, analyzing, prioritizing, reporting, remediating and mitigating vulnerabilities. VM is ubiquitous in enterprise environments as it is fundamental to understanding (technical) risk across the information systems that comprise an IT organization. Without VM, gaps in protection (vulnerabilities) are not identified or not properly addressed which can lead to very real consequences such as exploitation, system compromise, data loss, compliance/regulatory violations and even full-scale breach of an organizations environment.
In fact, VM is so fundamental it comes in third place (as of version 7.1) in the CIS (Center for Internet Security) top 20 “Critical Security Controls”. These 20 CIS controls collectively represent a prioritized set of actions which have been established as best practices for mitigating a large majority of attacks against systems and networks. In essence, VM is pretty crucial to enterprise security, falling only behind hardware/software inventory with respect to priority. This dependency is further illustrated below.
Before moving on let’s quickly cover the aforementioned inventory prerequisite. CIS Control 1: Hardware Inventory and CIS Control 2: Software Inventory as precursory actions are paramount to achieving effective VM. Essentially, you can’t hope to manage vulnerabilities in an environment whereby you don’t have a complete understanding of all the software and hardware assets in that setting. The common saying being, you can’t protect what you don’t know about.
Vulnerability Triage Deep-Dive
Alright, now that we have a basic understanding of vulnerability triage and how it fits within the overarching Vulnerability Management process, let’s take a closer look at the individual steps for triage. These steps are summarized as well as illustrated in the respective list and diagram below.
Vulnerability Triage Process Steps
- Step 0 ( Pre-Triage ): Build/maintain a comprehensive and accurate asset inventory
- Step 1: Ingest vulnerability data/intelligence
- Step 2: Correlate vulnerability data with asset inventory
- Step 3: Leverage metadata from vulnerability/asset data sources to perform risk analysis
- Step 4: *Prioritize findings
- Step 5 ( Post-Triage ): **Treatment of findings
*More primitive implementations of vulnerability triage may not include the prioritization step. This can be considered an optional advanced element.
**Vulnerability treatment(s) are not considered part of the vulnerability triage process. It is listed merely as a means to show it’s relationship to the other portions of the triage process.
Vulnerability Triage Process Diagram
Vulnerability Triage Levels
The goal of vulnerability triage is to make decisions on how a vulnerability should be treated. Triage can involve a relatively quick analysis of whether a vulnerability is applicable to a specific environment all the way to full in-depth analysis of a particular vulnerability and how it affects specific systems. This scale from simple to thorough can be described using the levels detailed below. Each of the levels below can be considered “vulnerability triage”, just at different depths.
- Level 1: Answers the simple question, “Is there any exposure?”. (i.e. are there vulnerabilities that affect products within an environment which do not have patches or controls which mitigate said vulnerability).
- Level 2: Does the vulnerability meet any criteria that may result in the vulnerability being particularly high or critical risk? This involves taking a cursory glance at vulnerability and asset metadata.
- Level 3: Partial risk analysis. Get a better understanding but not necessarily a full risk determination.
- Level 4: Complete risk analysis. Get a complete understanding of risk to the environment.
- Level 5: Complete risk analysis and prioritization. Get not only a complete understanding of the risk to the environment but prioritize how that finding will be addressed in the context of other findings.
Now that we have a high level picture of the vulnerability triage process and some of the ways it can be defined, let’s dive a little deeper into each step…
Having an accurate, comprehensive, up-to-date inventory of all software and hardware in an environment is one of the most important components of Vulnerability Triage. In the absence of a single-source of record or master inventory, you can leverage multiple disparate sources of inventory. Some examples of asset inventory sources are listed below.
- IT Asset Management tools (ITAM)
- Configuration Management Databases (CMDB)
- GRC platforms (e.g. Archer, ServiceNow, Jira SD, etc…)
- Application Lifecycle Managment (ALM) tools
- Cloud inventory tools (e.g. AWS Systems Manager, AWS Config, etc…)
- Other (e.g. IPAM, scanning tools, etc…)
Within these inventory sources, or as part of the master asset inventory, there is certain metadata we are interested in for vulnerability triage. Some examples of information elements of interest are listed below. Ultimately, this data is used to answer two essential questions, what is our high-level exposure? and what is the risk of any specific vulnerability as it applies to an affected system?
- Vendor / product / version of software and hardware
- Unique system identifier (e.g. IP, hostname, netbios, etc…)
- Ownership (e.g. business vertical, technical owner, etc…)
- Data classification processed/stored by that system
- Externality (e.g. external, internal, cloud, etc…)
- Scope of affected systems
- System to system relationships/affinities
Having a single master inventory with all of the aforementioned data would certainly make the process of vuln triage much easier. However, this information is not always readily available. In many organizations, there may be reliance on multiple inventory sources that collectively represent the entire environment. Or worse, there may be only a partial inventory or no real inventory at all! With respect to metadata, I suspect it is quite rare to have all the information detailed in the list above. The good news is however, as detailed in the section on triage levels, vulnerability triage does not require everything listed. At a minimum, we need only a decent inventory which includes basic product information ideally mapped to individual asset identifiers. This could at least get us to a level 1 triage. Put differently, if the inventory can tell us that product X exists on systems A, B and C, we are in good shape. With this, you can certainly make basic triage decisions. From there, the more additional information you have, the more detailed your analysis can be (achieving higher level triage) which in turn removes the added overhead required for manual analysis and ultimately yields better prioritization results.
Alright! Once we have a solid asset inventory, we now need to collect information on known/disclosed vulnerabilities. I refer to this process of collecting vulnerability data and parsing the relevant metadata as Vulnerability Intelligence. There is a plethora of vulnerability data sources both open-source/free as-well-as commercial we can leverage. From these vulnerability sources, we need to collect certain bits of metadata which help with vuln-to-product correlation as well as risk analysis. Below, I list a number of potential vulnerability data sources as well as some examples of important vulnerability metadata.
- Vulnerability feeds (e.g. NVD, MITRE, Security Tracker, etc…)
- VM vendor feeds (e.g. Qualys, Tenable, Rapid7)
- Security bulletins (e.g. CISA, AWS, Android, Microsoft, Oracle, etc…)
- Exploit databases (e.g. exploit-db, vuldb, SecurityFocus, packet storm, vulners, etc…)
- Social media (e.g. Twitter, etc…)
- RSS (e.g. Feedly, curated research sources, etc…)
- *Threat Intelligence sources
- and more…
*As a side note, I wanted to quickly cover the difference between the concept of “Vulnerability Intelligence” and that of traditional Threat Intelligence (TI) (at least from my point of view). Where I delineate between the two is the idea that threat intel exists only where there are known (active) threats targeting an organization. Vulnerability intelligence on the other hand is where you have vulnerabilities which affect systems within an organizations environment. Together, where you have both a threat and a vulnerability, you have potential risk (the simple formula below represents this calculation). As you can (also) see via the image below, threat intel is typically a subset of vulnerability intel and is much smaller in volume. Finally, where you have known threats targeting vulnerabilities present in your environment you will likely need to invoke a vulnerability escalation process.
THREAT * VULNERABILITY = RISK
- Affected vendor / product / version
- CVSS Base metrics (e.g. vector, complexity, privileges, user interaction, impact)
- CVSS Temporal metrics (e.g. exploit code maturity, remediation level, report confidence)
- Evidence of active exploitation in the wild
- Dwell-time (how long has the vulnerability been known)
All together, there is no shortage of sources to retrieve vulnerability data from and a wealth of relevant metadata to collect from within these sources. In fact, it is best practice when performing vuln triage / risk analysis to reference a multitude of disparate sources to build the most complete picture of the true risk of a vulnerability. The more information you have, the more detailed you can be ( higher vuln triage level ) in that analysis and the higher fidelity your ultimate risk determination will be. With that said, you won’t always have a uniform/standardized view of a vulnerability and will need to make due with what is available. Similar to the inventory step, you need at a minimum the affected product (plus version) as well as SOME manner of vulnerability metadata. The more metadata you have, the more precise you can be in your risk determination.
Correlating Vulnerability Intelligence with Asset Inventory
OK, so we have our asset inventory and we have vulnerability intelligence to pair with it. From here we perform simple correlation between the products known to exist in our environment and the known vulnerabilities which affect those products. This rudimentary process is illustrated below.
Typically, this correlation is performed through the process of Vulnerability Scanning. This article doesn’t seek to cover scanning in much depth but it will be explained with the detail required to understand it’s function within the vulnerability triage process. In brief, vulnerability scanners are used to systematically detect and classify weaknesses on systems. Scanners perform this task in a variety of ways. By either authenticating directly then pulling a software inventory or by performing anonymous footprinting of a system, scanners can identify products and product versions across it’s scanned hosts. It then matches these identified products/versions using it’s own built in “plugins” which correspond to known vulnerabilities that affect respective products/versions.
So if vulnerability scanners are already doing this correlation, what is the problem?
- Network vulnerability scanning tools rely on plugins provided by the scanner vendor to identify/correlate vulnerabilities. This means that if the vendor does not develop a plugin, a vulnerability may not be identified.
- Plugins from the scanner vendors are not developed and released in real-time. This means there is some dwell-time between when a vulnerability is disclosed and when the vendor has developed a plugin available to identify it in an environment. This dwell-time means manual analysis may need to be performed for vulnerabilities which require immediate attention.
- Scans of an environment are not performed real-time. Therefore, the data you are working with within the scan tool may be outdated when performing vulnerability triage correlation activities.
- Scans are inherently invasive. This means there will be systems that can not be scanned or do not support scanning activities. In these cases, you will have a blind spot with traditional scan-based vuln triage.
For the vast majority of vulnerabilities, the speed in which findings must be “triaged” or otherwise analyzed for risk is completely satisfied by automated vulnerability scanning. In that world, high-risk findings are expected to be patched within some pre-set SLA timeframe, medium-risk findings have a different SLA and so on… It is the edge-cases (typically potential critical-risk findings), where manual triage is invoked and in those situations, there are improvements to be made.
Take for example a high-profile vulnerability or a zero-day vulnerability that has been announced by CISA in a bulletin. Below are some example steps a security analyst/team might take in triaging this vulnerability.
- CISA announces a vulnerability that exhibits a few high/critical risk characteristics.
- This disclosure is collected via a vulnerability intelligence source (such as Twitter).
- A security analyst (or VM team) takes this disclosure/alert and begins vulnerability triage.
- The security analyst first checks to see what products/versions are affected by the disclosed vulnerability.
- The analyst then reviews known inventory sources (CMDB, scanners, etc..) to determine if the affected products exist within the organization’s environment.
- If the product doesn’t exist in the environment, the issue is closed.
- However if the affected product does exist in the environment, further analysis must be performed.
- The analyst will want to determine whether the vulnerability meets the (or exhibits certain) criteria for a critical (or maybe even high) risk finding.
- If the vulnerability is definitely not high/critical in nature, this often means no further manual triage is necessary. The vulnerability will be addressed via the normal vulnerability management process within the defined SLAs.
- If however, the vulnerability does have certain high/critical-risk criteria, it should be further analyzed to determine technical risk and whether emergency or accelerated actions must be taken.
- The analyst performs a thorough risk analysis of the finding based on any and all vulnerability metadata and metadata about the affected assets.
- Where possible, the analyst will further enrich this risk determination based on known mitigating factors such as technical controls which may further reduce the residual risk.
- Technical risk determination is then coupled with business context to come up with a final risk score.
- Based on this residual risk value, a determination is made on how to prioritize mitigation/remediation/patching/risk treatments.
Phew!. That is quite a process right? If used sparingly, it really isn’t that much work. But at scale, performing this series of steps manually can be a time consuming task. This means, where security staffing is limited and quick decision making is needed, traditional vulnerability triage via scanning and manual analysis is not sufficient. Enter a new method for vuln triage…
Symphonic Vulnerability Surface Mapping
Symphonic Vulnerability Surface Mapping (“SVSM”) is a new approach to vulnerability triage and attack surface mapping. The idea is to ingest vulnerabilities in real-time from a wide variety of sources, correlate the vulnerability metadata (specifically affected product/version) with known inventory (also in real-time) and then (optionally) calculate risk and make prioritization decisions based on a fully-automated (or semi-automated) analysis engine. Let’s talk about how this can be done…
- Identify vulnerability intelligence sources.
- Build individual ingestors to extract normalized vulnerability metadata from different vulnerability data sources.
- Leverage a metadata-parsing-engine (MPE) (leveraging ML, keywords, etc..) to facilitate extraction of relevant metadata from sources with non-standard formats.
- Develop individual ingestors to populate asset inventory and extract normalized asset metadata from unique inventory sources.
- Perform basic correlation of vulnerability and asset inventory data to determine high-level applicability and exposure.
- Store correlated data in a database.
- *Leverage advanced risk analysis engine (RAE) to perform automated risk analyses at scale.
- *With risk scores in hand, deliver prioritized plan for addressing vulnerabilities.
*Steps 7 and 8 as described above are considered more advanced/higher order versions of your basic vulnerability triage process.
Ultimately, this process provides real-time feedback on potential exposures, risk calculations related to these findings and context for making treatment decisions. It does this at a speed which can not be obtained using traditional manual triage and automated scanning processes.
Security Control Plane (Advanced/Optional)
The Security Control Plane is a means in which to provide further enrichment to the risk analysis process. To fully understand the risk of any vulnerability as it applies to an affected system, one must also understand how the security controls in that environment help mitigate potential risks relevant to the vulnerability.
For example, if you have software that prevents execution of non-whitelisted binaries, then vulnerabilities which require execution of an untrusted binary may be rendered completely ineffective.
This understanding of security controls and how they effectively mitigate vulnerabilities can be applied to the risk analysis engine to better enrich residual risk determinations.
So what make’s SVSM different?
Real-time correlation, analysis and prioritization of vulnerabilities as they are disclosed across a multitude of vulnerability intelligence feeds. SVSM takes what has always been a manual or relatively slow process and turns it into something that is real-time, dynamic and fully automated.
What’s the catch?
Why use multiple vulnerability intelligence sources?
No one vulnerability intelligence source has all relevant metadata needed to perform thorough risk analysis of a vulnerability as it applies to an affected system. Often in the process of risk analysis multiple sources are used to ultimately derive the final risk score. By parsing/ingesting data from a variety of sources, we can augment single-source analysis and get the clearest picture of risk.
What if I don’t have a lot of metadata?
No problem! SVSM is more than capable of performing correlation, risk analysis and decision making even with low-fidelity metadata. This flexibility provides the ability to perform everything from simple triage (am I exposed?) all the way to fully automated attack-surface mapping and risk analysis with robust prioritization.
What’s with the name “Symphonic Vulnerability Surface Mapping”?
SVSM is a new take on an age-old process. It utilizes the benefits of automation and orchestration to solve the issues that have always plagued vulnerability triage. SVSM is just my way of marketing this idea. The use of the term “symphonic” is a play on the established concept of “orchestration”.
In the context of vulnerability triage and SVSM, manual risk analysis is the nut we are trying to crack. Performing triage at scale is undoubtedly cumbersome and risk analysis as a component of that process is certainly one of the worst offenders from an overhead perspective. So how can we automate? First, let’s understand what criteria we are interested in when determining risk and how we use that criteria to calculate risk.
- Vulnerability disclosure date (When was the vulnerability first published?)
- Vulnerability dwell-time (The length of time a vulnerability has been present on a system)
- Patch publish date (When, if applicable, was the patch itself published?)
- Does the vulnerability affect business-critical systems?
- Does the vulnerability affect systems which store/process sensitive data?
- System type (e.g. database, server, network device, workstation, etc…)
- Scope (i.e. limited vs. widespread)
- Externality (e.g. internal, external, segmented, etc…)
- Mitigating Controls ( Security Control Plane )
- CVSS Base score (vector, complexity, privileges required, user interaction)
- CVSS Temporal score (exploit code availability, patch availability, confidence level)
So how is risk typically calculated in practice? A simple risk matrix as shown below is an easy way to qualitatively derive a risk determination. However, this matrix only considers likelihood (probability) and impact in a vacuum. What it does not take into account is business context. It is recommended to also understand the business context of a system when determining a final risk score.
As previously mentioned, not every vulnerability is worthy of manual triage. The overwhelming majority of vulnerabilities are expected to be addressed as a result of routine patching and standard prioritization sourced from typical vulnerability scanning activities. To determine which vulnerabilities ultimately require manual analysis, we use an escalation process flow coupled with a number of defined escalation criteria. This flow as well as the criteria are provided in more detail below.
- Named/publicized “designer” vulnerabilities
- Vulnerabilities that are being targeted by threat groups in an active campaign
- Critical-severity vulnerabilities that affect external-facing or sensitive assets
- Vulnerabilities that affect a wide scope of systems
- Vulnerabilities affecting business-critical systems
Vulnerabilities which have one or more of these characteristics are often candidates for further analysis to determine if they require accelerated treatment. The vulnerability escalation process flow depicted below helps further illustrate this concept.
Vulnerability Escalation Process Flow
Presumably, if risk analysis is thorough, prioritization is mostly a question of fixing the highest risk things first and then moving down the list. In reality however, there are a few additional factors that could further influence how vulnerabilities are ultimately prioritized post-analysis.
- Level-of-effort (LoE) to patch
- Is there a patch, workaround or mitigating control available to further mitigate risk?
- Can applying a single fix remediate multiple vulnerabilities (or entire classes of vulnerabilites) at once? If so, and for example, there could be one fix which applies to a large number of medium-risk findings which if resolved at scale would reduce more risk than applying a single fix for a single high-risk finding.
Though not really in scope for vulnerability triage, I wanted to at least mention the final step, Vulnerability Treatment, as it is crucial to the overall process of vulnerability management. It is within this step that vulnerabilities are reported, patched, resolved, mitigated, or otherwise addressed. What could be more important!
SVSM as a concept is being brought to life through a new open-source tool dubbed Vulnscape! This tool is in very early stages, but over time, the goal is to develop the following as modular components…
- Vulnerability ingestors for the wide variety of potential vulnerability intelligence sources
- Asset inventory ingestors for the wide variety of enterprise asset inventory sources
- A Metadata Parsing Engine (MPE) that will be used to extract relevant vulnerability metadata from non-standard vulnerability data sources
- An automated (or semi-automated) Risk Analysis Engine (RAE) capable of risk-based decision making at scale
- Prioritization features
With version 1.0, I aim to bring a limited set of inventory/vulnerability ingestors as well as a basic correlation capability (for high-level exposure notification). Stay tuned!
SVSM and Vulnscape have applications that I think extend beyond just simple-to-advanced vulnerability triage. I see applications/integration opportunities in other domains as well. For example, it could be used in penetration testing activities related to “exploit suggesters”. Imagine hooking an SVSM tool like Vulnscape up to an exploit framework solution like Metasploit. Using this, you could more accurately target endpoints with exploits most likely to be successful. This is but one example of how Vulnscape could be applied beyond just vulnerability triage!