Columns

Medical Device Vulnerability Scoring

The goal is to prioritize the mitigation of known vulnerabilities based on the vulnerability’s impact and exploitability.

Christopher Gates, Director of Product Security, Velentium01.31.24

In the medical device industry, vulnerability scoring is a process to assess and quantify the severity of cybersecurity vulnerabilities in medical device software, hardware, and systems. The goal is to prioritize the mitigation of known vulnerabilities based on the vulnerability’s impact and exploitability. Sounds simple enough, right? Unfortunately, that isn’t the case. Let's dive a little deeper into this process.

Defining the Terms

First, several definitions are required before proceeding any further.

Author—The person performing the scoring
Consumer—The person reading and using the results of scoring (can be the same as the author)
Vulnerability—A weakness in the system design, implementation, or third-party software components that can be exploited
Threat—An activity, with the potential for causing harm to a medical device via a vulnerability
(The last two definitions are just cleaned-up versions originally taken from the NIST CSRC Glossary.)
The cybersecurity industry frequently uses the terms “vulnerability” and “threat” as if they were interchangeable, which they are not. This leads to confusion when identifying tools and techniques.
This conflation of terms started with end users who were experiencing a threat (e.g., ransomware) but instead focused on the vulnerability (e.g., a remote code execution capability due to hard-coded credentials).

Even the most popular vulnerability scoring rubric—the Common Vulnerability Scoring System (CVSS)—was developed by an organization named FIRST (Forum of Incident Response and Security Teams), which provides a pretty good clue as to their orientation (i.e., the end-user perspective).

Use Cases

Vulnerability scoring needs to be utilized across the total product lifecycle of a medical device. Unfortunately, most of these life phases have radically different needs for the output of a scoring rubric. For example, the system engineer working during the concept/design phase of a device needs to score the results of threat modeling of the assets in the system. Such a threat model would result in a list of potential vulnerabilities that would then need to be scored in light of the intended use cases. This type of rubric could consist of just the severity and the exploitability (ease and scope) of a design vulnerability (e.g., no authentication being performed between connected devices in the system). Similarly, a development engineer may create an implementation vulnerability (e.g., a buffer overflow), which could be scored using just severity and exploitability.

Now, consider the situation where a third-party software component is being used in your medical device. In this case, there are many more factors to consider, such as whether the vulnerable code is actually present in your device or if the use of meta symbols and conditional compile statements in the library removed the affected software from the executable image during compilation. Further, don’t forget the sustaining cybersecurity engineer who will be scoring discovered (internally or externally) vulnerabilities.

On the other hand, consider the viewpoint of a security engineer in a hospital reviewing known vulnerabilities before they purchase a new medical device. Suddenly, the environment (of which the development engineer was blissfully unaware) could play a major factor in determining the exploitability of this device within their organization. In addition, in the case of a hospital chief information security officer (CISO) reviewing the risks to patient safety being created by the cybersecurity posture of specific medical devices, how is it possible to perform the assessment? No one until now was even considering patient safety in their scoring.

What we are experiencing here is the author of a vulnerability score may have more (e.g., in the case of the developers) or less (e.g., in the case of the hospital CISO) detailed information that could be used for scoring and, at the same time, all consumers have different perspectives regarding and needs from a vulnerability score.

Further, we also have to realize that early in the development lifecycle, we are scoring vulnerabilities. Approximately at the time third-party software components and especially sustaining scoring is being performed, however, we are scoring threats and their underlying vulnerabilities. At this phase, we can start to consider including likelihood and probability into the scoring rubric.

These concepts interact as illustrated in the following example. At design time, a design vulnerability is created, which allows access to the lowest levels of a medical device via a communications medium. This occurred because access to this communications medium did not require authentication. At the time of design, all we can say is we have a design vulnerability (which should be fixed). Once the vulnerability is in the field, it could be exploited in numerous ways, including by threats (e.g., ransomware, worms, botnets, etc.) and used for various purposes (e.g., extraction of intellectual property, extraction of PHI, compromise of design performance, etc.).

Ultimately, there is no one scoring rubric to serve all of the many consumers’ requirements. This has led to many people complaining about any given scoring rubric, and they are all correct, as they each have different perspectives on the requirements for the resultant score. It has also resulted in a wide selection of possible scoring rubrics.

FDA Premarket Guidance (Exploitability over Severity)
ISO 14971/24971 (Exploitability over Severity)
NIST’s Risk Determination (Likelihood over Impact)
NIST SP 800-221 (For use in the boardroom)
NIST’s CMSS (a variant of CVSS)
NIST’s CCSS (a variant of CVSS)
EPSS Exploit Prediction Scoring System (a likelihood-based rubric)
DREAD
CVSS v2 [includes Collateral Damage (i.e., Severity)]
CVSS v3 [removed Collateral Damage (i.e., Severity)]
CVSS v3.1
CVSS v4.0 (numerous improvements, including Collateral Damage restored with a new name)
MITRE’s Medical Device Rubric (a more robust variant of CVSS)
IVSS (an industrial variant of CVSS)
OWASP Risk Rating
Billy Rios & DHS’ RSS-MD (a medical device variant of CVSS)
PVSS/EPSS (a probability variant of CVSS)
No Dirt
HVSS Healthcare Vulnerability Scoring System (a variant of CVSS v3.1)

Most of these rubrics usually fall into one of three approaches:

Multiple subjective factors/attributes, all equally reflected in the final score
Multiple subjective factors/attributes, with each attribute being conditioned by a weighting factor before being reduced to a final score
Either 1 or 2 with the inclusion of “likelihood”

In addition, none of these rubrics are compatible with any other rubric, not even across versions of CVSS.

Selecting the Rubric

So which of these rubrics should you be using? By now, you should understand the answer depends upon the perspective and lifecycle phase of the consumer, so there is no single universal answer. That said, consider severity over exploitability for design and implementation vulnerabilities. Once those phases have passed, you may want to consider CVSS v4.0 or EPSS.

In addition, don’t overly fixate on the exact results of scoring, such as ignoring a vulnerability with a CVSS score of 3.9 while mitigating a vulnerability with a CVSS score of 4.0. At best, these rubrics create an approximation of the severity/exploitability/risk of a given vulnerability. Keep that in mind when working with any scoring rubric. Indeed, a scoring value created by a manufacturer is not going to meet the needs of a hospital, since, at a minimum, the manufacturer has no idea about the hospital environment. As such, be very leery of scoring results created by other people; most likely, they don’t have the detailed knowledge or perspective you need.

Conclusion

I would like the reader to consider a world in which we do not score our vulnerabilities. (Blasphemer! How dare you not use a metric in your processes.) We should realize the only reason we are using metrics is to prioritize implementing mitigations—a priority that also includes doing nothing at all.

Remember, this entire process is based on a score derived from only a handful of attributes. Is that really enough? Did we include impacts to the device’s operational timing? (No, you didn’t!) Is this scoring approach really effective or are we just trying to avoid doing the hard work of creating mitigations for every discovered vulnerability?

Christopher Gates is the director of Product Security at Velentium. He has more than 50 years of experience developing and securing medical devices and works with numerous industry-leading device manufacturers. He frequently collaborates with regulatory and standard bodies, including the CSIA, Health Sector Coordinating Council, H-ISAC, Bluetooth SIG, and FDA to present, define, and codify tools, techniques, and processes that enable the creation of secure medical devices.