The ubiquitous CrowdStrike incident resulted in a major diversion of resources, with some hard-hit organizations assigning almost all of their IT and security personnel to damage control. As a CISO of an impacted organization, you will likely be required to answer for a lack of resilience to this type of event.
To support your decision-making as you reevaluate your resilience budgets, this post outlines four resilience strategies based on key learnings from the CrowdStrike event.
1. Diversify your tech (and security) stack
A key objective to prevent future disruptions similar to the CrowdStrike incident is eliminating all risk concentrations in your IT ecosystem. This can be achieved by architecting increased diversity into the layers of your production system and technology stacks. Such an approach would aim for software agents, components, or IT subsystems with the potential of causing disruption through faulty updates to safely fail without total disablement of viable service capacity.
Diversifying your tech stack through policy changes or architectural reforms also has the benefit of disrupting cyber attack pathways and supporting your cybersecurity program with an additional layer of data breach protection.
One strategy for achieving a more graceful system degradation rather than a sudden catastrophic failure is implementing separate protective security stacks on different portions of the total workload capacity.
An example of this is structuring your infrastructure such that your web and database servers are protected by their own unique set of security controls. This way, if a faulty security update disrupts your web server operations, your database server controls will continue to operate as normal. This approach reduces the risk of your overall system functionality hinging on a single point of failure.
The downside of this approach is that it may increase risk management complexity and environmental and operational risk exposures. However, in high-maturity instances (such as Configuration-as-Code, Infrastructure-as-Code, and IT change management scenarios), the additional risk exposure is smaller, making this an attractive option for dispersing risk concentrations in such cases.
If you decide to diversify your security stack, keep the following implications in mind:
- Be prepared for increased costs due to managing more vendors, purchasing additional licenses, and developing the necessary internal or external capabilities to design, implement, and maintain these new security measures.
- Every third-party component added to your security stack will expand your attack surface. However, this slight expansion may be necessary to reduce your overall risk exposure.
Watch this video to learn how UpGuard's Attack Surface Management tool can mitigate your overall cyber threat exposure.
2. Comprehensive Testing and Impact Analysis of Security Software Components
The CrowdStrike incident demonstrated that even cybersecurity software—which has a reputation for being the most hardened and resilient of all software types—is susceptible to operational failures.
Addressing this underserved risk category will require adjusting your risk management lens to regard all security software components - especially those with a high potential of disrupting critical production workloads - with the same degree of prejudice as Operating Systems and general application updates.
This mindset shift will require assessing all current security components for any immediate significant disabling or disruptive impacts. You should apply these impact tests to a broad range of environments, including server workloads, which handle backend processes, and End-User Computing (EUC) environments, which directly affect user productivity.
Share the findings of your impact analysis with relevant stakeholders. Use their feedback to refine the testing processes and mitigate any identified risks before new security software components come into your production environment.
Don't limit your scope to just security vendors.
Use this opportunity to re-evaluate your current Vendor Risk Management platform and its effectiveness in mitigating third-party cyber risk exposure for your entire vendor ecosystem. After all, you're much more likely to experience a critical disruption from a third-party data breach than another faulty security software update.
To encourage threat response agility while minimizing risk exposure, your VRM tool should include integrated workflows that address the entire TPRM lifecycle and leverage automation technology to seamlessly manage vendor risk assessments at scale.
To extend your objective of dispersing risk concentrations to the vendor ecosystem, your VRM tool should also be capable of quickly adapting to new, unexpected supply chain threats, like the CrowdStrike incident, which sent shockwaves to third-party vendors globally.
Watch this video for an overview of how UpGuard is helping its users rapidly identify and manage third-party services impacted by the CrowdStrke event.
3. Adopt a balanced approach to software update management
A more cost-effective approach to mitigating disruptions from faulty third-party service updates is to reduce the immediacy of updates being pushed to critical production workloads and environments. This will require initially categorizing software components into three risk tiers based on their disruption potential if an update is delayed.
- Tier 3 - Low Disruption Risk: These would include components most unlikely to interfere with critical system operations, such as OS kernel operations, TCP/IP, and other higher network layer driver components. Updates to components in this category can usually be delayed with little risk of disruption.
- Tier 2 - High Disruption Risk: These components present a higher disruption risk if their updates are delayed.
- Tier 1 - Critical Security Updates: These components are necessary for protecting your environments against immediate threats, such as Zero-Days, and, therefore, must immediately accept all new updates despite their potential disruption risks.
Most of your components will likely fall into the tier 2 category, which isn't helpful for this strategy. To prompt a more beneficial distribution, assess whether Tier 2 update delays of four, eight, or twenty-four hours will likely increase security or productivity risks.
This tiering strategy could be applied to a Vendor Risk Management program to help security teams understand how comprehensive each third-party service's risk assessment needs to be.
For example, Tier 1 vendors would require the most comprehensive level of risk assessment to evaluate their disruption risk exposure stemming from security vulnerabilities and missed software updates.
The UpGuard platform offers a customizable vendor tiering feature that can be adapted to your specific tiering strategy.
Watch this video for an overview of UpGuard's risk assessment features.
Critical software components with a tolerance for update delays could still be protected with a buffer period where the impact of new updates on other organizations is observed before deciding if they're safe to onboard. However, this method is challenging to execute as it would require an accurate estimate of an acceptable delay period before a new update is deemed permissible - during which you're most at risk of being targeted by cyberattacks, taking advantage of your vulnerable state of security.
4. Recalibrate your staffing-to-MSP ratio
The CrowdStrike incident highlights the limited capabilities of MSPs when it comes to handling large-scale disruptions. The effects of this limitation were exacerbated during this event since impacted MSPs likely have most of their customers with Windows EUCs and server dependencies tuned for tight capacity to maximize profit margins.
But even if the CrowdStrike incident never occurred, MSPs still pose inherent disruption risks due to their limited availability of resources to flex during acute demand spikes. To reduce your exposure to this risk, evaluate your insourced staffing-to-MSP ratio.
While being more costly, ensuring a more balanced internal resource-to-MSP ratio will mitigate the concentrated risk of overreliance on MSPs.
Aim to have sufficient in-house resources available for critical incident recovery. Your internal resources should include experts capable of accurately interpreting your state of damage and overseeing full system recovery with targeted and efficient remediation efforts.
Mitigate third-party vendor disruptions from the CrowdStrike incident with UpGuard.
UpGuard can help you quickly identify and manage your level of risk exposure through third—and even fourth-party vendors impacted by the CrowdStrike incident.
For critical vendors where you require additional information about their level of exposure, UpGuard offers a new dedicated CrowdStrike Incident Questionnaire. All vendor collaborations are stored in a centralized location to streamline team collaboration and support audit tracking if evidence is required in the future.
To help you remain proactive with your risk management efforts., UpGuard's News section offers a comprehensive view of all potentially impacted entities in your vendor ecosystem.
When it's time to inform your board about the results of your risk management efforts, UpGuard can generate one-click reports providing a concise overview of the CrowdStrke Incident's impact on your business. These reports are intentionally designed to be easy to understand regardless of one's level of technical knowledge, allowing strategic decisions to be made without delay.