The 2024 CrowdStrike incident caused blue screens of death (BSOD) on Microsoft Windows devices worldwide, severely disrupting operations across essential industry sectors.
While this incident may have come out of nowhere for some, third-party-related incidents are becoming increasingly common and impactful, especially as businesses continue to increase their reliance on external vendors, products, and services, so much so that a single faulty software update can cause one of the most severe IT disruptions in history.
Even more alarming, IT disruptions are not the only substantial threat organizations face at the hands of their third-party ecosystems. Recent studies suggest that nearly 30% of all data breaches stem from a third-party attack vector, costing organizations an average of $4.88 million. Despite this, 54% of businesses admit they don’t vet their third-party vendors adequately before onboarding them into their internal systems.
Now that the dust has settled and the consequences of improper third-party risk management are at the forefront of conversations surrounding operational resilience, many chief information security officers (CISOs) are searching for ways to prevent future third-party disruptions from devastating their IT systems and impacting their business continuity.
This blog explores several strategies CISOs can employ to increase their IT resilience and mitigate third-party risks before they result in operational disruptions or other severe consequences.
Gain holistic insight into your third-party attack surface with UpGuard Vendor Risk>
Key Strategies for CISOs to Prevent Future Disruptions
To prevent CrowdStrike-type incidents in the future and significantly decrease their impact, CISOs need to adopt comprehensive strategies that reduce third-party risk and increase the resilience of their IT systems. Here are several strategies CISOs can employ to help in some way:
Develop a vigilant third-party risk management program
While even the most prepared third-party risk management (TPRM) program wouldn’t have prevented the faulty CrowdStrike update from happening, it would have enabled an organization to better understand which of its vendors was affected. By quickly identifying which vendors were impacted by the CrowdStrike outage, an organization could have pursued mitigation as efficiently as possible, limiting the time operations could have been disabled by an out-of-service vendor.
Also, the next third-party incident your organization faces may not be a software outage. It could be a cyber attack or data breach. By deploying critical TPRM tools and strategies, your organization can better protect itself from the potential risks present across your third-party attack surface.
The most effective TPRM programs include the following components:
- Vendor Risk Assessments
- Vendor Security Questionnaires
- Continuous security monitoring
- Detailed reports and dashboards
Establishing a program with these components will empower your organization to swiftly identify, mitigate, and remediate third-party risks before they damage your organization and improve your response time when unavoidable incidents occur.
Automated TPRM solutions, like UpGuard Vendor Risk, also enable organizations to improve their operational resilience and risk management without excessive manual effort. Compared to traditional risk management workflows, Vendor Risk empowers security teams to conduct comprehensive risk assessments in half the time.
To learn more about how UpGuard can help your organization, book your FREE demo today.
Establish comprehensive update management procedures
The CrowdStrike incident revealed that even the most innocuous-seeming software updates can cause significant problems to an organization’s IT infrastructure. Moving forward, CISOs need to develop a more comprehensive approach to update management.
CISOs must implement a rigorous update management program that evaluates and tests each update during pre-deployment and throughout different IT environments to detect issues before they become harmful. Staging environments, sometimes called replica environments, can be used to test the performance of updates without subjecting an organization’s actual IT system to an untested software update.
In addition, CISOs should develop procedures to reduce the immediacy of software updates across critical environments and infrastructure. One low-resource method is to categorize all software components into three separate stacks:
- Stack 3 - Low Disruption Risk: These would include components unlikely to interfere with critical system operations, such as OS kernel operations, TCP/IP, and other higher network layer driver components. Your security team will usually be able to delay updates to components in this category with little risk of disruption.
- Stack 2 - High Disruption Risk: These components present a higher disruption risk if your personnel delay updates.
- Stack 1 - Critical Security Updates: These components are necessary for protecting your environments against immediate threats, such as Zero-Days, and you must immediately accept all new updates despite their potential disruption risks.
If most of your components fall into the second stack, you may need to separate them further into substacks to achieve a more beneficial distribution. You can assess whether delaying Stack 2 updates by four, eight, or 24 hours will increase security or continuity risk.
Enhance resilience by avoiding single points of failure
Diversifying your software solutions will increase resiliency across your entire IT infrastructure and prepare your organization to handle future disruptions effectively. Consider employing the following strategies to increase your IT resilience:
- Diversifying solutions: Implement redundancy and failover mechanisms to ensure critical systems remain operational despite component failures.
- Hybrid or multi-cloud infrastructure: Adopt hybrid or multi-cloud infrastructure to reduce the risk of single points of failure and distribute workloads across multiple environments to enhance redundancy, flexibility, and disaster recovery capabilities.
- Load balancing and geographic distribution: Utilize load balancing to distribute traffic evenly across servers and distribute resources across environments to mitigate risks associated with localized failures.
These strategies can help your security team ensure critical systems remain resilient and operational despite potential failures.
Continually calibrate your incident response plan
Disruption incidents can be devastating but also present opportunities for continued improvement when used to elevate current systems and processes. One takeaway many organizations have had after CrowdStrike is the importance of developing comprehensive incident response and disaster recovery programs.
While you should calibrate your security programs to defend against the broadest array of risks, avoiding every cyber incident is impossible. A dedicated incident response plan helps you identify, mitigate, and remediate unforeseen incidents as efficiently as possible.
The best incident response plans operate across six main phases:
- Preparation: Establish the architecture of your incident response plan, draft key policies, and assemble your incident response toolbox
- Identification: Deciding when to activate the incident response plan after your security team has identified a security incident
- Containment: Isolating the incident and preventing further damage to other systems or environments
- Eradication: Remediating the security incident while prioritizing continued containment and protection for critical systems
- Recovery: Returning all systems to their standard state before the security incident occurred or infected the system
- Lessons learned: Completing incident documentation and learning how to prevent similar incidents from occurring in the future
Related reading: How to Create an Incident Response Plan (Detailed Guide)
Assess the effectiveness of your disaster recovery program
Outages and disruptions similar to CrowdSrike are powerful reminders of the necessity for robust infrastructure resilience and effective disaster recovery plans. Developing these plans and taking proactive measures are essential to ensure systems remain operational during unforeseen events. Disaster planning involves not only diversifying solutions but also continuously assessing and refining recovery strategies.
Regularly scheduled drills, thorough evaluations, and strategic partnerships with reliable providers can significantly enhance an organization's ability to respond to and recover from disruptions. By implementing these best practices, CIOs can ensure their infrastructure is well-prepared to handle any challenges that may arise:
- Proactive assessment: Regularly evaluate infrastructure resilience and disaster recovery plans to ensure preparedness for future disruptions.
- Simulated drills: Conduct regular simulated drills to test disaster recovery plans, identifying weaknesses and areas for improvement.
- Partnerships with reliable vendors: Collaborate with reliable providers to enhance preparedness and response capabilities by leveraging their expertise and resources.
Improving third-party risk visibility and mitigation with UpGuard
Of course, the best way you can prevent third-party risks from impacting your organization is to identify and mitigate them before they become problematic. A comprehensive, all-in-one, TPRM solution like UpGuard Vendor Risk helps organizations across industries do exactly that.
The UpGuard toolkit includes automated workflows that empower security teams to better understand the security posture of their third-party ecosystem through the following:
- Vendor risk assessments: Fast, accurate, and comprehensive view of your vendors’ security posture
- Security ratings: Objective, data-driven measurements of an organization’s cyber hygiene
- Security questionnaires: Flexible questionnaires that accelerate the assessment process using automation and provide deep insights into a vendor’s security
- Reports library: Tailor-made templates that support security performance communication to executive-level stakeholders
- Risk mitigation workflows: Comprehensive workflows to streamline risk management measures and improve overall security posture
- Integrations: Application integrations for Jira, Slack, ServiceNow, and over 4,000 additional apps with Zapier, plus customizable API calls
- Data leak protection: Protect your brand, intellectual property, and customer data with timely detection of data leaks and avoid data breaches
- 24/7 continuous monitoring: Real-time notifications and new risk updates using accurate supplier data
- Attack surface reduction: Reduce your third and fourth-party attack surface by discovering exploitable vulnerabilities and domains at risk of typosquatting
- Trust Page: Eliminate having to answer security questionnaires by creating an UpGuard Trust Page
- Intuitive design: Easy-to-use first-party dashboards
- World-class customer service: Plan-based access to professional cybersecurity personnel that can help you get the most out of UpGuard