How and why the CrowdStrike global BSOD
The Blue Screen of Death (BSOD)
FRIDAY JULY 19, 2024, THE WORLD SAW the global impact of a widespread BSOD event caused by an update error from an antivirus company resulting in the crash of approximately 8.5 million computers used in various industries around the world.
What is a BSOD?
WHEN MICROSOFT WINDOWS encounters a serious inconsistency error in its internal memory management system, it has no choice but to halt the running of the Windows Operating System in order to prevent more serious problems that might occur, e.g. data loss or a security breach.
While programmers are supposed to guard against this kind of fatal error by checking the consistency of their memory operations, a programming oversight can lead to either corruption of data or unauthorized data exposure. This could result in losing all data in memory or on a disk or potentially giving a user or process unauthorized privileged access.
This condition is so serious that Microsoft halts all software on the affected computer and displays a fatal error message on a blue screen.
A BSOD indicates that Windows is dead!
This is why we commonly call it the Blue Screen of Death.
Why rebooting did not work?
IN THE PAST WHEN YOU ENCOUNTERED A BSOD you merely rebooted Windows to fix it. That usually worked because the problem was a result of a combination of events that created the internal inconsistency error that led to the BSOD.
July 19th was different because CrowdStrike’s software update error caused the same internal memory inconsistency error each time that their antivirus software driver was loaded as the Windows operating system was starting up.
Here's what happened …
CrowdStrike and other antivirus software need to frequently update their software to ensure that it can detect and mitigate new strains of computer viruses and hacking threats.
CrowdStrike CEO George Kurtz noted that keeping up with hackers requires frequent updates to security tools ... This is true for all antivirus software, not just CrowdStrike.
CrowdStrike’s automated updater installed an antivirus update that caused the BSOD error. Each reboot resulted in the same system failure.
Unfortunately, this BSOD prevented Windows from booting to the point where the CrowdStrike updater could repair their broken software.
Windows was “Locked Up” at Boot Time, requiring a manual boot into Safe Mode
Malware Detection and Operational Considerations
Is the problem fixed now?
WHILE THIS PATCH was fixed, the risk of future BSODs from frequent antivirus updates looms as an issue. And while there is no silver bullet to completely eliminate the risk of BSODs, careful engineering of software design can significantly reduce this risk.
The Danger of a Zero-Day Attack
THE BIGGEST CYBERSECURITY THREAT is from the exploitation of Zero-Day Vulnerabilities.
A Zero-Day attack exploits an undisclosed computer-software vulnerability to gain control of a computer or network. It is known as a "zero-day" because the vulnerability isn’t publicly known before hackers use it, leaving the software authors with “zero” days to create patches to mitigate it.
Traditional and next-generation antivirus products can’t detect and stop new threats and Zero-Day attacks without prior knowledge of their behavior, so they requires antivirus updates whenever new threats are discovered in order to identify them.
“Updates to Channel Files are a normal part of the sensor’s operation and occur several times a day in response to novel tactics, techniques, and procedures discovered by CrowdStrike.”
These frequent CrowdStrike updates significantly increased the risk of a BSOD occurring. It took only a single update error to cause the global outage affecting millions of computers. 2024 was not CrowdStrike’s first BSOD incident, with other incidents allegedly in 2013, 2017, 2019, and 2023.
Malware Detection Complexity
No matter how many attack attributes an antivirus technology uses to categorize an attack vector or its behavior, hackers will continue to develop new hacking techniques. This makes malware detection an unbounded problem; one that’s characterized by unknown or poorly defined information requirements, a great number of variables and unpredictable behavior on a large-scale. Just adding more features or using newer learning algorithms cannot solve this kind of problem.
To avoid BSODs, simplify the design
To provide a predictable solution, we transformed the complex task of detecting malware into a straightforward, bounded task: determining which software is allowed to run. Instead of searching for the “needle in the haystack”, we only have to determine which software is authorized to run. Our App Firewall blocks all Zero-Day malware and all other unauthorized and unknown software.
Because our kernel software design is so simple, we haven’t needed to update it since April 2020.
This eliminates downtime risks caused by:
- a breach due to a missed Zero-Day attack, and
- errors during frequent software updates
So, now you can choose:
- a simple, stable Zero-Day prevention solution,
- or being stuck with a BSOD brick.