Root Cause Analysis
Photo by @insungyoon on Unsplash
In science and engineering, root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. It is widely used in IT operations, telecommunications, industrial process control, accident analysis (e.g., in aviation, rail transport, or nuclear plants), medicine (for medical diagnosis), healthcare industry (e.g., for epidemiology), etc. Wikipedia
Root cause analysis is used to find the root causes of faults or problems. RCA applied to CVE analysis process provides clarity to the underlying vulnerability primarily responsible for the violated security issue.
Table of Contents
Treating the Symptom Rather Than the Cause
Often in trying to solve complex problems, rather than providing a solution, the symptoms of the problem are treated:
- Mowing over weeds, instead of pulling out the root
- Taking pain medication, instead of curing the infection
- Replacing a fuse, instead of finding the short
It places us in an endless cycle of treating symptoms, or worse, ending in a complete failure of the system we are treating.
graph LR;
A[Problem] --> B[Treat Symptom];
B --> A;
To break that loop, the root needs to be identified. To find the root issue, a clear distinction is required between the actual root cause and causal factors (as listed in the CVE analysis template):
- root cause - the fundamental issue. Removing this prevents the problem from recurring
- casual factors - A contributor to the issue. Removing this might mitigate some part of the problem, but does not completely remove the issue
With luck, the loop can been broken.
graph LR;
A[Root Problem] --> B[Solved];
Formal Process
RCA - The process to find the root cause of a particular problem.
RCA Four Basic Steps:
- Identify and describe the problem clearly.
- Establish a timeline from the normal situation up to the time the problem occurred.
- Distinguish between the root cause and other causal factors
- Establish a causal graph between the root cause and the problem.
Visual RCA Process
A causal graph depicting the RCA process
graph TD;
classDef default font-size: 12px,stroke-width: 2;
A[Apparent Problem] --> B[Symptom of Problem];
A --> C[Symptom of Problem];
A --> D[Symptom of Problem];
B --> E[Possible Root Cause]
B --> F[Possible Root Cause]
C --> G[Possible Root Cause]
D --> H[Possible Root Cause]
D --> I[Possible Root Cause]
G --> J[Actual Root Cause]
Alternate method - Five Whys
Another method to aid in root cause determination is a technique called the Five whys - Wikipedia. This inquisitive toddler technique simply repeatedly asks the question “Why?” to try and uncover the primary issue.
Example: An example of a problem is: The vehicle will not start.
- Why? – The battery is dead. (First why)
- Why? – The alternator is not functioning. (Second why)
- Why? – The alternator belt has broken. (Third why)
- Why? – The alternator belt was well beyond its useful service life and not replaced. (Fourth why)
- Why? – The vehicle was not maintained according to the recommended service schedule. (Fifth why, a root cause)[2]
RCA applied to the Windows Print Spooler CVEs
Time to apply RCA to the Windows Printer Spooler CVEs previously analyzed in the CVE analysis section. In the case of vulnerabilities, the root cause is the fundamental security issue responsible for the violation of security. For this analysis, the 3 CVEs that most clearly demonstrate the importance of root cause are the following related CVEs.
Related CVEs:
These CVEs, spanning the course of just a few months, all suffer from the same underlying issue, (spoiler alert!) the fact that the Windows Print Spooler improperly impersonated a standard user to complete a file write. The details of this conclusion follow and were derived from several public blog posts helping fill in the CVE template for each with additional gaps being filled in during the Patch-Diffing-Applied section. With that context, RCA is applied to identify the issue and verify this stated root cause.
Four basic steps:
- Identify and describe the problem clearly.
- A standard user can perform an arbitrary file write as
SYSTEM
leading to code execution (see Summary Section)- More simply: Local Privilege Escalation (LPE)
- A standard user can perform an arbitrary file write as
- Establish a timeline from the normal situation up to the time the problem occurred
- This is a bit abstract for CVEs. For related CVEs this could be useful as trends may emerge or similarities (as seen below).
- Distinguish between the root cause and other causal factors
- There are several inputs that feed into this. This ability will be enriched by CVE analysis and Patch-Diffing.
- Establish a causal graph between the root cause and the problem.
graph LR;
classDef default font-size: 15px,stroke-width: 2;
classDef rc fill:#00cc66,font-size: 15px,stroke-width: 2;
A[LPE] --> B[CVE-2020-1048];
A --> C[CVE-2020-1337];
A --> D[CVE-2020-17001];
B --> E[Port Assignment - Client-Side Enforcement of Server-Side Security CWE-602];
B --> F[WPS Impersonates Itself - Priv Context Switching Error CWE-270];
C --> G[Directory Junction - Race Condition Enabling Link Following CWE-363]
C --> K[WPS Impersonates Itself - Priv Context Switching Error CWE-270]
D --> H[Windows File-Based Canonicalization - Improper Link Resolution Before File Access CWE-59]
D --> L[WPS Impersonates Itself - Priv Context Switching Error CWE-270]
K:::rc --> J[Root Cause]
F:::rc --> J
L:::rc --> J
J:::rc
Root Cause Determined
graph LR;
classDef default fill:#00cc66,stroke-width: 2;
B[WPS Impersonates Itself - Privilege Context Switching Error CWE-270]
Five Whys
More generally the 5 whys technique could like be applied to each of the CVEs individually arrive at the same conclusion.
Problem : Local Privilege Escalation
5 Whys:
- Why? Arbitrary DLL loading by privileged process (standard
featureissue on Windows) - Why? Windows Print Spooler wrote arbitrary data in a privileged directory
System32
- Why? User is able to direct the output of a print job to a specific port/file location (despite lacking write access to the location)
- Why? Some port assignment checks are only enforced client side. WPS must support several features including “printing” files to disk. WPS leverages Impersonation as it’s primary security mitigation.
- Why? Windows Print Spooler mistakenly impersonates itself for print jobs in which a null context is provided for its process token.
The questions and answers above were contrived to fit the model with exactly 5 whys. In practice, the idea is to iterate the whys. Keep asking until the issue is clear.
Finishing Root Cause Section in the Template
Back in CVE analysis, we demonstrated how to fill out the CVE Analysis Template using CVE-2020-1048. We now have the background and theory behind root cause analysis to adequately fill out the section. Know what we now know about CVE-2020-1048 and its related CVEs, the section can be completed.
### Is this CVE the Root Cause or a Causal Factor? If not Root, what is?
- **causal factor** - *major contributor to an undesirable condition that if eliminated, would have either prevented the occurrence of the incident or reduced its severity or frequency*
Looking at the requirements, this issue is two fold.
- ability to assign a printer port to an arbitrary file path.
- Windows Print Spooler contains a Self Impersonation Privilege Escalation
This CVE addresses specifically the client side check vulnerability. The bigger or root issue being the latter **Windows Print Spooler contains a Self Impersonation Privilege Escalation**.
Seeing Further
Understanding the root of the issue can help us see further. It can help provide clarity on whether or not a recent security patch was treating a symptom or providing a cure. Additionally, a deep understanding of a particular vulnerability class,can help provide a lens or mental model to discover the same class in another context. Next, generate new ideas for vulnerabilities or alternate paths to get to the root cause in the case the underlying issue still exists.
We are near the end, and several elements of our ideal CVE Analysis have been accomplished. In the final section, we will wrap up our final thoughts in the Conclusion.
CVE North Stars Map
graph TD;
classDef current fill:#00cc66;
H:::current;
A1[ CVE Research] --> A[CVE-2020-1048];
A1 --> B[CVE-2020-1337];
A1 --> C[CVE-2020-17001];
A1 --> D[CVE-2010-2729];
A1 --> E[CVE-2020-1030];
A --> F[CVE Analysis + Patch Diffing];
B --> F;
C --> F;
D --> F;
E --> F;
F --> I[System Comprehension]
F --> G[Vulnerability Classification];
F --> H[Root Cause Identification];
G --> J[Develop Mitigation Requirements / Novel Understanding];
H --> J;
I --> J;
J --> K[Discover New and/or Related Vulnerabilities]