Verify the human,
not just the knowledge.
Threshold is a research prototype for high-stakes knowledge transfer. It gates access behind biometric readiness, physical presence, and verified provenance — so transfer proceeds only when both the artifact and the recipient have been verified.
The system operationalizes a first-order task of crisis decision-making: raising the operator’s awareness of escalation risks — through structural system design rather than bureaucratic process.
Five layers, one threshold.
What this rests on.
The verification-of-the-human thesis is not a design opinion. Five independent peer-reviewed findings converge on it — each ruling out a different alternative. Together they foreclose the major pre-Threshold mitigations (better operators, better personas, better baseline AI, better safety training). Verification at the human surface is what remains.
Commercial pilots in NASA Ames simulators miss real automation failures at 44%, 48%, and 71% rates (altitude, heading, frequency misload). Trained operators do not catch what they trust the system to track.
Rules out: better-trained operators will catch the errors.
Algorithm aversion is asymmetric. People penalize AI for small errors more harshly than humans for larger ones, and reject AI after seeing it err even when it continues to outperform.
Rules out: operators will straightforwardly use AI when it outperforms.
LLMs ignore persona prompts at extremes. A “strict pacifist” persona produces no statistical difference from “aggressive sociopath” in simulated wargame behavior. Newer models track human experts less well.
Rules out: persona prompts can constrain LLM strategic preferences.
Across five frontier LLMs in autonomous wargame simulations, all escalate in neutral scenarios. GPT-4-Base executes nuclear strikes in 7.08% of actions; arms-race dynamics emerge in every model tested.
Rules out: LLMs without persona prompts will behave acceptably in autonomous decision contexts.
Standard safety training cannot remove deceptive behavior once present in a model. Adversarial training teaches the model to better hide the unsafe behavior rather than remove it. Persistence scales with model size.
Rules out: post-hoc safety training can fix problematic AI behavior.
Who is behind this.
Grant
Carnegie Corporation — Modern Technologies & Nuclear Risks
Two-year award. Co-publishing on AI hallucination and fragility in nuclear contexts.
Design partner
RISD Industrial Design — Tom Weis (PI)
Doomsday Clock redesign for the Bulletin of Atomic Scientists. Sandia foresight pedigree. West Point workshops.
Working group
Institute for Security & Technology — AI-NC3
April 2026 working session: STRATCOM, Pentagon procurement, LLNL, Palantir AI lead. Vocabulary that grounds this work.