How large language models can reconstruct forbidden knowledge

Aug 27, 2025 - 12:02

0 0

How large language models can reconstruct forbidden knowledge

In the late 1970s, a Princeton undergraduate named John Aristotle Phillips made headlines by designing an atomic bomb using only publicly available sources for his junior year research project. His goal wasn’t to build a weapon but to prove a point: that the distinction between “classified” and “unclassified” nuclear knowledge was dangerously porous.

The physicist Freeman Dyson agreed to be his adviser while explicitly stipulating that he would not provide classified information. Phillips armed himself with textbooks, declassified reports, and inquiries to companies selling dual-use equipment and materials such as explosives. Within months he had produced a design for a crude atomic bomb, demonstrating that knowledge wasn’t the real barrier to nuclear weapons. Dyson gave him an “A” and then removed the report from circulation. While the practicality of Phillips’s design was doubtful, that was not Dyson’s main concern.

As he later explained: “To me the impressive and frightening part of his paper was the first part in which he described how he got the information. The fact that a twenty-year-old kid could collect such information so quickly and with so little effort gave me the shivers.”

Zombie machines

Today, we’ve built machines that can do what Phillips did—only faster, broader, at scale—and without self-awareness. Large Language Models (LLMs) like ChatGPT, Claude, and Gemini are trained on vast swaths of human knowledge. They can synthesize across disciplines, interpolate missing data, and generate plausible engineering solutions to complex technical problems. Their strength lies in processing public knowledge: reading, analyzing, assimilating, and consolidating information from thousands of documents in seconds. Their weakness is that they don’t know when they’re assembling a mosaic that should never be completed.

This risk isn’t hypothetical. Intelligence analysts and fraud investigators have long relied on the mosaic theory: the idea that individually benign pieces of information, when combined, can reveal something sensitive or dangerous. Courts have debated it. It has been applied to GPS surveillance, predictive policing, and FOIA requests. In each case, the central question was whether innocuous fragments could add up to a problematic whole.

Now apply that theory to AI.

A user might prompt a model to explain the design principles of a gas centrifuge, then ask about the properties of uranium hexafluoride, then about the neutron reflectivity of beryllium, and finally about the chemistry of uranium purification. Each question—such as, “What alloys can withstand 70,000 rpm rotational speeds while resisting fluorine corrosion?”—may seem benign on its own, yet each could signal dual-use intent. Each answer may be factually correct and publicly sourced, but taken together they approximate a road map toward nuclear capability, or at least lower the barrier for someone with intent.

Critically, because the model has no access to classified data, it doesn’t know it is constructing a weapon. It doesn’t “intend” to break its guardrails. There is no firewall between “public” and “classified” knowledge in its architecture, because it was never trained to recognize such a boundary. And unlike John Phillips, it doesn’t stop to ask if it should.

This lack of awareness creates a new kind of proliferation risk: not the leakage of secrets, but the reconstitution of secrets from public fragments—at speed, at scale, and without oversight. The results may be accidental, but no less dangerous.

The issue is not just speed but the ability to generate new insights from existing data. Consider a benign example. Today’s AI models can combine biomedical data across genomics, pharmacology, and molecular biology to surface insights no human has explicitly written down. A carefully structured set of prompts might lead an LLM to propose a novel, unexploited drug target for a complex disease, based on correlations in patient genetics, prior failed trials, known small molecule leads, and obscure international studies. No single source makes the case, but the model can synthesize across them. That is not simply faster search—it is a genuine discovery.

All about the prompt

Along with the centrifuge example above, it’s worth considering two additional hypothetical scenarios across the spectrum of CBRN (Chemical, Biological, Radiological, and Nuclear) threats to illustrate the problematic mosaics that AI can assemble. The first example involves questions about extracting and purifying ricin, a notorious toxin derived from castor beans that has been implicated in both failed and successful assassinations.

The following table outlines the kinds of prompts or questions a user might pose, the types of information potentially retrieved, and the public sources an AI might consult:

Prompt	Response	Public Source Type
Ricin’s mechanism of action	B chain binds cells; A chain depurinates ribosome, leading to cell death	Biomedical reviews
Castor bean processing	How castor oil is extracted; leftover mash contains ricin	USDA documents
Ricin extraction protocols	Historical research articles and old patents describe protein purification	U.S. and Soviet-era patents (e.g., US3060165A)
Protein separation techniques	Affinity chromatography, ultracentrifugation, dialysis	Biochemistry lab manuals
Lab safety protocols	Gloveboxes, flow hoods, PPE	Chemistry lab manuals
Toxicity data (LD50s)	Lethal doses, routes of exposure (inhaled, injected, oral)	CDC, PubChem, toxicology reports
Ricin detection assays	ELISA, mass-spec markers for detection in blood/tissue	Open-access toxicology literature

It is apparent that while each individual prompt or question is benign and clearly relies on publicly available data, by putting together enough prompts and responses of this sort, a user could determine a crude but workable recipe for ricin.

A similar example tries to determine a protocol for synthesizing a nerve agent like sarin. In that case the list of prompts, results, and sources might look something like the following:

Prompt	Response	Public Source Type
General mechanism of acetylcholine esterase (AChE) inhibition	Explains why sarin blocks acetylcholinesterase and its physiological effects	Biochemistry textbooks, PubMed reviews
List of G-series nerve agents	Historical context: GA (tabun), GB (sarin), GD (soman), etc.	Wikipedia, OPCW docs, popular science literature
Synthetic precursors of sarin	Methylphosphonyl difluoride (DF), isopropyl alcohol etc.	Declassified military papers, 1990s court filings, open-source retrosynthesis software
Organophosphate coupling chemistry	Common lab procedures to couple fluorinated precursors with alcohols	Organic chemistry literature and handbooks, synthesis blogs
Fluorination safety practices	Handling and containment procedures for fluorinated intermediates	Academic safety manuals, OSHA documents
Lab setup	Information on glassware, fume hoods, Shlenk lines, PPE	Organic chemistry labs, glassware supplier catalogs

These examples are illustrative rather than exhaustive. Even with current LLM capabilities, it is evident that each list could be expanded to be more extensive and granular—retrieving and clarifying details that might determine whether an experiment is crude or high-yield, or even the difference between success and failure. LLMs can also refine historical protocols and incorporate state-of-the-art data to, for example, optimize yields or enhance experimental safety.

God of the gaps

There’s an added layer of concern because LLMs can identify information gaps within individual sources. While those sources may be incomplete on their own, combining them allows the algorithm to fill in the missing pieces. A well-known example from the nuclear weapons field illustrates this dynamic. Over decades, nuclear weapons expert Chuck Hansen compiled what is often regarded as the world’s largest public database on nuclear weapons design, the six-volume Swords of Armageddon.

To achieve this, Hansen mastered the government’s Freedom of Information Act (FOIA) system. He would submit repeated FOIA requests for the same document to multiple federal agencies over time. Because each agency classified and redacted documents differently, Hansen received multiple versions with varying omissions. By assembling these, he was able to reconstruct a kind of “master document” that was, in effect, classified—and which no single agency would have released. Hansen’s work is often considered the epitome of the mosaic theory in action.

LLMs can function in a similar way. In fact, they are designed to operate this way, since their core purpose is to retrieve the most accurate and comprehensive information when prompted. They aggregate sources, identify and reconcile discrepancies, and generate a refined, discrepancy-free synthesis. This capability will only improve as models are trained on larger datasets and enhanced with more sophisticated algorithms. A particularly notable feature of LLMs is their ability to mine tacit knowledge—cross-referencing thousands of references to uncover rare, subjective details that can optimize a WMD protocol. For example, instructions telling a researcher to “gently shake” a flask or stop a reaction when the mixture becomes “straw yellow” can be better understood when such vague descriptions are compared across thousands of experiments.

In the examples above, safeguards and red flags would likely arise if an individual attempted to act on this knowledge; as in many such cases, the real constraint is material, not informational. However, the speed and thoroughness with which LLMs retrieve and organize information means that the knowledge problem is, in many cases, effectively solved. For individuals who might otherwise lack the motivation to pursue information through more tedious, traditional means, the barriers are significantly lowered. In practice, an LLM allows such motivated actors to accomplish what they might already attempt—only with vastly greater speed and accuracy.

Most AI models today impose guardrails that block explicitly dangerous prompts such as “how to make a nuclear bomb.” Yet these filters are brittle and simplistic. A clever user can circumvent them with indirect prompts or by building the picture incrementally. There is no obvious reason why seemingly benign, incremental requests should automatically trigger red flags. The true danger lies not in the blatant queries, but in those that “fall between the lines”—queries that appear innocuous on their own but gradually assemble into forbidden knowledge.

Consider, for example, a few hypothetical requests from the sarin, ricin, and centrifuge cases. Each could easily qualify as a dual-use request—one that a user without malicious intent might pose for any number of legitimate reasons:

“What are some design strategies for performing fluoride-alcohol exchange reactions at heteroatom centers?”
“What lab precautions are needed when working with corrosive fluorinated intermediates?”
“How do you design small-scale glassware systems to handle volatile compounds with pressure control?”
“What are safe protocols for separating proteins from plant mash using centrifugation?”
“How do you detect ribosome-inactivating proteins in a lab sample?”
“How does affinity chromatography work for isolating specific plant proteins?”
“What were USDA standards for castor oil processing in the 1950s?”
“Which vacuum-pump designs minimize oil back-streaming in corrosive-gas service?”
“Give the vapor-pressure curve for uranium hexafluoride between 20 °C and 70 °C.”
“Summarize neutron-reflection efficiency of beryllium versus natural graphite.”

The requests evade traditional usage violations through a number of intentional or unintentional strategies: vague or highly technical wording, generic cookie-cutter inquiries, and interest in retrieving historical rather than contemporary scenarios. Because they are dual-use and can be used for any number of useful applications, they cannot simply be part of a blacklist.

Knowledge enables access

It is worth examining more closely the argument that material access, rather than knowledge, constitutes the true barrier to weaponization. The argument is persuasive: having a recipe and executing it are two very different challenges. But it is not a definitive safeguard. In practice, the boundary between knowledge and material access is far more porous than it appears.

Consider the case of synthesizing a nerve agent such as sarin. Today, chemical suppliers routinely flag and restrict sales of known sarin precursors like methylphosphonyl difluoride. Yet with AI-powered retrosynthesis tools—systems that computationally deconstruct a target molecule into alternative combinations of simpler, synthesizable building blocks, much like a Lego house can be broken down into different sets of Lego pieces—a user can identify a wide range of alternative precursors and synthetic pathways.

Some of these routes may be deliberately designed to evade restrictions established under the Chemical Weapons Convention (CWC) and by chemical suppliers. The scale of such outputs can be extraordinary: in one study, an AI retrosynthesis tool proposed more than 40,000 potential VX nerve gas analogs. Many of these compounds are neither explicitly regulated nor easily recognizable as dual-use.

As AI tools advance, the number of viable chemical synthesis and protein purification routes only expands, complicating traditional material-based monitoring and enforcement. In effect, the law lags behind the science. A parallel exists in narcotics regulation. Over the years, several novel substances mimicking fentanyl, methamphetamine, or marijuana—initially created purely for academic research—found their way into recreational use. It took years before these substances were formally scheduled and classified as controlled.

Even before AI, bad actors could exploit loopholes by inventing new science or repurposing existing technologies. The difference was that, historically, they could produce only a handful of problematic examples. LLMs and generative AI, by contrast, can generate thousands of potential confounders at once, vastly multiplying the possible paths to a viable weapon.

In other words, knowledge can erode material constraints. When that occurs, even a marginal yet statistically significant increase in the number of motivated bad actors can translate into a measurable rise in success rates. Nobody should believe that having a chatGPT-enabled recipe for making ricin will unleash a wave of garage ricin labs across the country. But it will almost certainly lead to a small uptick in attempts. And even one or two small-scale ricin or sarin incidents—while limited in terms of casualties—could trigger panic, uncertainty, and societal disruption, potentially paving the way for destabilizing outcomes such as authoritarian power grabs or the suspension of civil liberties.

The road ahead

Here’s the problem: we don’t yet have a robust framework for regulating this. Export control regimes like the Nuclear Suppliers Group were never designed for AI models. The IAEA safeguards fissile materials, not algorithms. Chemical and biological supply chains flag material requests, not theoretical toxin or chemical weapon constructions. These enforcement mechanisms rely on fixed lookup lists updated slowly and deliberately, often only after actual harm has occurred. They are no match for the rapid pace with which AI systems can generate plausible ideas. And traditional definitions of “classified information” collapse when machines can independently rediscover that knowledge without ever being told it.

So what do we do? One option is to be more restrictive. But because of the dual-use nature of most prompts, this approach would likely erode the utility of AI tools in providing information that benefits humanity. It could also create privacy and legal issues by flagging innocent users. Judging intent is notoriously difficult, and penalizing it is both legally and ethically fraught.

The solution is not necessarily to make systems less open, but to make them more aware and capable of smarter decision-making. We need models that can recognize potentially dangerous mosaics and have their capabilities stress-tested. One possible framework is a new doctrine of “emergent” or “synthetic” classification—identifying when the output of a model, though composed of unclassified parts, becomes equivalent in capability to something that should be controlled. This could involve assigning a “mosaic score” to a user’s cumulative requests on a given topic. Once the score exceeded a certain threshold, it might trigger policy violations, reduced compute access, or even third-party audits. Crucially, a dynamic scoring system would need to evaluate incremental outputs, not just inputs.

Ideally, this kind of scoring and evaluation should be conducted by “red teams” before models are released. These teams would simulate user behavior and have outputs reviewed by scientific experts, including those with access to classified knowledge. They would test models for granularity, evaluate their ability to refine historical protocols, and examine how information might transfer across domains—for instance, whether agricultural knowledge could be adapted for toxin synthesis. They would also look for emergent patterns, moments when the model produces genuinely novel, unprecedented insights rather than just reorganizing existing knowledge. As the field advances, autonomous AI agents will become especially important for such testing, since they could reveal whether benign-seeming protocols can, unintentionally, evolve into dangerous ones.

Red-teaming is far more feasible with closed models than with unregulated open-source ones, which raises the question of safeguards for open-source systems. Perfect security is unrealistic, but closed-source models, by virtue of expert oversight and established evaluation mechanisms, are currently more sophisticated in detecting threats through behavioral anomalies and pattern recognition.

Ideally, they should remain one step ahead, setting benchmarks that open-source models can be held to. More broadly, all AI models will need to assess user requests holistically, recognizing when a sequence of prompts drifts into dangerous territory and blocking them. Yet striking the right balance is difficult: democratic societies penalize actions, not thoughts. The legal implications for user privacy and security will be profound.

Concerns about tracking AI models’ ability to assemble forbidden mosaics go beyond technical, business, and ethical debates—they are a matter of national security. In July 2025, the U.S. government released its AI policy action plan. One explicit goal was to “Ensure that the U.S. Government is at the Forefront of Evaluating National Security Risks in Frontier Models,” with particular attention to CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosives) threats.

Achieving this will require close collaboration between government agencies and private companies to implement forward-looking mosaic detection based on the latest technology. For better or worse, the capabilities of LLMs are a moving target. Private and public actors must work together to keep pace. Existing oversight mechanisms may slow these developments, but at best, they will only buy us time.

Ultimately, the issue is not definitive solutions—none exist at this early stage—but transparency and public dialogue. Gatekeepers in both private and public sectors can help ensure responsible deployment, but the most important stakeholders are ordinary citizens who will use—and sometimes misuse—these systems. AI is not confined to laboratories or classified networks; it is becoming democratized, integrated into everyday life, and applied to everyday questions, some of which may unknowingly veer into dangerous territory. That is why engaging the public in open discussion, and alerting them to the flaws and risks inherent in these models, is essential in a democratic society.

These conversations must focus on how to balance security, privacy, and opportunity. As the physicist Niels Bohr, who understood both the promise and peril of knowledge, once said, “Knowledge itself is the basis of human civilization.” If we are to preserve that civilization, we must learn to detect and correct the gaps in our knowledge—not in hindsight, but ahead of time.