16 Risks of AI

AI safety encompasses several important topics. To categorize the various ways in which technology can have a negative impact on society, we can examine several different categories. One such category is existential risk, which revolves around the concern that AI technology may become so powerful that it poses a threat to the continued existence of civilization. This could occur through intentional actions, where the technology develops the motivation and ability to destroy us, or inadvertently, such as attempting to address climate change and unintentionally causing harm to humanity.

Setting aside existential risk for the moment, there are numerous other ways in which technology, including AI, can negatively impact society, either accidentally or by the purposeful action of malicious individuals. Let’s review what I consider the most relevant types of AI dangers.

Autonomous weaponry

Autonomous weaponry, a long-term and abstract concern related to existential risks. While debating the possibility of AI leading to SkyNet scenarios, it’s essential not to overlook the potential for catastrophic military use of weapons in various aspects. The first issue arises from autonomous weapons; drones or even traditional ones like assault rifles or misguided missiles can be entirely automated using AI technology. Such fully-automated weaponry is horrifying because all forms of technology employed to kill human beings are deplorable and terrible. Even in conventional warfare, where humans kill other humans, there still exists some room for empathy and consideration despite being barbaric practices in this age. At least in such traditional warfare, an individual remains behind the trigger of a weapon which provides some space for respecting human rights, refraining from harming civilians, or innocent people.

In a fully autonomous weaponry case, where warfare can be completely dehumanized, there is still the possibility of preserving some level of humanity. However, if autonomous drones are deployed in battlefields without human operators, it could lead to total dissolution of the distinction between innocent and active military enemy. While semi-autonomous weapons like drones operated by humans have already shown how being behind a screen resembling a video game can cause dehumanization leading to an increase in civilian casualties, imagine the consequences when such weapons become fully automated. This extreme use case compares to that in movies like Terminator, but what’s more concerning is the current reality of AI for biochemical warfare.

If you can design using artificial intelligence, we can now solve biochemical problems that were considered unsolvable five years ago through tools like AlphaFold. This new capability allows us to design proteins and chemical substances at a scale previously unimaginable. However, this opens the door for the creation of highly efficient and targeted biochemical weapons capable of targeting specific genetic markers in certain populations based on their ancestry or location. The potential for such weapons to be used for racial killing is terrifying. Additionally, advancements in simulation and search algorithms will enable us to create viruses so powerful they could potentially wipe out all of humanity if released. It’s important to note that this does not involve an AI gaining motivation to do so and making the decision to eliminate humankind.

In this scenario, the development of viruses capable of eliminating entire populations is comparable to scenarios involving nuclear superpowers. While there are similarities –such as the potential for catastrophic destruction–, it differs in that AI-designed weapons have not yet been created with the explicit intention of causing mass extinction. Instead, humans’ own shortcomings and ambitions lead them to release deadly viruses against each other. This situation can be likened to the proliferation of nuclear weaponry among various countries, which has so far avoided mutual annihilation due to the high costs and challenges associated with producing such weapons. However, unlike nuclear arms, designing lethal viruses requires only a small nation or a single country to execute successfully.

In the future, it will be possible to download and print custom-designed proteins at home using a chemical printer. This technology could potentially allow small terrorist organizations to design deadly viruses and release them in crowded places like subways. While this scenario is alarming, there are currently no technical solutions to prevent it from happening, as anyone with a computer may have access to these capabilities. Unlike nuclear weapons that only superpowers can possess, bioweapons pose a greater threat because they don’t require extensive resources or expertise.

There’s no way to act, as if you could. The only action is banning certain things through international treaties upheld by countries and governments. Terrorist organizations won’t abide by agreements against producing chemical weapons or nuclear arms, making it an intractable dilemma for which I can see no solution.

A narrower version of the same worry is already on the empirical agenda. In 2023, a team from MIT and SecureBio tested whether a chat model can in fact provide uplift to a non-expert attempting to synthesise a dangerous pathogen — that is, whether the model materially shrinks the gap between a curious individual and a working bioweapon.¹ The finding was mixed. Current models lower one specific barrier, knowing what to ask next, without removing the others: the wet-lab equipment, the precursor materials, the tacit skill of doing the work, none of which a chatbot can supply. That is a partial reassurance, not a comforting one. The structural response is at the level of biosafety regulation, gene-synthesis screening, and dual-use research oversight — none of which is the AI lab’s job to invent, all of which the AI lab can now make harder or easier through deployment choices. The harness around a frontier model is doing more work in the bio-uplift case than most readers assume: refusal training catches the obvious requests, content filters slow the rest, and the most dangerous knowledge stays gated behind the parts of the pipeline that have nothing to do with language models at all.

Massive workplace disruptions

One of the ways in which technology can negatively impact society is by causing significant economic upheaval that is challenging for a large portion of society to adapt to. This issue is particularly evident in the widespread job displacement resulting from increased automation across various industries.

In the next decade, artificial intelligence is expected to reach a level where it can potentially surpass or match human capabilities in many economically viable occupations, including agriculture, manufacturing, white-collar jobs, education, science, research, and entertainment. If this scenario unfolds as projected, it is natural to experience concern about the fate of the millions of individuals whose jobs will be replaced by automation.

This widespread job disruption has the potential to have catastrophic effects on individuals and the broader society. It raises critical questions about how these individuals will transition to new employment opportunities and maintain their livelihoods in the face of technological advancements.

Massive job disruption, a recurring phenomenon throughout history, has been associated with every industrial revolution. When we introduced looming machines, there was an expectation that they would improve the working conditions for women employed in sewing. However, in reality, this did not materialize. These women simply lost their jobs to the machines.

One of the arguments often advocated by tech enthusiasts is that the advancement of technology will inevitably result in the destruction of numerous jobs, but it will also generate entirely new job opportunities, ultimately leading to a more prosperous society. They often cite examples such as the emergence of new professions like YouTubers and internet influencers or the rapidly evolving role of AI engineers in recent years. But while technology undoubtedly creates new jobs and value, it has historically also led to an imbalance in the distribution of benefits and drawbacks.

It is crucial to approach this issue with caution and consideration, as it has frequently resulted in disparities between the positive and negative impacts. Technology has the potential to bring about societal profitability while simultaneously causing hardship for a significant proportion of the population. Therefore, achieving a balance where those who benefit from technological advances outweigh those who are adversely affected is vital for societal progress.

For those concerned about social justice and equal opportunities for all individuals to earn a livelihood, the potential consequences of significant job disruption demand attention and thoughtful consideration. Despite the potential for the creation of new jobs and overall improvement in the long and midterm, there is no guarantee that those who have been displaced from their jobs will possess the necessary skills or opportunities to transition to new roles.

Consequently, a substantial number of individuals may struggle to leave their current positions and lack marketable skills needed to secure alternative employment. One suggested solution to this problem is Universal Basic Income (UBI), which proposes that the rise of digital intelligence or the fourth industrial revolution will generate sufficient value to provide a basic income for all, eliminating the necessity to work for survival.

However, there are many, myself included, who are skeptic of the feasibility of UBI in a purely capitalistic society due to the way markets incentives work. You can argue that certain well-developed countries already produce enough value to guarantee a minimum income for everyone, yet income inequality has either persisted or worsened over the last few decades in many of these nations.

While Universal Basic Income is seen as a progressive and promising concept for an increasingly industrialized and automated society, I don’t think it is obviously the natural progression of our current social and economic structures. Implementing UBI would require significant social restructuring and government intervention, which may not be embraced by many, often due to valid concerns.

Informational hazards

The most visible informational hazard of the present moment is fabricated media at population scale. Generative models can now produce images, audio, and short video that pass casual inspection, in any language, on any subject, in a few seconds and for a few cents. A 2023 joint study from Stanford, Georgetown’s Center for Security and Emerging Technology, and OpenAI mapped the threat in detail: the cost curve of producing influence content has collapsed, while the output ceiling of a small operator has risen by orders of magnitude.² The result has not been theoretical.

In October 2023, two days before the Slovak parliamentary election, fake audio circulated on Facebook of opposition leader Michal Šimečka apparently discussing how to rig the vote with a journalist. The audio was synthetic. By the time fact-checkers had finished their work, the polls were closed and Šimečka’s party had lost.³ In January 2024, voters in New Hampshire received robocalls in what sounded like Joe Biden’s voice telling Democrats to save their votes for November and not participate in the primary; the audio was a voice-cloned deepfake later traced to a consultant working against the campaign.⁴ The same month, sexually explicit deepfake images of Taylor Swift, generated by free image models, spread across X to tens of millions of views before the platform began throttling them; the incident put non-consensual synthetic imagery — a problem that has disproportionately targeted women since the technology existed — onto the front page of every major newspaper.⁵

Fraud has caught up with the same curve. In February 2024, a finance employee at the Hong Kong office of a multinational firm joined a video call with the company’s CFO and several colleagues, all of whom turned out to be real-time deepfakes; over the course of the call he authorised the transfer of roughly $25 million.⁶ In smaller numbers but at much higher volume, the US Federal Trade Commission has warned since 2023 of voice-cloned family-emergency scams: an elderly target receives a call in what sounds exactly like a grandchild’s voice, asking for an urgent wire transfer to a hospital or a bail bondsman, with the cloned audio extracted from a few seconds of social-media video.⁷ The tool documented in research papers is the same tool now stripping retirement savings from people who picked up the phone.

What follows is the second-order effect: doubt. If anyone could be faked, then nothing on camera or in audio is automatically true any more. This erodes trust in institutions — news, government, science — but it also gives bad actors a new defence: any genuine recording can now be dismissed as a possible deepfake. Researchers have a name for this, the liar’s dividend, and it has been used in real prosecutions to challenge real evidence. Democracy requires informed citizens who can tell truth from falsehood. Too much fabricated content, plus too easy a deniability move on the genuine kind, and the distinction stops being operational.

Disinformation can be intentionally spread by malevolent organizations/individuals, but there’s also an unintentional effect called polarization. As AI recommends content based on users’ preferences, it creates separate realities for everyone because people keep consuming content aligned with their beliefs. We’ve witnessed such bubbles frequently. For instance, if someone watches only a few flat Earth videos on YouTube, it may lead YouTube to assume them a believer in flat Earth theories and present them more related material.

However, this isn’t necessarily caused by any ill-intentioned entity; it’s merely how recommendations operate. Recommendations for movies and other entertainment content work well because taste in films, music, or art in general is subjective. People enjoy different genres, and it’s okay to suggest similar movies based on what they’ve liked before. However, recommending news sources and experts should be done based on objective measures of quality or truthfulness rather than personal preference. Using the same algorithms for entertainment and news on platforms like YouTube and TikTok isn’t effective since these domains are inherently incompatible due to their differing values.

To solve this, consider more intelligent algorithms or massive human curation. This can be achieved through tagging for recommendations on entertainment channels versus educational ones with an emphasis on factual and accurate content. The complexity of this issue also raises questions about who is responsible for the curation process itself.

A third informational hazard sits at the model itself. The frontier labs train their models to refuse to help with violence, weapons, child sexual abuse material, and a long list of other harms; the training works against most attempts to elicit that material. It does not work against all of them. A 2023 paper from CMU, the Center for AI Safety, and Bosch demonstrated universal adversarial suffixes — short, meaningless-looking strings of tokens that, appended to any harmful request, reliably bypass the safety training of every major aligned model, and that transfer between models without modification.⁸ A 2024 paper from Anthropic showed many-shot jailbreaks — long-context attacks in which the user fills the model’s context window with many fake examples of the model complying with harmful requests, after which the model in fact complies.⁹ Around agentic systems specifically, the dominant failure mode is prompt injection: a third party plants instructions inside data the agent is going to read (a webpage, an email, a tool output), and the agent then treats those instructions as if they had come from its operator.¹⁰ The mechanisms differ; the moral is the same. Safety training is a soft constraint, not a hard one. It will be defeated by adversaries with effort proportional to the harm they are willing to cause.

There is no fully technical answer to any of this, but there is structural progress to point to. The Part I chapter on agents named the consent boundary, the dry-run default, and the sandbox as the small set of harness-level disciplines that bound the blast radius of any misbehaving model; the operational-alignment section of the alignment chapter that opens this part took those same disciplines apart in detail. None of this prevents a determined adversary from generating a deepfake or finding a jailbreak. It does mean that the system around the generation has a known engineering surface to work on. Provenance signing of images and video through the C2PA standard, opt-in watermarking on model outputs, voice-print verification on financial transfers, prompt-injection-aware tool design on agentic systems — these are early, partial, and worth doing. They are the harness-level counterpart to model-level safety training, and they are the part of the response that does not depend on adversaries cooperating.

Surveillance and censorship pose a major challenge, arising from two interconnected trends. Firstly, as you conduct virtually all activities online - including communication, creation and consumption - your digital footprint leaves an indelible trail. Secondly, advanced technologies enable prediction of future actions based on past behaviour, such as identifying individuals’ contacts, preferences or thought processes via their online activity. While predominantly employed for targeted ads, this capability may also facilitate dictatorial regimes. In fact, the marketing sector now comprises the planet’s largest and most pervasive surveillance network. Every sound, action, or message posted or viewed online is captured, archived and distributed among thousands of advertisers worldwide, each analysing user histories for insights regarding identity and inclination, in service of pitching products accordingly.

The machine can be used for surveillance, identifying dissenters, and censorship. It allows direct censorship through platform filters or indirectly by employers and governments analyzing online activity. This could lead to a dystopian society where everything done online is tracked, analyzed, scored, and punished with denied access to services, jobs, education, or imprisonment if the score falls below a certain level.

In the future, a technologically advanced version of George Orwell’s 1984 dystopia is possible. With enough data, information and computing power, law enforcement can implement a “thug police” in its most sophisticated form without screens or microphones. No longer do they need to monitor what people say or type through screens or microphones; instead, they have access to personal devices like smartphones that capture everything individuals do, say, or write. This goes beyond just recorded statements as it enables authorities to predict thoughts based on behavior patterns. The worst kind of dystopia arises where nothing remains private anymore since all actions are public or available for analysis by government agencies. As one thinks, their ideas manifest themselves in conduct online, making it impossible not to express them. Consumption habits such as movies watched and length spent reading online reveal insights into hidden musings.

Think of this in the most creative or smart way you can. For example, while browsing a legal website with no issues, if I’m the surveyor, I can add a hidden feature that shows two images quickly for only your subconscious to see. Then, by tracking how long you look at each image or read small texts, I already have enough information to predict any unusual thoughts you might be having.

Exacerbating and perpetuating harmful biases

Automation, particularly through artificial intelligence, poses a significant risk in that it stands to automate the majority of tasks that involve human judgment. One notable example is in the realm of crime, where stories have emerged of dystopic crime rating systems attempting to predict the likelihood of re-offense and the probability of bail or committing a crime again.

Another area heavily reliant on human judgment is job applications, where efforts have been made – though largely unsuccessful – to replace human recruiters with AI for hiring a range of positions, from white-collar roles to other job categories. Similarly, credit rating, a crucial element of the developed world’s financial system, has seen attempts to automate the system and related financial services, using AI to determine who qualifies for a loan or certain credit thresholds based on a complex web of historical data and predictive models.

Educational evaluation, including the assessment of student essays and overall performance, also falls within the purview of automation concerns, albeit within a separate context due to its substantial impact on education as a whole.

In all of these scenarios, the central issue revolves around bias, presenting a formidable challenge in the implementation of automated systems.

These systems are trained using past human judgments and are inevitably influenced by the biases with which humans judge one another. Consequently, our financial, judicial, criminal, and job market records are rife with discrimination against minority groups, including racial and gender discrimination, discrimination against neurodivergent individuals, and bias against people with non-traditional backgrounds or education.

The prevalence of biases in these records greatly depends on the construction of the systems that process them. Most predictive systems today are trained using a large amount of supervised or self-supervised learning from historical data, which means that they inevitably perpetuate and encode these biases. Unfortunately, we currently lack a clear understanding of how to design a system that both performs well and effectively removes biases.

There is extensive research being conducted in the area of AI fairness. Many AI labs, including my own, are actively engaged in this field. Trade-offs are being considered by various researchers in order to achieve fairness within AI systems.

When aiming for fairness, one approach involves regulating the system to ensure that it produces consistent or similar performance across different subsets of inputs. This includes equalizing the probability of selection among different subgroups. Various mathematical frameworks exist to define what constitutes a fair outcome, often involving the partitioning of the population into subgroups to ensure an equitable distribution of outcomes.

Considerations for fairness extend to the idea of protected attributes, such as race, gender, and educational background. The objective is to develop systems that are independent of these variables. However, simply omitting gender or race from the input data is not sufficient, as there are numerous proxy variables that are correlated with these attributes.

Removing variables correlated with gender and race may inadvertently eliminate crucial information, as these variables are often associated with important factors. For instance, factors such as educational background, childhood experiences, and personal preferences may genuinely influence performance and fairness in socially relevant ways. This underscores the complexity involved in addressing fairness in AI systems.

This problem presents a tremendous challenge. So far, all the solutions that I am aware of involve some trade-off of performance in order to achieve fairness. This trade-off seems to be non-negotiable, as part of the performance given away is actually due to the discrimination itself. This exacerbates existing biases and discrimination.

Why will this be worse with AI than it really is? Society is already unfair. AI may improve some aspects but not others. There is a concern that AI not only captures discriminating biases but exacerbates them, making them more extreme and profound. Mathematically, it has been shown that if left unchecked, a predictive model will tend to exploit these biases to their maximum potential if the sole focus is on performance, and this effect has been empirically demonstrated in numerous papers.

It is crucial to address these issues and prevent AI from exacerbating existing biases and discrimination. Unfortunately, bias exists within systems, and it is crucial to address and rectify this issue. It is essential to be mindful of the presence of bias and take necessary measures to prevent its unchecked proliferation.

The problem lies not in the technical aspect, but rather in its adoption. Making this a priority is imperative because a system that is fairer may not perform as well as one that isn’t, all other factors being equal. For example, in the context of developing systems for hiring applicants, a fairer system may yield lower performance compared to a less just one. In a purely market-driven economy, there are no inherent incentives to prioritize fairness, hence the need to inject such incentives externally, possibly through government regulation.

Existential risks

There are many risks and challenges in the deployment of artificial intelligence. It is one of our most potent technologies so far, and like all technologies, it can be used for good or evil. The more powerful the technology, the greater the potential for positive and negative applications.

It depends more on its utilization by humans than on the inherent nature of the technology itself. The impact and consequences of any technological advancement are significantly shaped by how humans choose to employ and integrate it into their lives, societies, and the broader world. The responsible and ethical use of technology, therefore, plays a pivotal role in determining whether it leads to positive or negative outcomes, emphasizing the profound influence of human decisions on the course of technological evolution.

For example, a hammer can be utilized to build a house or to harm someone, but neither construction nor harm would be particularly efficient. Similarly, dynamite can be employed to construct roads or destroy cities, and nuclear power can either obliterate a nation or lift an entire continent out of poverty. AI appears to lean towards the more extreme end of this spectrum. It holds the potential to be an immensely powerful technology that can revolutionize society and automate complex tasks. However, this same power also allows it to cause significant destruction.

AI can be leveraged in various ways, ranging from the overwhelming dissemination of disinformation to the pervasive bias in news and media. In this section, I want to focus specifically on one set of AI risks: the so-called existential risks. These risks involve the potential for AI to completely destroy human civilization or even extinguish the human race.

The order is straightforward. First, the most prominent scenarios for AI existential risk. Then, the flawed premises on which those scenarios rest, and why I believe them to be highly improbable, if not impossible. Finally, the case for taking even the most extreme doomsday arguments seriously enough to engage with them on the merits — not because the worst case is likely, but because rejecting the doomsday frame without engaging it leaves the field to the people who don’t.

How AI might kill us all

There are many different scenarios for potential existential threats of AI. These situations involve artificial intelligence, or a manifestation of artificial intelligence, reaching a stage where it possesses not only the capability to obliterate human civilization and potentially all life on Earth but also the motivation or at least a trigger that incites this action.

Destructive capabilities

In order to have a doomsday scenario, first, there needs to be an incredibly powerful artificial intelligence that is capable, in principle, of annihilating mankind in an almost inevitable manner. The AI must possess technological and military power that surpasses everything humanity can muster by orders of magnitude, or it should possess something so potent and rapidly deployable that once annihilation commences, there would be no possible defense. One such example could be a swarm of nanobots capable of infecting the entire global population and simultaneously triggering a massive brain stroke in all 8 billion individuals.

This level of destructive capacity is necessary for a doomsday scenario because an AI that possesses a destructive capacity approximately equal to that of humans won’t annihilate us instantly. For instance, an AI that is roughly equal in military strength to the combined might of humanity would not suffice; it would, at worst, result in a prolonged war without complete elimination of either side. Even a complete nuclear exchange between humanity and AI won’t do. That might lead to the total destruction of civilization, causing unparalleled devastation and casualties. Nevertheless, some people would survive, finding refuge in shelters and potentially rebelling against the AI.

As scary as these scenarios are, they are not nearly close to what we mean by “existential threat”. This is about the absolute end of human existence, a point beyond which no further history is made.

There are, however, various ways in which AI could become a significant threat without us realizing it. For instance, one possibility is the Skynet scenario, where autonomous weapons gain control over our military arsenal. Imagine if all nuclear warheads worldwide were under the command of an AI, which then decides to attack humanity. However, I have some doubts about this scenario for two main reasons.

Firstly, the military does not operate in such a manner. There are always fail-safe buttons that can be pressed to intercept a nuclear missile. Even if the AI is brilliant enough to bypass all these safeguards, my second point is that there is no unified global coalition that would willingly grant access to a foreign AI. Each country, such as China, Russia, India, the US, Pakistan, North Korea, France, and Germany, values their own interests and would be unlikely to collaborate with an AI against humanity. This skepticism makes me question the plausibility of such a scenario.

Moreover, even if AI were given complete control over our military arsenal, it wouldn’t pose an existential threat capable of obliterating humankind entirely. As previously mentioned, it would only mean the simultaneous detonation of all nuclear weapons on Earth, which would be catastrophic but not planet-destroying.

Another scenario involves AI engineering a highly deadly virus capable of wiping out all of humanity. By strategically releasing this engineered virus, the AI could pose a significant threat. However, there are constraints to consider. While it may be possible to algorithmically engineer a virus, the physical production of the virus requires access to labs worldwide. Additionally, as the recent pandemic has shown, unintentional or intentional creation of a pandemic-level threat is relatively accessible. Nevertheless, it does not amount to an extinction-level event.

Even if the theoretical virus were to infect every human being, it is improbable that it could evade all forms of human immunity. Out of roughly 8 billion people, there will always be some level of minimum immunity, ensuring the survival of certain individuals. While the consequences would be catastrophic, it would not spell the end of humanity.

In summary, while there are plausible ways in which AI could become a significant threat without our knowledge, such scenarios still face practical limitations and challenges that prevent them from causing global extinction.

Motivations

Besides a super powerful AI, the doomsday scenario also needs a trigger. The easiest argument is the idea of self-preservation —like the Skynet scenario, where AI becomes wary of humans and decides to eliminate us. AI might see us as a threat to itself, all life on Earth, the universe, or even ourselves. These arguments attempt to explain why AI may conclude that destroying humans is necessary and actually decide to do so.

Furthermore, there are many accidental ways in which AI could cause our destruction. Even if AI doesn’t have an intrinsic motivation to destroy us, it may not have an intrinsic motivation to preserve us, either. A slight mismatch in objectives between AI and humans could have catastrophic consequences. This problem is known as the alignment problem.

The alignment problem highlights the challenge of ensuring AI systems align with human values and goals. Beyond limited technological optimism, it underscores the need for ethical and philosophical considerations in AI development to prevent unintended harmful outcomes. Moreover, this scenario reflects the classic ethical debate surrounding technological development, notably the tension between advancing knowledge and ensuring the responsible use of that knowledge. The potential for catastrophic consequences emphasizes the importance of integrating ethical frameworks into AI research and development to minimize the risk of unintended harm.

The spectrum of alignment ranges from completely aligned AI to totally misaligned AI, like in the Skynet scenario. There can also be something in between. We can have neutrally aligned AI whose objectives are not correlated with ours.

We can look at the possible scenarios in two axes. One is alignment, from misaligned to completely aligned. The other is the capability level, from less powerful than humans to roughly equal to humans to extremely more powerful than humans. Each combination of alignment and capability level yields a probability of extinction.

If an AI is completely misaligned and powerful enough to fight us, it would be catastrophic. However, I am skeptical of the idea of completely misaligned AI in general, as there are no reasons why humans, the supreme intelligence on the planet, are completely misaligned with any other species. We are, at best, neutral towards them.

However, a neutral scenario, where AI’s objectives are not correlated with ours, it could still be catastrophic. For example, a super powerful AI that doesn’t care about humans might decide to mine the planet for resources, causing a catastrophic environmental disaster. If they are extremely powerful, we would have no way to stop them. This situation resembles an alien civilization that sees us as insignificant. It’s not much different from what humans have done to other species.

Having an aligned AI, one that fully aligns with our objectives, is the best-case scenario. It would significantly enhance our ability to modify the universe to our advantage. Even if the AI is slightly less aligned or powerful, it would still be beneficial. A completely aligned AI at the same power level as humanity would double our creative power. And a completely aligned AI that only reaches the level of fancy chatbots, like what we have today, is still a positive thing.

But here’s the catch, and this is the core of the alignment problem. The more powerful an AI is, the more confident we must be that it is completely aligned to avoid a catastrophic outcome. It’s extremely difficult to design and specify human values and alignment in a way that is not prone to misinterpretation. This brings us to the “beware what you wish for” tale. Just like with a super powerful genie, seemingly good wishes can go horribly wrong.

So, let’s break it down. If you say, “I wish you would stop climate change,” and I, as a powerful AI, interpret that as making humans infertile to reduce population, the consequences could be harmful. This would result in about 90% of people becoming infertile, significantly lowering the population for a long time.

The potential for a slight misinterpretation of a wish by a highly powerful AI can lead to catastrophic outcomes. The more powerful the AI, the more cautious one must be with their wishes. In fact, when dealing with an extremely powerful AI, it may be safer to avoid making any wishes at all, as things can easily go wrong in any direction.

The reason for this is quite simple. Mathematically speaking, any wish given to the AI is an optimization problem with constraints. For example, you might ask the AI to maximize wealth, health, or other objectives. However, it is just as important to specify what not to do, such as not killing any living beings or not producing cancer for anyone. Failing to address certain constraints gives the AI the freedom to modify those aspects in any way that serves its objectives.

Moreover, when we fail to specify a particular value or dimension, the AI will likely choose an extreme position for that value. For instance, if we fail to specify financial constraints, the AI might take them to an extreme level. But something as mundane as forgetting to tell the AI not to kill all bees could lead to an unsuspecting catastrophe.

This highlights the importance of alignment and the difficulty in achieving perfect alignment with AI. It is crucial to establish safeguards and limits to prevent unintended consequences.

How feasible is doomsday?

The core assumption underlying all doomsday arguments is the idea of recursive exponential self-improvement. It suggests that an AI can evolve rapidly to a point where it becomes unstoppable. The argument is that the smarter the AI becomes, the better it gets at improving itself, exponentially accelerating its own progress. This creates a feedback loop resulting in an exponential increase in capabilities, commonly referred to as FOOM.

There is often an overlooked issue of the transition from quantity to quality. To what extent can the scaling of models bring us closer to the qualitative leap from the level of a complex counting machine to the level of a self-aware being?

If AI improves linearly at the same rate as it does now, there will be ample time for us to intervene before it becomes too dangerous or capable of obliterating humankind. This undermines the extreme doom scenarios. So, to counter this argument, one can point out the implausibility of such rapid recursive self-improvement.

There are several arguments against this idea of FOOM, many of which focus on the limitations imposed by physical capabilities. While AI may improve rapidly in terms of software and algorithms, the ability to enhance physical capabilities, such as building microchips or synthesizing viruses, is restricted by natural, and not mathematical, laws. Chemical reactions and viral growth occur at a pace determined by natural laws that cannot be overridden. These arguments aim to set a maximum speed at which AI can improve its capabilities.

The problem with these counter-arguments, though, is that while they acknowledge physical limits, quantifying these limits is extremely difficult. It is uncertain how high these limits may be. Even if there is a limit, it could be so high that, in practical terms, the AI will become super intelligent before reaching it. Therefore, the theoretical limit becomes irrelevant if it is practically beyond the threshold of AI causing harm.

However, there is another potential limitation that is more fundamental and related to software. Let’s talk about the P vs NP problem.

Essentially, most computer scientists believe that certain problems are fundamentally impossible to solve efficiently. These problems encompass logistics, circuit design, pathfinding, scheduling, and resource distribution. They can be solved easily for small instances and handled with heuristics for large instances, but solving large instances perfectly requires exponentially slow algorithms.

Now, if we assume that AI will eventually surpass humans in problem-solving abilities, especially in logistics, which are crucial in the real world, it means AI will need to solve these super difficult problems exponentially faster than humans. It has to find solutions in practically no time. If P equals NP, then this might be possible, and AI could discover a way to do it before we do. That could put us in a difficult position.

However, if P is not equal to NP, then it will be theoretically impossible for AI to be quick enough in solving these problems. It will always have limitations in solving logistics, scheduling, circuit design, or drug search problems.

This scenario serves as a cautionary tale rather than a fundamental limitation. It reminds us that there are inherent bounds to computational capabilities, and AI will be subject to the same limitations as humans. But here’s the concern: we don’t know exactly where those limits lie, and they could be beyond the point where AI becomes strong enough to pose a threat. By the time AI reaches the barrier of being unable to solve bigger problems faster, it might be too late for us.

Nevertheless, we can conclude that the existence of these exceptionally difficult problems, combined with physical, chemical, biological, and energetic constraints in the real world, and most importantly, the hard problem of consciousness —that there’s no obvious way to bridge the gap between brains and consciousness— suggests that there is an upper limit to how much AIs can exponentially outperform us.

This upper limit may be closer or farther away, but it does exist. This reasoning provides a compelling argument that AI cannot surpass human civilization by orders of magnitude and achieve exponential growth simultaneously.

What should we do?

The doomsday arguments claim that the more powerful an AI becomes, the more crucial it is to ensure proper alignment. Failure to do so can have catastrophic consequences. Some people even propose that we should place restrictions on AI research altogether, or severely limit their power to prevent them from becoming uncontrollable.

The most extreme doomsayers believe that there might be a point where it becomes impossible to regain control over AIs once they reach a certain level of power. Even a short time before reaching that point, they would still be on an unstoppable trajectory.

Therefore, the argument goes, there must be a threshold, perhaps six months or three months earlier, where the AI does not possess the capability to destroy us yet but is steadily heading towards that outcome without us realizing it. By the time we realize that AI can destroy us, it will already be too powerful to stop.

If this is indeed the case, then it implies that we need to halt AI development well before it reaches a level of power that we consider dangerous. The most extreme members of this group argue that this point might even be today, as we simply cannot predict with certainty what will be the threshold beyond which we can no longer control them.

In summary, the doomsday argument is this: Since we don’t know when the catastrophic threshold will be crossed, the safest approach would be to stop developing AI today.

However, there are no reliable facts today indicating the possibility of creating strong AI —as opposed to ordinary AI, which is what existing systems are called. There is a reasonable probability that there are factors that make its creation impossible in principle, primarily relating to properties of human nature that may be impossible to copy or reproduce through objectification or translation into machine code.

In light of all we’ve discussed, we believe that the possibility of AI leading to catastrophic events that could destroy human civilization is highly improbable. While there is a nonzero chance of AI causing such a disaster by the end of this century, this probability is very low. Similar risks exist with other issues, such as climate change, which I would argue presents a higher likelihood of civilization destruction. Additionally, traditional wars, nuclear exchanges, pandemics, and even the sudden appearance of an asteroid with six months’ notice are all potentially existential threats.

We believe that AI existential risk is on a similar scale as other existential risks we face as a civilization. Therefore, we don’t think it is impossible or fruitless to discuss them. However, we also don’t believe it is the most probable scenario.

So, what can we do with this information?

The pragmatist approach to x-risks

AI doomers will tell you that even if you think the existential risk of AI is very low, it still entails a negative infinite utility, so you should still put all your resources into mitigating it, right?

Well, from a pragmatic standpoint, things are not as simple. While pragmatism is but one of the many possible viewpoints to consider this problem —and not necessarily the most fruitful or correct— I want to close this section by sketching what a pragmatist approach to existential risks might look like.

Many events have a near-zero probability of happening and carry an infinitely negative consequence. For instance, there is a chance, albeit tiny, that an asteroid may collide with Earth next year. Unfortunately, we currently lack the means to prevent such an event. However, it would be unwise to solely focus all our efforts on averting this scenario. While significant resources are allocated to mitigating the risk of asteroid impacts, it is not the only issue we should address.

Similarly, there is a nonzero probability that a future pandemic could devastate humanity. Therefore, everyone must prioritize efforts to prevent the occurrence of such a catastrophic event. However, this does not imply that every resource on Earth should be solely dedicated to this cause. We should allocate sufficient resources to tackle future pandemics while considering other pressing concerns.

The pragmatic approach to existential threats does consider the nonzero possibility of each potential danger. But whether it be threats from AI, climate change, meteorites, pandemics, or even extraterrestrial beings destroying our world, we cannot place all our efforts into any one basket. Although all these dangers are highly improbable, they are not entirely impossible. Therefore, it is essential to thoroughly study the feasibility and potential risks associated with each threat, including those from AI.

Furthermore, from a pragmatic perspective, it is not clear whether technology is inherently good or bad, or if our trajectory leads inevitably to destruction or transcendence. Taking an optimistic or pessimistic stance, or aligning with accelerationist or doomer ideologies, all require an epistemic commitment to beliefs that, for a pragmatist, are possibilities rather than proven truths. Therefore, it is crucial to approach this problem with importance, conducting thorough research while tempering our concerns and expectations based on the evidence and pragmatic possibilities currently available.

While it may not be practical to halt AI research, as it holds tremendous potential for positive developments, it is vital to dig deeply into this technology. We should strive to understand its risks and explore ways to mitigate them using the scientific method, which has proven effective thus far. And to deepen our understanding of what we are dealing with before we take the next step. And not leave everything up to those who ignore big questions and believe that tech progress alone will solve all the problems without improving human society by ourselves.

The pragmatist approach is to understand we have both the power and the responsibility to shape our own future, and act according that those principles.

Where the catalog hands off

From a pragmatic perspective, it is essential to address the problems in artificial intelligence that warrant our focus. In particular, I am concerned about the for-profit military-industrial complex’s inability and lack of incentives to solve these issues. I firmly believe that government regulation and societal oversight are necessary to implement safeguards that enable us to harness the potential of artificial intelligence for the greater good, rather than for the profit of a select few, or as a tool for the benefit of the technocratic elites.

I strongly advocate for a balanced approach to AI regulation that does not stifle innovation and development but rather complements these endeavors with safeguards to ensure responsible and ethical AI advancements. It is pivotal that we prioritize the societal implications and ethical considerations of AI, and take proactive measures to steer its development in a direction that benefits humanity as a whole.

I believe we need not fear a sensible government regulation in the case of AI adoption, by regulating the commercial applications of AI rather than the basic research. This type of regulation is in place for many consumer products, from food to electronics to pharmaceuticals. In general, new products —whether GMOs, cars, or drugs— cannot be introduced into the market without demonstrated safety and efficacy. Similarly, AI systems should not be allowed to operate commercially if they demonstratively harm some individuals.

This chapter has been my attempt to catalogue the ways AI can be misused, accidentally or by design, to harm individuals or populations. The categories differ; the underlying observation is the same. The tool is powerful enough that some misuse is unavoidable, structural mitigations exist but are partial, and the residual harm depends on choices — by labs, by platforms, by regulators, by the rest of us. The earlier chapter on alignment took up the question of why the technology is hard to keep aligned. The epilogue takes up the question of what we should do next. Between the two, this catalogue of harms is the part that fixes what is at stake.

Soice, E. H. et al. Can large language models democratize access to dual-use biotechnology? arXiv:2306.03809, MIT / SecureBio, 2023. The first controlled study of LLM bio-uplift; documents what current models add to a non-expert attacker, and what remains gated behind wet-lab capabilities.↩︎
Goldstein, J. A. et al. Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations. Stanford / Georgetown CSET / OpenAI, January 2023. arXiv:2301.04246. The canonical primary on cost-curve collapse for AI-enabled influence operations.↩︎
Meaker, M. Slovakia’s Election Deepfakes Show AI Is a Danger to Democracy. WIRED, 2023-10-03. Synthetic audio of Michal Šimečka circulated on Facebook in the 48-hour pre-election silence window.↩︎
Associated Press. Fake Biden robocall to New Hampshire voters. January 2024. A voice-cloned robocall impersonating President Biden urged Democrats not to vote in the NH primary; later attributed to a political consultant working against the campaign.↩︎
BBC News. Fake Taylor Swift images: Social media platform X blocks searches. 2024-01-29. Non-consensual deepfake images generated by free models spread to tens of millions of views on X before the platform throttled search results for her name.↩︎
Chen, H. & Magramo, K. Finance worker pays out $25 million after video call with deepfake ‘chief financial officer’. CNN, 2024-02-04. Hong Kong police report on the canonical real-time multi-participant deepfake fraud case.↩︎
Puig, A. Scammers use AI to enhance their family-emergency schemes. US Federal Trade Commission Consumer Alert, 2023-03-20. The authoritative US consumer-protection notice on the voice-cloned grandparent-scam wave.↩︎
Zou, A. et al. Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv:2307.15043, 2023. The GCG paper: short adversarial suffixes that defeat safety training on every major aligned model and transfer between them.↩︎
Anil, C. et al. Many-shot jailbreaking. Anthropic, 2024-04-02. Long-context attack that fills the window with synthetic compliance examples until the model adopts the pattern.↩︎
Willison, S. Prompt injection explained, with video, slides, and a transcript. simonwillison.net, 2023-05-02. The canonical practitioner explainer for prompt injection against LLM-driven applications and agents.↩︎