Pedro Nobre

Human Vulnerabilities and the Path to Symbiosis with Machines

By Pedro Nobre

Feb 2025

TL;DR

Super Artificial General Intelligence (SAGI) is inevitable, as machines evolve far faster than humans, constrained by natural selection. To ensure a symbiotic relationship between humans and machines, we must focus on making humans invulnerable by addressing our cognitive and emotional vulnerabilities. This requires a shift from heuristic approaches to optimization methods that actively refine and "fix" these weaknesses. To achieve this, we need a closed-loop system where input and feedback mechanisms harmoniously refine brain models. While some advanced technologies remain out of reach, we can start with low-hanging fruits by building smaller, foundational brain models as stepping stones toward a larger vision.

Author's Note

This document was written as the final project for the AI Safety Fundamentals course by BlueDot. It aims to explore human vulnerabilities and propose paths toward a symbiotic relationship with artificial intelligence.

Prologue

Side note before starting

In this document, terms such as 'superintelligence,' 'super artificial general intelligence' (SAGI), or 'artificial general intelligence' (AGI) refer to hypothetical AI systems that exceed human intelligence across all domains, including creativity, decision-making, and emotional understanding.

This document is a position paper, not a technical or scholarly piece. It’s about exploring how humanity can address its vulnerabilities in the face of inevitable Super Artificial General Intelligence. The ideas here aren’t entirely new—most have been shaped by others—but the goal is to assemble them into a coherent whole that frames the challenges ahead and maps out possible ways forward.

It’s not about hard answers or technical details but about starting a conversation. The writing avoids jargon and technical prerequisites so it can resonate with readers across fields—neuroscience, philosophy, AI, and beyond. My hope is to offer a perspective that makes the stakes clear and inspires new ways of thinking about the future we’re building.

Introduction

Super Artificial General Intelligence is inevitable. Machines can evolve much faster than humans, who are limited by natural selection. We need to make humans invulnerable to ensure our relationship with machines is symbiotic and not a winner-takes-all scenario—in which case, we would likely end up on the losing side.

This document aims to explore our vulnerabilities: where they come from, why they exist, and how to address them without losing our humanity in the process. (We should not aim to become machines; we should aim to become Super Humans.)

The Evolution of Intelligence

Distilling exactly what intelligence means deserves an essay of its own, so I will be concise here and focus on what is relevant to this document. Let’s assume for the purpose of this document that intelligence is the capacity to use knowledge^[1] Knowledge is information that can range in utility—it may be useless, useful, or even flawed. Useful knowledge can be inherited biologically (e.g., genetic programming) or socially (e.g., books and culture) and can also be acquired through interaction with the environment. It includes instincts, feelings, patterns, ideas, neural circuits, strategies, physical traits, and other constructs that may support effective decision-making and action.to interact with an environment in pursuit of an objective.

The first signs of intelligence appeared with the earliest organisms that developed ways to learn how to interact with their environment for the purpose of survival.The capacity to survive stems from knowledge that is sophisticated enough to keep the organism alive. However, as the environment changes and presents new challenges, the efficacy of that knowledge diminishes. Only those with more advanced knowledge manage to adapt and ensure the continued survival of their lineage. This gradual and adaptive process of acquiring and refining knowledge for survival is known as evolution through natural selection, where much of this knowledge is encoded in genes.

Yet, this process is extremely slow, fragile, and prone to losing valuable “knowledge” along the way. To overcome this limitation, different animals developed ingenious ways to encode and transmit information through behaviors. Behaviors, while heavily influenced by genetics, are not the same as instincts, which are directly dictated by genetic programming. Instead, behaviors allow animals to cooperate with others and share knowledge.

For example, ants communicate with their colony to reveal the location of food—that’s knowledge being shared. But here’s the key: the ant’s behavior goes beyond simply using knowledge—it’s about transmitting it! Wolves hunt in packs, lions establish social hierarchies to avoid unnecessary conflict, and many animals express complex behaviors to communicate status or intentions. Through their behaviors, they’re not only using and sharing knowledge but also acquiring it, deciding whether to pass it along or keep it to themselves.

Humans took this idea to the next level by developing a variety of new and more sophisticated tools. We figured out how to store knowledge and pass it directly to future generations. The first major breakthroughs were culture and language. Culture allowed us to transmit knowledge across generations without starting from scratch, and language enabled us to refine that knowledge and coordinate with one another more effectively. These two tools unlocked the concepts of specialization, engineering, and technology.

Then came writing, books, and extremely advanced linguistics. We began accumulating knowledge and accessing greater levels of intelligence, all while maintaining the same biological "hardware”. The scientific method eventually emerged as one of the most transformative tools for refining and expanding this knowledge systematically. By combining observation, experimentation, and reasoning, it allowed us to uncover truths about the world with unprecedented rigor. This accumulation of knowledge enabled humanity to advance at an extraordinary pace. Our ancestors recognized the risks that intelligence posed and relied on dogmas and traditions to try to limit our capacity for harm. However, clashes between cultures and the relentless accumulation of intelligence eventually overcame those limitations.

In an instant, the internet emerged, giving us a tremendous amount of power—and with it, a tremendous capacity for harm. We now carry all the intelligence ever generated, and being generated, in the world right in our pockets. But we’re still running on the same hardware. The same hardware as the first human civilizations from 20,000 years ago, equipped with tools crafted through billions of years of natural selection.

Those tools weren’t designed for highly intelligent beings. Some are incredible, amazing, and useful, but many are outdated (i.e. fear of public speaking)—there simply hasn’t been enough time for evolution to catch up. Intelligence has advanced too quickly for natural selection to keep pace. And with that rapid growth comes an unprecedented capacity for harm: from cyberbullying to nuclear war.

Thankfully, we’ve established mechanisms and institutions to prevent a nuclear catastrophe. But eventually, someone might press that button—unless we upgrade our hardware.

Now we have AI. Current LLMs already have the capacity to reason. They have their own world models and they are getting increasingly better at slow thinking. Inference time is decreasing, chain-of-thought is proving to be enough to achieve Super AGI. And the emergence of new architectural alternatives to autoregressive models can make the process go even faster.

Looking into the future, the capacity for harm is limitless. It doesn’t matter whether that harm comes from a sentient AI or a human being using AI as a tool. The clock is ticking.

Vulnerabilities

A vulnerability is like a bug in our code—something that can be exploited, intentionally or unintentionally, leading to unaligned behaviors. These vulnerabilities cause humans to act irrationally when exposed to certain inputs that exploit them. They can also be weaponized for manipulation.

Together with the alignment problem, this is one of the greatest challenges in AI safety. If we were invulnerable, AIs couldn’t harm us. And even if they tried, it wouldn’t benefit them. Why bother attacking something that can’t be attacked? Cooperation would simply make more sense.

If AIs ever become sentient, they might view us as reliable partners, and peace could be established between us. Fixing our "bugs" would not only make us resilient to AI-related risks but also resolve conflicts between humans—eliminating the motivation to use AI for harm.

Which vulnerabilities truly matter?

Not all bugs are equal; they can be divided into two categories: the “good” bugs and the “bad” bugs. We should focus on eliminating the bad bugs.

Take optical illusions, for example. They are a kind of adversarial input that tricks our minds—a bug, yes, but a harmless one. It just happens that our brains came up with their own way of representing the world and making it more digestible for us. It's an okay bug –In fact, we’ve already developed scientific tools, like precise measurements, to handle these kinds of issues effectively.

The problem lies in the bugs that hack our reward system. These are the bad ones. Because these bugs can reshape our brains entirely. These bugs all stem from emotions. But emotions aren’t bad. There is a fine line between a feature and a bug.

Sometimes misrepresentations can indeed be harmful due to some illness, which would constitute a bad bug, but not because of the misrepresentation itself (meaning a representation that differs “too much” from your peers) but from how the condition taps into something we weren’t designed to handle. This falls into the second category of bad bugs—vulnerabilities created by circumstances our evolutionary programming couldn’t anticipate. The same principle applies to other health conditions: the problem lies in how the condition makes us feel and how it ends up exploiting our brain because we weren’t designed to endure such conditions.

Take mental health problems, for example. Issues like anxiety, addictions, and other mental disorders are all bugs in our system. With addiction, our reward system wasn’t equipped to handle the advanced stimuli we’ve created—whether it’s doomscrolling on social media or consuming sophisticated chemicals. These modern inputs hijack our natural wiring, leading to destructive behaviors.

Now, let’s focus on anxiety. Anxiety is arguably the greatest vulnerability humans currently face. In the pre-intelligence era, anxiety served a vital purpose: it was an algorithm designed to identify potential threats or tasks, alert us to them, and prepare us to respond. However, in the modern world, two key factors make this mechanism problematic:

We now have reason as a more reliable tool to determine what we need to do.
Human intelligence evolved rapidly, and with it came a flood of stimuli and tasks, overwhelming our outdated anxiety algorithm.

As a result, anxiety frequently gets triggered unnecessarily, causing immense pain, clouding our judgment, and interfering with rational decision-making.

It is worth noting, though, that anxiety can be extremely useful. Most of the intelligence we can interpret and understand today—particularly through logic and language—has been transmitted and shared over time in a simplified, easy-to-digest format. However, the way this knowledge was originally derived was not purely deterministic. Creative exploration played a critical role.

Our brains were not built to function solely through logic and language. Instead, they evolved to work with abstract representations like feelings, images, memories, patterns, and ideas—along with concepts so abstract they can’t even be named. This is why extreme anxiety, while uncomfortable, sometimes arises for valid reasons. It is a signal, but mapping the feeling to words or understanding what it’s trying to tell us can be difficult. Just because we don’t fully understand it doesn’t mean it isn’t pointing to something extremely useful that we might otherwise miss.

This is significant because our brains are incredibly powerful, and leveraging their full potential requires embracing these abstract, non-linear processes. That’s what creativity is. Creativity is deeply tied to anxiety (unfortunately). It thrives on lateral thinking—exploring connections between seemingly unrelated ideas.

Pure logic, by contrast, is deterministic. When approaching a problem purely logically, we have to first narrow the range of possibilities we can consider. This limits our capacity to generate truly novel ideas. Creativity, on the other hand, often involves stepping into unbounded, abstract spaces, which anxiety can open up.

So, we need to be careful about how we deal with anxiety. The goal is to get rid of the bad parts while keeping our creative edge. Bugs need to be treated with caution—we want to stay human but get rid of the vulnerabilities. We need to make sure anxiety isn’t used against us while holding on to the creativity it helps unlock.

There is hope

When I say outdated hardware, I’m actually wrong. The issue isn’t with the hardware itself—it’s with the software. The problem lies in the code our brains run. The thing is, our brains are plastic; they can be reprogrammed and reshaped. This means the software is constantly evolving, with new circuits being established all the time. The hardware serves as an entry point to influence and modify the code, but the bug lies in the code we naturally produce.

The good news is that there’s a way to change this code—we just don’t have full control over it yet. We can think of this as an optimization process: finding the right inputs that rewrite the buggy code and replace it with secure by design software. And just like any software system, it needs constant patching and monitoring to keep it running securely.

How have we dealt with our bugs so far?

A good starting point for addressing most vulnerabilities is solving mental health. Nearly all—if not all—harmful bugs seem rooted in psychiatric conditions. Examining how we’ve addressed these so far can be pretty useful.

Broadly, mental health has been addressed through two main approaches: chemical and environmental. The chemical approach aims to alleviate symptoms directly, without necessarily reprogramming the underlying brain circuits responsible for the issue. In an ideal world, this would be a universal fix—a panacea. Think of it like adding a security layer to an already vulnerable system to make it stable. Essentially, it’s a patch. However, developing these "patches" doesn’t rely on a rigorous optimization framework. Instead, the process often depends on heuristics, making it more general, static, and imperfect (e.g., side effects, partial efficacy). That said, advancements in personalized medicine are gradually improving this method, tailoring treatments to individuals and addressing some of its limitations.

The environmental approach, in contrast, focuses on changing external conditions to foster the development of new, healthier brain circuits. In theory, identifying the exact environment or input data needed to "reprogram" our minds could also serve as a panacea. This also involves leveraging our own brains to generate input data that encourages the formation of new circuits—an idea already present to some extent in approaches like cognitive-behavioral therapy (CBT). However, like the chemical method, this approach also follows a heuristic process—it’s driven by trial and error, informed by the patterns we’re able to observe, rather than a precise formula or optimization method.

Despite their heuristic nature, both approaches have yielded remarkable progress. We’ve developed effective medications and impactful strategies. However, there are clear differences in accessibility and scalability. The chemical approach is often limited by the high costs, complexity, and rigorous processes required for progress. This slows down innovation and makes it less accessible to many people. On the other hand, the environmental approach is more accessible and easier to advance. It has the added advantage of creating real, lasting changes in the "code" our minds run on.

In the next section, we’ll explore alternative paths that could harness the power of optimization techniques to revolutionize our approach to mental health, transitioning from heuristics and trial-and-error methods to computational approaches where gradient descent or reinforcement learning algorithms can be applied effectively.

Moving from heuristics to optimization

To solve mental health we must establish a closed loop between input mechanisms, reading mechanisms, and modeling and simulation. This loop allows for optimization techniques to be applied toward specific objectives, ensuring interventions are precise, effective, and adaptive. Inputs provide stimuli, readings measure their effects, and models enable predictions and refinements to achieve desired outcomes.

This section provides a brief overview of the main mechanisms currently being pursued and how they align with this closed-loop framework.

Input Mechanisms

Inputs are stimuli designed to influence the brain and elicit measurable responses. These mechanisms aim to create changes in neural circuits or mental states that can be monitored and optimized.

Non-Invasive Methods

Music Therapy: Using sound and rhythm to stimulate emotional and cognitive changes.
Virtual Reality (VR): Creating immersive environments to deliver targeted stimuli and measure behavioral adaptations.
Magnetic Fields: Employing techniques like transcranial magnetic stimulation (TMS) to non-invasively modulate brain activity.
Transcranial Electrical Stimulation (TES): Delivering low electrical currents to specific brain regions to enhance neural plasticity and modulate activity.
Smell Therapy (Olfactory Stimulation): Using scents to evoke specific emotions or physiological states.
Touch and Haptic Feedback: Leveraging tactile stimulation, such as therapeutic massage or haptic devices, to engage the somatosensory system.
Light Therapy: Applying specific wavelengths of light to regulate circadian rhythms or affect mood.
Chemical Interventions: Administering medications or supplements that influence brain function without requiring invasive procedures (e.g., SSRIs for depression or anxiolytics for anxiety).

Invasive Methods

Neural Implants: Directly stimulating or modulating brain regions to achieve targeted changes (e.g., Neuralink).
Deep Brain Stimulation (DBS): Implanting electrodes to treat conditions like depression or OCD.
Optogenetics: Genetically modifying neurons to respond to light for precise circuit control.

Reading Mechanisms

Reading mechanisms measure the brain’s response to inputs, providing the data needed to evaluate and optimize interventions. These mechanisms must capture outputs that reflect the brain's internal state and changes over time.

Non-Invasive Methods

Proxies: Indirect measurements like questionnaires, behavioral patterns, or physiological data (e.g., heart rate, skin conductance).
Brain-Computer Interfaces (BCIs): Devices that externally read neural signals and translate them into actionable data.
Magnetic Resonance Imaging (MRI/fMRI): Non-invasive imaging tools that use magnetic fields to observe brain activity and structural changes.
Electroencephalography (EEG): Measuring electrical activity in the brain through electrodes placed on the scalp to detect neural patterns and responses.
Speech and Behavior Analysis: Using AI to infer mental states from voice tone, word choice, and behavioral patterns.
Wearables: Collecting physiological data through devices like smartwatches to track stress, sleep, or activity levels.

Invasive Methods

Intracranial EEG (iEEG): Monitoring brain activity with implanted electrodes for high-resolution data.
Neural Implants for Data Collection: Providing precise and continuous measurements of neural activity.

Modeling and Simulation

Modeling bridges inputs and readings by predicting the brain’s responses and enabling the application of optimization techniques. These models are essential for testing interventions and refining objectives.

Key Approaches

Brain Modeling: Developing computational models of brain activity to simulate responses to different inputs.
Whole Brain Emulation: Digitally simulating the brain to explore complex mental health challenges and refine interventions.
AI-Driven Mental Health Models: Using machine learning to analyze patterns and predict the effects of various treatments.
Digital Twins of the Brain: Creating personalized, virtual replicas of individual brains to simulate and optimize treatments.
Optimization Frameworks: Applying gradient descent, reinforcement learning, or other optimization algorithms to refine inputs and achieve measurable objectives.

Closing the loop

The key is to create a system where inputs generate measurable outputs, captured through reading mechanisms and refined by modeling. This kind of closed-loop approach enables iterative optimization toward specific goals—reducing anxiety, improving emotional regulation, or enhancing cognitive performance. With techniques like gradient descent or reinforcement learning, we can move past trial-and-error methods and design interventions that are precise, adaptive, and tailored to individual needs.

The Low-Hanging Fruit

How feasible is all of this? Let’s be honest—whole brain emulation is still far off, even for rodents. Sure, we’ve made progress with simulating parts of the brain, like the neocortical column, but stitching it all together into a fully functional emulation? Not happening anytime soon. Neural implants like Neuralink are still extremely limited, interacting with just a tiny number of neurons. And let’s not ignore the obvious: most people aren’t exactly lining up to get a chip in their heads

Then there’s fMRI. It’s slow, expensive, and inconvenient—not to mention it measures blood flow, not actual neural activity. And optogenetics? A fascinating tool for research but nowhere near practical for widespread human application.

So, what’s the low-hanging fruit? Where do we start?

Let’s focus on models we already have—good ones. Think about recommendation systems. Instagram, TikTok, YouTube—they’ve practically mastered the art of emulating the brain. These systems aren’t just good at predicting your behavior; they’re terrifyingly accurate. That feeling you get when you swear your phone is listening to you? It’s not magic. It’s not selection bias. It’s the result of billions of data points allowing these algorithms to mimic your preferences and habits with near-psychic precision.

But here’s the problem: these models aren’t optimizing for what’s good for us. They’re designed to hack our reward systems, to keep us scrolling, watching, and consuming endlessly. What if we flipped the script? What if, instead of maximizing screen time, we optimized for something meaningful—like reducing anxiety or solving depression?

It’s not impossible, but building the infrastructure is essential. We need massive datasets, sophisticated models, and clearly defined measurable objectives. The key question becomes: what input sequence minimizes a given parameter x? For example, what stimuli reduce anxiety or modify neural circuits to optimize for alleviating depression?

We need more models of the brain that work to fix our bugs rather than exploit them. Models that optimize for well-being, not just profit. The tools are there; it’s just a matter of changing the objective.

Next Steps

With billions of dollars being poured into achieving AGI (e.g., the recently announced $500 billion Stargate project) and the imminent race against China, as evidenced by the newer DeepSeek R-1 model, it is imperative to focus on human resilience now. If AGI is inevitable, we must ensure humans are equipped to thrive alongside it—mentally, emotionally, and neurologically.

That’s why I’m starting the organization cajal.org. Inspired by Santiago Ramón y Cajal, the father of modern neuroscience, this initiative will focus on building tools that help humans regain control of their minds. It’s not enough to prepare for AGI at a technical level—we need to address the vulnerabilities in our own biology that leave us overwhelmed, anxious, and prone to irrationality.

We’ll start by using the tools already available to us—AI models, neuroscience insights, and scalable platforms. Think personalized music therapy to reduce anxiety, or virtual reality experiences that simulate therapeutic effects, like psychedelics, without the need for substances. These are the low-hanging fruits, but they can lay the foundation for something bigger: computational brain models that optimize human well-being, resilience at scale and independence.

This isn’t just about mental health—it’s about survival in a rapidly accelerating world. If we want to coexist with AGI, we need to act fast. Cajal is my way of ensuring we do exactly that.