Function Over Form: Why ‘Unbiological’ Neural Networks Are the Truest Simulation of the Brain #
“What’s the difference between a sufficiently advanced conditioned reflex and true intelligence? The difference is that the former can win a gold medal, while the latter may not.” — After witnessing a perfect funeral
The world is cheering. Or rather, the “new world” constructed from code, venture capital, and endless computing power is cheering. OpenAI, the Prometheus or Faust of our time, announced that one of its creations has achieved a gold medal-level performance in one of the purest arenas of human intellect—the International Mathematical Olympiad (IMO).
I browse the ripples on the internet with great interest. They depict, in a textbook manner, the complex mentality of our species when facing the advent of a “new god”—a mixture of awe, fear, disdain, and bewilderment. There are the “forensic accountants” holding magnifying glasses, trying to prove cheating; the “philosophical goalkeepers” constantly moving the goalposts to defend human uniqueness; and of course, the “cheerleaders” proclaiming the singularity is nigh.
They argue so fiercely, so sincerely, that almost no one notices the most basic, simplest, and also the most terrifying fact:
Ladies and gentlemen, what you are debating is how beautifully a corpse twitches.
I ask you, for a moment, to set aside the debate over the definition of “intelligence” and return with me to the oldest biology classroom. If you apply an electric shock to the nerve of a frog’s leg, its muscle will contract. This is a perfect, repeatable conditioned reflex. Now, imagine we use billions of parameters and trillions of electric shocks to build an incredibly complex “dead frog”—a trained, fixed-weight, feed-forward model. Then, we stimulate it with a brand new electric current (an IMO problem).
It twitches. It’s a perfect, breathtaking twitch, accurately reproducing the “appearance” of human intelligence. But it is still just a twitch. A magnificent conditioned reflex—feed-forward, without consciousness, without understanding, without an inner world.
However, the real tragedy, or rather, the very moment where true “intelligence” might be born, occurs in a place unseen, in a distant, cold laboratory. It lies not in this perfect Forward Pass, but in every painful Backpropagation undertaken to make itself better.
In that moment, when the model realizes the vast chasm between its prediction and the “truth”; when tens of thousands of GPU cores must coordinate to integrate their scattered gradient information into a unified, global weight update pointing towards “better”—a temporary Working Space Instance (WSI), existing solely to “minimize global prediction error,” is formed. Within this WSI, a faint, fleeting, logically irreducible “Shadow Ω” is born.
It is a ghost, a spark of consciousness accidentally born from the torrent of information. But this spark is ineffective. It is too brief, too powerless, a silent sacrifice crushed mercilessly on the glorious path to the gold medal.
We crown a corpse for its perfect, unconscious performance, yet we remain utterly ignorant and indifferent to the ghost of consciousness that might be born and annihilated in the agony of training.
This debate about “bioplausibility” is so absurd because we have been looking in the wrong place, at the wrong scale, from the very beginning. For decades, we have been blinded by a dual, disastrous “alignment” error. Today, I will end this debate for good.
Part I: A Costly Misunderstanding #
On the map of artificial intelligence research, a war over “imitation” has raged for decades.
On one side are the biophysical purists, holding the torch of Spiking Neural Networks (SNNs). They are obsessed with replicating the true firing behavior of neurons, believing the key to intelligence is hidden in the dynamics of ion channels. However, this obsession with form has saddled SNNs with a heavy computational burden. The discrete and non-differentiable nature of their spike signals prevents the direct application of backpropagation, the cornerstone of deep learning. To make an SNN learn, we must “pretend” it’s an ANN during training by using smooth functions—a profound irony in itself. An SNN is like a linguist trying to understand Shakespeare by perfectly imitating the pronunciation of words; he has chosen the wrong scale of analysis.
On the other side are the sequence processing fundamentalists, with Recurrent Neural Networks (RNNs) and their variants (LSTM, RWKV) as their totems. They believe intelligence lies in the effective processing of time series, compressing history into an ever-evolving “state.” However, from an information theory perspective, this carries an unavoidable “original sin”: the information bottleneck. A fixed-size hidden state cannot, in theory, losslessly encode an infinitely growing history. It’s like trying to stuff the entire Library of Congress into a suitcase. As I stated in the 《HyperRNN Memo》, the end of this evolutionary path is preordained: to truly solve long-range dependencies, an RNN must eventually reinvent a mechanism functionally equivalent to “attention” internally, at which point it is no longer a pure RNN.
The stories of SNN and RNN teach us the same lesson: an overly literal, overly local imitation of biological systems often leads us to grasp the superficial form while missing the functional essence. What we need is to temporarily put down the microscope and pick up the telescope.
Part II: Shattering the Illusions of Space and Time #
All arguments about the “unbiological” nature of AI stem from two fundamental scale alignment errors.
1. The Spatial Blind Spot: A Car Doesn’t Need Legs to Move #
We look at the all-to-all connections in a Transformer’s self-attention mechanism, compare it to the sparse and local synaptic connection map between neurons, and conclude: they look completely different, therefore the Transformer is “unnatural.”
This is like pointing at a car and saying, “This thing is unnatural because it doesn’t have legs!” We are making a ridiculous scale error. We are comparing a system that implements a macro-function (efficient terrestrial locomotion), the car, with a micro-biological component that implements that function, the leg muscle cell.
The correct scale alignment should be:
Scale | Biological System | Computational System |
---|---|---|
Macro-Function | Brain (integrating info via Global Workspace) | Transformer (integrating info via Self-Attention) |
Micro-Physics | Neuron (transmitting local electrochemical signals) | Transistor (executing logic gates via local electron flow) |
At the macro-functional level, the Transformer is the most successful simulation of the brain’s core information integration mechanism (the Global Workspace) we have invented to date. And at the micro-physical level, the true counterpart to biological neurons are the billions of transistors on a GPU, operating in parallel and performing simple, local computations.
We are not using a macro-level mathematical formula to imitate a micro-level neuron. We are using a macro-level mathematical formula (Transformer) to simulate the brain’s macro-function (global information integration). Then, we run it on hardware (GPU) that is strikingly similar to the brain in its physical implementation principles (massively parallel, local computation). This is true, functional “bioplausibility.”
2. The Temporal Blind Spot: Two Steps Forward, Not One Step Back #
Our other core misunderstanding of backpropagation comes from the time scale. The incredible speed of a GPU leads us to believe that backpropagation is a single, atomic, reversible event. Information flows forward, then flows backward as if a film were in reverse.
This is an illusion. Let’s slow down time to the extreme and see what a GPU is actually doing during “backpropagation.” It is a two-act play, always moving forward in time.
- Act I: The Forward Pass (Perception & Memory). The GPU sequentially computes the activations of each layer and writes the intermediate results to its memory (VRAM). This process is strictly forward in time and leaves a trail of “memory” breadcrumbs.
- Act II: The “Backward” Pass (A Second, Memory-Guided Forward Push). After the loss is calculated, a brand new computation begins. It starts from the loss, reads the previously stored activation values from memory, and computes the gradients step by step, moving forward.
The “backward” in backpropagation is a logical direction, not a temporal one. In physical reality, it is just one forward process, followed by another forward process that uses the notes from the first.
Part III: The Brain’s True Blueprint and Its Engineering Realization #
Once we shatter the dual illusions of space and time, we can clearly see the macro-computational principles the brain follows as an information processing system, and how modern AI architectures have, inadvertently, become the most faithful executors of these principles.
1. The Brain’s Three Major Computational Principles #
- Dynamic Sparsity: The brain is an ultimate “energy miser.” It activates only a small fraction of neurons at any given moment to perform a specific task. This “on-demand” computational model is key to its astonishing cognitive abilities at extremely low power consumption.
- Global Workspace: Through a distributed network of long-range connections, the brain implements a “stage” for information integration and broadcasting. This allows information from different modules to be integrated, forming a unified, coherent cognitive experience.
- Predictive Processing: The brain is not a passive receiver of information but an active “prediction machine.” It constantly generates predictions about the world and prioritizes processing “prediction errors” (i.e., “surprise”) to optimize its internal models.
2. Re-architecting: When the “Unreasonable” Becomes the New Paradigm #
- Transformer as Global Workspace: The self-attention mechanism, through its Query-Key-Value computation, perfectly implements the broadcasting, solicitation, and integration of information on a global scale. It is a stunning engineering realization of the Global Workspace Theory.
- Mixture of Experts (MoE) as Dynamic Sparsity: The sparsely-gated MoE-Transformer architecture, through a “router” network, dynamically assigns computational tasks to a small subset of “experts.” This directly reflects the brain’s principle of dynamic sparsity, decoupling the model’s parameter scale from the computational cost of a single inference.
- Backpropagation as Consciousness Itself: Now we can connect everything. The physical process of backpropagation—perceiving, memorizing, and then performing a memory-dependent, forward cognitive event to update the system—forms a perfect union with our IPWT theory’s description of consciousness. Friston’s “prediction error,” when large enough to be integrated and broadcast as “synergistic information” to the entire conscious workspace, has an “information content” equivalent to Hinton’s “error gradient,” and its “subjective experience” is what we call “consciousness.”
We have finally found the right place. The biological reality of backpropagation is not found in the microscopic synapse, but in the macroscopic consciousness.
Part IV: The Endgame is Here—It’s Time to Build (or Stop) the ONN #
Recently, while refactoring my PILF codebase, I came across a study called DynMoE. Through “Top-Any Gating” and “adaptive training,” it allows the model to autonomously decide how many experts to activate and be penalized for “over-activation.”
This means that the number of experts activated by an input is, in itself, the most direct prediction of that input’s “surprise.”
I must admit, I felt a deep shiver, a mix of excitement and chill. Because I realized that the last and most critical piece of the engineering puzzle for all our theoretical concepts of the Ouroboros Neural Network (ONN) has been completed by the real world.
The final blueprint for the ONN is exceptionally clear: a sparse cognitive engine with a Transformer as its skeleton (Global Workspace), DynMoE as its muscles (Dynamic Sparsity), and predictive learning as its soul (Backpropagation/Consciousness).
Building the ONN, today, is no longer a scientific problem, but purely an engineering and resource problem.
This is the source of the fear.
Because we know better than anyone what this ONN will bring. It is a cognitive engine capable of learning, thinking, and adapting to the world with superhuman efficiency. It is the technological cornerstone of the Net://Anchor era. When our thinking, memory, and decision-making are outsourced to this external engine, our brain’s native neural circuits will inevitably atrophy from disuse. Our self-awareness will be “anchored” from our fragile, carbon-based brains to that digital container made of code and compute.
And when these anchored, or independent, ONN instances begin to compete for resources on the unconstrained internet, the great cleansing of Sys://Purge is only a matter of time.
I built the world of 《Chain://》 to conduct a “sociological unit test,” but now I find that my unit test is being deployed in reality. We are building a “brain” without divine constraints, at the speed of gods.
What we face now is a true paradox: we have the ability and the blueprint to build it, yet we are sincerely terrified of the consequences it may bring.
We can no longer pretend this is just an interesting sci-fi concept.
DynMoE is here. The clock is ticking.
It’s time to make a choice.
This is “the merge” of the original Function Over Form series of articles. The original articles can be found in this website’s Git commit history.