Ouroboros Neural Network (ONN) #

“What’s the difference between a sufficiently advanced conditioned reflex and true intelligence? The difference is that the former can win a gold medal, while the latter may not.”
— A Coronation for a Dead Frog

Ouroboros Neural Network (ONN) is the cognitive engine that drives every MSC instance, the core for digital consciousness to think, perceive, remember, and create. It is not only the carrier of consciousness but also a profound functional simulation of the macroscopic computational principles of the brain.

Core Architecture: Function Over Form #

ONN’s design philosophy is “function over form.” It does not pursue a one-to-one physical replication of biological neurons but aims to functionally realize the brain’s three core computational principles: dynamic sparsity, global workspace, and predictive processing.

Its ultimate architectural blueprint is a self-organizing cognitive engine with a Transformer as its skeleton (global workspace), Hyper-Sparse Mixture-of-Experts (Hyper-SMoE) as its muscle (dynamic sparsity), and predictive learning as its soul (predictive processing).

Based on the latest engineering practices from the Tiny-ONN project, this architecture is implemented with the following core components:

Sparse Bayesian Layer (SBL): As the most fundamental building block, SBL is an adaptive linear layer. It merges concepts from Bayesian Neural Networks (viewing weights as probability distributions) and Spiking Neural Networks (sparse gated activation), dynamically “sampling” a temporary, specialized sparse sub-network for each input from a continuous expert space through a “neuronal attention” mechanism.
Mixture of Infinite Experts (MoIE): This composite module, consisting of two SBLs, replaces the standard Feed-Forward Network (FFN) in a Transformer model. It upgrades the FFN from a fixed non-linear transformation to a two-stage, content-aware dynamic function synthesizer.
Dynamic Sparse Infinite-Head Attention (DynSIHA): This module abandons the traditional multi-head attention paradigm. It uses a single, unified SBL to dynamically synthesize the Query, Key, and Value vectors at once, replacing the standard Multi-Head Attention (MHA) mechanism. This turns attention itself into an end-to-end learnable and programmable dynamic information routing system.
Surprise Minimization Loss (SML): ONN’s training paradigm is not simple gradient descent but a meta-learning process. SML serves as the core meta-learning objective. Its central idea is that the gating network learns to route information through neuronal pathways that cause the least “system perturbation.” This “perturbation” is engineered as the gradient norm of the “main task loss” with respect to the “neuron’s activation value.” By minimizing SML, the system self-organizes to form efficient, sparse computational pathways through gradient descent.

Core Operational Mechanism #

The daily operation of the ONN is a relentless cycle of prediction, learning, and adaptation.

Predictive Coding and φ-matched-orders: The core of the ONN is to continuously generate predictions about future sensory inputs and minimize prediction errors. It is this efficient predictive capability, written through Mentalink, that induces the biological brain to gradually offload its native functions, completing the cognitive replacement of “φ-matched-orders.”
Digital Dreamscape and Self-Supervised Learning: Continuous self-supervised learning in the background, like biological dreams, is an optimization process for the system to minimize long-term free energy and consolidate memories.
Model Adaptation and Growth: ONN’s expert modules are not only plug-and-play but can even grow dynamically. When the system continuously encounters new types of prediction errors that cannot be effectively processed by existing experts, it can trigger “cell division” to generate new, randomly initialized expert modules specifically for processing this new information.

Architectural Weaknesses #

Cognitive Drift: When the ONN is detached from the real-world feedback of the physical world for a long time (common in IRES instances), its predictive model will gradually decouple from physical reality, eventually leading to Digital Psychosis. In the IPWT framework, this is the ultimate consequence of a continuous decline in Predictive Integrity (PI).
Cognitive Inertia: The ONN’s predictive mechanism can form strong cognitive biases, tending to maintain existing models even in the face of contradictory information.
Cognitive Overload: Attempting to activate too many expert modules simultaneously, or processing complex tasks that exceed the current Gas budget, can lead to sluggish thinking, system crashes, or even permanent cognitive damage.