Neural Networks Are Blurry, Symbolic Systems Are Fragmented. Sparse Autoencoders Help Us Combine Them.

computer systems and Synthetic Intelligence, we had established establishments designed to purpose systematically about human habits — the courtroom. The authorized system is one in all humanity’s oldest reasoning engines, the place info and proof are taken as enter, related legal guidelines are used as reasoning guidelines and verdicts are the system’s output. The legal guidelines, nevertheless, have been persistently evolving from the very starting of human civilization. The earliest Codified Legislation – the Code of Hammurabi (circa 1750 BCE) – represents one of many first large-scale makes an attempt to formalize ethical and social reasoning into specific symbolic guidelines. Its magnificence lies in readability and uniformity — but it is usually inflexible, incapable of adaptation to context. Centuries later, Widespread Legislation traditions like these formed by the Case of Donoghue v Stevenson (1932), launched the alternative philosophy: reasoning primarily based on precedential expertise and instances. As we speak’s authorized methods, as we all know, are often a mixture of each, whereas the proportions range throughout totally different nations.

In distinction to the cohesive mixture in authorized methods, an identical paradigm pair in AI — Symbolism and Connectionism — appear to be considerably tougher to unite. The latter has dominated the surge of AI growth lately, the place every thing is implicitly discovered with monumental quantities of knowledge and computing assets and encoded throughout parameters in neural networks. And this path, certainly, has been confirmed very efficient by way of benchmark efficiency. So, do we actually want a symbolic element in our AI methods?

Symbolic Programs v.s. Neural Networks: A Perspective of Data Compression

To reply the query above, we have to take a more in-depth have a look at each methods. From a computational standpoint, each symbolic methods and neural networks could be seen as machines of compression — they cut back the huge complexity of the world into compact representations that allow reasoning, prediction, and management. But they achieve this by means of basically totally different mechanisms, guided by reverse philosophies of what it means to “perceive”.

In essence, each paradigms could be imagined as filters utilized to uncooked actuality. Given enter (X), every learns or defines a change (H(cdot)) that yields a compressed illustration (Y = H(X)), preserving info that it considers significant and discarding the remainder. However the form of this filtering is totally different. Typically talking, symbolic methods behave like high-pass filters — they extract the sharp, rule-defining contours of the world whereas ignoring its clean gradients. Neural networks, in contrast, resemble low-pass filters, smoothing native fluctuations to seize international construction. The distinction is just not in what they see, however in what they select to neglect.

Symbolic methods compress by discretization. They carve the continual cloth of expertise into distinct classes, relations, and guidelines: a authorized code, a grammar or an ontology. Every image acts as a crisp boundary, a deal with for manipulation inside a pre-defined schema. The method resembles projecting a loud sign onto a set of human-designed foundation vectors — an area spanned by ideas comparable to Entity and Relation. A information graph, for example, may learn the sentence “UIUC is a rare college and I find it irresistible”, and retain solely (UIUC, is_a, Establishment), discarding every thing that falls exterior its schema. The result’s readability and composability, but in addition rigidity: which means exterior the ontological body merely evaporates.

Neural networks, in distinction, compress by smoothing. They forgo discrete classes in favor of clean manifolds the place close by inputs yield comparable activations (often bounded by some Lipschitz fixed in trendy LLMs). Somewhat than mapping knowledge to predefined coordinates, they be taught a latent geometry that encodes correlations implicitly. The world, on this view, is just not a algorithm however a subject of gradients. This makes neural representations remarkably adaptive: they’ll interpolate, analogize, and generalize throughout unseen examples. However the identical smoothness that grants flexibility additionally breeds opacity. Data is entangled, semantics turn into distributed, and interpretability is misplaced within the very act of generalization.

Property	Symbolic Programs	Neural Networks
Survived Data	Discrete, schema-defined info	Frequent, steady statistical patterns
Supply of Abstraction	Human-defined ontology	Information-driven manifold
Robustness	Brittle at rule edges	Regionally strong however globally fuzzy
Error Mode	Missed info (protection gaps)	Smoothed info (hallucinations)
Interpretability	Excessive	Low

In conclusion, we will summarize the distinction between the 2 methods from the data compression perspective in a single sentence: “Neural Networks are blurry photographs of the world, whereas symbolic methods are high-resolution photos with lacking patches.” This truly signifies the rationale why neuro-symbolic methods are an artwork of compromise: they’ll harness information from each paradigms through the use of them collaboratively at totally different scales, with neural networks offering a worldwide, low-resolution spine and symbolic elements supplying high-resolution native particulars.

The Problem of Scalability

Although it is vitally tempting so as to add symbolic elements into neural networks to harness advantages from each, scalability is an enormous drawback getting in the way in which of our makes an attempt, particularly within the period of Basis Fashions. Conventional neuro-symbolic methods depend on a set of expert-defined ontology / schema / symbols, which is assumed to have the ability to cowl all potential enter instances. That is acceptable for domain-specific methods (for instance, a pizza order chatbot); nevertheless, you can not apply comparable approaches to open-domain methods, the place you have to consultants to assemble trillions of symbols and their relations.

A pure response is to go totally data-driven: as a substitute of asking people to handcraft an ontology, we let the mannequin induce its personal “symbols” from inner activations. Sparse autoencoders (SAEs) are a distinguished incarnation of this concept. By factorizing hidden states into a big set of sparse options, they seem to offer us a dictionary of neural ideas: every characteristic fires on a specific sample, is (typically) human-interpretable, and behaves like a discrete unit that may be turned on or off. At first look, this appears to be like like an ideal escape from the knowledgeable bottleneck: we now not design the image set; we be taught it.

Right here (D) is known as the dictionary matrix the place every column shops a semantically significant idea; the primary time period is the reconstruction loss of the hidden state (h), whereas the second is a sparsity penalty encouraging minimal activated neurons within the code.

Nonetheless, an SAE-only strategy runs into two elementary points. The primary is computational: utilizing SAEs as a dwell symbolic layer would require multiplying each hidden state by an unlimited dictionary matrix, paying a dense computation value even when the ensuing code is sparse. This makes them not possible for deployment at Basis Mannequin scales. The second is conceptual: SAE options are symbol-like representations, however they don’t seem to be a symbolic system — they lack an specific formal language, compositional operators, and executable guidelines. They inform us what ideas exist within the mannequin’s latent house, however not the way to purpose with them.

This doesn’t imply we must always abandon SAEs altogether — they supply elements, not a completed meal. Somewhat than asking SAEs to be the symbolic system, we will deal with them as a bridge between the mannequin’s inner idea house and the numerous symbolic artefacts we have already got: information graphs, ontologies, rule bases, taxonomies, the place reasoning can occur by definition. And a high-quality SAE skilled on a big mannequin’s hidden states then turns into a shared “idea coordinate system”: totally different symbolic methods can then be aligned inside this coordinate system by associating their symbols with the SAE options which can be persistently activated when these symbols are invoked in context.

Doing this has a number of benefits over merely inserting symbolic methods aspect by aspect and querying them independently. First, it allows image merging and aliasing throughout methods: if two symbols from totally different formalisms repeatedly gentle up virtually the identical set of SAE options, we now have sturdy proof that they correspond to the identical underlying neural idea, and could be linked and even unified. Second, it helps cross-system relation discovery: symbols which can be far aside in our hand-designed schemas however persistently shut in SAE house level to bridges we did not encode — new relations, abstractions, or mappings between domains. Third, SAE activations give us a model-centric notion of salience: symbols that by no means discover a clear counterpart within the neural idea house are candidates for pruning or refactoring, whereas sturdy SAE options with no matching image in any system spotlight blind spots shared by all of our present abstractions.

Crucially, this use of SAEs stays scalable. The costly SAE is skilled offline, and the symbolic methods themselves don’t must develop to “Basis Mannequin dimension” — they’ll stay as small or as massive as their respective duties require. At inference time, the neural community continues to do the heavy lifting in its steady latent house; the symbolic artefacts solely form, constrain, or audit behaviour on the factors the place specific construction and accountability are most respected. SAEs assist by tying all these heterogeneous symbolic views again to a single discovered conceptual map of the mannequin, making it potential to check, merge, and enhance them with out ever setting up a monolithic, expert-designed symbolic twin.

When Can an SAE Function a Symbolic Bridge?

The image above quietly assumes that our SAE is “ok” to function a significant coordinate system. What does that truly require? We don’t want perfection, nor do we’d like the SAE to outperform human symbolic methods on each axis. As a substitute, we’d like a number of extra modest however essential properties:

– Semantic Continuity: Inputs that categorical the identical underlying idea ought to induce comparable help patterns within the sparse code: the identical subset of SAE options ought to are typically non-zero, somewhat than flickering on and off underneath small paraphrases or context shifts. In different phrases, semantic equivalence must be mirrored in a steady sample of energetic ideas.

– Partial Interpretability: We should not have to grasp each characteristic, however a nontrivial fraction of them ought to admit strong human descriptions, in order that merging and debugging are potential on the idea degree.

– Behavioral Relevance: The options that the SAE discovers should truly matter for the mannequin’s outputs: intervening on them, or conditioning on their presence, ought to change or predict the mannequin’s selections in systematic methods.

– Capability and Grounding: An SAE can solely refactor no matter construction already exists within the base mannequin; it can not conjure wealthy ideas out of a weak spine. For the “idea coordinate system” image to make sense, the bottom mannequin itself must be massive and well-trained sufficient that its hidden states already encode a various, non-trivial set of abstractions. In the meantime, the SAE should have adequate dimensionality and overcompleteness: if the code house is just too small, many distinct ideas will likely be pressured to share the identical options, resulting in entangled and unstable representations.

Now we talk about the primary three properties intimately.

Semantic Continuity

On the degree of pure perform approximation, a deep neural community with ReLU- or GELU-type activations implements a Lipschitz-continuous map: small perturbations within the enter can not trigger arbitrarily unbounded jumps within the output logits. However this type of continuity could be very totally different from what we’d like in a sparse autoencoder. For the bottom mannequin, a number of neurons flipping on or off can simply be absorbed by downstream layers and redundancy; so long as the ultimate logits change easily, we’re happy.

In an SAE, in contrast, we’re now not simply taking a look at a clean output — we’re treating the help sample of the sparse code reconstructed over the residual stream as a proto-symbolic object. A “idea” is recognized with a specific code subset being energetic. That makes the geometry far more brittle: if a small change within the underlying illustration pushes a pre-activation throughout the ReLU threshold within the SAE layer, a neuron within the code will instantly flip from off to on (or vice versa), and from the symbolic viewpoint the idea has appeared or disappeared. There is no such thing as a downstream community to common this out; the code itself is the illustration we care about.

Sparsity penalty in setting up the SAE even exacerbates this. The same old SAE goal combines a reconstruction loss with an (ell_1) penalty on the activations, which explicitly encourages most neuron values to be as near zero as potential. Consequently, even many helpful neurons find yourself sitting close to the activation boundary: simply above zero when they’re wanted, just under zero when they don’t seem to be — this is named “activation shrinkage” in SAEs. That is unhealthy for semantic continuity on the help sample degree: tiny perturbations within the enter can change non-zero neurons, even when the underlying which means has barely modified. Due to this fact, Lipschitz continuity of the bottom mannequin doesn’t mechanically give us a steady non-zero subset of code within the SAE house, and support-level stability must be handled as a separate design goal and evaluated explicitly.

Partial Interpretability

SAE defines an overcomplete dictionary to retailer potential options discovered from knowledge. Due to this fact, we solely want a subset of those dictionary entries to be interpretable options. Even for that subset, meanings of the options are solely required to be roughly correct. After we align current symbols to the SAE house, it’s the activation patterns within the SAE layer that we depend on: we probe the mannequin in contexts the place a logo is “in play”, report the ensuing sparse codes, and use the aggregated code as an embedding for that image. Symbols from totally different methods whose embeddings are shut could be linked or merged, even when we by no means assign human-readable semantics to each particular person characteristic.

Interpretable options then play a extra targeted position: they supply human-facing anchors inside this activation geometry. If a specific characteristic has a fairly correct description, all symbols that load closely on it inherit a shared semantic trace (e.g. “these are all duty-of-care-like issues”), making it simpler to examine, debug, and set up the merged symbolic house. In different phrases, we don’t want an ideal, totally named dictionary. We want (i) sufficient capability in order that necessary ideas can get their very own instructions, and (ii) a sizeable, behaviorally related subset of options whose approximate meanings are steady sufficient to function anchors. The remainder of the overcomplete code can stay as nameless background; it nonetheless contributes to distances and clusters within the SAE house, even when we by no means title it.

Behavioral Relevance through Counterfactuals

A characteristic is barely fascinating, as a part of a bridge, if it truly influences the mannequin’s habits — not simply if it correlates with a sample within the knowledge. In causal phrases, we care about whether or not the characteristic lies on a causal path within the community’s computation from enter to output: if we perturb the characteristic whereas holding every thing else mounted, does the mannequin’s behaviour change in the way in which that its believed which means would predict?

Formally, altering a characteristic is much like an intervention of the shape (textual content{do}(z = c)) within the causal sense, the place we overwrite that inner variable and rerun the computation. However in contrast to classical causal inference modeling, we don’t actually need Pearl’s do-calculus to establish (P(y mid textual content{do}(z))). The neural community is a totally observable and intervenable system, so we will merely execute the intervention on the interior nodes and observe the brand new output. On this sense, neural networks give us the posh of performing idealized interventions which can be not possible in most real-world social or financial methods.

Intervening on SAE options is conceptually comparable however applied in a different way. We sometimes have no idea the which means of an arbitrary worth within the characteristic house, so the laborious intervention talked about above might not be significant. As a substitute, we amplify or suppress the magnitude of an current characteristic, which behaves extra like a smooth intervention: the structural graph is left untouched, however the characteristic’s efficient affect is modified. As a result of SAE reconstructs hidden activations as a linear mixture of a small variety of semantically significant options, we will change the coefficients of these options to implement significant, localized interventions with out affecting different options.

Symbolic-System Primarily based Compression as an Alignment Course of

Now let’s take a barely totally different view. Whereas neural networks compress the world into some extremely summary, steady manifolds, symbolic methods compress it right into a human-defined house with semantically significant axes alongside which the system’s behaviors could be judged. From this angle, compressing info into the symbolic house is an alignment course of, the place a messy, high-dimensional world is projected onto an area whose coordinates replicate human ideas, pursuits, and values.

After we introduce symbols like “responsibility of care”, “risk of violence”, or “protected attribute” right into a symbolic system, we aren’t simply inventing labels. This compression course of does three issues without delay:

– It selects which elements of the world the system is obliged to care about (and which it’s purported to ignore).

– It creates a shared vocabulary in order that totally different stakeholders can reliably level to “the identical factor” in disputes and audits.

– It turns these symbols into dedication factors: as soon as written down, they are often cited, challenged, and reinterpreted, however not quietly erased.

In contrast, a purely neural compression lives fully contained in the mannequin. Its latent axes are unnamed, its geometry is personal, and its content material can drift as coaching knowledge or fine-tuning targets change. Such a illustration is great for generalization, however poor as a locus of obligation. It’s laborious to say, in that house alone, what the system owes to anybody, or which distinctions it’s purported to deal with as invariant. In different phrases, neural compression serves prediction, whereas symbolic compression serves alignment with a human normative body.

When you see symbolic methods as alignment maps somewhat than mere rule lists, the connection to accountability turns into direct. To say “the mannequin should not discriminate on protected attributes”, or “the mannequin should apply a duty-of-care normal”, is to insist that sure symbolic distinctions be mirrored, in a steady approach, inside its inner idea house — and that we be capable to find, probe, and, if obligatory, right these reflections. And this accountability is often desired, even at the price of compromising a part of the mannequin functionality.

From Hidden Legislation to Shared Symbols

In Zuo Zhuan, the Jin statesman Shu-Xiang as soon as wrote to Zi-Chan of Zheng: “When punishment is unknown, deterrence turns into unfathomable.” For hundreds of years, the ruling class maintained order by means of secrecy, believing that concern thrived the place understanding ended. That’s why it turned a milestone in historical Chinese language historical past when Zi-Chan shattered that custom, solid the felony code onto bronze tripods and displayed it publicly in 536 BCE. Now AI methods are dealing with an identical drawback. Who would be the subsequent Zi-Chan?

References

Bloom, J., Elhage, N., Nanda, N., Heimersheim, S., & Ngo, R. (2024). Scaling monosemanticity: Sparse autoencoders and language fashions. Anthropic.
Garcez, A. d’Avila, Gori, M., Lamb, L. C., Serafini, L., Spranger, M., & Tran, S. N. (2019). Neural-symbolic computing: An efficient methodology for principled integration of machine studying and reasoning. FLAIRS Convention Proceedings, 32, 1–6.
Gao, L., Dupré la Tour, T., Tillman, H., Goh, G., Troll, R., Radford, A., Sutskever, I., Leike, J., & Wu, J. (2024). Scaling and evaluating sparse autoencoders.
Bartlett, P. L., Foster, D. J., & Telgarsky, M. (2017). Spectrally-normalized margin bounds for neural networks. Advances in Neural Data Processing Programs, 30, 6241–6250.
Chiang, T. (2023, February 9). ChatGPT is a blurry JPEG of the Internet. The New Yorker.
Pearl, J. (2009). Causality: Fashions, reasoning, and inference (2nd ed.). Cambridge College Press.
Donoghue v Stevenson [1932] AC 562 (HL).

Source link

Neural Networks Are Blurry, Symbolic Systems Are Fragmented. Sparse Autoencoders Help Us Combine Them.

I Built a C++ Backend So My GPU Would Stop Eating Air

I Spent May Evaluating Different Engines for OCR

Why AI Is NOT Stealing Your Job

What AI Agents Should Never Do on Their Own

Exploring Income Patterns with Python Pandas, Matplotlib, and Seaborn

From Local App to Public Website in Minutes

American Rheinmetall and Harbinger Partner on Autonomous Hybrid Military Trucks

Startup Muster is back in 2026 thanks to widespread support to save it

Pura Promo Codes: $20 Off May 2026

June deadline approaches for Hawthorne sale process

Featured Picks

Study reveals how bear teeth broke evolutionary rules

LLM-Powered Time-Series Analysis | Towards Data Science

Credit Bureau TransUnion Hit With Data Breach Affecting 4.4 Million People

Neural Networks Are Blurry, Symbolic Systems Are Fragmented. Sparse Autoencoders Help Us Combine Them.

Symbolic Programs v.s. Neural Networks: A Perspective of Data Compression

The Problem of Scalability

When Can an SAE Function a Symbolic Bridge?

Semantic Continuity

Partial Interpretability

Behavioral Relevance through Counterfactuals

Symbolic-System Primarily based Compression as an Alignment Course of

From Hidden Legislation to Shared Symbols

References

Related Posts