Meet SymTorch: A PyTorch Library that Translates Deep Learning Models into Human-Readable Equations

Meet SymTorch: A PyTorch Library that Translates Deep Learning Models …

Can symbolic regression be the key to transforming opaque deep learning models into interpretable, closed-form mathematical equations? or Say you have trained your deep learning model. It works. But do you know what it has actually learned? A team of University of Cambridge researchers propose ‘SymTorch’, a library designed to integrate symbolic regression (SR) into deep learning workflows. It enables researchers to approximate neural network components with closed-form mathematical expressions, facilitating functional interpretability and potential inference acceleration.

https://arxiv.org/pdf/2602.21307

Core Mechanism: The Wrap-Distill-Switch Workflow

SymTorch simplifies the engineering required to extract symbolic equations from trained models by automating data movement and hook management.

Wrap: Users apply the SymbolicModel wrapper to any nn.Module or callable function.

Distill: The library registers forward hooks to record input and output activations during a forward pass. These are cached and transferred from the GPU to the CPU for symbolic regression via PySR.

Switch: Once distilled, the original neural weights can be replaced with the discovered equation in the forward pass using switch_to_symbolic.

The library interfaces with PySR, which uses a multi-population genetic algorithm to find equations that balance accuracy and complexity on a Pareto front. The ‘best’ equation is chosen by maximizing the fractional drop in log mean absolute error relative to an increase in complexity.

Case Study: Accelerating LLM Inference

A primary application explored in this research is replacing Multi-Layer Perceptron (MLP) layers in Transformer models with symbolic surrogates to improve throughput.

Implementation Details

Due to the high dimensionality of LLM activations, the research team employed Principal Component Analysis (PCA) to compress inputs and outputs before performing SR. For the Qwen2.5-1.5B model, they selected 32 principal components for inputs and 8 for outputs across three targeted layers.

Performance Trade-offs

The intervention resulted in an 8.3% increase in token throughput. However, this gain came with a non-trivial increase in perplexity, primarily driven by the PCA dimensionality reduction rather than the symbolic approximation itself.

MetricBaseline (Qwen2.5-1.5B)Symbolic SurrogatePerplexity (Wikitext-2)10.6213.76Throughput (tokens/s)4878.825281.42Avg. Latency (ms)209.89193.89

GNNs and PINNs

SymTorch was validated on its ability to recover known physical laws from latent representations in scientific models.

Graph Neural Networks (GNNs): By training a GNN on particle dynamics, the research team used SymTorch to recover empirical force laws, such as gravity (1/r2) and spring forces, directly from the edge messages.

Physics-Informed Neural Networks (PINNs): The library successfully distilled the 1-D heat equation’s analytic solution from a trained PINN. The PINN’s inductive bias allowed it to achieve a Mean Squared Error (MSE) of 7.40 x 10-6.

LLM Arithmetic Analysis: Symbolic distillation was used to inspect how models like Llama-3.2-1B perform 3-digit addition and multiplication. The distilled equations revealed that while the models are often correct, they rely on internal heuristics that include systematic numerical errors.

Key Takeaways

Automated Symbolic Distillation: SymTorch is a library that automates the process of replacing complex neural network components with interpretable, closed-form mathematical equations by wrapping components and collecting their input-output behavior.

Engineering Barrier Removal: The library handles critical engineering challenges that previously hindered the adoption of symbolic regression, including GPU-CPU data transfer, input-output caching, and seamless switching between neural and symbolic forward passes.

LLM Inference Acceleration: A proof-of-concept demonstrated that replacing MLP layers in a transformer model with symbolic surrogates achieved an 8.3% throughput improvement, though with some performance degradation in perplexity.

Scientific Law Discovery: SymTorch was successfully used to recover physical laws from Graph Neural Networks (GNNs) and analytic solutions to the 1-D heat equation from Physics-Informed Neural Networks (PINNs).

Functional Interpretability of LLMs: By distilling the end-to-end behavior of LLMs, researchers could inspect the explicit mathematical heuristics used for tasks like arithmetic, revealing where internal logic deviates from exact operations.

Check out the Paper, Repo and Project Page. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Meet SymTorch: A PyTorch Library that Translates Deep Learning Models into Human-Readable Equations appeared first on MarkTechPost.

Click here to Contact US

Live Chat Platform

Demand Generation

Customer Support