SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Abstract
Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved. Moreover, solely by integrating SimBa architecture into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across DMC, MyoSuite, and HumanoidBench. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.
Community
Want faster, smarter RL? Check out SimBa – our new architecture that scales like crazy!
📄 project page: https://sonyresearch.github.io/simba
📄 arXiv: https://arxiv.org/abs/2410.09754
🔗 code: https://github.com/SonyResearch/simba
🚀 Tired of slow training times and underwhelming results in deep RL?
With SimBa, you can effortlessly scale your parameters and hit State-of-the-Art performance—without changing the core RL algorithm.
💡 How does it work?
Just swap out your MLP networks for SimBa, and watch the magic happen! In just 1-3 hours on a single Nvidia RTX 3090, you can train agents that outperform the best across benchmarks like DMC, MyoSuite, and HumanoidBench. 🦾
⚙️ Why it’s awesome:
Plug-and-play with RL algorithms like SAC, DDPG, TD-MPC2, PPO, and METRA.
No need to tweak your favorite algorithms—just switch to SimBa and let the scaling power take over.
Train faster, smarter, and better—ideal for researchers, developers, and anyone exploring deep RL!
what a wonderful works!!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- The Role of Deep Learning Regularizations on Actors in Offline RL (2024)
- MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL (2024)
- Masked Generative Priors Improve World Models Sequence Modelling Capabilities (2024)
- Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn (2024)
- Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper