arxiv:2410.09754

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Published on Oct 13

· Submitted by

godnpeter on Oct 16

Upvote

Authors:

Dongyoon Hwang ,

Abstract

Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved. Moreover, solely by integrating SimBa architecture into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across DMC, MyoSuite, and HumanoidBench. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.

View arXiv page View PDF Add to collection

Community

godnpeter

Paper author Paper submitter 22 days ago

•

edited 22 days ago

Want faster, smarter RL? Check out SimBa – our new architecture that scales like crazy!
📄 project page: https://sonyresearch.github.io/simba
📄 arXiv: https://arxiv.org/abs/2410.09754
🔗 code: https://github.com/SonyResearch/simba

🚀 Tired of slow training times and underwhelming results in deep RL?
With SimBa, you can effortlessly scale your parameters and hit State-of-the-Art performance—without changing the core RL algorithm.

💡 How does it work?
Just swap out your MLP networks for SimBa, and watch the magic happen! In just 1-3 hours on a single Nvidia RTX 3090, you can train agents that outperform the best across benchmarks like DMC, MyoSuite, and HumanoidBench. 🦾

⚙️ Why it’s awesome:
Plug-and-play with RL algorithms like SAC, DDPG, TD-MPC2, PPO, and METRA.
No need to tweak your favorite algorithms—just switch to SimBa and let the scaling power take over.
Train faster, smarter, and better—ideal for researchers, developers, and anyone exploring deep RL!

🎯 Try it now and watch your RL models evolve!