MAMBA from Scratch: Neural Nets Better and Faster than Transformers

106,750

4,060 0

Published 2024-04-30

Mamba is a new neural network architecture that came out this year, and it performs better than transformers at language modelling! This is probably the most exciting development in AI since 2017. In this video I explain how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed!

Mamba paper: openreview.net/forum?id=AL1fq05o7H
Linear RNN paper: openreview.net/forum?id=M3Yd3QyRG4

#mamba
#deeplearning
#largelanguagemodels

00:00 Intro
01:33 Recurrent Neural Networks
05:24 Linear Recurrent Neural Networks
06:57 Parallelizing Linear RNNs
15:33 Vanishing and Exploding Gradients
19:08 Stable initialization
21:53 State Space Models
24:33 Mamba
25:26 The High Performance Memory Trick
27:35 The Mamba Drama

All Comments (21)

@jamescamacho3403 14 days ago

As someone actively working on this stuff, this channel has the best explanations on the internet, and the 'tuber actually understands what is going on.
@jawadmansoor6064 1 month ago

wow, you've made some difficult i mean extremely difficult algorithms look easy. thank you.
@jarib3858 1 month ago

One small note on RNN's, reservoir computing is a very high dimensional random RNN with linear regression readout, therefore there is no exploding nor vanishing gradient. Reservoir computing is currently the standard for non-linear dynamic time series prediction
@timseguine2 2 days ago

Thanks for the clear explanation. This gives me enough understanding to not only implement it myself, but to also have some ideas for sensible architecture modifications.
@EkShunya 1 month ago

please open your community tab your content is incredible
@rikkathemejo 14 days ago

Nice video! I just wanted to point out that the parallel scan algorithm can be also implemented in O(n) time (instead of the O(n log(n)) version peresented in the video. and this is the version that the MAMBA uses.
@RexPilger 21 days ago

About peer review: As one comment noted, there could be many more candidate papers presented than could be accommodated at the venue. However, this video argues, the rejection justification for this paper is inadequate at best. Some comments ask whether the rejection is important; for academics, the answer is yes, because presentations and publications count for tenure, promotions, and raises plus continued funding of the research. Since several comments plus the video indicate that the algorithm had already received a lot of publicity, for the sake of the project it may not matter if it can continue to be funded, especially if commercial implementations are successful. What is interesting in any case is that the paper exists; in effect it has been published; the authors may not get the desired credit for formal publication, but their work and the reviewer comments are out there now. A couple of decades ago that would not have been the case; most people in the field would be unaware of the algorithm. In terms of peer review, in general (outside of AI), in my field, one of the natural sciences, a paper I submitted for publication encountered an editor plus two reviewers who were well qualified in the field; after asking for two revisions to the manuscript, the third version was rejected. Interestingly, all three scientists had published research which my paper undermined; they may well have lost funding for their research or even their position had that manuscript of mine been published (I speculate here). Peer review cuts both ways. While iterating with the editor and reviewers I continued to expand my research project and made some additional discoveries. Following the rejection I wrote a completely different paper which incorporated my initial work supplemented by the new discoveries; happily it was published a few months ago (in a different journal). I'm formally retired now, but continue to do research. To young researchers -- never give up. Learn from rejection, refine your work, be humble, exercise integrity and honesty, and take pride in your accomplishments, even if only a few know about them. Peer review (by humans) is a necessity and will continue to be. There is no such thing as a perfect filter, but science and technology would be overwhelmed by irrelevancy, dishonesty, and duplication of effort without it. AI may become a useful filtering tool, but science is a human endeavor.
@kamdynshaeffer9491 1 month ago

Absolutely amazing vid. Just subbed after getting recommended to this channel. Never stop making videos dude <3
@ithaca2076 1 month ago

absolutely love the quality and information of this video!!! please keep up the good work this is amazing
@anrilombard1121 1 month ago

Currently testing it on molecular generation, so excited to see where these strengths hold and where they falter :)
@IllIl 28 days ago

Thank you! Your channel is an invaluable resource on here. Hope you keep making these videos!
@greymonwar9906 6 days ago

The level of details and intuition you dig into are excellent 💯🔥
@peterdemore7239 21 days ago

Brutal. I'm going to have to watch this about 30 times. Love it.
@markdatton1348 1 month ago

Awesome video. I love the speed and the depth of this, it's perfect
@kalkhasse 6 days ago

I love how you nail the level of detail in the explanations. Perfect for me at least.
@honglu679 1 month ago

Wow, excellent explaination. It covers all the essense of the paper with just enough math/algo. Thank you so much ! If you dont mind, plz make a video for RWKV (v6 has some new modifications), which is another strong linear RNN model. I am curious how does it compares to mamba.
@tellu5493 1 month ago

This was very good, and I hope you make more videos like this!
@anthonybernstein1626 1 month ago

Amazing explanation, thank you!
@BooleanDisorder 1 month ago

You have such a pleasant voice 😊 Thanks for helping me understand better. Please keep making videos. ❤
@MarcosScheeren 3 days ago

Subscribed! Thats some 3Blue1Brown level stuff! Amazing!