MAMBA and State Space Models explained | SSM explained

40,175

1,727 0

Published 2024-02-17

We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs.
SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences!

AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.... Celebrating our merch launch, here is a limited time offer! 👉 Get 25% discount on AI Coffee Break Merch with the code MAMBABEAN.

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael

Outline:
00:00 Mamba to replace Transformers!?
02:04 State Space Models (SSMs) – high level
03:09 State Space Models (SSMs) – more detail
05:45 Discretization step in SSMs
08:14 SSMs are fast! Here is why.
09:55 SSM training: Convolution trick
12:01 Selective SSMs
15:44 MAMBA Architecture
17:57 Mamba results
20:15 Building on Mamba
21:00 Do RNNs have a comeback?
21:42 AICoffeeBreak Merch

📄 Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." arXiv preprint arXiv:2312.00752 (2023). arxiv.org/abs/2312.00752
📄 MoE-Mamba arxiv.org/abs/2401.04081
📄 Vision Mamba arxiv.org/abs/2401.09417
📄 MambaByte arxiv.org/abs/2401.13660
🕊️ Mamba rejected from ICLR: twitter.com/srush_nlp/status/1750526956452577486
📖 Prefix sum (Scan) with Cuda: developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-…
📺 Transformer explained: • The Transformer explained by Ms. Coff...

Great resources to learn about Mamba:
📙 Mamba: jameschen.io/jekyll/update/2024/02/12/mamba.html
📕 The Annotated S4: srush.github.io/annotated-s4/
📘 Mamba The Easy Way: jackcook.com/2024/02/23/mamba.html

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: www.patreon.com/AICoffeeBreak
Ko-fi: ko-fi.com/aicoffeebreak
Join this channel to get access to perks:
youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:
AICoffeeBreakQuiz: youtube.com/c/AICoffeeBreak/community
Twitter: twitter.com/AICoffeeBreak
Reddit: www.reddit.com/r/AICoffeeBreak/
YouTube: youtube.com/AICoffeeBreak

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research

Scientific advising by Mara Popescu
Video editing: Nils Trost
Music 🎵 : Sunny Days – A

All Comments (21)

@drummatick 3 months ago

I've a question. Given that SSMs are entirely linear, how do they conform with universal approximation theorem? I mean a lack of non-linear activation should imply that they should be particularly bad at approximate functions, but they are not. Am i missing something? Also really loved the video!
@ShadowHarborer 4 months ago

I have to give a presentation on Mamba next week and I've been waiting for this video so I could finally learn what the hell I need to talk about
@partywen 3 months ago

Thanks! Looking forward to a Hyena video :)
@jamescunningham8092 3 months ago

This is exactly the level of detail I needed right now. Thank you so much!
@DerPylz 4 months ago

I was waiting for exactly this topic! Thanks so much! :_beanHeart:
@harumambaru 3 months ago

Nice T-shirt! So excited to listen about new models!
@serta5727 3 months ago

Thank you for the great Mamba explanation
@user-wu3bb9uy5q 4 months ago

Amazing explanation! Please create more content! <3
@juanmanuelcirotorres6155 4 months ago

thanks a lot for this (love your sweater)
@user-lz5kq5jb5z 2 months ago

Great. There are a lot of failed explanation or completely wrong approach about SSM and Mamba on the internet, but finally I found the exact what I want. Thank you for the video.
@cosmic_reef_17 4 months ago

Hats off to you for this amazing video! Best explanation of Mamba I have seen.:_beanHeart:
@peabrane8067 2 months ago

Thank you for the shoutout to me repo! I later realized it was an application of a known idea "heisen sequence", which is a pretty cool way to do certain associative scan operations via cumsum
@jefferychen8330 3 months ago

Thanks for the video! I’ve been confused between RNN and MAMBA for several days😂
@ruchiradhar1589 4 months ago

A big thanks for a comprehensive explanation of the Mamba Architecture & computations, @AICoffeeBreak!
@Thomas-gk42 4 months ago

I understand 10% of that stuff, but the presentation is lovely.:washhands:
@AM-yk5yd 4 months ago

Great video. Speaking of VMamba. For some reasons it seems people in medical images field are more excited for mamba than in LLM. So many, they are competing between themselves. "Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models. Notably, on AbdomenMRI, Encoscopy, and Microscopy datasets, Swin-UMamba outperforms its closest counterpart U-Mamba by an average score of 3.58%". Like if you pick a random mamba paper from arxiv (there are dozens of them already), it probably will be related to medical image segmentation.
@MaJetiGizzle 4 months ago

Thanks for the MAMBA video! I always appreciate your insight on these new, influential papers! Your thoughts always pair well with a good cup of coffee. 😁☕️
@faysoufox 3 months ago

Nice video, good overview, which is what I was searching for
@rodrigomeireles5966 4 months ago

@AICoffeBreak, thank you for the awesome video. Very small pet peeve which had me re-check all the math. At 11:20 it would make the explanation much easier to understand if you kept x 0 indexed as that is the notation you were going for since the beginning. Also, maybe making it explicit that you're taking t = L, although this is kind of obvious. This was an awesome lecture, thank you again.
@outliier 3 months ago

Really cool video! Thank youuuu