MAMBA and State Space Models explained | SSM explained

Published 2024-02-17
We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs.
SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences!

AI Coffee Break Merch! πŸ›οΈ aicoffeebreak.creator-spring.... Celebrating our merch launch, here is a limited time offer! πŸ‘‰ Get 25% discount on AI Coffee Break Merch with the code MAMBABEAN.

Thanks to our Patrons who support us in Tier 2, 3, 4: πŸ™
Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael

Outline:
00:00 Mamba to replace Transformers!?
02:04 State Space Models (SSMs) – high level
03:09 State Space Models (SSMs) – more detail
05:45 Discretization step in SSMs
08:14 SSMs are fast! Here is why.
09:55 SSM training: Convolution trick
12:01 Selective SSMs
15:44 MAMBA Architecture
17:57 Mamba results
20:15 Building on Mamba
21:00 Do RNNs have a comeback?
21:42 AICoffeeBreak Merch

πŸ“„ Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." arXiv preprint arXiv:2312.00752 (2023). arxiv.org/abs/2312.00752
πŸ“„ MoE-Mamba arxiv.org/abs/2401.04081
πŸ“„ Vision Mamba arxiv.org/abs/2401.09417
πŸ“„ MambaByte arxiv.org/abs/2401.13660
πŸ•ŠοΈ Mamba rejected from ICLR: twitter.com/srush_nlp/status/1750526956452577486
πŸ“– Prefix sum (Scan) with Cuda: developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-…
πŸ“Ί Transformer explained: Β Β Β β€’Β TheΒ TransformerΒ explainedΒ byΒ Ms.Β Coff...Β Β 

Great resources to learn about Mamba:
πŸ“™ Mamba: jameschen.io/jekyll/update/2024/02/12/mamba.html
πŸ“• The Annotated S4: srush.github.io/annotated-s4/
πŸ“˜ Mamba The Easy Way: jackcook.com/2024/02/23/mamba.html

β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€
πŸ”₯ Optionally, pay us a coffee to help with our Coffee Bean production! β˜•
Patreon: www.patreon.com/AICoffeeBreak
Ko-fi: ko-fi.com/aicoffeebreak
Join this channel to get access to perks:
youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join
β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€

πŸ”— Links:
AICoffeeBreakQuiz: youtube.com/c/AICoffeeBreak/community
Twitter: twitter.com/AICoffeeBreak
Reddit: www.reddit.com/r/AICoffeeBreak/
YouTube: youtube.com/AICoffeeBreak

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​

Scientific advising by Mara Popescu
Video editing: Nils Trost
Music 🎡 : Sunny Days – A

All Comments (21)
  • @drummatick
    I've a question. Given that SSMs are entirely linear, how do they conform with universal approximation theorem? I mean a lack of non-linear activation should imply that they should be particularly bad at approximate functions, but they are not. Am i missing something? Also really loved the video!
  • @ShadowHarborer
    I have to give a presentation on Mamba next week and I've been waiting for this video so I could finally learn what the hell I need to talk about
  • @partywen
    Thanks! Looking forward to a Hyena video :)
  • This is exactly the level of detail I needed right now. Thank you so much!
  • @DerPylz
    I was waiting for exactly this topic! Thanks so much! :_beanHeart:
  • @harumambaru
    Nice T-shirt! So excited to listen about new models!
  • @serta5727
    Thank you for the great Mamba explanation
  • @user-lz5kq5jb5z
    Great. There are a lot of failed explanation or completely wrong approach about SSM and Mamba on the internet, but finally I found the exact what I want. Thank you for the video.
  • @cosmic_reef_17
    Hats off to you for this amazing video! Best explanation of Mamba I have seen.:_beanHeart:
  • @peabrane8067
    Thank you for the shoutout to me repo! I later realized it was an application of a known idea "heisen sequence", which is a pretty cool way to do certain associative scan operations via cumsum
  • @jefferychen8330
    Thanks for the video! I’ve been confused between RNN and MAMBA for several daysπŸ˜‚
  • @ruchiradhar1589
    A big thanks for a comprehensive explanation of the Mamba Architecture & computations, @AICoffeeBreak!
  • @Thomas-gk42
    I understand 10% of that stuff, but the presentation is lovely.:washhands:
  • @AM-yk5yd
    Great video. Speaking of VMamba. For some reasons it seems people in medical images field are more excited for mamba than in LLM. So many, they are competing between themselves. "Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models. Notably, on AbdomenMRI, Encoscopy, and Microscopy datasets, Swin-UMamba outperforms its closest counterpart U-Mamba by an average score of 3.58%". Like if you pick a random mamba paper from arxiv (there are dozens of them already), it probably will be related to medical image segmentation.
  • @MaJetiGizzle
    Thanks for the MAMBA video! I always appreciate your insight on these new, influential papers! Your thoughts always pair well with a good cup of coffee. πŸ˜β˜•οΈ
  • @faysoufox
    Nice video, good overview, which is what I was searching for
  • @AICoffeBreak, thank you for the awesome video. Very small pet peeve which had me re-check all the math. At 11:20 it would make the explanation much easier to understand if you kept x 0 indexed as that is the notation you were going for since the beginning. Also, maybe making it explicit that you're taking t = L, although this is kind of obvious. This was an awesome lecture, thank you again.
  • @outliier
    Really cool video! Thank youuuu