How to build Multimodal Retrieval-Augmented Generation (RAG) with Gemini

36,668

814 0

Published 2024-05-16

The saying ""a picture is worth a thousand words"" encapsulates the immense potential of visual data. But most retrieval-augmented generation (RAG) applications rely only on text. This session applies RAG to multimodal use cases. It focuses on embeddings and attributed question answering to retrieve data. We’ll begin with a high-level architecture and quickly dive into a practical demo. Attendees will learn to create powerful LLM-based workflows and embed them in existing applications.

Speakers: Shilpa Kancharla, Jeff Nelson

Resources:
Try Gemini in Vertex AI → goo.gle/3Vttolh

Watch more:
Check out all the AI videos at Google I/O 2024 → goo.gle/io24-ai-yt
Check out all the Cloud videos at Google I/O 2024 → goo.gle/io24-cloud-yt

Subscribe to Google Developers → goo.gle/developers

#GoogleIO

Event: Google I/O 2024

All Comments (18)

@GoogleDevelopers 1 month ago

Check out all the AI videos at Google I/O 2024 → goo.gle/io24-ai-yt
@jprak123asd 8 days ago

I wanted to extend my heartfelt thanks for the excellent session on how Retrieval-Augmented Generation (RAG) can be used to train Large Language Models (LLMs) to build expert systems in the retail, software, automotive, and other sectors. Your explanation was incredibly clear and insightful, making a complex topic easily understandable. I truly felt like Dr. Watson listening to Sherlock Holmes unravel the mysteries of the universe, marveling at the clarity and depth of the information presented. Your efforts in breaking down the concepts and applications of RAG in such a straightforward manner have left me feeling both enlightened and excited about the potential this technology holds for our industry. Thank you once again for your time and for sharing your expertise. I look forward to exploring and implementing these innovative solutions in our own projects
@charlesbabbage6786 1 month ago

Could'nt find the exact notebook used here.
@dumbol8126 1 month ago

will there be an opensource version of this, or atleast a paper
@hasszhao 1 month ago

where is this notebook in the cookbook repo?
@mariaescobar8003 1 month ago

When I use RAG, Am I sharing my data with the model/company? or is it private with an extracost?
@nagarathnabheggade8410 11 days ago

This example briefs about text and PDF, do we have any for video how de we use RAG, Vector store for Video can anyone give some reference
@IndianLeopard7 21 days ago

Wat about Copyright and Ethical issues? How much do u guys charge for using ur model? And as per IBM and Oracle embeddings are nothing new so why use urs?
@nagpalvikas 1 month ago

Is "unstructured" the best choice here for parsing PDF? Any better alternatives?
@adithiyag4616 1 month ago

Please share the colab link
@oldmansgoldenwords 1 month ago

You can get blue driver and get all error codes and example
@mohammadyousuf6621 21 days ago

dense embeddings are never enough for RAG system
@pratikpratik8495 1 month ago

github link please
@SB-md2km 1 month ago

Ok but someone could literally look any of this up online or look for it in a manual, etc. w/out using AI...
@fast-path 1 month ago

🥺
@Inceptionxg 1 month ago

After Muaadh Rilwan's post on LinkedIn
@JH-bb8in 1 month ago

This shows how garbage Langchain is as a library. Extremely verbose and intransparent.