How to build Multimodal Retrieval-Augmented Generation (RAG) with Gemini

Published 2024-05-16
The saying ""a picture is worth a thousand words"" encapsulates the immense potential of visual data. But most retrieval-augmented generation (RAG) applications rely only on text. This session applies RAG to multimodal use cases. It focuses on embeddings and attributed question answering to retrieve data. We’ll begin with a high-level architecture and quickly dive into a practical demo. Attendees will learn to create powerful LLM-based workflows and embed them in existing applications.

Speakers: Shilpa Kancharla, Jeff Nelson

Resources:
Try Gemini in Vertex AI → goo.gle/3Vttolh

Watch more:
Check out all the AI videos at Google I/O 2024 → goo.gle/io24-ai-yt
Check out all the Cloud videos at Google I/O 2024 → goo.gle/io24-cloud-yt

Subscribe to Google Developers → goo.gle/developers

#GoogleIO


Event: Google I/O 2024

All Comments (18)
  • @jprak123asd
    I wanted to extend my heartfelt thanks for the excellent session on how Retrieval-Augmented Generation (RAG) can be used to train Large Language Models (LLMs) to build expert systems in the retail, software, automotive, and other sectors. Your explanation was incredibly clear and insightful, making a complex topic easily understandable. I truly felt like Dr. Watson listening to Sherlock Holmes unravel the mysteries of the universe, marveling at the clarity and depth of the information presented. Your efforts in breaking down the concepts and applications of RAG in such a straightforward manner have left me feeling both enlightened and excited about the potential this technology holds for our industry. Thank you once again for your time and for sharing your expertise. I look forward to exploring and implementing these innovative solutions in our own projects
  • @dumbol8126
    will there be an opensource version of this, or atleast a paper
  • @hasszhao
    where is this notebook in the cookbook repo?
  • When I use RAG, Am I sharing my data with the model/company? or is it private with an extracost?
  • This example briefs about text and PDF, do we have any for video how de we use RAG, Vector store for Video can anyone give some reference
  • @IndianLeopard7
    Wat about Copyright and Ethical issues? How much do u guys charge for using ur model? And as per IBM and Oracle embeddings are nothing new so why use urs?
  • @nagpalvikas
    Is "unstructured" the best choice here for parsing PDF? Any better alternatives?
  • @SB-md2km
    Ok but someone could literally look any of this up online or look for it in a manual, etc. w/out using AI...
  • @JH-bb8in
    This shows how garbage Langchain is as a library. Extremely verbose and intransparent.