GraphRAG: LLM-Derived Knowledge Graphs for RAG

46,841
0
Publicado 2024-05-04
Watch my colleague Jonathan Larson present on GraphRAG!

GraphRAG is a research project from Microsoft exploring the use of knowledge graphs and large language models for enhanced retrieval augmented generation. It is an end-to-end system for richly understanding text-heavy datasets by combining text extraction, network analysis, LLM prompting, and summarization.

For more details on GraphRAG check out aka.ms/graphrag

Read the blogpost: www.microsoft.com/en-us/research/blog/graphrag-unl…

Check out the arxiv paper: arxiv.org/abs/2404.16130

And follow me on other platforms so you’ll never miss out on my updates!

💌 Sign up for my free AI newsletter Chaos Theory: alexchao.substack.com/subscribe
🐦 Follow me on Twitter twitter.com/alexchaomander
📷 And Instagram! www.instagram.com/alexchaomander/
🎥 And TikTok! www.tiktok.com/@alexchaomander
👥 Connect with me on LinkedIn www.linkedin.com/in/alexchao56/

Todos los comentarios (21)
  • This is basically causal grounding. We figure semantic symbolic reasoning, from an architectural perspective. Add a powerful model…something very compelling AGI-like would be the result I would assume(plus mcts sampling lol). Causal grounding is huge hole in current models. This is dope research. Kudos.
  • @iukeay
    That last 5min of the video was epic!!!!! Dude amazing stuff!!! Also thanks for the tip on having the LLM generate the graph
  • @lalamax3d
    glad, i didn't skip this and watched video, thanks for sharing knowledge. seems very impressive.
  • @peteredmonds1712
    this was so well explained, nicely done. my first thoughts: 1. i'd be curious to see benchmarks with cheaper LLMs. from my experience, even much smaller models like llama-3-8b can come close to gpt-4 in this use-case (entity extraction and relationships). a little fine-tuning could likely match or surpass gpt-4 for much cheaper. 2. i wonder how this could be augmented with datasources which already have some concept of relationships, ie wikipedia, dictionaries, hypertext.
  • @ChetanVashistth
    This seems very powerful. Thanks for sharing it and explaining it well.
  • @jcourson8
    I've been doing work in the area of creating knowledge graphs for codebases. The nice thing about generating them for code (as opposed to text) is that you don't have to rely on LLM calls to recognize and generate relationships, but you can utilize language servers and language parsers for that.
  • @TomBielecki
    I really like the addition of hierarchical agglomerative summarization, which gives holistic aanswers similar to RAPTOR RAG strategy but with the better data representation of knowledge graphs. I'll need to read the paper to understand if embeddings are used at all in this, and whether relationships are labelled or if they just have a strength value.
  • @mvasa2582
    While RAG is a good process for eliminating hallucinations, GraphRAG makes the retrieved context richer with its relationship-building techniques. The expense is worth it. Is the result set then re-graphed, or will the same query twice be as expensive?
  • @escanoxiao6871
    fabulous work! wondering how long it takes to form a whole vector db and plus how many tokens will it take?
  • @filippomarino861
    This could be a game-changer in both public and private-sector intelligence analysis (as I am sure you figured out.) Looking forward to additional info - but what about the private dataset's format? Is it vectorized? If so, can we assume that there are optimal and sub-optimal approaches? (IOW, is it fair to assume vectorization can significantly impact GraphRAG's performance?)
  • @pablof3326
    Great work! I was thinking to use a system like this to build the memory of an AI companion as it talks to the user. So in this case the knowledge graph will start empty and grow get built dynamically with every conversation. Do you see this as a good use case for GraphRAG?
  • Excuse me if I’m wrong… listened to this while exercising… but the main issue explored here for each question was that questions like “what are the top themes?” Cannot be answered by the LLM with vanilla RAG. Is this correct? If so, then if context size grows large enough this will be less necessary right? Furthermore, by introducing a graph that has communities premised on topics/themes or whatever u decide, doesn’t that reduce the degrees of freedom of your system?
  • @sairajpednekar8049
    May I know the underlying technology used for hosting the graph database? Was it Cosmos db?
  • @Aditya_khedekar
    Hii, i am working on solving the same problem of vector search rag is not good. can you plz share the code a tutorial will be even great !!
  • @mrstephanwehner
    Is there no standard comparison approach? For example one could take academic literature reviews, collect their references, throw in some more, and ask the llm system. Compare the result with the original review. There might be summaries available in the accounting and legal world, that could be used also