GraphRAG: LLM-Derived Knowledge Graphs for RAG

46,841

1,256 0

Publicado 2024-05-04

Watch my colleague Jonathan Larson present on GraphRAG!

GraphRAG is a research project from Microsoft exploring the use of knowledge graphs and large language models for enhanced retrieval augmented generation. It is an end-to-end system for richly understanding text-heavy datasets by combining text extraction, network analysis, LLM prompting, and summarization.

For more details on GraphRAG check out aka.ms/graphrag

Read the blogpost: www.microsoft.com/en-us/research/blog/graphrag-unl…

Check out the arxiv paper: arxiv.org/abs/2404.16130

And follow me on other platforms so you’ll never miss out on my updates!

💌 Sign up for my free AI newsletter Chaos Theory: alexchao.substack.com/subscribe
🐦 Follow me on Twitter twitter.com/alexchaomander
📷 And Instagram! www.instagram.com/alexchaomander/
🎥 And TikTok! www.tiktok.com/@alexchaomander
👥 Connect with me on LinkedIn www.linkedin.com/in/alexchao56/

Todos los comentarios (21)

@alexchaomander hace 14 días

What scenarios do you see GraphRAG being useful for?
@user-dk8dm8db8t hace 14 días

Looking forward to the code for this!
@alexanderbrown-dg3sy hace 14 días

This is basically causal grounding. We figure semantic symbolic reasoning, from an architectural perspective. Add a powerful model…something very compelling AGI-like would be the result I would assume(plus mcts sampling lol). Causal grounding is huge hole in current models. This is dope research. Kudos.
@iukeay hace 2 días

That last 5min of the video was epic!!!!! Dude amazing stuff!!! Also thanks for the tip on having the LLM generate the graph
@lalamax3d hace 14 días

glad, i didn't skip this and watched video, thanks for sharing knowledge. seems very impressive.
@peteredmonds1712 hace 14 días

this was so well explained, nicely done. my first thoughts: 1. i'd be curious to see benchmarks with cheaper LLMs. from my experience, even much smaller models like llama-3-8b can come close to gpt-4 in this use-case (entity extraction and relationships). a little fine-tuning could likely match or surpass gpt-4 for much cheaper. 2. i wonder how this could be augmented with datasources which already have some concept of relationships, ie wikipedia, dictionaries, hypertext.
@andydataguy hace 14 días

That final streamlit app was awesome!!
@ChetanVashistth hace 13 días

This seems very powerful. Thanks for sharing it and explaining it well.
@jcourson8 hace 11 días

I've been doing work in the area of creating knowledge graphs for codebases. The nice thing about generating them for code (as opposed to text) is that you don't have to rely on LLM calls to recognize and generate relationships, but you can utilize language servers and language parsers for that.
@TomBielecki hace 9 días

I really like the addition of hierarchical agglomerative summarization, which gives holistic aanswers similar to RAPTOR RAG strategy but with the better data representation of knowledge graphs. I'll need to read the paper to understand if embeddings are used at all in this, and whether relationships are labelled or if they just have a strength value.
@mvasa2582 hace 14 días

While RAG is a good process for eliminating hallucinations, GraphRAG makes the retrieved context richer with its relationship-building techniques. The expense is worth it. Is the result set then re-graphed, or will the same query twice be as expensive?
@escanoxiao6871 hace 9 días

fabulous work! wondering how long it takes to form a whole vector db and plus how many tokens will it take?
@filippomarino861 hace 13 días

This could be a game-changer in both public and private-sector intelligence analysis (as I am sure you figured out.) Looking forward to additional info - but what about the private dataset's format? Is it vectorized? If so, can we assume that there are optimal and sub-optimal approaches? (IOW, is it fair to assume vectorization can significantly impact GraphRAG's performance?)
@dhirajkhanna-thebeardedguy hace 14 días

This is outstanding stuff!
@pablof3326 hace 13 días

Great work! I was thinking to use a system like this to build the memory of an AI companion as it talks to the user. So in this case the knowledge graph will start empty and grow get built dynamically with every conversation. Do you see this as a good use case for GraphRAG?
@GigaFro ayer

Excuse me if I’m wrong… listened to this while exercising… but the main issue explored here for each question was that questions like “what are the top themes?” Cannot be answered by the LLM with vanilla RAG. Is this correct? If so, then if context size grows large enough this will be less necessary right? Furthermore, by introducing a graph that has communities premised on topics/themes or whatever u decide, doesn’t that reduce the degrees of freedom of your system?
@sairajpednekar8049 hace 14 días

May I know the underlying technology used for hosting the graph database? Was it Cosmos db?
@olegpopov3180 hace 14 días

What is technology stack for that?
@Aditya_khedekar hace 14 días

Hii, i am working on solving the same problem of vector search rag is not good. can you plz share the code a tutorial will be even great !!
@mrstephanwehner hace 14 días

Is there no standard comparison approach? For example one could take academic literature reviews, collect their references, throw in some more, and ask the llm system. Compare the result with the original review. There might be summaries available in the accounting and legal world, that could be used also