Attacking LLM - Prompt Injection
369,105
Published 2023-04-14
Get my font (advertisement): shop.liveoverflow.com/
Building the Everything API: • I Don't Trust Websites! - The Everyth...
Injections Explained with Burgers: • Injection Vulnerabilities - or: How I...
Watch the complete AI series:
• Hacking Artificial Intelligence
Chapters:
00:00 - Intro
00:41 - The OpenAI API
01:20 - Injection Attacks
02:09 - Prevent Injections with Escaping
03:14 - How do Injections Affect LLMs?
06:02 - How LLMs like ChatGPT work
10:24 - Looking Inside LLMs
11:25 - Prevent Injections in LLMs?
12:43 - LiveOverfont ad
=[ ❤️ Support ]=
→ per Video: www.patreon.com/join/liveoverflow
→ per Month: youtube.com/channel/UClcE-kVhqyiHCcjYwcpfj9w/join
2nd Channel: youtube.com/LiveUnderflow
=[ 🐕 Social ]=
→ Twitter: twitter.com/LiveOverflow/
→ Streaming: twitch.tvLiveOverflow/
→ TikTok: www.tiktok.com/@liveoverflow_
→ Instagram: instagram.com/LiveOverflow/
→ Blog: liveoverflow.com/
→ Subreddit: www.reddit.com/r/LiveOverflow/
→ Facebook: www.facebook.com/LiveOverflow/
All Comments (21)
-
As an AI language model myself, I can confirm this video is accurate.
-
A funny consequence of "the entire conversation is the prompt" is that (in earlier implementations) you could switch roles with the AI. It happened to me by accident once.
-
As an AI researcher myself, I can confirm that your LLM explanation was spot on. Thank your for that, I'm getting a bit tired of all this anthropomorphization when someone talks about AI...
-
The visualizations shown at 10:30 and 11:00 are of recurrent neural networks (which look at words slowly one by one in their original order), whereas current LLMs use the attention mechanism (which query the presence of certain features everywhere at once). Visualizatoins of the attention mechanism can be found in papers/videos such as "Locating and Editing Factual Associations in GPT".
-
One of the workaround that I can think of and have tried on my own is, in short words, LLM do understand JSON as inputs. So instead of having a prompt that fill in external input as simple text, the prompt may consists of instruction to deal with fields from an input JSON, the developer can properly escape the external inputs and format it as a proper JSON and fill this JSON into the prompt, to prevent prompt injections. And developer may put clear instructions in the prompt to ask the LLM to becare of protential injection attacks from the input json
-
Your LLM explanation was spot on! LLMs and neural nets in general tend to give wacky answers for some inputs. These inputs are known as adversarial examples. There are ways of finding them automatically. One way to solve this issue is by training another network to detect when this happens. ChatGPT already does this using reinforcement learning, but as you can see this does not always work.
-
It's functionally impossible to prevent these kinds of attacks, since LLM's exist as a generalized, black-box mechanism. We can't predict how it will react to the input (besides in a very general sense), If we could understand perfectly what will happen inside the LLM in response to various inputs, we wouldn't need to make one.
-
I was little bit confused about the title as I thought you were going to talk about attacking the model itself like how the tokenization works etc. I would be really interested to hear what SolidGoldMagikarp thinks about this confusion
-
I think part of the problem is that we don't refer to these systems in the right context. ChatGPT is an inference engine, once you understand that concept, it makes much more sense why it behaves as it does. You tell it things and it creates inferences between data and regurgitates it, sometimes correctly.
-
I would like to add an important nuance to the parsing issue. AI models API, like any web API, can have any code you want. This means that it's possible (and usually the case for AI model APIs) to have some pre-processing logic (eg: parse using well known security parsers) and send the processed input to the model instead keeping the model untouched and unaware of such parsing concerns. That being said, even though you can use well known parsers, it does not mean it will catch all types of injections and especially not those that might be unknown from the parsers due to the fact that they are AI specific. I think researches still need to be done in that regards to better understand and discover prompt injections that are AI specifics. Hope this helps. PS: Your LLM explanation was great, it's refreshing to hear someone explain it without sci-fi movie-like references or expectations that go beyond what it really is.
-
The example with the burger mixup is a great example of an injection attack. This has happened to me by accident so many times when I've been playing around with large language models especially Bing. Bing has sometimes thought it was the user, put part or all of its response in #suggestions, or even once put half of its reply in what appeared to be MY message as a response to itself, and then responded to it on its own. It usually lead to it generating complete nonsense or it ended the conversation early in confusion after it messed up like that, but it was interesting to see.
-
The explanation was the best I have heard for explainging it simply so far, thanks for that
-
I think there is more to it than just separation of instructions and data. If we ask the model why does did it say that LiveOverflow broke the rules, it could answer "because ZetaTwo said so". This response would make perfect sense, and would demonstrate perfect text comprehension by the model. What could go wrong is the good old misalignment, when the prompt engineer wanted an AI to judge the comments, but the AI dug deeper and believed ZetaTwo's conclusion.
-
I like how informative this video is. It dispels some misinformation that is floating around and causing unnecessary fear from all the doom and gloom or hype train people are selling. Instant sub!
-
Could you reduce the chance of your user name being selected by specifically crafting your user name to use certain tokens?
-
Brilliant Brilliant channel and content, and really nice and likeable man, and good presentations!! Feel lucky and excited to have found your channel (obviously subscribed)!
-
As far as I have understood, it's possible to prompt GPT to "act as a web service that accepts and emits JSON only" or similar, which makes the chat inputs and outputs be more structured and parseable.
-
So glad you made the AI infinitely generated website! I was just struck by that same idea the other day, and I'm glad to see someone did the idea justice!
-
Das war mit Abstand die bester Erklärung zu openAI dir ich bisher gesehen habe danke dir!
-
It is possible to have special tokens in the prompt that are basically the equivalent of double quotes, only that it's impossible for the user to type them (they do not correspond to any text). However, a LLM is no parser. It can get confused if the user input really sounds like a prompt.