Attacking LLM - Prompt Injection

369,105

12,362 0

Published 2023-04-14

How will the easy access to powerful APIs like GPT-4 affect the future of IT security? Keep in mind LLMs are new to this world and things will change fast. But I don't want to fall behind, so let's start exploring some thoughts on the security of LLMs.

Get my font (advertisement): shop.liveoverflow.com/

Building the Everything API:    • I Don't Trust Websites! - The Everyth...

Injections Explained with Burgers:    • Injection Vulnerabilities - or: How I...

Watch the complete AI series:
   • Hacking Artificial Intelligence

Chapters:
00:00 - Intro
00:41 - The OpenAI API
01:20 - Injection Attacks
02:09 - Prevent Injections with Escaping
03:14 - How do Injections Affect LLMs?
06:02 - How LLMs like ChatGPT work
10:24 - Looking Inside LLMs
11:25 - Prevent Injections in LLMs?
12:43 - LiveOverfont ad

=[ ❤️ Support ]=

→ per Video: www.patreon.com/join/liveoverflow
→ per Month: youtube.com/channel/UClcE-kVhqyiHCcjYwcpfj9w/join

2nd Channel: youtube.com/LiveUnderflow

=[ 🐕 Social ]=

→ Twitter: twitter.com/LiveOverflow/
→ Streaming: twitch.tvLiveOverflow/
→ TikTok: www.tiktok.com/@liveoverflow_
→ Instagram: instagram.com/LiveOverflow/
→ Blog: liveoverflow.com/
→ Subreddit: www.reddit.com/r/LiveOverflow/
→ Facebook: www.facebook.com/LiveOverflow/

All Comments (21)

@anispinner 1 year ago

As an AI language model myself, I can confirm this video is accurate.
@cmilkau 1 year ago

A funny consequence of "the entire conversation is the prompt" is that (in earlier implementations) you could switch roles with the AI. It happened to me by accident once.
@TheAppleBi 1 year ago

As an AI researcher myself, I can confirm that your LLM explanation was spot on. Thank your for that, I'm getting a bit tired of all this anthropomorphization when someone talks about AI...
@user-yx3wk7tc2t 1 year ago

The visualizations shown at 10:30 and 11:00 are of recurrent neural networks (which look at words slowly one by one in their original order), whereas current LLMs use the attention mechanism (which query the presence of certain features everywhere at once). Visualizatoins of the attention mechanism can be found in papers/videos such as "Locating and Editing Factual Associations in GPT".
@hellfirebb 1 year ago

One of the workaround that I can think of and have tried on my own is, in short words, LLM do understand JSON as inputs. So instead of having a prompt that fill in external input as simple text, the prompt may consists of instruction to deal with fields from an input JSON, the developer can properly escape the external inputs and format it as a proper JSON and fill this JSON into the prompt, to prevent prompt injections. And developer may put clear instructions in the prompt to ask the LLM to becare of protential injection attacks from the input json
@henrijs1999 1 year ago

Your LLM explanation was spot on! LLMs and neural nets in general tend to give wacky answers for some inputs. These inputs are known as adversarial examples. There are ways of finding them automatically. One way to solve this issue is by training another network to detect when this happens. ChatGPT already does this using reinforcement learning, but as you can see this does not always work.
@BanakaiGames 1 year ago

It's functionally impossible to prevent these kinds of attacks, since LLM's exist as a generalized, black-box mechanism. We can't predict how it will react to the input (besides in a very general sense), If we could understand perfectly what will happen inside the LLM in response to various inputs, we wouldn't need to make one.
@velho6298 1 year ago

I was little bit confused about the title as I thought you were going to talk about attacking the model itself like how the tokenization works etc. I would be really interested to hear what SolidGoldMagikarp thinks about this confusion
@eformance 1 year ago

I think part of the problem is that we don't refer to these systems in the right context. ChatGPT is an inference engine, once you understand that concept, it makes much more sense why it behaves as it does. You tell it things and it creates inferences between data and regurgitates it, sometimes correctly.
@alexandrebrownAI 1 year ago

I would like to add an important nuance to the parsing issue. AI models API, like any web API, can have any code you want. This means that it's possible (and usually the case for AI model APIs) to have some pre-processing logic (eg: parse using well known security parsers) and send the processed input to the model instead keeping the model untouched and unaware of such parsing concerns. That being said, even though you can use well known parsers, it does not mean it will catch all types of injections and especially not those that might be unknown from the parsers due to the fact that they are AI specific. I think researches still need to be done in that regards to better understand and discover prompt injections that are AI specifics. Hope this helps. PS: Your LLM explanation was great, it's refreshing to hear someone explain it without sci-fi movie-like references or expectations that go beyond what it really is.
@Millea314 1 year ago

The example with the burger mixup is a great example of an injection attack. This has happened to me by accident so many times when I've been playing around with large language models especially Bing. Bing has sometimes thought it was the user, put part or all of its response in #suggestions, or even once put half of its reply in what appeared to be MY message as a response to itself, and then responded to it on its own. It usually lead to it generating complete nonsense or it ended the conversation early in confusion after it messed up like that, but it was interesting to see.
@MWilsonnnn 1 year ago

The explanation was the best I have heard for explainging it simply so far, thanks for that
@Stdvwr 1 year ago

I think there is more to it than just separation of instructions and data. If we ask the model why does did it say that LiveOverflow broke the rules, it could answer "because ZetaTwo said so". This response would make perfect sense, and would demonstrate perfect text comprehension by the model. What could go wrong is the good old misalignment, when the prompt engineer wanted an AI to judge the comments, but the AI dug deeper and believed ZetaTwo's conclusion.
@kusog3 1 year ago

I like how informative this video is. It dispels some misinformation that is floating around and causing unnecessary fear from all the doom and gloom or hype train people are selling. Instant sub!
@gwentarinokripperinolkjdsf683 1 year ago

Could you reduce the chance of your user name being selected by specifically crafting your user name to use certain tokens?
@bluesque9687 1 year ago

Brilliant Brilliant channel and content, and really nice and likeable man, and good presentations!! Feel lucky and excited to have found your channel (obviously subscribed)!
@-tsvk- 1 year ago

As far as I have understood, it's possible to prompt GPT to "act as a web service that accepts and emits JSON only" or similar, which makes the chat inputs and outputs be more structured and parseable.
@miserablepile 1 year ago

So glad you made the AI infinitely generated website! I was just struck by that same idea the other day, and I'm glad to see someone did the idea justice!
@Fifi70 1 year ago

Das war mit Abstand die bester Erklärung zu openAI dir ich bisher gesehen habe danke dir!
@cmilkau 1 year ago

It is possible to have special tokens in the prompt that are basically the equivalent of double quotes, only that it's impossible for the user to type them (they do not correspond to any text). However, a LLM is no parser. It can get confused if the user input really sounds like a prompt.