Attacking LLM - Prompt Injection

369,105
0
Published 2023-04-14
How will the easy access to powerful APIs like GPT-4 affect the future of IT security? Keep in mind LLMs are new to this world and things will change fast. But I don't want to fall behind, so let's start exploring some thoughts on the security of LLMs.

Get my font (advertisement): shop.liveoverflow.com/

Building the Everything API:    • I Don't Trust Websites! - The Everyth...  

Injections Explained with Burgers:    • Injection Vulnerabilities - or: How I...  

Watch the complete AI series:
   • Hacking Artificial Intelligence  

Chapters:
00:00 - Intro
00:41 - The OpenAI API
01:20 - Injection Attacks
02:09 - Prevent Injections with Escaping
03:14 - How do Injections Affect LLMs?
06:02 - How LLMs like ChatGPT work
10:24 - Looking Inside LLMs
11:25 - Prevent Injections in LLMs?
12:43 - LiveOverfont ad

=[ ❤️ Support ]=

→ per Video: www.patreon.com/join/liveoverflow
→ per Month: youtube.com/channel/UClcE-kVhqyiHCcjYwcpfj9w/join

2nd Channel: youtube.com/LiveUnderflow

=[ 🐕 Social ]=

→ Twitter: twitter.com/LiveOverflow/
→ Streaming: twitch.tvLiveOverflow/
→ TikTok: www.tiktok.com/@liveoverflow_
→ Instagram: instagram.com/LiveOverflow/
→ Blog: liveoverflow.com/
→ Subreddit: www.reddit.com/r/LiveOverflow/
→ Facebook: www.facebook.com/LiveOverflow/

All Comments (21)
  • @anispinner
    As an AI language model myself, I can confirm this video is accurate.
  • @cmilkau
    A funny consequence of "the entire conversation is the prompt" is that (in earlier implementations) you could switch roles with the AI. It happened to me by accident once.
  • @TheAppleBi
    As an AI researcher myself, I can confirm that your LLM explanation was spot on. Thank your for that, I'm getting a bit tired of all this anthropomorphization when someone talks about AI...
  • The visualizations shown at 10:30 and 11:00 are of recurrent neural networks (which look at words slowly one by one in their original order), whereas current LLMs use the attention mechanism (which query the presence of certain features everywhere at once). Visualizatoins of the attention mechanism can be found in papers/videos such as "Locating and Editing Factual Associations in GPT".
  • @hellfirebb
    One of the workaround that I can think of and have tried on my own is, in short words, LLM do understand JSON as inputs. So instead of having a prompt that fill in external input as simple text, the prompt may consists of instruction to deal with fields from an input JSON, the developer can properly escape the external inputs and format it as a proper JSON and fill this JSON into the prompt, to prevent prompt injections. And developer may put clear instructions in the prompt to ask the LLM to becare of protential injection attacks from the input json
  • @henrijs1999
    Your LLM explanation was spot on! LLMs and neural nets in general tend to give wacky answers for some inputs. These inputs are known as adversarial examples. There are ways of finding them automatically. One way to solve this issue is by training another network to detect when this happens. ChatGPT already does this using reinforcement learning, but as you can see this does not always work.
  • @BanakaiGames
    It's functionally impossible to prevent these kinds of attacks, since LLM's exist as a generalized, black-box mechanism. We can't predict how it will react to the input (besides in a very general sense), If we could understand perfectly what will happen inside the LLM in response to various inputs, we wouldn't need to make one.
  • @velho6298
    I was little bit confused about the title as I thought you were going to talk about attacking the model itself like how the tokenization works etc. I would be really interested to hear what SolidGoldMagikarp thinks about this confusion
  • @eformance
    I think part of the problem is that we don't refer to these systems in the right context. ChatGPT is an inference engine, once you understand that concept, it makes much more sense why it behaves as it does. You tell it things and it creates inferences between data and regurgitates it, sometimes correctly.
  • I would like to add an important nuance to the parsing issue. AI models API, like any web API, can have any code you want. This means that it's possible (and usually the case for AI model APIs) to have some pre-processing logic (eg: parse using well known security parsers) and send the processed input to the model instead keeping the model untouched and unaware of such parsing concerns. That being said, even though you can use well known parsers, it does not mean it will catch all types of injections and especially not those that might be unknown from the parsers due to the fact that they are AI specific. I think researches still need to be done in that regards to better understand and discover prompt injections that are AI specifics. Hope this helps. PS: Your LLM explanation was great, it's refreshing to hear someone explain it without sci-fi movie-like references or expectations that go beyond what it really is.
  • @Millea314
    The example with the burger mixup is a great example of an injection attack. This has happened to me by accident so many times when I've been playing around with large language models especially Bing. Bing has sometimes thought it was the user, put part or all of its response in #suggestions, or even once put half of its reply in what appeared to be MY message as a response to itself, and then responded to it on its own. It usually lead to it generating complete nonsense or it ended the conversation early in confusion after it messed up like that, but it was interesting to see.
  • @MWilsonnnn
    The explanation was the best I have heard for explainging it simply so far, thanks for that
  • @Stdvwr
    I think there is more to it than just separation of instructions and data. If we ask the model why does did it say that LiveOverflow broke the rules, it could answer "because ZetaTwo said so". This response would make perfect sense, and would demonstrate perfect text comprehension by the model. What could go wrong is the good old misalignment, when the prompt engineer wanted an AI to judge the comments, but the AI dug deeper and believed ZetaTwo's conclusion.
  • @kusog3
    I like how informative this video is. It dispels some misinformation that is floating around and causing unnecessary fear from all the doom and gloom or hype train people are selling. Instant sub!
  • Could you reduce the chance of your user name being selected by specifically crafting your user name to use certain tokens?
  • @bluesque9687
    Brilliant Brilliant channel and content, and really nice and likeable man, and good presentations!! Feel lucky and excited to have found your channel (obviously subscribed)!
  • @-tsvk-
    As far as I have understood, it's possible to prompt GPT to "act as a web service that accepts and emits JSON only" or similar, which makes the chat inputs and outputs be more structured and parseable.
  • @miserablepile
    So glad you made the AI infinitely generated website! I was just struck by that same idea the other day, and I'm glad to see someone did the idea justice!
  • @Fifi70
    Das war mit Abstand die bester Erklärung zu openAI dir ich bisher gesehen habe danke dir!
  • @cmilkau
    It is possible to have special tokens in the prompt that are basically the equivalent of double quotes, only that it's impossible for the user to type them (they do not correspond to any text). However, a LLM is no parser. It can get confused if the user input really sounds like a prompt.