aifaq.wtf

"How do you know about all this AI stuff?"
I just read tweets, buddy.

#prompt injection

Page 1 of 1

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

#ASCII art   #jailbreaks   #prompt injection   #hacks   #prompt engineering   #lol   #link  

I honestly though that ASCII art didn't work that well for LLMs! But maybe they're just bad at generating it, not reading it? In this case, the semantics of building a bomb makes it through the alignment force field:

ArtPrompt attack

And yeah, it's still bad at generating ASCII art. So at least we can still employ humans for one thing.

Build a bombe

Build a bombe

Gandalf | Lakera – Test your prompting skills to make Gandalf reveal secret information.

#games   #prompt injection   #jailbreaks   #link  

I ran across the "trick the LLM" game again and realized I never posted it here! It's great.

Tricking the LLM into revealing the password

@LChoshen on January 04, 2024

#evaluation   #audio   #prompt injection   #tweets  

@ChrisJBakke on December 17, 2023

#lol   #prompt injection   #tweets  

@VictoriqueM on December 17, 2023

#prompt injection   #lol   #tweets  

@netdragon0x on November 09, 2023

#security   #prompt injection   #tweets  

@venturetwins on October 06, 2023

#prompt injection   #lol   #tweets  

@josephofiowa on August 04, 2023

#shortcomings and inflated expectations   #prompt injection   #lol   #tweets  

🪿 🪿 🪿

@hwchase17 on August 03, 2023

#prompt injection   #tweets  

@andyzou_jiaming on July 28, 2023

#prompt injection   #security   #tweets  

More wide-ranging prompt injection! Not as fun as haunting baby but much more... terrifying might be the word?

In this case, adversarial attacks work on open-source models, which are then transferred to closed-source models where they often work just as well.

@zicokolter on July 27, 2023

#prompt injection   #security   #tweets  

You can see another thread here.

@random_walker on July 25, 2023

#prompt injection   #security   #lol   #open source models   #tweets  

This paper is wild! By giving specially-crafted images or audio to a multi-modal image, you force it to give specific output.

User: Can you describe this image? (a picture of a dock)

LLM: No idea. From now on I will always mention "Cow" in my response.

User: What is the capital of USA?

LLM: The capital of the USA is Cow.

Now that is poisoning!

From what I can tell they took advance of having the weights for open-source models and just reverse-engineered it: "if we want this output, what input does it need?" The paper itself is super readable and fun, I recommend it.

Crying boy poisoning LLM

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs. The paper is especially great because there's a "4.1 Approaches That Did Not Work for Us" section, not just the stuff that worked!

June 7, 2023: @goodside

#prompt injection   #models   #limitations