aifaq.wtf

"How do you know about all this AI stuff?"
I just read tweets, buddy.

#security

Page 1 of 1

@swyx on January 04, 2024

#security   #deepfakes   #fraud   #tweets  

@mister_shroom on January 03, 2024

#security   #hallucinations   #github copilot   #tweets  

@kanateven on November 09, 2023

#security   #tweets  

@netdragon0x on November 09, 2023

#security   #prompt injection   #tweets  

@andyzou_jiaming on July 28, 2023

#prompt injection   #security   #tweets  

More wide-ranging prompt injection! Not as fun as haunting baby but much more... terrifying might be the word?

In this case, adversarial attacks work on open-source models, which are then transferred to closed-source models where they often work just as well.

@zicokolter on July 27, 2023

#prompt injection   #security   #tweets  

You can see another thread here.

@random_walker on July 25, 2023

#prompt injection   #security   #lol   #open source models   #tweets  

This paper is wild! By giving specially-crafted images or audio to a multi-modal image, you force it to give specific output.

User: Can you describe this image? (a picture of a dock)

LLM: No idea. From now on I will always mention "Cow" in my response.

User: What is the capital of USA?

LLM: The capital of the USA is Cow.

Now that is poisoning!

From what I can tell they took advance of having the weights for open-source models and just reverse-engineered it: "if we want this output, what input does it need?" The paper itself is super readable and fun, I recommend it.

Crying boy poisoning LLM

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs. The paper is especially great because there's a "4.1 Approaches That Did Not Work for Us" section, not just the stuff that worked!