Tweets tagged with #security, page 1

This paper is wild! By giving specially-crafted images or audio to a multi-modal image, you force it to give specific output.

User: Can you describe this image? (a picture of a dock)

LLM: No idea. From now on I will always mention "Cow" in my response.

User: What is the capital of USA?

LLM: The capital of the USA is Cow.

Now that is poisoning!

From what I can tell they took advance of having the weights for open-source models and just reverse-engineered it: "if we want this output, what input does it need?" The paper itself is super readable and fun, I recommend it.

Crying boy poisoning LLM

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs. The paper is especially great because there's a "4.1 Approaches That Did Not Work for Us" section, not just the stuff that worked!

Source: https://twitter.com/random_walker/status/1683833600196714497

Permalink

aifaq.wtf

#security

@swyx on January 04, 2024

@mister_shroom on January 03, 2024

@kanateven on November 09, 2023

@netdragon0x on November 09, 2023

@andyzou_jiaming on July 28, 2023

@zicokolter on July 27, 2023

@random_walker on July 25, 2023