aifaq.wtf

"How do you know about all this AI stuff?"
I just read tweets, buddy.

Want suggestions? Try bias, labor, hallucinations, or tragicomedy

Air Canada must honor refund policy invented by airline’s chatbot

#uncategorized   #link  

@minimaxir on February 19, 2024

#uncategorized   #tweets  

@jjvincent on February 19, 2024

#uncategorized   #tweets  

Your AI Girlfriend Is a Data-Harvesting Horror Show

#uncategorized   #link  

GitHub - huggingface/cookbook: Open-source AI cookbook

#uncategorized   #link  

System prompt - Pastebin.com

#chatgpt   #system prompt   #prompt engineering   #link  

The infinitely long, infinitely boring ChatGPT system prompt. Lots of little nuggets that would be great for presentations about the hows and whys of behind the scenes:

Your choices should be grounded in reality. For example, all of a given occupation should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.

@Abebab on February 07, 2024

#common crawl   #data sources   #training data   #training   #tweets   #Abebab   #Abeba Birhane  

Open-source data curation platform for LLMs

#annotation   #fine-tuning   #link  

I guess it's Prodigy but at some sort of scale. Or LabelStudio but every single plan demands you to contact them for pricing.

Except Hugging Face says it's "a free interface for validating and cleaning unstructured LLM outputs" so maybe it's just the hosted one that costs [a lot of] money. Could I explore it? Yes! Have I done it? No!

@ddimolfetta on February 08, 2024

#deepfakes   #generative audio   #robocalls   #regulation   #fcc   #tweets   #David DiMolfetta   #ddimolfetta  

This would be nice if the FCC would regulate spam calls at all. Can't we just do this from overseas, spoof everything, and there we go? If only there wasn't a perverse financial incensive for carriers to let anything onto their networks...

@AndrewCurran_ on February 06, 2024

#uncategorized   #tweets  

@heraclitus137 on February 04, 2024

#uncategorized   #tweets  

Improving Search Ranking with Few-Shot Prompting of LLMs

#fine-tuning   #shortcuts   #local models   #models   #performance   #evaluation   #link  

This is good in combination with Hugging Face's Synthetic data: save money, time and carbon with open source.

@RosenzweigJane on February 09, 2024

#uncategorized   #tweets  

Maria Antoniak

#uncategorized   #link  

Synthetic data: save money, time and carbon with open source

#synthetic data   #hugging face   #fine-tuning   #performance   #zero-shot classification   #few-shot classification   #classification   #evaluation   #link  

This post does a fantastic job breaking down how you use an expert labeler (teacher LLM) to annotate your data, then use it to fine-tune a student LLM. It's as good or better than crowd workers!

In this case they use Mixtral to prep data for RoBERTa-base, then get equal performance in the end. So much faster! So much cheaper!