"How do you know about all this AI stuff?"
I just read tweets, buddy.
The infinitely long, infinitely boring ChatGPT system prompt. Lots of little nuggets that would be great for presentations about the hows and whys of behind the scenes:
Your choices should be grounded in reality. For example, all of a given occupation should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.
Except Hugging Face says it's "a free interface for validating and cleaning unstructured LLM outputs" so maybe it's just the hosted one that costs [a lot of] money. Could I explore it? Yes! Have I done it? No!
This is good in combination with Hugging Face's Synthetic data: save money, time and carbon with open source.
This post does a fantastic job breaking down how you use an expert labeler (teacher LLM) to annotate your data, then use it to fine-tune a student LLM. It's as good or better than crowd workers!
In this case they use Mixtral to prep data for RoBERTa-base, then get equal performance in the end. So much faster! So much cheaper!