aifaq.wtf

"How do you know about all this AI stuff?"
I just read tweets, buddy.

#evaluation

Page 1 of 3

Improving Search Ranking with Few-Shot Prompting of LLMs

#fine-tuning   #shortcuts   #local models   #models   #performance   #evaluation   #link  

This is good in combination with Hugging Face's Synthetic data: save money, time and carbon with open source.

Synthetic data: save money, time and carbon with open source

#synthetic data   #hugging face   #fine-tuning   #performance   #zero-shot classification   #few-shot classification   #classification   #evaluation   #link  

This post does a fantastic job breaking down how you use an expert labeler (teacher LLM) to annotate your data, then use it to fine-tune a student LLM. It's as good or better than crowd workers!

In this case they use Mixtral to prep data for RoBERTa-base, then get equal performance in the end. So much faster! So much cheaper!

Multilingual BERT has an accent: Evaluating English influences on fluency in multilingual models

#multilingual models   #languages   #evaluation   #paper   #link  

@LChoshen on January 04, 2024

#evaluation   #audio   #prompt injection   #tweets  

@fchollet on December 16, 2023

#evaluation   #reading list   #tweets  

@IanMagnusson on December 19, 2023

#nlp   #dialects   #evaluation   #tweets  

Paper here, data here

@GokuMohandas on September 13, 2023

#RAG   #evaluation   #retrieval-augmented generation   #tweets  

Hallucination Evaluation Leaderboard - a Hugging Face Space by vectara

#evaluation   #hallucinations   #link  

I did not know there was a hallucination leaderboard! You can read more about it on Cut the Bull…. Detecting Hallucinations in Large Language Models

@ClementDelangue on November 03, 2023

#models   #evaluation   #tweets  

@daveyalba on October 11, 2023

#models   #evaluation   #limitations   #tweets  

@_jasonwei on September 27, 2023

#evaluation   #tweets  

@OwainEvans_UK on September 22, 2023

#models   #evaluation   #limitations   #tweets  

Note that this is only for fine-tuned data, not content included in the prompt.

Can I take ducks home from the park?

#evaluation   #link  

@emollick on September 08, 2023

#evaluation   #behind the scenes   #prompt engineering   #tweets  

I think the way to think about prompt engineering is: what would the best teacher preface an instruction to a student with, if they really really wanted the student do do the best possible job?

The worst performer, at 62.7%:

Start by dissecting the problem to highlight important numbers and their relations. Decide on the necessary mathematical operations like addition, subtraction, multiplication, or division, required for resolution. Implement these operations, keeping in mind any units or conditions. Round off by ensuring your solution fits the context of the problem to ensure accuracy.

That is obviously an awful way to start a lesson or a test. Even if someone knows the answer they're going to lose their minds!

The best performer, at 80.2%:

Take a deep breath and work on this problem step-by-step.

So relaxing, so kind, so guaranteed to ensure high performance.

Large Language Models as Optimizers: Paper here

@restofworld on September 06, 2023

#translation   #languages   #evaluation   #tweets  

Just like how social media giants spend the majority of their time and resources regulating English-language content, there is definitely not enough attention paid to the abilities of AI tools in non-English languages for things other than translation.

A while back iMEdD analyzed political speeches with ChatGPT, translating them into English prior to the analysis. I had thought hey, you should just do it in the original Greek, but looking at this maybe I was wrong!