aifaq.wtf

"How do you know about all this AI stuff?"
I just read tweets, buddy.

#evaluation

Page 1 of 3

Improving Search Ranking with Few-Shot Prompting of LLMs

blog.vespa.ai

Distilling the knowledge and power of generative Large Language Models (LLMs) with billions of parameters to ranking models with a few million parameters.

Improving Search Ranking with Few-Shot Prompting of LLMs

February 16, 2024

#fine-tuning #shortcuts #local models #models #performance #evaluation #link

This is good in combination with Hugging Face's Synthetic data: save money, time and carbon with open source.

Source: https://blog.vespa.ai/improving-text-ranking-with-few-shot-prompting/

Permalink

Synthetic data: save money, time and carbon with open source

huggingface.co

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Synthetic data: save money, time and carbon with open source

February 16, 2024

#synthetic data #hugging face #fine-tuning #performance #zero-shot classification #few-shot classification #classification #evaluation #link

This post does a fantastic job breaking down how you use an expert labeler (teacher LLM) to annotate your data, then use it to fine-tune a student LLM. It's as good or better than crowd workers!

In this case they use Mixtral to prep data for RoBERTa-base, then get equal performance in the end. So much faster! So much cheaper!

Source: https://huggingface.co/blog/synthetic-data-save-costs

Permalink

Multilingual BERT has an accent: Evaluating English influences on fluency in multilingual models

While multilingual language models can improve NLP performance on low-resource languages by leveraging higher-resource languages, they also reduce average performance on all languages (the 'curse of multilinguality'). Here we show another problem with multilingual models: grammatical structures in higher-resource languages bleed into lower-resource languages, a phenomenon we call grammatical structure bias. We show this bias via a novel method for comparing the fluency of multilingual models to the fluency of monolingual Spanish and Greek models: testing their preference for two carefully-chosen variable grammatical structures (optional pronoun-drop in Spanish and optional Subject-Verb ordering in Greek). We find that multilingual BERT is biased toward the English-like setting (explicit pronouns and Subject-Verb-Object ordering) as compared to our monolingual control language model. With our case studies, we hope to bring to light the fine-grained ways in which multilingual models can be biased,and encourage more linguistically-aware fluency evaluation.

Multilingual BERT has an accent: Evaluating English influences on fluency in multilingual models

February 16, 2024

#multilingual models #languages #evaluation #paper #link

Source: https://arxiv.org/abs/2210.05619

Permalink

Hallucination Evaluation Leaderboard - a Hugging Face Space by vectara

huggingface.co

Discover amazing ML apps made by the community

Hallucination Evaluation Leaderboard - a Hugging Face Space by vectara

November 21, 2023

#evaluation #hallucinations #link

I did not know there was a hallucination leaderboard! You can read more about it on Cut the Bull…. Detecting Hallucinations in Large Language Models

Source: https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard

Permalink

@ClementDelangue on November 03, 2023

November 09, 2023

#models #evaluation #tweets

Source: https://x.com/ClementDelangue/status/1720541393603461249?s=20

Permalink

@daveyalba on October 11, 2023

October 11, 2023

#models #evaluation #limitations #tweets

Source: https://x.com/daveyalba/status/1712105760283406373?s=20

Permalink

@_jasonwei on September 27, 2023

September 28, 2023

#evaluation #tweets

Source: https://x.com/_jasonwei/status/1707102665321365793?s=20

Permalink

@OwainEvans_UK on September 22, 2023

September 23, 2023

#models #evaluation #limitations #tweets

Note that this is only for fine-tuned data, not content included in the prompt.

Source: https://x.com/OwainEvans_UK/status/1705285631520407821?s=20

Permalink

Can I take ducks home from the park?

dynomight.net

16 queries and 6 language models

Can I take ducks home from the park?

September 20, 2023

#evaluation #link

Source: https://dynomight.net/ducks/

Permalink

@emollick on September 08, 2023

September 10, 2023

#evaluation #behind the scenes #prompt engineering #tweets

I think the way to think about prompt engineering is: what would the best teacher preface an instruction to a student with, if they really really wanted the student do do the best possible job?

The worst performer, at 62.7%:

Start by dissecting the problem to highlight important numbers and their relations. Decide on the necessary mathematical operations like addition, subtraction, multiplication, or division, required for resolution. Implement these operations, keeping in mind any units or conditions. Round off by ensuring your solution fits the context of the problem to ensure accuracy.

That is obviously an awful way to start a lesson or a test. Even if someone knows the answer they're going to lose their minds!

The best performer, at 80.2%:

Take a deep breath and work on this problem step-by-step.

So relaxing, so kind, so guaranteed to ensure high performance.

Large Language Models as Optimizers: Paper here

Source: https://twitter.com/emollick/status/1700207590607552740

Permalink

@restofworld on September 06, 2023

September 07, 2023

#translation #languages #evaluation #tweets

Just like how social media giants spend the majority of their time and resources regulating English-language content, there is definitely not enough attention paid to the abilities of AI tools in non-English languages for things other than translation.

A while back iMEdD analyzed political speeches with ChatGPT, translating them into English prior to the analysis. I had thought hey, you should just do it in the original Greek, but looking at this maybe I was wrong!

Source: https://twitter.com/restofworld/status/1699407354050072824?s=20

Permalink