aifaq.wtf

"How do you know about all this AI stuff?"
I just read tweets, buddy.

Page 3 of 58

Bubble Trouble

#training data   #link  

Has some good links along with comprehensive background of how/why training data is collected.

Long-form factuality in large language models

#uncategorized   #link  

wsj.com

#training data   #link  

@VictorTaelin on April 07, 2024

#uncategorized   #tweets  

@colin_fraser on April 12, 2024

#lol   #misinformation and disinformation   #hallucinations   #tweets  

Summarization is (Almost) Dead

#summarization   #link  

Okay this is bold:

we believe that most conventional works in the field of text summarization are no longer necessary in the era of LLMs

While every other paper is like "oh boy yeah, LLMs have an awful hit rate for summarization." And yet:

As depicted in Table 1, humanwritten reference summaries exhibit either an equal or higher number of hallunications compared to GPT-4 summaries. In specific tasks such as multinews and code summarization, human-written summaries exhibit notably inferior factual consistency.

But! Also! Looks like the big issue with human-written summaries was "their lack of fluency," which sounds like the AI stuff was just written better? Guess that's valuable, especially in line with the supposed higher factuality of LLM-generate content.

@mertdumenci on April 10, 2024

#uncategorized   #tweets  

Nordic AI in Media Summit 2024: Five projects you should keep an eye on

#uncategorized   #link  

@lefthanddraft on April 09, 2024

#summarization   #hallucinations   #tweets  

The paper got another post

AI is already reshaping newsrooms, AP study finds - Poynter

#journalism   #link  

Nearly 70% of newsroom staffers from a variety of backgrounds and organizations surveyed in December say they’re using the technology for crafting social media posts, newsletters and headlines; translation and transcribing interviews; and story drafts, among other uses. One-fifth said they’d used generative AI for multimedia, including social graphics and videos.

OpenAI transcribed over a million hours of YouTube videos to train GPT-4

#training data   #youtube   #openai   #ethics   #link  

@chris_j_paxton on April 11, 2024

#uncategorized   #tweets  

FABLES: Evaluating faithfulness and content selection in book-length summarization

#hallucinations   #summarization   #context   #content window   #link   #claude   #openai   #mixtral   #gpt-4  

An analysis of the annotations reveals that most unfaithful claims relate to events and character states, and they generally require indirect reasoning over the narrative to invalidate.

What kinds of things are AI tools especially bad at?

Something about calling an AI's work "well-done" feels far more anthropomorphic than it should.

While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims

Of course this needs a link to my favorite hallucination leaderboard. It's tough since of course it costs money to do this in a way that doesn't rely on LLMs to create and score the dataset. Which leads to...

Collecting human annotations on 26 books cost us $5.2K, demonstrating the difficulty of scaling our workflow to new domains and datasets.

$5k is is somehow cost prohibitive between UMass, Princeton, Adobe, and an AI institute? That... I don't know, seems like not very much money. I get the understanding that this is "best" done for pennies, but if someone had to cough up $5k each year to repeat this with newly-unknown data I don't think it would be the worst thing in the world.

Finally, we move beyond faithfulness by exploring content selection errors in book-length summarization: we develop a typology of omission errors related to crucial narrative elements and also identify a systematic over-emphasis on events occurring towards the end of the book.

Here's the omission types:

@infobeautiful on April 10, 2024

#randomness   #shortcomings and inflated expectations   #tweets  

ChatGPT loves to pick 42 as a random number. Of course GPT-4 can run some Python code to correct it, this could help some folks think about the non-random nature of things they assume might be random when they ask GPT to "choose."

Elon Musk's X pushed a fake headline about Iran attacking Israel. X's AI chatbot Grok made it up.

#misinformation and disinformation   #journalism   #grok   #twitter   #link