The craziest part is probably the "2x decrease in false statements". The whole thread is an interesting read, though, that illustrates LLMs are not as good as you think they are.
You can follow this up with 404 Media's Hugging Face Removes Singing AI Models of Xi Jinping But Not of Biden
I did not know there was a hallucination leaderboard! You can read more about it on Cut the Bull…. Detecting Hallucinations in Large Language Models