I know I love all of these, but this is a great thread to illustrate how these models aren't just a magic box we have no control over or understanding of.
I know I love all of these, but this is a great thread to illustrate how these models aren't just a magic box we have no control over or understanding of.
It's tough to make robust tests to evaluate machines if you're used to making assumptions based on adult humankind. The paper's title – Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models is a reference to a horse than did not do math.
I love these magic words. Read more here.
A "moat" is what prevents your clients from switching to another product.
As it stands in the immediate moment, most workflows are "throw some text into a product, get some text back." As a result, the box you throw the text into doesn't really matter – GPT, LLaMA, Bard – the only different is the quality of the results you get back.
Watch how this evolves, though: LLMs are going to add in little features and qualities that make it harder to jump to the competition. They might make your use case a little easier in the short term, but anything other than text-in text-out builds those walls a little higher.
Not that I know the details, but I have my doubts that BloombergGPT was even worth it. I think "maybe look at" is a little too gentle – if you think you need your own model, you don't.
Prompt engineering and even somewhat thoughtful engineering of a pipeline should take care of most of your use cases, with fine-tuning filling in any gaps. The only reason you'd train from scratch is if you're worried about the copyright/legal/ethical implications of the data LLMs were trained on – and if you're worried about that, I doubt you have enough data to build a model.