If you're interested in this kind of thing, a term to search for is "NLP and under-resourced languages." It absolutely goes well beyond that, but it's a good starting point.
I'd just like to note that every single time there have been big error-ful articles produced by AI, it was always in "collaboration with" or "edited by" a human, despite the remarkable number of errors. In its current state this stuff only works on reader-facing content if you invest a lot of the resources that you thought you'd be saving on.
It's tough to make robust tests to evaluate machines if you're used to making assumptions based on adult humankind. The paper's title – Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models is a reference to a horse than did not do math.