aifaq.wtf

"How do you know about all this AI stuff?"
I just read tweets, buddy.

SpeechX - Microsoft Research

www.microsoft.com

Neural Codec Language Model as a Versatile Speech Transformer SpeechX is a versatile speech generation model leveraging audio and text prompts, which can deal with both clean and noisy speech inputs and perform zero-shot TTS and various tasks involving transforming the input speech. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting. […]

SpeechX - Microsoft Research

August 13, 2023

We've seen a lot of audio models in the past couple weeks, but this one is very cool!

Using this tiny, tiny sample of a voice...

...they were able to generate the spoken text below.

that summer’s emigration however being mainly from the free states greatly changed the relative strength of the two parties

Lots of other examples on the project page, including:

Zero-shot TTS (Text To Speech)
Spoken content editing
Background-preserving spoken content editing
Background noise removal
Target speaker extraction
Speech removal

I have no idea what the use case for speech removal is, but it's pretty good. Here's a remarkably goofy before/after:

Source: https://www.microsoft.com/en-us/research/project/speechx/