We've seen a lot of audio models in the past couple weeks, but this one is very cool!
Using this tiny, tiny sample of a voice...
...they were able to generate the spoken text below.
that summer’s emigration however being mainly from the free states greatly changed the relative strength of the two parties
Lots of other examples on the project page, including:
- Zero-shot TTS (Text To Speech)
- Spoken content editing
- Background-preserving spoken content editing
- Background noise removal
- Target speaker extraction
- Speech removal
I have no idea what the use case for speech removal is, but it's pretty good. Here's a remarkably goofy before/after: