Google’s new AI voices are nothing short of incredible

Wavenet

Taken from this Google Deepmind blog post, the above chart shows subjective scores of various ways of composing speech in different languages. 

As you can see, the new ‘WaveNet’ approach, which uses neural networks to synthesise speech at a rate of thousands of times per second. The result is mindblowingly good. In fact, I can think of several situations where I’d rather listen to this voice than the one to which I’m subjected!

From the related abstract of the (rather complicated) journal article the Google Deepmind team wrote about all this: 

When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Chinese. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments.

Fascinatingly, the system can be trained on anything audio-related, meaning that it can compose quite complicated Chopin-like pieces of music on the fly!