The dream of conversing with an artificially Intelligent Robot is not very far in time. The WaveNet, an artificially intelligent text-to-speech system was just announced by DeepMind, an Alphabet Inc.’s Google’s acquisition has reduced the gap between human speech patterns by 50 percent when considering the previous systems. The company said to synthesize other audio signals like music the same artificial intelligence model can be used.
To problems faced by the researchers was explained in a blog post on their website. 16,000 samples per second sampling speed are required for modeling the human voice along with numerous complexities for instance emotion patterns and scale etc. The autoregressive attempt where future sampling predictions are based on previous data is essentially been used by the researchers. By calculating probability distribution from this data a predictive network was built. As this method requires thousands of calculations per second this is a computationally expensive process. The text-to-speech systems in English and Chinese rank amongst the finest worldwide according to the company.
Speech technology has become a rewarding market with the rich implementation of artificial intelligence and virtual assistance. By 2017 the market for global voice recognition technology is expected to be $113.2 billion according to an estimate.