Microsoft unveiled the next major version of Skype with real-time translation technology on Tuesday at the Code Conference in Rancho Palos Verdes, California. While you won't be able to chat with your favorite Andorian or Tellarite via sub-space for at least another century, you will be able to speak to someone in Germany or France in their native tongue by the end of 2014. The best part is you won't have to spend weeks learning a new language with Rosetta Stone.
The product allows two people to converse in any two of over 40 languages. In video demos released by Microsoft, the translation appears to be as fast and accurate as a trained human intermediary. The technology is based on the voice-to-text or speech recognition engine that drives Microsoft's Cortana. Cortana is the new virtual personal assistant built into Windows Phone 8.1 and will be released to the general public sometime this summer. Microsoft plans to release Skype with translation technology by the end of 2014.
I can't get my hands on the next version of Skype yet, but as a software developer I do have access to the next best thing - the developer release of Cortana. I've been playing with this updated speech recognition technology for several weeks and I can tell you that it is nothing short of amazing. I've always been an enthusiast of natural language interaction with computers. Geeks aged 35 and older probably remember watching the crew of the Enterprise interact with the computer like it was a virtual crew member. Well, that was supposed to be a couple hundred years in the future.
Just last year I did a project that involved voice-to-text transcription services. We tried out several market leading technologies. Some were better than others, but none was capable of near 100% recognition. I told our client that we might be 10 years from meaningful advances. I was right and I was wrong. The massive improvement from the Windows Phone 8 to 8.1 technology is not incremental. It represents a total paradigm shift in the way researchers are approaching the problem of speech recognition. I won't bore you with all the technobabble details, but this new approach, called deep learning, is being applied to lots of computer science problems. This is the real story. It means that in the next few years you can expect surprisingly rapid acceleration and innovation in the high-tech world.
What does that mean for you? Better phones, more engaging games and entertainment, and devices we haven't even dreamed of. Microsoft already had great language translation tech. They already had great text-to-speech tech. All they did was add the first truly great voice-to-text tech to that mix and POW, a new product straight from the world of science fiction. We might still be a few hundred years from the first warp engine and first contact, but we'll be ready thanks to the brilliant engineers at Microsoft Research.