Yesterday, I attended SpeechTek in New York. With all the hype and bling in our industry revolving around video and data, SpeechTek makes no bones about expanding beyond voice to include “multimodal” self service. However, Siri has revitalized speech itself, and I found the show fairly vitalized because of this. This show is definitely different than it used to be.
The SpeechTek website explains this change best: “Smartphone and tablet applications provide convenient and intuitive interactions by allowing customers to use different input methods (talk, touch, and type) and see and listen to results. Multimodal applications have raised customer expectations for service across all modalities, and organizations need to understand and deploy self-service technologies wisely to meet the needs of today’s connected customers. That’s why SpeechTEK 2012 is expanding its focus beyond IVR to include smartphone and tablet applications.” The show did execute on this promise, at least what I saw.
While I spoke on a panel about HD Voice, I also took the time to attend some sessions. One session in particular about Advanced Spoken Language Research, done by someone from ICSI (International Computer Science Institute), was interesting.
2012 is the 60th Birthday of Speech Recognition. Speech Recognition has come a long way since its early beginnings, due to technology improvements (processing power) and research.
Current “hot” topics in Speech are as follows. I list them here since you as a reader may think of some cool, money-making application:
- Speech Retrieval – searching the web for spoken words (i.e. looking for words in a YouTube video)
- Speech Synthesis – computer talking back to you intelligently
- Speaker identification – for passwords, etc.
- Non-linguistic information – for instance, detecting lying (better than humans)
- Speaker diarization – who is speaking when, in a continuous stream of speakers
So all you innovators, let’s get going!