State of Natural Speech and Italy

IM.jpg

Last week, I had the opportunity to attend and speak at InteractiveMedia’s Speech Workshop in Rome.  Speech recognition has become much more mainstream and accepted in the past 5 years, due mainly to the industry making huge strides in getting speech recognition to work, and also frankly to it being more ubiquitous, such as in cars, so people are more comfortable with it. 

While I used to be very embedded in the speech scene, I haven’t been lately.  In fact, IM asked me to speak about Dialogic’s cloud experiences and what we’re seeing there.  But the workshop was very interesting for me and even though the entire day was in Italian, here are some key themes that I picked up, or shall I say, some key themes that crystallized for me even though maybe this isn’t what the speakers were talking about!

First, the movement to mobile is increasing the usage of speech technologies.  As I said above, speech is more ubiquitous now in cars than ever before and that’s because when you are driving, you can’t really be hitting buttons.  So it’s hard to create DTMF tones when talking on a mobile phone too. And Smartphones increase the use of speech technologies as well – one reason is 3G has both a voice and data channel as any US consumer will know from the AT&T commercials about 3G and the iPhone – and another reason is because as Smartphones become more vehicles for mobile payments, then speech recognition will be the speaker verification vehicle, so you wouldn’t need a PIN.  Your unique speech tones, such as your unique fingerprint, will be the verification.

Another theme I saw was that virtual agents (i.e. pieces of software that maybe look like a human and also “listen” to you and “talk” to you) are able to be used to help with specific tasks.  They don’t make mistakes.  And in some cases, it might be better to give personal information to a computer rather than a human being.  Sure, live agents are still required, and will be for some time, to deal with complex requests and probably accents (though I saw some cool demos about speaking a language, say English, with an Italian accent, etc. and it sounded like an Italian talking English), but virtual agents for specific tasks are cost-effective and they work.

I also got a low-down on the latest standards, such as EmotionML, EMMA, SCXML, VoiceXML 3.0 and HTMLspeech.

All in all, the speech industry remains extremely vibrant and innovative.  And we are now moving into true multi-modal support, and true interactivity.  I’m sure this will continue to move forward and provide increased support for all of us, and even more new, innovative applications such as providing support for mobile banking.

| 0 Comments | 0 TrackBacks

Listed below are links to sites that reference State of Natural Speech and Italy:

State of Natural Speech and Italy TrackBack URL : http://blog.tmcnet.com/mt/mt-tb.cgi/44985

Around TMCnet:

Leave a comment

About this Entry

This page contains a single entry by Jim Machi published on April 5, 2011 9:54 AM.

A Unified Communications Outlook was the previous entry in this blog.

When is the Bus Coming and How Does that Relate to Innovation? is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Around TMCnet Blogs

Latest Whitepapers

TMCnet Videos