I recently learned of a Steam early access game In Verbis Virtus, where you cast magic by using your own voice. I also know that Linden Labs sees the future of Second Life in Voice, which is what they call their mechanism for people to use microphones to talk to each other in the virtual world.
Some time ago, I conducted interviews in Second Life for research. As I was laying out the plans for the interviews, I decided that I would accept interviews both through Voice and text chat. This decision was based on my earlier experience of Voice dividing people’s opinions. Especially people with hearing disabilities and people whose own voice was a significant mismatch with their avatar appearance, dislike Voice. For the latter, Second Life provides with voice transformers, which also helps those concerned over their privacy issues and want to maintain a distance between their Second Life identity and their real identity. The former would require some kind of speech-to-text service to provide them with subtitles for what the people are talking.
Speech-to-text technology is being worked on. Microsoft is providing automated real time translation service for Skype. Youtube provides for automatic captioning for uploaded videos. I have personal experience on the latter. The captioning is quite accurate, but does make mistakes at an estimated rate of 90% correctness, which is, of course, worse when the speaker has a stronger accent or speaks otherwise less clearly.
Of course, it isn’t justifiable to keep the amount of available modalities of computer mediated interaction limited for the sake of equality for a some people that have problems on applying certain modalities. One altogether different issue is the problems that all users can have with their real physical environment. When one is using a computer at night, with the rest of the family sleeping in adjacent rooms, one doesn’t want to disturb them by speaking on a microphone. When one is sitting in a train, bus, or other crowded environment, one shuns from disturbing the others as well, but also is concerned about personal privacy. One could easily type information on a device, but speaking out loud one’s personal information within a crowd is a repelling idea. You can vent to your virtual friend about the co-worker you dislike in text chat, but no so well by speaking to a microphone, in case of some friend of this co-worker happening to be in the crowd overhearing.
Delivering audio to the user is not so much of a problem. Earplugs can give the user an excellent quality of audio, with still being next to silent to other people around the user. This is becoming even better with noise cancelling techniques that remove the disturbance of the environment noise for the user and possibly further garble whatever the person seated right next to the user might be over hearing. The difficulty is in getting the user their own natural agency in the system. Speaking voice doesn’t only come out through the mouth, but some part of it emanates from the throat, nose and everywhere. You can test this by keeping your mouth shut tight and humming, for example to some melody. A microphone that would cancel out the user’s voice from their surroundings would have to be something like a full portrait space suit. More plausible option would be for the user to whisper, or otherwise speak without using their vocal cords in their throat. This, however, is not very natural, in addition to the technology for this kind of voice input not existing yet. A more science fiction concept would be some sort of cyber technology that would cover the user’s vocal cords and feed their intended signal to the computer, removing it from the physical location of the user. This would still leave the problem, where a speaking person typically both hears and physically feels their own voice.
Linden Labs would prefer their users immersing as fully as possibly in Second Life. In 2008 Linden Labs and IBM boasted about teleportation between two virtual worlds, the Linden Labs Second Life Preview Grid and an OpenSim virtual world server. However, nothing even on this field has been in the publicity ever since. This approach of bringing users into the service doesn’t appear valid, until William Gibson’s visions of direct to brain interface becomes reality, if ever. Today and tomorrow the services should be brought to the users, ubiquitous, in cloud, and mobile. The virtual world designers should focus on the users and the user experience more than the technology and the user’s technology experience.
The context switch required here is for the designers to perceive the user and their needs in the environment and context in which they would use the system. It doesn’t suffice that the designers only consider the system they are building and the technical possibilities and the internal context. This is also explained in Frans Mäyrä’s Contextual Game Experience Model, which I discussed earlier here. You shouldn’t design your own system for zombies to stray into, but a system for users with their own will, needs and contexts.
 IBM: Linden Lab and IBM Achieve Major Virtual World Interoperability Milestone. http://www-03.ibm.com/press/us/en/pressrelease/24589.wss 2008
 Mäyrä, Frans. “The contextual game experience: On the socio-cultural contexts for meaning in digital play.” In Proceedings of DiGRA. 2007. http://www.tay.fi/~tlilma/Mayra_Contextual_Game_Experience.pdf