Voice user interface, or VUI, has recently dramatically increased in popularity.
VUI is making use of speech recognition technology in order to enable users to interact with devices by using just their voices.
Some virtual assistants like ‘Siri’ (Apple) and ‘Alexa’ (Amazon/Microsoft) have allowed VUI to reach a significant development step.
VUI allows efficient interactions which are more ‘human’ than any other form of the user interface such as mouse or keyboard since “Speech is the primary mean of human communication”
A VUI will generally be much faster than a traditional UI.
However, we still need to learn how to interact with computers using a voice the same way that we use a keyboard and mouse. After all, we certainly all have in mind a Sci-Fi movie or TV series where the hero speaks to the computer and gives it orders!
Let us, therefore, explore, in what follows, the fascinating and futuristic world of the Voice as a UI.
Table of Contents
A voice-user interface (VUI) is a system that allows spoken human interaction with computers. VUI typically is using speech recognition in order to understand spoken requests from the user and is also able to answer these requests through text or voice outputs.
A voice command device (VCD) is a device – such as a computer – which is controlled with a VUI.
Voice user interfaces are now appearing more and more everywhere, especially on newer smartphones. They are also integrated into recent automobiles, domotics (home automation), computers, home appliances (washing machines etc…) and TV remote controls.
VUI has been developed for a long time for PABX systems using software such as Voxeo or Asterisk for example. Such systems can dialog with a user and understand his/her requests automatically without the need to press a DTMF touch.
A VUI is, therefore, an interface to a speech application. It allows controlling a machine or website by language only, which was viewed as ‘Sci-Fi’ only a decade ago. VUI involves historically Artificial Intelligence and the development of computers of the fourth generation (which are still being researched and not yet in production).
The development of natural speech recognition and text-to-speech synthesis has allowed the popular adoption of several voice interfaces, allowing a full human-to-machine interaction without the traditional mouse or keyboard.
Compared to a traditional user interface, the development of a VUI is considerably harder. It involves not only computer science and programming skills but also linguistics (not everybody on earth is supposed to speak fluent English – yet) and psychology.
A VUI can only work if the tasks that it must be able to perform are clear and well-defined as well as the target audience. In other terms, a VUI is not a ‘spoken’ shell but a carefully designed list of keywords and sentences, which must be ergonomic and easy to pronounce (or understand).
Many VUI is catastrophic in terms of usability and can lead to a rapid abandonment of the device or of the application by the user. For instance, automatic text recognition is often extremely badly developed on many smartphones and may lead to several absurd situations with wrongly understood commands and inputs.
A VUI should be designed differently whether it is designed for the public or for advanced ‘power’ users. A VUI must be crafted and very carefully shaped for the type of business it focuses on. In the case of a website, for example, the design of the VUI should consider the business model of the website and its target audience, especially the impact of an error on the user. For example, when searching among a database of toys, some incorrectly processed sentences may be tolerated but certainly, this could not be tolerated when making financial transactions online!
Windows 7, Windows Vista and Windows 10 are providing speech recognition mechanisms. Microsoft created a mixed system where people can make use of voice to use less of their keyboard or mouse.
Windows 10 uses Cortana or Alexa as integrated voice assistants. They can activate background applications and allow users to interact with such applications.
The Windows VUI is based on ‘natural’ language rather than technical jargon. It works when the user makes simple, precise and concise sentences without using complicated terms.
Car technology is an important area of application for VUI. Voice commands allow a user to control several features of his/her car without being distracted from driving.
Such VUI is usually far more complex than other VUI. For example, Ford Sync VUI has 10,000 voice commands as opposed to a hundred for most typical VUI applications. The vast variety of tasks required by a driver make VUI very important in such context since automotive VUI can process tasks such as:
There are many virtual assistants understanding voice commands for bank users for instance. In such websites, the customer is guided vocally by the assistant or can interact with that assistant to get answers to simple questions.
For instance, Google has developed a very simple yet impressive VUI where users can say the search that they are looking for and the Google website will speak the best matching result. The accuracy of the VUI is simply astonishing. Some studies suggest that most web searches will be voice searches by the end of 2020.
VUI has been developed as part of smartphones in both IOS(Apple) and Android (Google). Users can activate or deactivate phone features by simply speaking a command.
Here is a list of some smartphone popular VUIs:
Robots are a very important field of application for Voices interfaces. The amount of commands a robot could receive is extremely vast and symbolic artificial intelligence may, therefore, be required besides speech recognition.
Robots can be commanded primarily to move in certain directions or to do some given tasks including errands for example.
Honda robotics among others is actively researching the topic.
Same as VUI for robotics, voice can be used to control certain features of a house like the temperature of the heating, opening or locking doors, activating alarms, etc. A certain level of trust and reliability is required so that accidents will not happen.
Owens, fridges, washing-machines can be equipped with VUI. It has the interest of quick and basic command without moving toward the appliance. In some way, it does not appear as a fundamentally important application.
Identical to Home appliances.
VUI for military applications encompasses rugged translators which can translate languages in ‘real-time’, allowing crews of different nationalities to interact and cooperate together.
Because of security issues, VUI isn’t really considered for use in military or combat while this could have definite advantages in terms of quick action/reaction during combat.
Here we list a few points to consider when designing a VUI.
Trust is fundamental when considering the interaction between a user and a machine. There are two factors to consider: Valid Outcome and User Control.
Valid outcome: When a user interacts with a VUI, they expect to get what was requested (“what you get is what you requested”). If the user receives a slightly different output than what was expected, doubt starts to be created, which is never a good factor.
When designing VUI, designers must think about the end goal of the voice interaction, in other terms: which goal the user wants to fulfill: searching for information, enabling features? Moving a device? Etc…
Even if it’s not possible to think about all the possible user requests, the designers should do their best to anticipate the requests and provide guided flows.
User control: Voice interaction is a big challenge for user interface designers. Anyway, the same good principles and best practices guiding a GUI design are often applicable to a VUI design. The need for a strong user control design stays therefore true in the VUI context. An error system must be in place so to “catch” any voice request not correctly processed. Visual or audio feedback must also be in place so to let the user know what is generally going on.
Users will use VUI because they primarily want to save time. Therefore, efficiency is the main target of such interfaces. To achieve that efficiency in this prominent SEO trend, the amount of thinking needed to use the interface must be reduced: avoiding long sentences, keeping just a few phrases so that humans can remember the response from the machine and also provide good help assistance.
Today many VUI are using sounds that appear to be too robotic. Using as much as possible “natural” speech synthesis is generally a good idea for designers. The interface should appear as human-friendly as possible.
VUI is the future, but they still require many technological improvements, and they are also quite difficult to efficiently design. They can be quite impressive and fast like with the Google voice search.
Acodez is a top leading website design agency in India. We offer all kinds of web design and web development services to our clients using the latest technologies. We are also a leading digital marketing agency providing SEO, SMM, SEM, Inbound marketing services, etc at affordable prices. For further information, please contact us.
Contact us and we'll give you a preliminary free consultation
on the web & mobile strategy that'd suit your needs best.