Taking hold of the future: gesture user interface design for extended reality
I’ve been interested in gesture/touchless user interfaces for a while. Especially the use of hand gestures and body movement as inputs to interact with computer programs. The first time I can remember seeing an actual demonstration of this on the big screen was Johnny Mnemonic. The manipulation of virtual displays and data in virtual reality using interactive gloves was so futuristic… even in 1995.
But it wasn’t until 2002, when Tom Cruise donned a pair of gloves in Minority Report, that the public really got a feel (pardon the pun) for how our bodies and natural movements could be used to interact with virtual content.
Since then, science fiction films have provided many more examples of how gestural interfaces could enhance work, play and retail in the future. A few of these examples includ
Lawnmower Man, Film, 1992.
Firefly, TV Series, 2002.
Iron Man, Film, 2008.
Prmoetheus, Film, 2012.
Ghost in the Shell, Film, 2017.
These visions are a great source of inspiration for gesture design in the augmented and mixed reality space. As a visual prototype they explore:
Physical interactions with digital/virtual content.
Use of hardware and peripheral accessories for input and output (eyewear, gloves, helmets, body suits).
Methods for accessing digital content (retrieve, display, navigate, share, archive).
Use cases (how different users might interact with the same system).
Potential constraints (spatial, security, permissions).
However, the designers and art directors who create these amazing visions of the future are designing deliberately for the big screen. That is, their designs are more of a storytelling tool than a true user interface.
For example, Jayse Hansen is a creative art director who specialises in what he calls FUI (Fake User Interfaces). He has worked on many films including Iron Man, Avengers, Hunger Games, Guardians of the Galaxy and Star Wars. He explains the difference between designing an FUI versus a real UI in an interview with The Next Web.
“You have to allow yourself the freedom to go far outside what’s ‘real’ and enter into something that’s more fantastical. A lot of the time, with film UI’s, you’re attempting to show what’s going on behind the screen; to show graphically what the computer is doing. Whereas, with real UI’s, you’re usually attempting to hide it. So allowing for non-real, non-usable creativity is essential to a good story-telling UI. Someone once called it an NUI — or non-usable-interface — and I kind of liked that. You do have to break some usability rules to make it dynamic enough.”
It’s great to get inspiration from film, but we must acknowledge that these mixed reality designs are not necessarily going to work in real life unless they have taken into account real use cases.
A more in depth yet still fictional exploration of gesture design can be found in Heavy Rain, a video game released in 2010. The main character used augmented reality to access virtual data in both real world and virtual environments. (Note: floating white symbols denote game control interface while orange symbols, objects and text represent AR assets).
Heavy Rain, Walkthrough. Augmented reality example at 5:20 timestamp.
Gesture design and user input
Several years ago I set out to learn more about gesture design. But I wanted to start with the basics of user input. I wanted to understand the fundamentals of how hands and fingers communicate in 3D space. So while living in Singapore I took a beginner’s class in sign language at the Singapore School for the Deaf.
I’ve always found sign language fascinating. As a form of communication without spoken or written words, it seemed to me like a secret world of coded messages relying on swift yet subtle hand movements. I thought that if I could convince someone else to learn sign language with me, we would be able to exchange all kinds of secret messages, even in the presence of others. But it turned about to be a lot tougher than I anticipated.
Firstly, my fingers were exhausted by the end of the first lesson. Making those gestures and switching between them quickly, reminded me of learning to play a musical instrument. I was surprised by how quickly my hands and fingers began to physically fatigue.
Some alphabet signs. “Signing Exact English” by Gerilee Gustason and Esther Zawolkow. Illustrations by Lilian Lopez. 1993, Modern Signs Press.
Secondly, as anyone who has learned a second language knows from experience, learning how to express yourself is only half the exercise. You must also learn to understand someone else and what they’re trying to communicate. In sign language, you quickly learn to recognise the same hand gestures and movements when they are formed by another person. You also learn to appreciate the subtly of an incorrectly positioned thumb or overlapping finger. I also learned that good sign language is just like any other form of communication: to prevent misunderstandings you must be clear and direct with your expression.
Learning all those hand positions required my brain to rewire in what seemed like a really unnatural way – at first. You can almost feel your brain trying to “talk” to your fingers. I had to exercise my procedural memory, forcing my muscles to form, practice and recall those shapes and movements.
I also realised that an important part of sign language isn’t even about your hands. If you’re signing to say that you don’t feel well, you have to express that in your face with a sad or pained expression. If you think about it, congruence between words and expression is important even in communication between two non-signing people. For example, if I said “I’m really upset with you” but smiled as I spoke.
Even though it was difficult, I also avoided trying to “translate” each sign through my brain in English, but tried to associate the concept/noun/verb directly with the hand gesture.
I learned that there’s a compound effect in sign language as you add each additional element such as movement, another hand and/or facial expression:
One hand: shape
One hand: shape + movement
Two hands: shape + movement + interaction between hands
Two hands and face: shape + movement + interaction + facial expressions
Although a beginner’s class was only just a taste of a rich and varied language, I learned a lot about user experience, user input, gestures and how to sign Tyrannosaurus Rex. But what about software and devices that can “translate” our gestures?
Leap Motion
Launched in 2012, Leap Motion is a sensor device that recognises hand gestures without physical touch and transforms these into user inputs.
For example, the software can not only recognise hand and finger locations or movement but it also recognises the discrete motion of a finger tracing a circle in space as a Circle gesture.
Circle gesture via Leap Motion.
The Leap Motion device connects to a computer via USB and has more recently been used by clever developers to track hand gestures and recognise user inputs in virtual reality. Not surprisingly Leap Motion have since created a VR-specific developer kit. This opens the possibility of registering user inputs beyond the native controllers specific to each VR platfor
Orion Blocks Demo, Leap Motion.
The company has also developed a Leap Motion Interaction Engine, a layer which operates between the Unity game engine and real-world hand physics, also publishing a nifty guide on their principles of interaction design.
6 Principles of Leap Motion Interaction Design.
Microsoft Hololens Design Guide
Microsoft has published a design guide for Hololens which includes designing gesture inputs. The system only recognises a few gestures: Ready Mode, Air Tap (similar to a mouse click), Air Tap and Hold (similar to mouse click and hold/drag), and Bloom (return Home). But when used in conjunction with user gaze and voice commands, these gestures become more powerful.
What does the future of gesture hold for mixed reality?
Tango is Google’s AR platform. From what I’ve seen in demos, it’s a very powerful platform but only available on two mobile devices: Lenovo Phab 2 Pro and Asus Zenfone AR. But today marked an interesting turning point in Google’s AR journey. Google announced ARCore, a baked-in augmented reality platform for Android developers. This is obviously Google’s response to the hype around Apple’s ARKit which has fuelled the imagination of developers around the world. However both ARKit and ARCore (to the best of my knowledge) rely on touch inputs via the mobile screen. But therein lies an user interaction issue.
While using your mobile phone, you aren’t handsfree. You’ll need at least one hand to hold it. So now you’re reduced to touch inputs via one hand, while trying to position the phone with your other hand so you can see the real world through the phone’s camera.
Even if you could strap your phone to your head (à la Google Cardboard) freeing both hands AND if your phone could recognise hand gestures in the real world, your phone would still only be able to recognise gestures that are within the camera’s field of view.
“Measuring the Field of View for the iPhone 6 Camera” Wired.com 15 May 2015.
So you’re back at the same problem that Hololens and Meta face: if it can’t see your hands, it won’t register the inputs.
But Google, being Google, have also been busy experimenting with touchless gestures as the ultimate in future user inputs.
Project Soli is one such experiment. Created by Google’s Advanced Technology and Projects group (ATAP) the team have investigated the use of miniature radar to detect gesture interactions without touch. The subtle fine movements that the system is able to recognise and respond to, is really quite beautiful to watch.
Also from Google’s ATAP team is Project Jacquard. In this video, we see Ivan Poupyrev, describe how they created a new interactive system by weaving the technology into the fabric itself. As he says, “If you can hide or weave interactivity and input devices into the materials that will be the first step to making computers and computing invisibly integrated into objects, materials, and clothing”.
Can you imagine clothing that recognises simple touch gestures?
In a future where everyone wears a mixed reality headset, this type of technology would certainly help address user fatigue that comes from waving their hands in the air all the time. The user could assume a natural posture and place their hands where it feels comfortable and natural. For example, while sitting on a chair it’s natural to rest your hands on your thighs or fold your arms. All you’d have to do then, is brush your jacket sleeve or tap your knee to open an app.
Brushing fabric is a very gentle, quiet and subtle interaction. I may be a technophile but I do cringe at the thought of using Hololens in a public place. Interacting with mixed reality content is uber cool but when you’re the only one who can see that content, you look…. well, kinda weird.
Microsoft Hololens Review by James Mackie.
A summary of UX considerations
It’s important to note that while gesture input is great, it still has some drawbacks.
Fatigue. As mentioned previously, user fatigue is probably the biggest issue. Your hands make small intricate movements which are repeated over time. Your arms also get tired from making large gestures or just maintaining your hands position in the air. This is an important lesson for MXR gesture design. We must consider the user and their ongoing comfort when interacting with our system.
Lack of physical feedback. Unlike pressing a button in real life, there is no haptic/tactile feedback to let the user know that a button has indeed been pressed. I imagine that physical resistance would be important in use cases such as a surgeon conducting remote surgery via a robot. This is where haptic gloves or other body wear that provides the user with physical feedback, would still be useful and preferred.
Complexity. I wouldn’t advise anyone design a user interface that relies on too many different finger or hand gestures. Keeping it simple helps everyone: designers, developers and the end user. But simple doesn’t mean limited. Think about the traditional computer mouse with only 2-button click and scroll functionality. Combined with software that indicates where and when to click on a website, provides a much greater range of possible interactions.
Coordination. It can also be difficult for the user to initially coordinate movements if they’re manipulating virtual objects. For example, when I played around with a Zapbox demo I found it tricky to move my hand through z space and “find” the point at which a real object intersected with a virtual one.
The recent developments in this space are very exciting. There’s a lot to consider from a UX point of view when designing mixed reality experiences. As science fiction author Arthur C. Clarke famously once said, “Any sufficiently advanced technology is indistinguishable from magic.” As the future of mainstream mixed reality comes closer, so too does the ability for gestural interfaces to become more like… magic. Who knows, I might one day be able to summon a cloud just as my child hero Monkey once did.