The uncanny valley hypothesis predicts that an entity appearing almost human risks eliciting cold, eerie feelings in viewers. Categorization-based stranger avoidance theory identifies the cause of this feeling as categorizing the entity into a novel category. This explanation is doubtful because stranger is not a novel category in adults; infants do not avoid strangers while the category stranger remains novel; infants old enough to fear strangers prefer photographs of strangers to those more closely resembling a familiar person; and the uncanny valley’s characteristic eeriness is seldom felt when meeting strangers. We repeated our original experiment with a more realistic 3D computer model and found no support for categorization-based stranger avoidance theory. By contrast, realism inconsistency theory explains cold, eerie feelings elicited by transitions between instances of two different, mutually exclusive categories, given that at least one category is anthropomorphic: Cold, eerie feelings are caused by prediction error from perceiving some features as features of the first category and other features as features of the second category. In principle, realism inconsistency theory can explain not only negative evaluations of transitions between real and computer modeled humans but also between different vertebrate species.
We use gestures frequently in daily life—to interact with people, pets, or objects. But interacting with computers using mid-air gestures continues to challenge the design of touchless systems. Traditional approaches to touchless interaction focus on exploring gesture inputs and evaluating user interfaces. I shift the focus from gesture elicitation and interface evaluation to touchless interaction mechanics. I argue for a novel approach to generate design guidelines for touchless systems: to use fundamental interaction principles, instead of a reactive adaptation to the sensing technology. In five sets of experiments, I explore visual and pseudo-haptic feedback, motor intuitiveness, handedness, and perceptual Gestalt effects. Particularly, I study the interaction mechanics in touchless target selection. To that end, I introduce two novel interaction techniques: touchless circular menus that allow command selection using directional strokes and interface topographies that use pseudo-haptic feedback to guide steering–targeting tasks. Results illuminate different facets of touchless interaction mechanics. For example, motor-intuitive touchless interactions explain how our sensorimotor abilities inform touchless interface affordances: we often make a holistic oblique gesture instead of several orthogonal hand gestures while reaching toward a distant display. Following the Gestalt theory of visual perception, we found similarity between user interface (UI) components decreased user accuracy while good continuity made users faster. Other findings include hemispheric asymmetry affecting transfer of training between dominant and nondominant hands and pseudo-haptic feedback improving touchless accuracy. The results of this dissertation contribute design guidelines for future touchless systems. Practical applications of this work include the use of touchless interaction techniques in various domains, such as entertainment, consumer appliances, surgery, patient-centric health settings, smart cities, interactive visualization, and collaboration.
Computer-modeled characters resembling real people sometimes elicit cold, eerie feelings. This effect, called the uncanny valley, has been attributed to uncertainty about whether the character is human or living or real. Uncertainty, however, neither explains why anthropomorphic characters lie in the uncanny valley nor their characteristic eeriness. We propose that realism inconsistency causes anthropomorphic characters to appear unfamiliar, despite their physical similarity to real people, owing to perceptual narrowing. We further propose that their unfamiliar, fake appearance elicits cold, eerie feelings, motivating threat avoidance. In our experiment, 365 participants categorized and rated objects, animals, and humans whose realism was manipulated along consistency-reduced and control transitions. These data were used to quantify a Bayesian model of categorical perception. In hypothesis testing, we found reducing realism consistency did not make objects appear less familiar, but only animals and humans, thereby eliciting cold, eerie feelings. Next, structural equation models elucidated the relation among realism inconsistency (measured objectively in a 2D Morlet wavelet domain inspired by the primary visual cortex), realism, familiarity, eeriness, and warmth. The fact that reducing realism consistency only elicited cold, eerie feelings toward anthropomorphic characters, and only when it lessened familiarity, indicates the role of perceptual narrowing in the uncanny valley.
Best Paper Honorable Mention (top 5%)
The safe prescribing of medications via computerized physician order entry routinely relies on clinical alerts. Alert compliance, however, remains surprisingly low, with up to 95% often ignored. Prior approaches, such as improving presentational factors in alert design, had limited success, mainly due to physicians’ lack of trust in computerized advice. While designing trustworthy alert is key, actionable design principles to embody elements of trust in alerts remain little explored. To mitigate this gap, we introduce a model to guide the design of trust-based clinical alerts—based on what physicians value when trusting advice from peers in clinical activities. We discuss three key dimensions to craft trusted alerts: using colleagues’ endorsement, foregrounding physicians’ prior actions, and adopting a suitable language. We exemplify our approach with emerging alert designs from our ongoing research with physicians and contribute to the current debate on how to design effective alerts to improve patient safety.
Slide presentations have long been stuck in a one-to-many paradigm, limiting audience engagement. Based on the concept of smartphone-based remote control of slide navigation, we present Office Social—a PowerPoint plugin and companion smartphone app that allows audience members qualified access to slides for personal review and, when the presenter enables it, public control over slide navigation. We studied the longitudinal use of Office Social across four meetings of a workgroup. We found that shared access and regulated control facilitated various forms of public and personal audience engagement. We discuss how enabling ad-hoc aggregation of co-proximate devices reduces ‘interaction costs’ and leads to both opportunities and challenges for presentation situations.
To design intuitive, interactive systems in various domains, such as health, entertainment, or smart cities, researchers are exploring touchless interaction. Touchless systems allow individuals to interact without any input device—using freehand gestures in midair. Gesture-elicitation studies focus on generating userdefined interface controls to design touchless systems. Interface controls, however, are composed of primary units called interaction primitives—which remain little explored. For example, what touchless primitives are motor-intuitive and can unconsciously use our preexisting sensorimotor knowledge (such as visual perception or motor skills)? Drawing on the disciplines of cognitive science and motor behavior, my research aims to understand the perceptual and motor factors in touchless interaction with 2D user interfaces (2D UIs). I then aim to apply this knowledge to design a set of touchless interface controls for large displays.
Human replicas may elicit unintended cold, eerie feelings in viewers, an effect known as the uncanny valley. Masahiro Mori, who proposed the effect in 1970, attributed it to inconsistencies in the replica’s realism with some of its features perceived as human and others as nonhuman. This study aims to determine whether reducing realism consistency in visual features increases the uncanny valley effect. In three rounds of experiments, 548 participants categorized and rated humans, animals, and objects that varied from computer animated to real. Two sets of features were manipulated to reduce realism consistency. (For humans, the sets were eyes–eyelashes–mouth and skin–nose–eyebrows.) Reducing realism consistency caused humans and animals, but not objects, to appear eerier and colder. However, the predictions of a competing theory, proposed by Ernst Jentsch in 1906, were not supported: The most ambiguous representations—those eliciting the greatest category uncertainty—were neither the eeriest nor the coldest.
Safe prescribing of medications relies on drug safety alerts, but up to 96% of such warnings are ignored by physicians. Prior research has proposed improvements to the design of alerts, but with limited increase in adherence. We propose a different perspective: before re-designing alerts, we focus on improving the trust between physicians and computerized advice by examining why physicians trust their medical colleagues. To understand trusted advice among physicians, we conducted three contextual inquiries in a hospital setting (22 participants), and corroborated our findings with a survey (37 participants). Drivers that guide physicians in trusting peer advice include: timeliness of the advice, collaborative language, empathy, level of specialization, and medical hierarchy. Based on these findings, we introduce seven design directions for trust-based alerts: endorsement, transparency, team sensing, collaborative, empathic, conflict mitigating, and agency laden. Our work contributes to novel alert design strategies to improve the effectiveness of drug safety advice.
Elicitation and evaluation studies investigated intuitiveness of touchless gestures but did not operationalize intuitiveness. For example, studies found that users fail to make accurate 3D strokes as interaction commands. But this phenomenon remains unexplained. In this paper, we first explain how making accurate 3D strokes is generally unintuitive, because it exceeds our sensorimotor knowledge. We then introduce motor-intuitive, touchless interaction that uses sensorimotor knowledge by relying on image schemas. Specifically, we propose an interaction primitive—mid-air, directional strokes—based on space schemas up–down and left–right. In a controlled study with large displays, we found that biomechanical factors affected directional strokes. Strokes were efficient (0.2 s) and effective (12.5∘ angular error), but affected by directions and length. Our work operationalized intuitive touchless interaction using the continuum of knowledge in intuitive interaction, and demonstrated how user performance of a motor-intuitive, touchless primitive based on sensorimotor knowledge (image schemas) is affected by biomechanical factors.
Markerless motion-sensing promises to position touchless interactions successfully in various domains (e.g., entertainment or surgery) because they are deemed natural. This naturalness, however, depends upon the mechanics of touchless interaction that remains largely unexplored. My dissertation first aims to deconstruct the interaction mechanics of touchless, especially its device-less property, from an embodied perspective. Grounded in this analysis, I then plan to investigate how visual perception affects touchless interaction with distant, 2D displays. Preliminary findings suggest that Gestalt principles in visual perception and motor action affect the touchless user experience. User interface elements demonstrating perceptual-grouping principles, such as similarity of orientation decreased users’ efficiency, while continuity of UI elements forming a perceptual whole increased users’ effectiveness. Moreover, following the law of Prägnanz, users often gestured to minimize their energy expenditure. This work can inform the design of touchless UX by uncovering relations between perceptual and motor gestalt in touchless interactions.
Touchless interactions synthesize input and output from physically disconnected motor and display spaces without any haptic feedback. In the absence of any haptic feedback, touchless interactions primarily rely on visual cues, but properties of visual feedback remain unexplored. This paper systematically investigates how large-display touchless interactions are affected by (1) types of visual feedback—discrete, partial, and continuous; (2) alternative forms of touchless cursors; (3) approaches to visualize target-selection; and (4) persistent visual cues to support out-of-range and drag-and-drop gestures. Results suggest that continuous was more effective than partial visual feedback; users disliked opaque cursors, and efficiency did not increase when cursors were larger than display artifacts’ size. Semantic visual feedback located at the display border improved users’ efficiency to return within the display range; however, the path of movement echoed in drag-and-drop operations decreased efficiency. Our findings contribute key ingredients to design suitable visual feedback for large-display touchless environments.
Large, high-resolution displays enable efficient visualization of large datasets. To interact with these large datasets, touchless interfaces can support fluid interaction at different distances from the display. Touchless gestures, however, lack haptic feedback. Hence, users' gestures may unintentionally move off the interface elements and require additional physical effort to perform intended actions. To address this problem, we propose data-morphed topographies for touchless interactions: constraints on users' cursor movements that guide touchless interaction along the structure of the visualized data. To exemplify the potential of our concept, we envision applying three data-morphed topographies—holes, pits, and valleys—to common problem-solving tasks in visual analytics.
Researchers are exploring touchless interactions in diverse usage contexts. These include interacting with public displays, where mouse and keyboards are inconvenient, activating kitchen devices without touching them with dirty hands, or supporting surgeons in browsing medical images in a sterile operating room. Unlike traditional visual interfaces, however, touchless systems still lack a standardized user interface language for basic command selection (e.g., menus). Prior research proposed touchless menus that require users to comply strictly with system-defined postures (e.g., grab, finger-count, pinch). These approaches are problematic because they are analogous to command-line interfaces: users need to remember an interaction vocabulary and input a pre-defined symbol (via gesture or command). To overcome this problem, we introduce and evaluate Touchless Circular Menus (TCM)—a touchless menu system optimized for large displays, which enables users to make simple directional movements for selecting commands. TCM utilize our abilities to make mid-air directional strokes, relieve users from learning posture-based commands, and shift the interaction complexity from users’ input to the visual interface. In a controlled study (N=15), when compared with contextual linear menus using grab gestures, participants using TCM were more than two times faster in selecting commands and perceived lower workload. However, users made more command-selection errors with TCM than with linear menus. The menu’s triggering location on the visual interface significantly affected the effectiveness and efficiency of TCM. Our contribution informs the design of intuitive UIs for touchless interactions with large displays.
To interact with wall-sized displays (WSD) from a five-to-ten feet distance, users can leverage touchless gestures tracked by depth sensors such as Microsoft’s Kinect®. Yet when user’s gestures inadvertently land outside the WSD range, no visual feedback appears on the screen. This leaves users to wonder what happened, and slows down their actions. To combat this problem, we introduce Stoppers, a subtle visual cue that appears at the gesture’s last exit position informing the users that their gestures are off the WSD range, but being still tracked by sensors. In a 18- participant study investigating touchless selection tasks on an ultra-large 15.3M pixel WSD, introducing Stoppers made users twice as fast in getting their gesture back within the display range. Users reported Stoppers as intuitive, non-distracting and an easy-to-use visual guide. By providing persistent visual feedback, Stoppers show promise as a key ingredient to enhance fundamental mechanisms of user interaction in a broad range of touchless environments.
To facilitate interaction and collaboration around ultrahigh-resolution, Wall-Size Displays (WSD), post-WIMP interaction modes like touchless and multi-touch have opened up new, unprecedented opportunities. Yet to fully harness this potential, we still need to understand fundamental design factors for successful WSD experiences. Some of these include visual feedback for touchless interactions, novel interface affordances for at-a-distance, high-bandwidth input, and the technosocial ingredients supporting laid-back, relaxed collaboration around WSDs. This position paper highlights our progress in a long-term research program that examines these issues and spurs new, exciting research directions. We recently completed a study aimed at investigating the properties of visual feedback in touchless WSD interaction, and we discuss some of our findings here. Our work exemplifies how research in WSD interaction calls for re-conceptualizing basic, first principles of Human-Computer Interaction (HCI) to pioneer a suite of next-generation interaction environments.
Human activity recognition has potential to impact a wide range of applications from surveillance to human computer interfaces to content based video retrieval. Recently, the rapid development of inexpensive depth sensors (e.g. Microsoft Kinect) provides adequate accuracy for real-time full-body human tracking for activity recognition applications. In this paper, we create a complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data. Moreover, we use our dataset to evaluate various features typically used for indexing and retrieval of motion capture data, in the context of real-time detection of interaction activities via Support Vector Machines (SVMs). Experimentally, we find that the geometric relational features based on distance between all pairs of joints outperforms other feature choices. For whole sequence classification, we also explore techniques related to Multiple Instance Learning (MIL) in which the sequence is represented by a bag of body-pose features. We find that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest.
This work experiments with human motion initiated music generation. Here we present a stand-alone system to tag human motions readily into musical notes. We do this by first discovering the human skeleton using depth images acquired by infrared range sensors and then exploiting the resultant skeletal tracking. This realtime skeletal tracking is done using the videogame console Microsoft Kinect for Xbox 360. An agent’s bodily motion is defined by the spatial and temporal arrangement of his skeletal framework over the episode of the associated move. After extracting the skeleton of a performing agent by interfacing the Kinect with an intermediate computer application, various features defining the agent’s motion are computed. Features like velocity, acceleration and change in position of the agent’s body parts is then used to generate musical notes. Finally, as a participating agent performs a set of movements in front of our system, the system generates musical notes that are continually regulated by the defined features describing his motion.
Recognizing moves and movements of human body(s) is a challenging problem due to their self-occluding nature and the associated degrees of freedom for each of the numerous body-joints. This work presents a method to tag human actions and interactions by first discovering the human skeleton using depth images acquired by infrared range sensors and then exploiting the resultant skeletal tracking. Instead of estimating the pose of each body part contributing to a set of moves in a decoupled way, we represent a single-person move or a two-person interaction in terms of its skeletal joint positions. So now a single-person move is defined by the spatial and temporal arrangement of his skeletal framework over the episode of the associated move. And for a two-person interactive sequence, an event is defined in terms of both the participating agents' skeletal framework over time. In this work we have experimented with two different modes of tagging human moves and movements. In collaboration with the Music department we tried an innovative way to tag a single person's moves with music. As a participating agent performs a set of movements, musical notes are generated depending upon the velocity, acceleration and change in position of his body parts. We also try to recognize human interactions into a set of well-defined classes. We present the K-10 Interaction Dataset with ten different classes of two-person interactions performed among six different agents and captured using the Kinect™ for Xbox 360. We construct interaction representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. We further aligned the clips in our dataset using the Canonical Time Warping algorithm that led to an improvement in the interaction classification results.
Content based video indexing and retrieval traces back to the elementary video structures, such as a table of contents. Thus, algorithms for video partitioning have become crucial with the unremitting growth in the prevalent digital video technology. This demands for a tool which would break down the video into smaller and manageable units called shots. In this paper, a shot boundary detection technique has been proposed for abrupt scene cuts. The method computes co-occurrence matrices by taking block differences between the consecutive frames in each of R, G, and B plane, using sum of absolute differences (SAD). Feature vectors are extracted from the co-occurrence matrices' statistics, defined at various pixel displacement distances. The statistical find-outs are integrated into a training set and an unsupervised classifier, K-means, is used to identify the shot-frames and the non-shot frames.