FrankenFolk: Distinctiveness and Attractiveness of Voice and Motion

Jan Ondrej, Disney Research, Los Angeles
Cathy Ennis, Dublin Institute of Technology
Niamh Merriman, Disney Research, Los Angeles
Carol O'Sullivan, Trinity College Dublin, Ireland

Document Type Article

ACM Transactions on Applied Perception, 13, 4, Article 20, (July 2016).

Association for Computing Machinery


With the increased demand for realism in virtual characters in recent years, performance capture of actors has become ubiquitous. Apart from the advantages that this approach provides, such as highly realistic motions and voices, there are several potential problems. One challenge is the infeasibility of capturing all modalities simultaneously (i.e., voice, body, face, hands, avatar’s appearance) from a unique actor for every character, especially for crowd creation. Other practical constraints, especially for real-time applications, are limited hardware and time budgets. Thus, it is common in games and movies to combine and reuse the voice recordings and motions (face, body, fingers) of actors and to apply them to a variety of different three-dimensional (3D) characters. However, what is the effect of combining different modalities from different actors to create new virtual characters, which we nickname FrankenFolk? This is the question we explore in this article. To explore the perception of FrankenFolk, we focus on perceived Attractiveness and Distinctiveness, as in Hoyet et al. [2013]. However, they explored the perception of body motion only for different types of locomotion. In our case we focus on short speaking performances and ask the following questions: —How distinctive and attractive is each modality (i.e., voice, face or body motion, physical appearance) when presented in isolation, that is, as a Partial performance? —How does each partial performance relate to the overall, Full performance? —What, if any, modality most strongly influences the attractiveness or distinctiveness of a character’s performance? To answer these questions, we conducted a series of experiments: in the Full baseline experiment we evaluate the distinctiveness and attractiveness of each actor’s performance, presented both as a Real video and applied to a Virtual character. In the Partial baseline experiment, we explore each modality in isolation. Finally, we create FrankenFolk characters, where we mix and match the voice, body motion, face motion, and avatar of different actors. We found that an actor’s Voice may be the most distinctive feature of his performance, but we found that only for males. Females in general were less easy to recognize. We also found that body motion and character’s appearance (avatar) were most indicative of perceived attractiveness. Our results highlight the importance of paying attention to all modalities when creating virtual performances captured from multiple different actors. If, for example, a particular actor’s voice or body motion is highly distinctive or unattractive, it could adversely affect the overall performance. Furthermore, repetition of such a voice or motion in a crowd would stand out, attract undesirable attention, and detract from the overall realism of a scene.