September | 2008 | Deep Semaphore

Archive for September, 2008

Some thoughts about identities, avatars, surrogates and video conferencing leading to video bubbles

Posted in Cognitive science, Content creation, Massive Multiplayer Environments on September 22, 2008| 6 Comments »

The problem that designers of virtual worlds face today is the difficulty in creating forms to represent in a meaningful way the diverse bundles of narratives that we have become identified with. One point of failure in avatar design is the absence of a tight coupling between user intentions/behaviors and those of his/her avatar. This in turn destroys the effectiveness of the user representation/avatar as a reliable canvas on which can be painted, for an audience/observer, the wide range of emotions and related information we have been accustomed to in real world human communication processes.

When interacting in (Second Life) SL, we are pervaded with a sustained sense of ‘uneasiness’, ‘unsatisfactoriness’, ‘a hunger for emotional exchange’ to a large extent because we no longer use a significant part of our brain that is wired for face processing. Often the case has been made that SL provides a higher emotional bandwidth that other communication media. It is important to know what exactly the elements of this comparison are. If SL is compared to a text chat environment, then yes, SL does provide more emotional bandwidth. The next question is how much is gained, and if the amount gained is worth the cost. Now let’s compare SL to a video conferencing application. A video conferencing application provides opportunities to engage the vast face processing capabilities of our brains. It is disingenuous to claim that SL provides a higher emotional bandwidth than a videoconferencing application. Thus pitting SL as it stands against videoconferencing is a non-starter especially for meeting situations where the importance of spatial context (e.g. whether it is in a virtual board room or a virtual rest room) is meaningless. We might improve human-human communication in virtual environments however if we try to merge video conferencing and SL. Let us explore ways that will provide users the opportunity to make use of their untapped face processing capabilities. I will only suggest one way,there must be many more.

May be we should suspend, for a while at least, talking about avatars and really start focusing on surrogates. Surrogates as a term suggests a weaker user-representation coupling than avatar does. This slight shift in the way we frame human-human interactions problem in a virtual world frees us from our obsession with trying to create avatars that is tightly coupled to the user, where attempts are made to capture every gesture and emotion of a user for reproduction in a virtual world. Most typically, this is achieved by recreating a quasi-mirror image of the user (e.g. in 3d using gesture tracking mechanisms, 3d cams, physiological signal monitoring and so forth). Quasi, because it won’t be too much fun if the precise physical status of users are mirrored in virtual environments. A virtual environment where everyone is in a sitting posture will be quite boring. Research in this area is much needed and this approach has a wide active fan base but I doubt we will see realistic 3d mirror images of users within 5 years. In addition, each of the technologies involved with come with a level of obtrusiveness (e.g.tethered devices, cumbersome calibration set ups etc..) that will scare off users and probably spike their subjective workload, frustration levels, physical fatigue and so forth. Let us look at more near term solutions. And if we focus on surrogates, may be we will be happier to inject some AI into our ‘avatars’ so that they get to ‘represent’ us rather us controlling them. Anyway, this topic is for a different occasion.

SL with audio conferencing has helped to address floor control issues faced by traditional audio conferencing applications in a very obvious and natural way. We can expect that SL with video conferencing might also help to solve some issues we face in traditional video conferencing for e.g. talking heads in windows with no spatial context. One seemingly natural integration with video conferencing that comes to mind is to have chat bubbles replaced by a video stream about the user, a video bubble. The user can choose to point his/her camera to whatever he or she wants. In a show and tell session, the user may choose to point his camera to what s/he is doing. At other times, he may point the camera to his or her face. Now, only users in close proximity to an ‘avatar/your surrogate’ with have their ‘video bubble’ activated. Proximity does not only mediate audio but video as well. Which video streams get activated will be based on the proximity of ‘avatars’ so that users don’t get visually swamped, minimize occlusion and bandwidth problems etc… This according to me is a possible near term solution LL could try selling if it wants to pitch SL against video conferencing applications. SL then can claim that it does something more than video conferencing…because video conferencing is part of it…and then it will become obvious that the telepresence solution from Cisco is about mirroring, but SL is more than mirroring.

In my view, the virtual environment of the NEAR future will be desktop based, point and trigger and provide the space to contain 3d audio conferencing+ video conferencing (as ‘video bubbles or some variants of that) + information sharing (basically document/web sharing). This solution will address the emotional bandwidth issue more convincingly. Does this approach going to hurt other approaches looking at creating 3d mirror images of users and the future gesture tracking applications etc…? Certainly not. Video bubbles will probably die a peaceful death when we work out all the kinks with creating 3d mirror images with a fidelity level that can cross the uncanny valley…and can produce micro facial gestures and so forth… But video bubbles look feasible right now and the technology is certainly closer at hand. This approach raises many more questions, what will ‘avatar’ body gestures do? will they be communicating anything..etc..this is besides the point, right now am trying to address the emotional bandwidth issue in the NEAR term. The body of the avatars will still have a function. They can be animated in various ways to add context to human human interactions. The potential of video bubbles for griefing purposes can be dealt with easily in the same way audio griefing was dealt with.

Read Full Post »