The problem that designers of virtual worlds face today is the difficulty in creating forms to represent in a meaningful way the diverse bundles of narratives that we have become identified with. One point of failure in avatar design is the absence of a tight coupling between user intentions/behaviors and those of his/her avatar. This in turn destroys the effectiveness of the user representation/avatar as a reliable canvas on which can be painted, for an audience/observer, the wide range of emotions and related information we have been accustomed to in real world human communication processes.
When interacting in (Second Life) SL, we are pervaded with a sustained sense of ‘uneasiness’, ‘unsatisfactoriness’, ‘a hunger for emotional exchange’ to a large extent because we no longer use a significant part of our brain that is wired for face processing. Often the case has been made that SL provides a higher emotional bandwidth that other communication media. It is important to know what exactly the elements of this comparison are. If SL is compared to a text chat environment, then yes, SL does provide more emotional bandwidth. The next question is how much is gained, and if the amount gained is worth the cost. Now let’s compare SL to a video conferencing application. A video conferencing application provides opportunities to engage the vast face processing capabilities of our brains. It is disingenuous to claim that SL provides a higher emotional bandwidth than a videoconferencing application. Thus pitting SL as it stands against videoconferencing is a non-starter especially for meeting situations where the importance of spatial context (e.g. whether it is in a virtual board room or a virtual rest room) is meaningless. We might improve human-human communication in virtual environments however if we try to merge video conferencing and SL. Let us explore ways that will provide users the opportunity to make use of their untapped face processing capabilities. I will only suggest one way,there must be many more.
May be we should suspend, for a while at least, talking about avatars and really start focusing on surrogates. Surrogates as a term suggests a weaker user-representation coupling than avatar does. This slight shift in the way we frame human-human interactions problem in a virtual world frees us from our obsession with trying to create avatars that is tightly coupled to the user, where attempts are made to capture every gesture and emotion of a user for reproduction in a virtual world. Most typically, this is achieved by recreating a quasi-mirror image of the user (e.g. in 3d using gesture tracking mechanisms, 3d cams, physiological signal monitoring and so forth). Quasi, because it won’t be too much fun if the precise physical status of users are mirrored in virtual environments. A virtual environment where everyone is in a sitting posture will be quite boring. Research in this area is much needed and this approach has a wide active fan base but I doubt we will see realistic 3d mirror images of users within 5 years. In addition, each of the technologies involved with come with a level of obtrusiveness (e.g.tethered devices, cumbersome calibration set ups etc..) that will scare off users and probably spike their subjective workload, frustration levels, physical fatigue and so forth. Let us look at more near term solutions. And if we focus on surrogates, may be we will be happier to inject some AI into our ‘avatars’ so that they get to ‘represent’ us rather us controlling them. Anyway, this topic is for a different occasion.
SL with audio conferencing has helped to address floor control issues faced by traditional audio conferencing applications in a very obvious and natural way. We can expect that SL with video conferencing might also help to solve some issues we face in traditional video conferencing for e.g. talking heads in windows with no spatial context. One seemingly natural integration with video conferencing that comes to mind is to have chat bubbles replaced by a video stream about the user, a video bubble. The user can choose to point his/her camera to whatever he or she wants. In a show and tell session, the user may choose to point his camera to what s/he is doing. At other times, he may point the camera to his or her face. Now, only users in close proximity to an ‘avatar/your surrogate’ with have their ‘video bubble’ activated. Proximity does not only mediate audio but video as well. Which video streams get activated will be based on the proximity of ‘avatars’ so that users don’t get visually swamped, minimize occlusion and bandwidth problems etc… This according to me is a possible near term solution LL could try selling if it wants to pitch SL against video conferencing applications. SL then can claim that it does something more than video conferencing…because video conferencing is part of it…and then it will become obvious that the telepresence solution from Cisco is about mirroring, but SL is more than mirroring.
In my view, the virtual environment of the NEAR future will be desktop based, point and trigger and provide the space to contain 3d audio conferencing+ video conferencing (as ‘video bubbles or some variants of that) + information sharing (basically document/web sharing). This solution will address the emotional bandwidth issue more convincingly. Does this approach going to hurt other approaches looking at creating 3d mirror images of users and the future gesture tracking applications etc…? Certainly not. Video bubbles will probably die a peaceful death when we work out all the kinks with creating 3d mirror images with a fidelity level that can cross the uncanny valley…and can produce micro facial gestures and so forth… But video bubbles look feasible right now and the technology is certainly closer at hand. This approach raises many more questions, what will ‘avatar’ body gestures do? will they be communicating anything..etc..this is besides the point, right now am trying to address the emotional bandwidth issue in the NEAR term. The body of the avatars will still have a function. They can be animated in various ways to add context to human human interactions. The potential of video bubbles for griefing purposes can be dealt with easily in the same way audio griefing was dealt with.
I think that you’re basically looking at a mix of http://www.handsfree3d.com/, a bit of http://www.gizmoz.com/create/head , all incorporated into http://www.logitech.com/index.cfm/webcam_communications/video_software_services/video_effects/&cl=us,en?WT.ac=ps|3294
Perhaps not, I think simple video streams will work just fine. All the solutions you just mentioned are a tad clunky. Why go for those when all you need for ‘meaningful’ faces with micro facial expressions is cheaply available through video. I don’t see any of the technologies you point to achieve the level of emotional bandwidth we require for corporate and so called ‘serious’ applications.
This is a great point, Deep, and a fascinating ongoing discussion!
I’ll play devil’s advocate by equating it to the argument “it is disingenuous to claim that books provide a higher emotional bandwidth than movies because you’re not using the part of the brain that processes faces when reading.”
I definitely get annoyed in certain situations by the limits presented by avatars. The stiff animated expressions and articulations of avatars, although these are not less valuable, as a means of enhancing communication, than punctuation or emoticons. And it’s true that there are certain interactions that benefit from being able to look into someone’s eyes or read their body language. I suspect good poker players will not spend too much time in online casinos…unless they’ve got some really bad tells.
However, it’s fair to say that reducing the SL vs. Vidcon argument to emphasize the importance of face recognition and dismiss the importance of spacial relationships neglects quite a number of possible use cases. This isn’t, after all, a zero-sum game….it’s a platform ;D
Great article, I often thought the same. has there been any development since you wrote this?
My experience trying to sell SL projects to corporate and educational clients tells me that an alternative form of user representation, a bubble with a live webcam video feed, would be a useful option. What many users really want is a collaboration space where one can see others and co-work on documents.
Video bubbles would permit retaining all the advantages of a shared 3D environment while offering a representation that would be perceived as more “serious” or “businesslike” (you know what I mean) by many. Also, at this moment we cannot precisely control avatars but we can and do control our faces and hands.
Note: I am not proposing, by any means, abandoning avatars for video bubbles. I am proposing introducing video bubbles as an alternative option.
Of course 3D video bubble chat can be implemented as a separate service, but wouldn’t it be a useful enhancement of SL? Can it be implemented with the recent SL media API?
Not much embodiment or immersion in a bubble. Perhaps, however, it would be a “toe in the water” for overly cautious business users.
As I told Guilio on the SLED list, I could care less about business users’ fears. After coming off Burning Life 09 and the Virtual Worlds Story Project experiences, I’m glad I was driving an avatar, not floating in a bubble.
Gwyneth’s own thoughts on augmentation vs. immersion come in really handy here:
http://gwynethllewelyn.net/2008/03/09/immersionism-and-augmentationism-revisited/
Final snipe: most US Suits have lived in bubbles for years, and in fact that helped them pop a Big Bubble that crashed the global economy. Maybe VW bubbles fit into their “comfort zone” for that reason. Don’t break the metaphor, after all.
I have refined my first demo and made a video bubble avatar as described by Moritz’ video bubble post. FYC:
http://giulioprisco.blogspot.com/2010/02/video-bubbles-in-second-life-20.html