Vergence-Accommodation Conflict (VAC)

Vergence-accommodation conflict (VAC) is a major problem for the use of VR / AR head-mounted displays (HMD).

For a normal stereoscopic optical see-through (OST) HMD, it is very intuitive for programmers to render two separate images for each display, representing the virtual camera placed in the location of human eyes. Like Unity3D, the default way to deploy stereo AR / VR application is to place two virtual cameras in the scene, and renders to the left and right screen.

Take me as an example, when I first developed the most simplest application to display a stereo wooden cube on Moverio BT-200, where the cube is placed on a desktop, I noticed that

  • when I am focusing on the wooden texture, I cannot see the surrounding real environment
  • when I am trying my best to perceive the 3D location of the cube, I cannot see the texture clearly.

This effect is caused by VAC.

What is VAC?

Vergence is the simultaneous opposite motion of two eyes to maintain binocular vision, in other word, the viewing angle of two eyes are changed in this procedure to fit the depth of objects, as shown in the right part of image. In the case of HMD, vergence is driven by the retinal disparity.

vac

Image courtesy of Gregory Kramida and Amitabh Varshney

Accommodation is the occulomotor response to the distance of object, like the focusing procedure on a camera, the muscle controls the focus of eye, in order to see the object clearly. It is illustrated in the left part of image. In AR / VR, accommodation is driven by the retinal blur.

As a result, when the user is looking through an HMD, both eyes focal distance is the imaging plane, which is the embedded HMD screen usually, but the angular difference of eyes is adjusted to a further distance formed by the disparity. Thus, the distance of the object perceived by accommodation and vergence is different, lead to VAC.

Solutions

VAC is still an unsolved problem for AR / VR in general, despite a lot of efforts in the academia and industry to tackle it. Gregory Kramida and Amitabh Varshney have a great survey on the existing solutions to VAC: Resolving the Vergence-Accommodation Conflict in Head Mounted Displays. The rest of this part is mainly from this paper.

If we took a look at two different categories of HMDs: stereoscopic and multiscopic, the solutions can be summarized:

  • Stereoscopic
    • Varifocal
      • Sliding optics
      • Deformable membrane mirrors [t]
      • Liquid lenses [t]
    • Liquid crystal lenses [t]
      • Multifocal (focal plane stacks)
      • Birefringent lenses [s][t]
      • Freeform waveguide stacks [s]
  • Multiscopic
    • Multiview retinal displays [t]
    • Microlens arrays [s]
    • Parallex barriers [s]
    • Pinlight displays [s]

[s] represents that the method is time-multiplexed, and [t] corresponds to time-multiplexed. For the detail of each methods, please refer to the original paper.

Light Field Optics

Light field is a methods to model light as vector, which has both intensity and direction. Two parallel LCD panels can be used to create light field, so that the display is multiscopic. This work came from MIT Media Lab: Content-Adaptive Parallax Barriers: Optimizing Dual-Layer 3D Displays using Low-Rank Light Field Factorization.

parallex barrier

This image copied from the paper shows how multiscopic is realized.

Another prototype is demonstrated in the paper: Near-Eye Light Field Displays. In this case, Microlens array is used to create light field.

near eye

A software-based retinal blur is applied to the source image in this solution, so that the perceived distance from vergence is same as the perceived distance from accommodation.

The video of the project is awesome:

How About Hololens?

The display technology of Hololens remains a mystery for the general public. It is not purely light field technology, since it is not very bulky, and the display resolution is much higher than people’s expectation of light field HMDs. There are a few good posts online discussing about it.

An Experiment

In order to experience vergence-accommodation conflict, I wrote a simple Cardboard application to demonstrate the issue. Codes are on Github: vac_demo

Building

  • Unity3D (v5.0+)
  • GVR SDK (v0.6)
    • v0.6 contains the Unity3D GameObject of two virtual eyes, which is easier for the integration of ImageEffect. v1.0+ re-organizes the two virtual eyes in scripts.
  • Access to Unity3D asset store
    • For the access of object materials and models.
  • Android SDK (with API 19+)

Running

  • Android Phone (API 19+)
  • Google Cardboard

The Android installation file is VAC.apk in this repository, in case you are not customizing the application. I am using Nexus 5 for testing, and the screenshots below are generated by Nexus 5.

Demonstration

In this application, there are many objects with texture, placed at different depth of the scene, e.g. books, tables. The rectile that shows the user gaze finds the current focused object. The focus depth is adjusted accordingly, achieved by Unity3D ImageEffect script. With different gaze, user experiences the effect of depth blur, which is correspondent to the retinal blur of different depth.

vac demo far

vac demo near

Screenshot of vac_demo at near and far range

Issue

In order to achieve the effect of focus and defocus, Cardboard un-distortion, there are five rendering passes. In general, post rendering is computationally expensive for mobile platforms. In the case of Nexus 5, the frame rate of this application is 7 fps.

Acknowledgement

Reference

  1. VRWiki
  2. Light field (Wikipedia)
  3. Vergence (Wikipedia)
  4. Accommodation (Wikipedia)
  5. Lanman, Douglas, and David Luebke. “Near-eye light field displays.” ACM Transactions on Graphics (TOG) 32.6 (2013): 220.
  6. Kramida, Gregory. “Resolving the Vergence-Accommodation Conflict in Head-Mounted Displays.” IEEE transactions on visualization and computer graphics 22.7 (2016): 1912-1931.
  7. Lanman, Douglas, et al. “Content-adaptive parallax barriers: optimizing dual-layer 3D displays using low-rank light field factorization.” ACM Transactions on Graphics (TOG) 29.6 (2010): 163.

Thanks for reading!


ISMAR 2016 Highlight

The 15th IEEE International Symposium on Mixed and Augmented Reality (ISMAR) was held in Merida, Mexico last month. It is the most attractive conference for people working on Augmented Reality technologies, either from academia or from industry.

merida

Many interesting hardware, algorithms, ideas and discussions happened in this conference. Here I summarized some of them that aroused my interest, both from the aspects of research and engineering, because in this area, research and engineering are sometimes aiming for the same target: expanding the impact of AR technology.

Keynote

The keynote is presented by Dr. Andrew Davison, who is well-known for his contribution to the research in SLAM (Simultaneous Localization and Mapping). The talk topic was The History and Future of SLAM.

The progress of SLAM in the past decades involves:

  • Open-source
  • Real-time performance enhancement with support of advanced hardware
  • Became a crucial enabling technology for AR, robotics and mobile devices

In terms of general market, Pokemon Go achieved such great success with the simplest idea of AR, the augmentation is still not registering with reality. For the later vision to come true, SLAM is exactly the enabling technology, since general public generally dislike any fiducial markers, no matter for gaming or smart house applications.

According to Dr. Davison, the future of SLAM is:

  • Fully dense
  • Semantically aware
  • Lifelong mapping
  • Always-on, low power operation

The first two are basically research problem, and the later two, are clearly the need driven by the general market. During the conference, I talked with some mobile software companies; they showed extreme interest for AR applications, and eyeing for the academia advances, especially SLAM technology.

He also talked about how new hardware can facilitate SLAM technologies, e.g. DVS, DAVIS and IPU, and the possibility of integrating deep learning to SLAM.

Interesting Papers

I selected a few papers that especially attract me here. The selection is biased on my preference of course, and is largely based on their presentation on ISMAR, rather than purely focusing on the paper itself.

Do You See What I See? The Effect of Gaze Tracking on Task Space Remote Collaboration

This paper is one of the seven papers in this ISMAR conference selected to TVCG journal. Through a user study, two hypothesis are proved:

  • Co-presense: There is significant difference in co-presence measure by providing additional attention cues.
  • Performance: There is significant difference in performance time by providing additional attention cues.

The result is not surprising, but in demo session, the author showed their HMD devices: a Epson Moverio BT-200, plus AffectiveWear, and plus Pupil-Labs gaze tracking cameras. The two additional hardware are quite interesting.

AffectiveWear glasses are developed from Keio University, Japan, taking its first appearance on 2015 SIGGRAPH. It takes several proximity sensor readings of the distance to the face, distributed over the frame of the glasses, and computes the emotion of the person, via a trained deep learning model. One issue with this device is the user-specific calibration, which can also be viewed as the training of learning model, that suits the specific user. General model is a bit inaccurate since the facial structure can varies a lot for different people.

Pupil-labs is doing perfect engineering work for providing gaze tracking capability for different head-mounted displays, both VR and AR. It is clearly a huge demand. It is amazing that they provide a solution that can be generalized: the hardware with specific add-on kit suites a variety of HMDs, and the software is open-source. Noticeably, AffectiveWear is also taking the possibility of generalization into account, because the proximity sensors are attached to the glass frame. Hopefully, there will be a test report of Pupil-lab tracking soon.

Gaussian Light Field: Estimation of Viewpoint-Dependent Blur for Optical See-Through Head-Mounted Displays

This paper wins the “Best Paper Runner Up Award”, and is one of the seven as well.

The blur on HMD is clearly a problem for the manufacturers and users. The blur is viewpoint specific, and is pixel specific. If the image blur is modeled as a Gaussian ellipse on the image plane, then the input of the “blur function” is 5 DOF (pixel location + viewing direction + viewing distance), and the output of the “blur function” is 3 DOF. The mapping is again learned, in order to ignore the complicated optics physical analysis of course. This is an interesting paper because people are discovering problems and finding out solutions for it.

Automated Spatial Calibration of HMD Systems with Unconstrained Eye-cameras

This paper is taking advantage of the reflection of IR-led on humans’ eyeball, to calculate the spatial relationship between the devices and the eye, potentially this work can facilitate the calibration of optical see-through HMD as well.

Interesting Ideas

  • Rolling Shutter Camera, can be used to expedite visual tracking, since the image is not taken as a whole at same time, but the arrival of pixel information is more interpolated, thus the incremental information can help maintain the tracking at a fairly high frame rate.
  • Deep learning can help doing sensor fusion of IMU and marker-based visual tracking. Although the performance at this time is not satisfying, I guess it will inspire a lot of research in applying deep learning to sensor fusion. How about collecting a dataset on this?

One Awesome Work

Check this video: Reality-Skins.

An example of how awesome AR can be!

Finally

ISMAR is a great conference. Merida and Cancun are nice places!

Me at ISMAR

Thanks for reading!