Google Cardboard as "Augmented Reality" Headset

Google Virtual Reality headset evolved from Cardboard v1, to Cardboard v2, and to the most recent and elegant Daydream. Daydream supports limited devices, thus is excluded in the discussion here. For Cardboard v1 and v2, there have been a few discussion about the difference between them. As a VR / AR developer working with both headset, I looked at something different

Cardboard v1

  • Supported phone size: 5.3” maximum (Nexus 5, iPhone 6/6s/7)
  • Assembly: Self-assembly from a plain cardboard
  • Trigger: Magnetic trigger (triky on iPhone)
  • Camera see-through: Supported
  • Price: Extremely cheap but the original version is hard to access now

Cardboard v2

  • Supported phone size: 6” maximum (iPhone 6+/6s+/7+)
  • Assembly: Almost no effort
  • Trigger: Screen touch
  • Camera see-through: Not supported
  • Price: A little more expensive, but totally acceptable

Back view

Front view

Back and front view of Cardboard v1 and v2


  • Since I have a Nexus 5 and iPhone 6s, Cardboard v1 is a better fit in terms of size. When I put iPhone 6s in Cardboard v2, the field of view is restricted by the size of phone screen.
  • Magnetic trigger is a problem in Cardboard v1. I admit that it is a very clever design to take advantage of magnetic sensor of the phone, however, the change of magnetic field can be caused by a few reasons, including the movement of the phone itself with respect to the magnet, the change of orientation of the phone (with respect to earth magnetic field). Touch trigger is much more stable. Even Google Virtual Reality (GVR) SDK removed the support for magnetic trigger in its later releases. Developers have to switch to older releases to look for it.
  • Self-assembly is never a problem for engineers.
  • Camera see-through is a feature I liked most about Cardboard v1. With camera video, the tracking of user can be enhanced. Imagining SLAM or marker-tracking based localization integrated with VR / AR game! Vuforia supports image tracking for Cardboard v1 as a pose estimator.

What about Augmented Reality

Google Cardboard is basically a VR headset, here I am going to turn it into a “AR” headset!


For an augmented reality application to run with Google Cardboard, there are a few requirement:

  • Access to camera video of the phone
  • Render two views representing the vision of two eyes
  • Distort the image so that after the distortion of Cardboard lens, the images are natural for human’s vision
  • Real-time!


The combination of Cardboard v1 and Nexus 5 is chosen. Unity3D is the development platform. The rendering pipeline is:

  1. Grab a video frame
  2. Optional: do whatever augmented reality processing
  3. Select and crop the part of vision for two eyes
  4. Apply barrel distortion to both views (barrel distortion)
  5. Composite two views side by side and blit on screen

Video frame is accessed by WebcamTexture object of Unity3D engine.


Within this application, we are not displaying real 3D scene, what we have is only augmented 2D image. We select the part of image to display with disparity, faking an AR experience for the user, all the objects in the scene will be perceived at a fixed distance determined by the amount of disparity. We need fast pixel displacement!

Step 3, 4 and 5 are achieved by shader, especially fragment shader. Shader is very fast and powerful. Unity3D shaders can be categorized into vertex shaders, fragment shaders and surface shaders. Vertex shaders and fragment shaders are basic types of shaders that are inserted into graphics pipeline of most modern GPU. On mobile devices, OpenGL ES 2.0 supports both of them. Surface shader is a special type of shader dealing with lighting and textures. In compilation stage, surface shaders are separated into a vertex shader part and a fragment shader part.

In our application, we have video frame in a texture object, thus, we only need to displace the texture coordinates of screen plane to represent barrel distortion and disparity of two views.

Step 3, 4 and 5 can be implemented into three shaders, which is easy to maintain and modify but will require more render passes. I put them altogether into one shader, since they are all fragment shaders. Source code is on Github: cardboard_seethrough.

In order to render the texture directly on screen, Graphics.Blit function is called upon rendering, and the source texture object is replaced with our processed video, the target render object is set to null. According to Unity3D documentation, a null target render object represents the screen. Then the shader material is specified in order to achieve correct Cardboard style display.


Nexus 5 runs this application smoothly, with 20-30 fps.

With disparity

Without disparity

Screenshot with and without disparity

Several GUI elements are put on the GUI layer:

  • ON/OFF to control camera access
  • Display of FPS and current video resolution
  • Slider to control Field-of-View
  • Slider to control level of Disparity of two images
  • 0 disparity indicates exactly same images on left and right
  • 0.3 is the maximum here

Source code is on Github: cardboard_seethrough.

Google VR SDK is not used here.


  • Discussion about Cardboard v1 and v2 on Briztech.

Thanks for reading!

Vergence-Accommodation Conflict (VAC)

Vergence-accommodation conflict (VAC) is a major problem for the use of VR / AR head-mounted displays (HMD).

For a normal stereoscopic optical see-through (OST) HMD, it is very intuitive for programmers to render two separate images for each display, representing the virtual camera placed in the location of human eyes. Like Unity3D, the default way to deploy stereo AR / VR application is to place two virtual cameras in the scene, and renders to the left and right screen.

Take me as an example, when I first developed the most simplest application to display a stereo wooden cube on Moverio BT-200, where the cube is placed on a desktop, I noticed that

  • when I am focusing on the wooden texture, I cannot see the surrounding real environment
  • when I am trying my best to perceive the 3D location of the cube, I cannot see the texture clearly.

This effect is caused by VAC.

What is VAC?

Vergence is the simultaneous opposite motion of two eyes to maintain binocular vision, in other word, the viewing angle of two eyes are changed in this procedure to fit the depth of objects, as shown in the right part of image. In the case of HMD, vergence is driven by the retinal disparity.


Image courtesy of Gregory Kramida and Amitabh Varshney

Accommodation is the occulomotor response to the distance of object, like the focusing procedure on a camera, the muscle controls the focus of eye, in order to see the object clearly. It is illustrated in the left part of image. In AR / VR, accommodation is driven by the retinal blur.

As a result, when the user is looking through an HMD, both eyes focal distance is the imaging plane, which is the embedded HMD screen usually, but the angular difference of eyes is adjusted to a further distance formed by the disparity. Thus, the distance of the object perceived by accommodation and vergence is different, lead to VAC.


VAC is still an unsolved problem for AR / VR in general, despite a lot of efforts in the academia and industry to tackle it. Gregory Kramida and Amitabh Varshney have a great survey on the existing solutions to VAC: Resolving the Vergence-Accommodation Conflict in Head Mounted Displays. The rest of this part is mainly from this paper.

If we took a look at two different categories of HMDs: stereoscopic and multiscopic, the solutions can be summarized:

  • Stereoscopic
    • Varifocal
      • Sliding optics
      • Deformable membrane mirrors [t]
      • Liquid lenses [t]
    • Liquid crystal lenses [t]
      • Multifocal (focal plane stacks)
      • Birefringent lenses [s][t]
      • Freeform waveguide stacks [s]
  • Multiscopic
    • Multiview retinal displays [t]
    • Microlens arrays [s]
    • Parallex barriers [s]
    • Pinlight displays [s]

[s] represents that the method is time-multiplexed, and [t] corresponds to time-multiplexed. For the detail of each methods, please refer to the original paper.

Light Field Optics

Light field is a methods to model light as vector, which has both intensity and direction. Two parallel LCD panels can be used to create light field, so that the display is multiscopic. This work came from MIT Media Lab: Content-Adaptive Parallax Barriers: Optimizing Dual-Layer 3D Displays using Low-Rank Light Field Factorization.

parallex barrier

This image copied from the paper shows how multiscopic is realized.

Another prototype is demonstrated in the paper: Near-Eye Light Field Displays. In this case, Microlens array is used to create light field.

near eye

A software-based retinal blur is applied to the source image in this solution, so that the perceived distance from vergence is same as the perceived distance from accommodation.

The video of the project is awesome:

How About Hololens?

The display technology of Hololens remains a mystery for the general public. It is not purely light field technology, since it is not very bulky, and the display resolution is much higher than people’s expectation of light field HMDs. There are a few good posts online discussing about it.

An Experiment

In order to experience vergence-accommodation conflict, I wrote a simple Cardboard application to demonstrate the issue. Codes are on Github: vac_demo


  • Unity3D (v5.0+)
  • GVR SDK (v0.6)
    • v0.6 contains the Unity3D GameObject of two virtual eyes, which is easier for the integration of ImageEffect. v1.0+ re-organizes the two virtual eyes in scripts.
  • Access to Unity3D asset store
    • For the access of object materials and models.
  • Android SDK (with API 19+)


  • Android Phone (API 19+)
  • Google Cardboard

The Android installation file is VAC.apk in this repository, in case you are not customizing the application. I am using Nexus 5 for testing, and the screenshots below are generated by Nexus 5.


In this application, there are many objects with texture, placed at different depth of the scene, e.g. books, tables. The rectile that shows the user gaze finds the current focused object. The focus depth is adjusted accordingly, achieved by Unity3D ImageEffect script. With different gaze, user experiences the effect of depth blur, which is correspondent to the retinal blur of different depth.

vac demo far

vac demo near

Screenshot of vac_demo at near and far range


In order to achieve the effect of focus and defocus, Cardboard un-distortion, there are five rendering passes. In general, post rendering is computationally expensive for mobile platforms. In the case of Nexus 5, the frame rate of this application is 7 fps.



  1. VRWiki
  2. Light field (Wikipedia)
  3. Vergence (Wikipedia)
  4. Accommodation (Wikipedia)
  5. Lanman, Douglas, and David Luebke. “Near-eye light field displays.” ACM Transactions on Graphics (TOG) 32.6 (2013): 220.
  6. Kramida, Gregory. “Resolving the Vergence-Accommodation Conflict in Head-Mounted Displays.” IEEE transactions on visualization and computer graphics 22.7 (2016): 1912-1931.
  7. Lanman, Douglas, et al. “Content-adaptive parallax barriers: optimizing dual-layer 3D displays using low-rank light field factorization.” ACM Transactions on Graphics (TOG) 29.6 (2010): 163.

Thanks for reading!