How do multi-focal plane displays work in XR display modules?

Multi-focal plane displays work by stacking two or more physical display panels at different fixed distances from the viewer’s eye, or by using a single, fast-switching panel combined with a variable-focus optical element. This creates the perception of multiple, discrete depth layers, allowing digital content to appear at specific, comfortable focal distances. This is a fundamental shift from traditional stereoscopic 3D displays, which create a compelling illusion of depth through binocular disparity but force the eyes to focus on the physical screen surface regardless of where the virtual object appears to be. This conflict, known as the vergence-accommodation conflict (VAC), is a primary source of visual discomfort and fatigue in extended reality (XR) applications. Multi-focal plane technology directly addresses VAC by providing the correct focal cue for each depth layer, making virtual objects feel more solid and natural and significantly improving user comfort.

The core problem multi-focal planes solve is a biological limitation of human vision. Our eyes rely on two main cues to perceive depth: vergence and accommodation. Vergence is the coordinated movement of both eyes to point inward or outward to fixate on an object. Accommodation is the process where the eye’s lens changes shape to bring an object into sharp focus on the retina. In the real world, these two systems are perfectly coupled; when you look at your finger close to your face, your eyes converge and your lenses accommodate for a near distance. In a conventional XR display showing a close-up virtual object, your eyes will converge as if the object is near, but they must still accommodate to focus on the display screen, which might be only a few centimeters away. This mismatch confuses the brain, leading to symptoms like eyestrain, headaches, and blurred vision after prolonged use. By presenting images on distinct planes that correspond to different focal distances, multi-focal displays allow the eye’s accommodative system to work more naturally, reducing or eliminating the conflict.

There are two dominant engineering approaches to creating these multiple focal planes. The first and more straightforward method is the stacked panel approach. Imagine a miniature version of a theatre stage with several transparent scrims (screens) placed at different depths. In an XR headset, these are ultra-thin, high-resolution micro-displays, such as OLED or MicroLED panels, positioned at fixed distances—for example, at 1 diopter (1 meter), 2 diopters (0.5 meters), and optical infinity. A beam splitter or waveguide optical combiner merges the light from these panels into a single path directed toward the eye. The system then renders the virtual scene, assigning parts of the image to the appropriate depth plane. A virtual object meant to appear far away is displayed on the “infinity” panel, while a nearby menu is rendered on the “near” panel. The key advantage here is that each plane is a true physical focal point, providing high-quality, native focal cues.

ApproachMechanismKey AdvantagePrimary Challenge
Stacked PanelsMultiple physical display panels at fixed depths.High image quality, true physical focal cues.Increased size, weight, cost, and power consumption.
Varifocal (Time-Multiplexed)One high-speed display + tunable lens (e.g., liquid crystal, deformable mirror).More compact form factor, potential for many focal planes.Requires extremely high refresh rates & precise eye-tracking.

The second, more advanced method is the varifocal or time-multiplexed approach. This system uses a single, very high-refresh-rate display panel (e.g., 1440 Hz or higher) paired with a dynamic optical element that can rapidly change its focal power. This element could be a liquid crystal lens, an electrically tunable liquid lens, or even a deformable membrane mirror. Here’s how it works in a cycle: an eye-tracking system first precisely measures the user’s current vergence point—where they are looking in 3D space. The system then calculates the required accommodation distance. The tunable lens is set to the corresponding focal power, and the display flashes the image for that specific depth plane. The entire process—measure, adjust, display—happens in a few milliseconds. The system then immediately switches the lens to the next focal distance and displays the image for that plane, and so on, cycling through all active depth planes. Because this happens so quickly, the user’s visual system perceptually fuses these rapidly alternating images into a stable, multi-depth scene. The major benefit is a more compact physical design, but it places immense demands on the speed of the display, the lens, and the accuracy of the eye tracker.

The choice of how many planes to use is a critical trade-off between visual fidelity and system complexity. Research in human visual perception suggests that while a continuous range of focal depth would be ideal, a surprisingly small number of discrete planes can effectively mitigate VAC. Studies have shown that two to four focal planes can provide a significant comfort improvement for many applications. The placement of these planes is also not linear; they are often spaced in diopters, which is the reciprocal of distance in meters. For instance, a three-plane system might have planes at 0 D (optical infinity), 1.5 D (~0.67 meters), and 3.0 D (~0.33 meters). This logarithmic spacing better matches the sensitivity of the human eye’s accommodative system, which is more sensitive to changes in focus at near distances than far ones. The rendering software must intelligently assign depth information from the virtual scene to the available planes, sometimes using techniques like depth-based filtering to minimize “popping” artifacts when an object moves from one plane to another.

Implementing this technology is not without its significant hurdles. For the stacked panel approach, the main challenges are physical: adding more displays increases the size, weight, cost, and power requirements of the headset—all critical factors for consumer adoption. Aligning these panels with micron-level precision is a manufacturing nightmare. There’s also an inherent trade-off in brightness, as light must pass through multiple optical combiners, leading to losses. For the varifocal approach, the challenges are rooted in performance. The display and optics must operate at kilohertz rates to avoid flicker, which pushes the limits of current microdisplay and liquid crystal technology. The eye-tracking system must be exceptionally fast and accurate, with latency under 1-2 milliseconds, as any delay causes the focal plane to lag behind the user’s gaze, reintroducing blur and defeating the purpose. Both approaches require sophisticated and computationally expensive rendering pipelines to manage the multi-plane image synthesis in real-time.

Looking forward, the evolution of multi-focal planes is likely to blend with other cutting-edge display technologies. A promising direction is the integration with holographic displays, which aim to generate true light fields by reconstructing the wavefront of light, theoretically providing continuous, correct focus cues across a wide depth range. While still largely in the research phase, progress in spatial light modulators and computational holography could eventually make this the ultimate solution. In the nearer term, we will see more advanced forms of adaptive varifocal systems that use predictive algorithms based on gaze data to anticipate where a user will look, further reducing latency. The development of specialized XR Display Module components that integrate displays, optics, and sensors into a single, optimized unit will be crucial for making these complex systems commercially viable. The goal is to move beyond merely solving VAC to creating displays where virtual and real light are indistinguishable, enabling truly comfortable and immersive experiences that can be used for hours on end.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top