Literature DB >> 32759057

Augmented Reality and Virtual Reality Displays: Perspectives and Challenges.

Tao Zhan¹, Kun Yin¹, Jianghao Xiong¹, Ziqian He¹, Shin-Tson Wu².

Abstract

As one of the most promising candidates for next-generation mobile platform, augmented reality (AR) and virtual reality (VR) have potential to revolutionize the ways we perceive and interact with various digital information. In the meantime, recent advances in display and optical technologies, together with the rapidly developing digital processers, offer new development directions to advancing the near-eye display systems further. In this perspective paper, we start by analyzing the optical requirements in near-eye displays poised by the human visual system and then compare it against the specifications of state-of-the-art devices, which reasonably shows the main challenges in near-eye displays at the present stage. Afterward, potential solutions to address these challenges in both AR and VR displays are presented case by case, including the most recent optical research and development, which are already or have the potential to be industrialized for extended reality displays.

Entities: Chemical Disease Gene Species

Keywords: Laser; Optical Imaging; Optical Materials; Photonics

Year: 2020 PMID： 32759057 PMCID： PMC7404571 DOI： 10.1016/j.isci.2020.101397

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

As the most critical information acquisition medium, information displays have been developing rapidly after the third industrial revolution. From the beginning of this millennium, display technologies have successfully evolved from the bulky cathode ray tube to compact flat panel designs, such as liquid crystal display (LCD) and organic light-emitting diode (OLED) (Chen et al., 2018). More recently, the next-generation display technologies under dedicated development are no longer limited to flat panels that are just placed in front of the users but aimed at revolutionizing the way of interactions between the users and their surrounding environment (Cakmakci and Rolland, 2006). At one end of the spectrum is virtual reality (VR) display, which effectively extends the field of view (FOV), blocks the entire ambient, and offers an immersive virtual environment independent of the user's real surroundings. At the other end of the spectrum is augmented reality (AR) display, which not only pursues high-quality see-through performance but also enriches the real world by overlaying digital contents. With advanced level of optical technology and refreshing user experience, AR and VR displays exhibit potential to trigger attractive applications, including but not limited to health care, education, engineering design, manufacturing, retail, and entertainment. The ideal goal of AR and VR display development is to offer reality-like crystal-clear images that can simulate, merge into, or rebuild the surrounding environment and avoid wearing discomfort concurrently. This is still challenging at the present stage, especially for AR systems, as most components demand not only further performance enhancement but also miniaturization in both form factor and power consumption. In this article, we share a few perspectives about the development of optical technologies for AR and VR head-mounted displays. We begin the discussion by reviewing the visual requirement poised by the human visual systems. Next, we discuss how emerging optical technologies can help meet these challenges in terms of resolution, visual comfort, FOV, and dynamic range. Moreover, form factor and power efficiency are also taken into consideration because they play crucial roles in near-eye display designs, especially for consumer applications.

Requirement of Human Visual System

To better understand the goal and underlying challenges, it is necessary to examine the performance parameters of human visual system. The FOV has the distribution plotted in Figure 1A. The monocular FOV of human eye is about 160° (horizontal) by 130° (vertical). The combined binocular FOV is about 200° (horizontal) by 130° (vertical), with an overlapped region of 120° horizontally (Wheelwright et al., 2018). The resolution limit of the human eye is determined by the average spacing of cone cells in the fovea. This estimation yields the visual angle of about 0.5 arcmins (Curcio et al., 1990), or 120 pixel per degree (ppd), which corresponds to 20/10 visual acuity. As it comes to display design, there is an apparent trade-off between resolution density and FOV, given that the total number of display pixels is fixed.

Figure 1

Illustration on the Performance of Human Vision

(A) The profile of human FOV.

(B) The relation between human visual acuity and visual angle.

(C) Sketch of the VAC issue. The accommodation cue coincides with vergence cue when viewing a real object (left). The mismatch occurs when viewing a virtual object displayed at a fixed plane (right).

Illustration on the Performance of Human Vision (A) The profile of human FOV. (B) The relation between human visual acuity and visual angle. (C) Sketch of the VAC issue. The accommodation cue coincides with vergence cue when viewing a real object (left). The mismatch occurs when viewing a virtual object displayed at a fixed plane (right). For VR, a broad FOV that covers the human visual range is relatively easy to achieve by designing an eyepiece with sufficiently low f/#. The main issue becomes the resultant low-resolution density, which brings up the so-called screen-door effect that considerably compromises the viewing experience. A direct solution, of course, is to increase the display resolution, which is unfortunately very challenging considering the high cost and data transport rate. For estimation, to achieve a monocular vision with 100° FOV and resolution density of 60 ppd (1 arcmin, or 20/20 vision), a display with 6K resolution in horizontal is required. Some commercial products (like Pimax Vision 8K) now can provide about 4K monocular resolution, but the daunting price that comes with the high performance remains an issue. Another approach considers the fact that the high-resolution density only exists within the fovea region of ±2.5° (Rossi and Roorda, 2010), out of which the visual acuity drops drastically (Figure 1B). Therefore, the high resolution is only required in the central viewing zone, which brings out the concept of foveated display (Tan et al., 2018b; Kim et al., 2019). In foveated displays, the resolution is variant across the entire viewing region, usually through an optical combination of two display panels that individually address central and peripheral areas. This way, not only the burden of display hardware is lessened, the computational and data-transferring burdens are also reduced significantly. Regarding AR systems, although the trade-off between FOV and resolution density still exists, a more significant concern is to produce a decent FOV in the first place. Throughout various optical architectures from free-space combiners, total internal reflection (TIR) freeform combiners (Hua et al., 2013) to lightguide combiners, the maximum achievable FOV typically does not exceed 60° in horizontal, which still has a long path to go toward the human vision limit. Furthermore, as a high-dynamic-range imaging system, the human eye can adapt to a broad range of illuminance from 104 lux of daylight to 10−3 lux at night (Hoefflinger, 2007). Thus, contrast ratio (CR) is a critical display parameter. In VR, the issue of contrast is not significant because the influence of environment light can be neglected. If the stray light inside the headset can be well managed and suppressed, then CR can reach over 1,000:1. In AR, however, due to the high surrounding illuminance, the requirement for display brightness can be very high. In this case, a more representative parameter to consider is ambient contrast ratio (ACR), defined as (Lee et al., 2019b):where Lon (Loff) represents the display luminance of on- (off-) state and T is display transmittance. For a simple estimation, if we assume a display transmittance of 80% and ambient illuminance of 104 lux with Lambertian distribution, an ACR of 2:1 that barely prevents image washout already requires 2,500 nits of display brightness. A better CR of 5:1 for adequate readability even requires 10,000 nits of brightness. Current AR systems, for comparison, generally can support brightness only up to 500 nits (Lee et al., 2019b), which can only accommodate indoor use (500 lux). When evaluating the VR/AR systems capable of 3D image generation, yet another aspect to consider for human vision is the stereo sensation. The natural viewing experience of a 3D object induces vergence cue (relative rotation of eyes) and accommodation cue (the focus of eyes), which coincide with each other (Figure 1C). However, in most current VR systems, a fixed display plane with different rendered contents for each eye is adopted. The eye accommodation is fixed on the plane and therefore mismatches with vergence cue, which causes visual fatigue and discomfort, sabotages stereo acuity, and distorts perceived depth (Hoffman et al., 2008; Watt et al., 2005). This phenomenon is often called vergence-accommodation conflict (VAC).

VR: Resolution

The current angular resolution of VR displays still falls short of normal 20/20 vision acuity. Most VR headsets are using one display panel and viewing optics for each eye to provide the stereoscopy effect; such an old technology can trace back to the nineteenth century (Wheatstone, 1838). The VR optical layout is essentially an unsophisticated imaging system using the viewing optics to magnify the display panel. Therefore, from the system perspective, clearer and sharper imagery can be offered by further improving both display panels and magnifying lenses. The display industry has been pursuing display panels with higher resolution, power efficiency, dynamic range, and faster response time yet lower cost. The fast-evolving flat panel display in the past decade is one of the cornerstones of current VR headsets, and their future development will also considerably benefit the VR industry. It is vital to increase the pixel number and density on physical display panels and thus reduce the screen-door effect in the long term. However, this may bring a heavy burden on image rendering, driving circuits, and power consumption. In the meantime, some emerging approaches can offer decent visual experience based on the off-the-shelf display panels (Figure 2). For global resolution enhancement, the conventional wobulation method (Allen and Ulichney, 2005) designed for projection displays can be extended to VR. Lee et al., 2017b demonstrated an optical wobulation VR system by synchronizing a switchable liquid crystal (LC) Pancharatnam-Berry phase deflector and subframe images, increasing the pixel density through time multiplexing. Zhan et al., 2019b, further advanced this approach using a passive polymer deflector and a polarization management layer, doubling the apparent pixel density without reducing the original frame rate. More recently, Nguyen et al., 2020 realized mechanical wobulation for both micro-OLED and LCD panels to reduce the screen-door effect. These prior arts, based on the wobulation method, can simulate high-resolution imagery for the entire FOV before ideal display panels are available. Nonetheless, the wobulation method still requires a large amount of data rate and cannot reduce the burden placed by the massive amount of data flow.

Figure 2

The Development Trend of Panel Resolution

The pixel density of display panels will gradually increase for VR application. Before panels with ideal pixel density are available at low cost, it is also feasible to employ global resolution enhancement based on mechanical or optical wobulation method and local resolution enhancement with foveated display technologies.

The Development Trend of Panel Resolution The pixel density of display panels will gradually increase for VR application. Before panels with ideal pixel density are available at low cost, it is also feasible to employ global resolution enhancement based on mechanical or optical wobulation method and local resolution enhancement with foveated display technologies. Alternatively, the foveation approach aimed at local resolution enhancement can avoid this problem, which makes use of the non-uniform angular resolution distribution of the human visual system (Rossi and Roorda, 2010). It offers high resolution on the fovea region of eye retina while maintaining degraded resolution on the peripherals. This principle was adopted for imaging before near-eye displays (Hua and Liu, 2008). Generally, in most foveated VR systems, a beam splitter is employed to combine the images displayed on the low-resolution panel and high-resolution one, resulting in a larger device volume. Miniaturizing the optical layout and finding an alternative to the bulky beam splitter design is an essential task for the future development of foveated VR devices. A promising candidate is using an off-axis mini-projection unit together with a transparent projection screen on top of the display panel. The projection screen should be transparent for the display light but manifest strong scattering for the off-axis projection light. A decent example of such a projection screen is polymer-dispersed LC film with customized molecular orientation and index mismatch (He et al., 2020). Moreover, as the gaze point is not always fixed at the center FOV, another potential development direction for the foveation method is image shifting, which is similar to but more complicated than beam steering technologies. Both mechanical and optical shitting methods for VR displays have been demonstrated, using a rotatable beam splitter (Sahlsten et al., 2020) and a switchable LC deflector (Tan et al., 2018c), respectively.

VR: Viewing Optics

In parallel, a decent optical imaging part is also critical for generating high-resolution virtual images in VR headsets. Due to ergonomic requirements, the viewing optics should be compact and lightweight, which brings a significant sacrifice in imaging quality. Conventional aspheric singlet with smooth surfaces usually have limited stray light but a large volume and weight. Thus, its compact Fresnel alternative is more prevalent in current commercial VR headsets (Geng et al., 2018). Although Fresnel singlets have more degrees of freedom for aberration control, its intrinsic diffractive artifacts and unavoidable stray light considerably reduce the image sharpness. For now, the systematic imaging quality is limited by the display panel resolution in most headsets, so these drawbacks of Fresnel lenses are still tolerable. However, in the long run, these issues could become more critical as display pixel density gradually increases. To further reduce the device dimension, catadioptric pancake optics can be employed (Wong et al., 2017). With reflective surfaces induced to share the optical power of refractive components, the pancake lenses can allow display panel with smaller sizes due to their shorter focal length. However, these benefits come at the cost of 75% light efficiency and demanding polarization control to eliminate ghost images. In this case, plastic materials with limited birefringence and high-quality polarizers and waveplates are highly demanded. Moreover, the emerging flat optics including broadband diffractive lenses (Meem et al., 2020), metalenses (Chen et al., 2019), and LC Pancharatnam-Berry phase lenses (Zhan et al., 2019a) can also be applied in the VR lens system for aberration control and system miniaturization. By adding a thin-film flat polymer lens, it is possible to sharpen the imagery by more than three times (Zhan et al., 2020b). Another intriguing approach is to use a two-dimensional curved display (Grover et al., 2018). With the field curvature compensated by the tailored panel curvature, the heavy burden on the lens design can be well relieved. Alternatively, the curved fiber faceplate (Zhao et al., 2019) can be attached to the display panel as a surface-shaping component, which can be designed together with the viewing optics for sharper imaging.

Vergence-Accommodation Conflict

Aside from limited resolution and screen-door effect, VAC is another significant issue in VR systems. A plethora of solutions have been developed to mitigate this conflict (Kramida, 2015), but only few have been applied to the current commercial VR headsets. Monovision displays represent a simple solution to VAC, where vergence is not present for the virtual image. As only one eye is offered with digital images, this approach is more suitable for specific AR applications, but not immersive VR. The other extreme is accommodation-invariant approaches, like the Maxwellian view (Takaki and Fujimoto, 2018), where the point source is focused on the pupil with angularly encoded amplitude information and the image on the retina is independent of the accommodation response. However, to tolerate the eye movement, Maxwellian-view systems usually exhibit a limited FOV. In general, most other approaches offer a proper accommodation cue to mimic the retina blur and therefore alleviate the conflict. A typical example is holographic display (Yamaguchi et al., 2007) aimed at reconstructing accurate wavefront of the entire 3D scene and offering accurate retinal blur. Aside from the limited FOV, holographic displays usually manifest degraded image quality due to laser speckles. Similarly, light field displays (Wetzstein et al., 2012) reconstruct the geometric light rays instead of the diffractive wavefront, which can also provide the approximately correct depth information and retinal blur but usually ends up with low resolution. If the amount of information is taken into consideration, it is not surprising that these approaches aimed at showing volumetric information like holograms and light fields cannot offer sufficient resolution with the limited bandwidth of current hardware. Even so, there is no denying that these approaches may gradually mature in the long term with better hardware and eventually become satisfactory for users. In the short term, methods that can find an acceptable trade-off between depth accuracy and system complexity should be more practical for addressing the VAC in current commercial products, such as varifocal and multifocal displays. Varifocal displays employ an eye tracker to locate the gaze location and an adaptive focusing component to shift the display depth accordingly. In addition, real-time blur rendering is also preferred in varifocal approaches because they cannot naturally generate retina blur (Dunn et al., 2017). In comparison, multifocal displays (Liu and Hua, 2010; Hua, 2017; Zhan et al., 2018; Tan et al., 2018c; Liu et al., 2018) can create near-correct physical depth blur and offer a customizable balance between depth accuracy and hardware bandwidth by choosing the density of focal planes for different applications. A systematic summary and analysis of multifocal displays can be found in Zhan et al. (2020a). For both varifocal and multifocal displays, the need for high-quality focal changing components is still urgent, which should have fast response time, compact form factor, and low power consumption.

AR: Field of View

Different from the immersive experience provided by VR, one of the most pressing challenges in AR is expanding the FOV. Due to various designs and form factors for the same type of AR, we will discuss and compare the diagonal FOV instead of the horizontal/vertical FOV values. The diagonal FOV is related to the horizontal/vertical FOV as . To address the inadequate FOV issue, we will overview potential solutions and analyze the systems case by case. In a lightguide-based near-eye display (LNED), the light from optical engine propagates inside the lightguide following the TIR and is then extracted to human eye by an exit pupil expansion (out-coupler) as illustrated in Figure 3A. Typically, the core optical elements in such a system are the image source and the light combiner consisting of an input coupler and an output coupler. The optical engine can be a liquid-crystal-on-silicon (LCoS) panel, digital light processing (DLP), μOLED, μLED, and laser beam scanning (LBS) (Kress, 2020), whereas the combiners can be a reflective mirror or diffractive grating (Kress, 2019; Lee et al., 2019b).

Figure 3

Optical Structures of AR Systems with Extended FOV

(A) Schematic illustration of the LNED system. TIR happens at each reflection during the propagation, and the angle is marked in orange.

(B) Lightguide-based polarization multiplexing system for enlarging FOV. The system is based on two PVGs with opposite polarization responses (LCP and RCP) and different diffraction angles.

(C) Schematic diagrams of the Maxwellian view system, including the imaging principle and two distinct forms derived from it: partial reflector and lightguide structure.

Optical Structures of AR Systems with Extended FOV (A) Schematic illustration of the LNED system. TIR happens at each reflection during the propagation, and the angle is marked in orange. (B) Lightguide-based polarization multiplexing system for enlarging FOV. The system is based on two PVGs with opposite polarization responses (LCP and RCP) and different diffraction angles. (C) Schematic diagrams of the Maxwellian view system, including the imaging principle and two distinct forms derived from it: partial reflector and lightguide structure. When the light is propagating inside the lightguide, the TIR angle is governed by the refractive index of the lightguide. Meanwhile, the index contrast of the coupler determines the angular and spectral responses, especially for grating and hologram, which affects the color uniformity over the FOV and the eye-box (Kress, 2019). Due to the significant impact of the coupler on the system, numerous technologies have been applied to optimize the coupler performance (Xiang et al., 2018; Gao et al., 2017; Yin et al., 2019; Yin et al., 2020b). As a result, the angular response of an LNED system is not limited by the coupler but by the critical angle of TIR, which is in turn determined by the lightguide refractive index. The normal refractive index of lightguide is n = 1.50 ± 0.03 (Sprengard et al., 2019), whereas a comparatively high refractive index is n = 1.7–1.8 (Masuno et al., 2019). For most LNEDs, such as HoloLens 2 and Magic Leap One, high-index glass has been implemented to realizing a diagonal FOV of 50° (Kress, 2020). To widen FOV further, a high-index n ≥ 1.9 glass has been commercialized recently. By using such a high-index glass, the critical angle becomes smaller so that the range from critical angle to 90° gets larger, meaning a wider FOV can be supported in the lightguide. In addition to improving the intrinsic characteristics of the components, such as increasing the refractive index of glass or widening the angular bandwidth of coupler, the FOV can also be extended by expanding the system's degree of freedom. By utilizing the multiplexing of coupler functions, such as spatial multiplexing (Vallius and Tervo, 2017), polarization multiplexing (Shi et al., 2018), etc., we can build a more sophisticated system with wide FOV. The multiplexing method utilized for broadening FOV is essentially to stitch images based on different characteristics of light, thereby realizing a more informative and realistic experience. However, it is worth mentioning that the multiplexing is not limited in benefitting the FOV; it also plays an essential role in overcoming the VAC issue (Zhan et al., 2019c; He et al., 2020) and presenting full-color images (Jang et al., 2017) in the AR system. In a near-eye display, the multiplexing based on the properties of light can be categorized into spatial, time, polarization, wavelength, and angular multiplexing. Sometimes, more than one method is used in a system. By spatially combining two images to increase the FOV, Microsoft patented a combiner structure with two intermediate couplers separated spatially (Vallius and Tervo, 2017). Then Shi et al. proposed the polarization multiplexing based on meta-gratings (Shi et al., 2018). Similar to polarization division multiplexing in optical fiber communications where two channels with orthogonal polarizations are used to double the information capacity, the polarization multiplexing method increases the FOV by encoding the left and right FOVs into two orthogonal polarization channels, transverse electric (TE) channel and transverse magnetic (TM) channel, respectively. Recently, Yoo et al. propose an extended FOV LNED system by polarization multiplexing using LC-based grating (Yoo et al., 2020). In the holographic volume grating (HVG)-based LNEDs, several multiplexing techniques have been reported. Han et al. (2015) and Yu et al. (2017) attempted to apply the spatially multiplexing in out-coupler HVG to obtain wide FOV. Lately, LC-based polarization volume gratings (PVGs), also known as Bragg polarization gratings, with high diffraction efficiency and large angular bandwidth have been reported (Lee et al., 2017a; Yin et al., 2020b). Due to these special optical features, it is feasible to build a spatially multiplexed AR system with a large FOV using PVGs. As depicted in Figure 3B, the image information is coupled into two lightguides through two input couplers that are spatially separated. Then the light propagates into the output area through TIR, and the image information is extracted by two output couplers with different periodicity and form a larger FOV beyond the limitation of lightguide TIR. As the asymmetric input and output coupler here may induce significant chromatic aberrations and image distortion, it is preferable to employ narrow-band display engine and anamorphic image pre-processing. The Maxwellian view is an observation method, in which the lens system forms an image of the light source in the plane of the observer's pupil, instead of looking at the source directly. Therefore, the effect of the eye's optical aberrations is minimized, and the quantity of light independent of pupil size is increased (Westheimer, 1966; Sugawara et al., 2016). When applying this method in near-eye displays (NEDs), the effective eye pupil can be regarded as a tiny aperture and the focal depth of the image will be dramatically increased. Therefore the system offers focus-free feature, i.e., no matter where the eye focuses, the image is always clear. However, this method has its own limitations, especially the severely reduced eye-box. To address this issue, Kim and Park (2018) combined a Maxwellian view LNED with holographic optical element (HOE) multiplexing to obtain an enlarged eye-box or a steering eye-box. Figure 3C illustrates a typical schematic diagram of the Maxwellian view system. Based on geometric optics, the Maxwellian view system can evolve into different forms, such as partially reflective elements and LNEDs. From Figure 3C, the FOV of this system is directly related to the numerical aperture (NA) of the lens system. With rapid technology development and urgent needs from industry, numerous novel flat lenses with a wide acceptance angle and large aperture in both on-axis and off-axis types have emerged (Khorasaninejad et al., 2016; Yin et al., 2020a). Based on the HOE with a large NA, NVIDIA demonstrated an 85° × 78° monocular FOV Maxwellian view system (Kim et al., 2019). Further efforts have been investigated to enlarge the FOV. Xiong et al., (2020) demonstrated a large FOV AR system with 100° diagonal FOV by hybridizing the Maxwellian view and the lightguide-based exit pupil expander. By increasing the NA and compressing the lens volume, both FOV and form factor of the Maxwellian-view based NED system can be improved significantly.

AR: Brightness and Efficiency

For optical see-through AR displays, ACR is a critical parameter, which puts a strict requirement on display luminance (Lee et al., 2019a, 2019b). As a general guideline, for indoor applications, the output luminance of the AR display should be at least 500 nits. By contrast, for outdoor applications, the required luminance would exceed 10,000 nits. To deliver such a high luminance, both microdisplay and efficient relay/combiner optics are pivotal. A roadmap of potential display engines is plotted in Figure 4. To provide a more general guideline on how to choose display engines, a qualitative comparison among five candidates is summarized in Table 1. Field-sequential LCoS is a reflective display based on polarization modulation of backlight (Huang et al., 2018). Due to high brightness (104 to 105 nits) and commercial availability, it has been used in Magic Leap One (Klug et al., 2016) and HoloLens (Kress and Cummings, 2017). A proper polarization conversion system (PCS) can boost the efficiency and brightness of an LCoS as only light with a certain linear polarization can be reflected by the polarization beam splitter (PBS) and modulated by the LCoS. In traditional, large-sized LCoS projectors, a PCS consisting of a fly-eye lens, a PBS array, and a patterned half-wave plate is integrated. However, as the form factor shrinks to microdisplay sizes, fabrication difficulties and bulkiness of such a PCS has its limitation. Although some researchers proposed improved PCSs based on thin-film polarization gratings (Kim et al., 2012; Du et al., 2015), the small form factor, large angular bandwidth, and high efficiency are still lacking. Another fundamental issue of LCoS is its limited dynamic range, as the relatively poor dark state will influence the see-through experience, especially for indoor uses. A two-dimensional (2D) illumination or backlight with independently addressable patches offers a promising solution, like the mini-LED array for LCD panels (Tan et al., 2018a). Similar to LCoS, DLP panels are field-sequential micromirror displays with high brightness (Thompson et al., 2015), as employed by DigiLens. Compared with LCoS, the amplitude modulation of DLP is polarization independent, and the dynamic range can be higher. For both reflective microdisplay panels (LCoS and DLP), although LEDs are typically applied as the illumination source, other light sources, such as lasers, are also available. Lasers are inherently collimated and linearly polarized and are very suitable for LCoS. However, additional de-speckle optics are needed to achieve good image quality.

Figure 4

Schematic Plots of Major Microdisplays and Combiners

The microdisplays cover liquid-crystal-on-silicon (LCoS), digital light processer (DLP), laser beam scanner (LBS), micro organic light-emitting diode (μOLED), and micro light-emitting diode (μLED), whereas the combiners include freeform half mirror, birdbath, freeform prism, off-axis holographic optical element (HOE), cascaded mirrors, and grating couplers. Three kinds of grating couplers are also highlighted: surface relief grating (SRG), volume Bragg grating (VBG), and polarization volume grating (PVG).

Table 1

Comparison among AR Display Light Engines

Display	Maturity	Brightness (Nits)	Light Efficiency	Form Factor	Optical System Complexity	Contrast Ratio
LCoS	High	10⁴–10⁵	Low	Large	Medium	~10³:1
DLP	High	10⁴–10⁵	Medium	Medium	Medium	~10³:1
μOLED	Medium	10³–10⁴	High	Small	Low	~10⁴:1
μLED	Low	10⁵–10⁶	High	Small	Low	~10⁵:1
LBS	Medium	>10⁴	High	Small	High	~10⁵:1

Schematic Plots of Major Microdisplays and Combiners The microdisplays cover liquid-crystal-on-silicon (LCoS), digital light processer (DLP), laser beam scanner (LBS), micro organic light-emitting diode (μOLED), and micro light-emitting diode (μLED), whereas the combiners include freeform half mirror, birdbath, freeform prism, off-axis holographic optical element (HOE), cascaded mirrors, and grating couplers. Three kinds of grating couplers are also highlighted: surface relief grating (SRG), volume Bragg grating (VBG), and polarization volume grating (PVG). Comparison among AR Display Light Engines In comparison with projection, emissive displays are less mature but have potential to reduce the form factor. They exhibit intrinsically high dynamic range because of the true black state. Micro organic light-emitting diode (μOLED) is a promising candidate for emissive microdisplays. The typical architecture is patterned color filters on top of white OLEDs. To date, full-color μOLED displays with 3,000 to 5,000 nits in luminance and ~3,000 ppi (pixel per inch) in resolution have been achieved (Haas, 2018; Motoyama et al., 2019). However, for AR displays with a large eye-box, such brightness is still inadequate (Lee et al., 2019b). Future development should pay attention to boosting their brightness, device lifetime, and current efficiency. On the other hand, micro light-emitting diode (μLED) is emerging and has the potential to become the next-generation display technology. The most recent development of 10-μm pitch (~1,300 ppi) full-color LED microdisplay has achieved 105 to 106 nits in luminance (Quesnel et al., 2020). Despite this impressive progress, μLED still faces two major challenges. The first is to enhance the non-radiative recombination when the area ratio of the side wall increases (Gou et al., 2019). This means, for small μLED chips down to <5 μm, the external quantum efficiency would drop dramatically. The second issue is how to realize full color and high resolution simultaneously, as mass transfer and assembly for such tiny RGB LEDs is challenging (Lin and Jiang, 2020; Wong et al., 2020). A parallel approach is to use blue μLED to pump green and red quantum dots as color conversion (Huang et al., 2020). However, obtaining a uniform, long lifetime color conversion layer without color cross talk for such small pixel sizes is by no means easy. Therefore, further effort is needed to develop mass transfer technique or color conversion layer patterning technique for ultra-small pixel pitch (<5 μm) μLEDs. As for scanning display systems, they are normally with high efficiency, small form factor, high dynamic range, and high brightness using laser illumination. Typically, a 2D micro-electromechanical system (MEMS) mirror or two 1D MEMS mirrors are applied to scan the laser beam in orthogonal directions to form 2D images. Different from the panel-based displays, scanning displays do not have an object plane. This unique property indicates that whereas panel-based displays form object images on the panel, the scanning displays can directly form images on the retina. One prominent example is the LBS system in North Focals (Alexander et al., 2018). As most scanning display engines have intrinsically small exit pupil, they need a proper exit pupil expansion/steering, and thus the optical design will be more sophisticated. When compared with reflective and emissive displays, the image uniformity of the scanning method is another inevitable issue that requires improvement. The information generated from the optical engine will undergo magnifying optics and/or combiners and finally project into human eyes. The combiners can be classified into two types: reflective and diffractive. The reflective type includes freeform half mirrors, freeform prisms, birdbath combiners, and cascaded mirrors (Wei et al., 2018; Cheng et al., 2011), whereas diffractive type covers all kinds of grating-coupler-based lightguide combiners and off-axis HOE (not used in lightguide) combiners (Li et al., 2016). Their schematic plots are shown in Figure 4, and a comparison among them is illustrated in Table 2.

Table 2

Comparisons among AR Optical Combiners

Type	Combiner	Efficiencya	Form Factor	δn	Bandwidth	FOV diagonalb
Reflective	Freeform mirror	<50%	Large	–	Large	90°
	Freeform prism	<50%	Large	–	Large	120°
	Birdbath	<25%	Large	–	Large	52°
	Cascaded mirrors	<20%	Medium	–	Large	40°
Diffractive	Off-axis HOE	<20%	Small	Small	Small	15°
	Traditional VBG	<10%	Medium	Small	Small	40°
	HPDLC	<10%	Medium	Medium	Medium	50°
	SRG	<10%	Medium	High	Large	52°
	PVG	<10%	Medium	Medium	Medium	50°

These typical values depend on lightguide design.

These typical values come from products and prototypes.

Comparisons among AR Optical Combiners These typical values depend on lightguide design. These typical values come from products and prototypes. The freeform half mirrors, freeform prisms, and birdbath combiners usually manifest decent imaging quality and high optical efficiency, but mainly suffer from a large form factor. To reduce the form factor, cascaded mirrors embedded in a lightguide has been invented. However, for lightguide combiners, additional attention should be paid on see-through transmittance, see-through uniformity, stray light control, and image brightness uniformity. As a result, the image quality and optical efficiency are usually compromised. The diffractive combiners are also introduced to reduce the form factor of traditional reflective combiners. Different from the reflective counterpart, the chromatic nature of diffractive elements needs to be considered in optical design. Off-axis HOEs combined with an LBS system can provide a true glasses-like form factor yet a limited eye-box. To further enlarge the eye-box, grating-coupled lightguide combiners are employed where the output coupler design is more complicated as it can also perform as the exit pupil expander. Currently, two types of gratings are employed in lightguide AR: holographic volume Bragg gratings (VBGs) and surface relief gratings (SRGs). Due to the different refractive index contrast, they exhibit different spectral and angular responses. The traditional VBGs with a small refractive index contrast (δn ≤ 0.05) manifest narrow spectral (~10 nm) and angular (~5° in air) bandwidths, whereas SRGs with a large δn (≥0.5) show much broader spectral and angular bands (Lee et al., 2019b). Interestingly, DigiLens has developed a large δn VBG (close to LC birefringence) based on holographic polymer-dispersed LC, which is switchable and performs much better than traditional VBGs (Brown et al., 2018). Beside these two gratings, polarization volume gratings (PVGs) based on chiral LCs are also emerging (Yin et al., 2019). The refractive index contrast is essentially the birefringence of the LC material and thus can be tuned within a broad range (from <0.1 to >0.4). As those grating couplers are usually optimized for a particular polarization (e.g., a linear polarization for VBGs and SRGs and a circular polarization for PVGs), a PCS modulating the polarization of light from the display engine and polarization management within the lightguide will be significant for improving the system efficiency. Another unavoidable aspect of improving light efficiency is the 2D exit pupil expander design. Typically, a turn-around gradient-efficiency grating (also termed as fold grating) is performed to first expand the eye-box in one direction within the lightguide. Then the output grating extends the eye-box in another direction. Specifically, due to the inherent chromatic dispersion in diffraction, color uniformity control is as challenging as brightness uniformity in most of the waveguide designs using diffractive combiners. However, because there is a trade-off between optical efficiency of the gratings (both the turn-around grating and the output grating) and color/brightness uniformity within the expanded eye-box, finding an appropriate balance between them is essential from the system perspective.

Conclusion

We overviewed the major challenges and discussed potential opportunities of display optics in the fast-developing field of AR and VR systems. The requirements from the human visual system are analyzed in detail to offer quantitative standards for future near-eye display devices. These requirements also bring out the major issues that need to be emphasized and addressed in current devices, regarding panel resolution, form factor, imaging performance, VAC, FOV, and brightness. By learning from recent advances in optics and developing trends of AR and VR devices, we shared a few thoughts about how to meet these challenges in the near future and the long run.

38 in total

1. Focus cues affect perceived depth.

Authors: Simon J Watt; Kurt Akeley; Marc O Ernst; Martin S Banks
Journal: J Vis Date: 2005-12-15 Impact factor: 2.240

2. Dual-sensor foveated imaging system.

Authors: Hong Hua; Sheng Liu
Journal: Appl Opt Date: 2008-01-20 Impact factor: 1.980

3. Vergence-accommodation conflicts hinder visual performance and cause visual fatigue.

Authors: David M Hoffman; Ahna R Girshick; Kurt Akeley; Martin S Banks
Journal: J Vis Date: 2008-03-28 Impact factor: 2.240

4. Design of a wide-angle, lightweight head-mounted display using free-form optics tiling.

Authors: Dewen Cheng; Yongtian Wang; Hong Hua; Jose Sasian
Journal: Opt Lett Date: 2011-06-01 Impact factor: 3.776

5. The relationship between visual resolution and cone spacing in the human fovea.

Authors: Ethan A Rossi; Austin Roorda
Journal: Nat Neurosci Date: 2009-12-20 Impact factor: 24.884

6. Portable waveguide display system with a large field of view by integrating freeform elements and volume holograms.

Authors: Jian Han; Juan Liu; Xincheng Yao; Yongtian Wang
Journal: Opt Express Date: 2015-02-09 Impact factor: 3.894

7. Foveated imaging for near-eye displays.

Authors: Guanjun Tan; Yun-Han Lee; Tao Zhan; Jilin Yang; Sheng Liu; Dongfeng Zhao; Shin-Tson Wu
Journal: Opt Express Date: 2018-09-17 Impact factor: 3.894

8. Holographic display for see-through augmented reality using mirror-lens holographic optical element.

Authors: Gang Li; Dukho Lee; Youngmo Jeong; Jaebum Cho; Byoungho Lee
Journal: Opt Lett Date: 2016-06-01 Impact factor: 3.776

9. Design and fabrication of a compact off-axis see-through head-mounted display using a freeform surface.

Authors: Lidong Wei; Yacan Li; Juanjuan Jing; Lei Feng; Jinsong Zhou
Journal: Opt Express Date: 2018-04-02 Impact factor: 3.894

10. Human photoreceptor topography.

Authors: C A Curcio; K R Sloan; R E Kalina; A E Hendrickson
Journal: J Comp Neurol Date: 1990-02-22 Impact factor: 3.215

8 in total

1. First Evaluation of a Retinal Imaging Laser Eyewear System Based Low Vision Aid.

Authors: Mareile Stöhr; Dirk Dekowski; Nikolaos Bechrakis; Joachim Esser; Anja Eckstein; Michael Oeverhaus
Journal: Clin Ophthalmol Date: 2020-11-30

Review 2. Advanced liquid crystal devices for augmented reality and virtual reality displays: principles and applications.

Authors: Kun Yin; En-Lin Hsiang; Junyu Zou; Yannanqi Li; Zhiyong Yang; Qian Yang; Po-Cheng Lai; Chih-Lung Lin; Shin-Tson Wu
Journal: Light Sci Appl Date: 2022-05-30 Impact factor: 20.257