Literature DB >> 33057363

Stereoscopic 3D geometric distortions analyzed from the viewer's point of view.

Zhongpai Gao¹, Guangtao Zhai¹, Xiaokang Yang¹.

Abstract

Stereoscopic 3D (S3D) geometric distortions can be introduced by mismatches among image capture, display, and viewing configurations. In previous work of S3D geometric models, geometric distortions have been analyzed from a third-person perspective based on the binocular depth cue (i.e., binocular disparity). A third-person perspective is different from what the viewer sees since monocular depth cues (e.g., linear perspective, occlusion, and shadows) from different perspectives are different. However, depth perception in a 3D space involves both monocular and binocular depth cues. Geometric distortions that are solely predicted by the binocular depth cue cannot describe what a viewer really perceives. In this paper, we combine geometric models and retinal disparity models to analyze geometric distortions from the viewer's perspective where both monocular and binocular depth cues are considered. Results show that binocular and monocular depth-cue conflicts in a geometrically distorted S3D space. Moreover, user-initiated head translations averting from the optimal viewing position in conventional S3D displays can also introduce geometric distortions, which are inconsistent with our natural 3D viewing condition. The inconsistency of depth cues in a dynamic scene may be a source of visually induced motions sickness.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 33057363 PMCID： PMC7561172 DOI： 10.1371/journal.pone.0240661

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

1 Introduction

The goal of display systems is to convey the real world or virtually constructed 3D worlds veridically to viewers. Compared to 2D displays, stereoscopic 3D (S3D) displays are able to provide binocular disparity depth cue. Various S3D display technology has been used for virtual/augmented reality, scientific visualization, medical imaging, 3D movies, and gaming. However, orthoscopic presentation of the 3D scene remains a challenging task. Parametric mismatches occur among stereoscopic capture, display, and viewing processes cause geometric distortions for the viewer [1-4]. Motions in geometrically distorted S3D space are suspected as a potential cause of visually induced motion sickness (VIMS) [3]. In general, VIMS is considered as a physiological response induced by inter-sensory motion signal conflicts, e.g., the motion signal conflicts between the visual and vestibular systems [5, 6]. However, the inter-sensory conflict theory cannot explain why watching S3D videos causes significantly higher levels of discomfort [7] and motion sickness [8] than watching 2D videos. Another explanation of VIMS is sensory rearrangement [9]—‘Whenever the central nervous system receives sensory information concerning the orientation and movement of the body which is unexpected or unfamiliar in the context of motor intentions and previous sensory-motor experience—and this condition occurs for long enough—motion sickness typically results’ [10]. Hwang and Peli [3] and Gao et al. [4] pointed out that depth-cue conflicts in a geometrically distorted S3D space with motions may cause VIMS, which can be explained by the sensory rearrangement theory since depth-cue conflicts are unexpected in the real 3D world motion. Depth perception in a 3D space involves monocular and binocular depth cues. Monocular depth cues include static monocular depth cues, also called pictorial depth cues [11], and motion parallax [12]. Pictorial depth cues include linear perspective, interposition (occlusion), object sizes, shades and shadows, texture gradients, accommodation and blur, aerial perspective, etc. Motion parallax is the relative movement of images across the retina resulting from the movement of the observer or the translation of objects across the viewer’s field of view. Binocular depth cues come from two space-separated eyes, including convergence and binocular disparity. In the real world, different depth cues are consistent. Human visual systems interpret depth by integrating various depth cues [13-15]. However, depth-cue conflicts occur in many situations. The most typical example is pseudoscope, which was originally invented by Wheatstone [16]. The pseudoscope optical device reverses the relationship between physical depth and binocular disparity by presenting the left eye view image to the right eye and the right eye view to the left eye, vice versa. Therefore, the device provides a scene with binocular depth reversed, while monocular depth cues are veridically preserved. Consequently, monocular and binocular depth cues conflict (reversed), which may cause sickness. Depth-cue conflicts can also occur in S3D viewing with geometric distortions. Several geometric models have been built to predict geometric distortions in S3D. Woods et al. [2] proposed a transfer function from the real (or virtual) world to the S3D world. Using this model, various geometric distortions, such as depth plane curvature (i.e., objects are bent away from the viewer in the periphery), depth non-linearity (i.e., depth differences in the reconstructed world do not match the corresponding depth differences in the original world), and shearing distortion (i.e., objects appear sheared toward the viewer’s head position), were discussed. Masaoka et al. [17] and Yamanoue et al. [18] built geometric models to predict two abnormal perceptions: the puppet-theater effect [19] and the cardboard effect [20]. Gao et al. [4] provided a geometric model and illustrated various S3D distortions caused by four parameter-pair mismatches during the image capture, display, and viewing processes: 1) camera separation vs. eye separation, 2) camera field of view (FOV) vs. screen FOV, 3) camera convergence distance vs. screen distance, and 4) head position vs. display position. In this model, the impact of each paired parameters on S3D geometric distortions was analyzed independently. The model also provided methods to correct the geometric distortions by individually matching the parameter and combining the distortion patterns to compensate for each other so that the overall distortion can be minimized. The geometric models [2, 4, 17, 18] predict geometric distortions by the ray-intersection method [21, 22] that calculates the intersection of two projection lines from the left and the right eye to the corresponding left and right onscreen points. Thus, only the binocular disparity depth cue is considered in these geometric models. However, human visual systems interpret depth by integrating both monocular and binocular depth cues [13-15]. Perceived depth in complex S3D scenes often differs from these geometric predictions based on binocular disparity alone [23]. The geometric models that demonstrate geometric distortions from a third-person perspective without considering monocular depth cues from the viewer’s perspective cannot predict the viewer’s S3D perception accurately [24-27]. Hwang and Peli [3] discussed depth perceptions of camera or object movements in S3D worlds from the viewer’s perspective by defining angular disparity from the visual eccentricities of corresponding retinal position. This angular disparity indicates object depths relative to the angular disparity from the fixation point. Two types of geometric distortions depth plane curvature (caused by the mismatch between capture and display image plane) and shearing distortion (caused by the viewer’s head translations) were analyzed. They proposed that the motions in a distorted virtual 3D space may cause vision-to-vision intra-sensory conflicts that result in VIMS. Hwang and Peli [28] further pointed out that S3D optic flow distortion may be a source of VIMS with the help of the geometric models in [4]. In this paper, the S3D distortions, caused by the four parameter-pair mismatches: 1) camera separation vs. eye separation, 2) camera FOV vs. screen FOV, 3) camera convergence distance vs. screen distance, and 4) head position vs. display position, are analyzed in the viewer’s perspective by combining the geometric model proposed by Gao et al. [4] and retinal disparity model used in Hwang and Peli [3]. The retinal disparity model disentangles the binocular depth cue and monocular depth cues so that geometric distortions in terms of the monocular and the binocular can be analyzed separately.

2 S3D geometric and retinal disparity models

S3D geometric models predict geometric distortions only considering the binocular disparity. The retinal disparity model reconstructs the presented S3D scene based on the corresponding retinal projection on the viewer. Therefore, Combining the geometric model and retinal disparity model allows analyzing both linear perspective (monocular depth cue) and disparity (binocular depth cue) simultaneously.

2.1 S3D geometric model

In S3D, when the original world is captured by parallel-axis with shifted sensor technique (or rendered with asymmetric converging frustums), then displayed on a flat real/virtual screen, the transfer function from the original world to the reconstructed world can be expressed as where O = [X, Y, Z]⊺ is a point in the original world, P = [X, Y, Z]⊺ is the point corresponding to O in the reconstructed world in S3D, T = [T, T, T]⊺ is the offset of head position from the origin, is the ratio of eye separation to camera separation, is the ratio of screen distance to camera convergence distance, and is the ratio of screen FOV to camera FOV in linear scale. The transfer function is also controlled by the camera convergence distance, d. See [4] for the derivation of the geometric model. When the paired parameters are matched, i.e., k = 1, k = 1, k = 1, and T = [0, 0, 0]⊺, the transfer function (1) is simplified as This indicates an orthoscopic displaying condition without geometric distortions.

2.2 Retinal disparity model

In the retinal disparity model [3], the object that a person fixates on is projected onto the fovea in each eye. Visual eccentricity () of a point is defined as an angular distance relative to the fovea. Therefore, the eccentricity of the fixated point becomes zero (); the visual eccentricity of a non-fixated point projected to the retina in left and right eyes are and , respectively, as shown in Fig 1. Note that, since the geometric distortions in x and y dimensions are always the same (see (1)), we will only discuss x and z dimensions (horizontal axis and depth) in the following analyses and we only consider horizontal visual eccentricity.

Fig 1

Viewer perspective retinal projection of objects showing different horizontal visual eccentricities (), based on relative distances from the fixated point.

The visual eccentricities of a point A = [A, A]⊺ (e.g., the far object in Fig 1) in the left and the right eye can be expressed as respectively, where E = [E, E]⊺ and E = [E, E]⊺ are the positions of the left and the right eye, and F = [F, F]⊺ is the position of the fixation. Note that the visual eccentricity of a point located at the left side of the viewer’s visual field will be assigned to a negative value and that at the right visual field will be assigned to a positive value. The depth of a point relative to the fixation target can be estimated by the difference between visual eccentricities projected to the left and the right eye. This retinal angular disparity () is defined as Thus, the angular disparity () represents the binocular depth cue. The orientation of a point related to the viewer’s perspective after binocular fusion can be defined as the visual eccentricity from the cyclopean eye, E = [E, E]⊺, where this imaginary eye is positioned midway between the left and the right eye [29, 30]. Thus, the visual eccentricity of a point to the viewer can be expressed as As illustrated in Fig 2, the far and the near objects project to the same retina locations of the cyclopean eye, including all the monocular depth cues (e.g., perspective, occlusion, shade and shadow, and texture gradient). If solely based on the visual eccentricity of the cyclopean eye, the far and the near objects are indistinguishable. Thus, the visual eccentricity () represents all the monocular depth cues.

Fig 2

Visual eccentricity from the cyclopean eye represents monocular depth cues, including perspective, occlusion, shade and shadow, and texture gradient.

The far and the near objects project to the same retina locations of the cyclopean eye and are indistinguishable based on the monocular depth cues.

Visual eccentricity from the cyclopean eye represents monocular depth cues, including perspective, occlusion, shade and shadow, and texture gradient.

The far and the near objects project to the same retina locations of the cyclopean eye and are indistinguishable based on the monocular depth cues.

2.3 A sample 3D scene structure and illustration from the viewer’s perspective

Fig 3 shows the sample scene used in Hwang and Peli [3] in the real world to illustrate geometric distortions in the viewer’s perspective. Objects 1–9 are arranged to be on an equally spaced rectilinear grid (3 × 3 in xz-plane). Objects are spaced 1m apart in both x and z dimensions. The center object (O5) is the fixation object at [0, 0, 3]⊺. As an example, we consider S3D geometric distorions when a 50mm IPD user watching 3D videos on a 50-inch (1.1m × 0.62m) TV at 2m distance without lateral offset, while the scene was captured by camera separation is 63mm, convergence distance is 3m, and camera FOV is 45° (i.e., k = 0.8, k = 0.67, k = 0.66, T = [0, 0, 0]⊺).

Fig 3

Sample original scene composed of 9 objects (3-blue square, 3-red circle, and 3-green triangle) for the geometric distortion analysis.

Objects 1 to 9 are arranged to be on an equally 1m spaced rectilinear grid (3 × 3 in xz-plane). Figure is adapted from Hwang and Peli [3] with permission.

Sample original scene composed of 9 objects (3-blue square, 3-red circle, and 3-green triangle) for the geometric distortion analysis.

Objects 1 to 9 are arranged to be on an equally 1m spaced rectilinear grid (3 × 3 in xz-plane). Figure is adapted from Hwang and Peli [3] with permission. Fig 4 shows the geometric distortions presented in eccentricity-disparity () coordinates. The solid and dashed lines show the original world (Fig 4a) and reconstructed world (Fig 4b), respectively. In coordinates, eccentricity represents monocular depth cues (i.e., perspective, occlusion, shade and shadow, and texture gradient) to the viewer. The same eccentricity of the original world and reconstructed world indicates identical monocular perception. On the other hand, disparity represents the binocular depth cue, indicating object distance relative to the fixated distance. Thus, geometric distortions from the viewer’s perspective can be computed by the differences between the original world and reconstructed world in coordinates. The eccentricity difference and disparity difference represent geometric distortions perceived by monocular and binocular vision, respectively. The graphs are superimposed in Fig 4c to visualize the eccentricity and disparity difference.

Fig 4

Example of geometric distortions presented in eccentricity-disparity () coordinates.

Example of geometric distortions presented in eccentricity-disparity () coordinates.

Objects 1–9 are arranged to be on an equally spaced rectilinear grid (3 × 3 in x − z plane), as shown in Fig 3. (a) Solid lines represent the eccentricity-disparity in the original world and (b) dashed lines represent the eccentricity-disparity in the reconstructed world in the condition where the separation ratio is k = 0.8, distance ratio is k = 0.67, the FOV ratio is k = 0.66, the convergence distance is d = 3m. (c) Superimposed eccentricity-disparity graph of the original world and reconstructed world. In subsequence figures, the superimposed graphs are presented to aid the visualization of distortions. To visually demonstrate the geometric distortions in monocular perception, the viewer’s perspective (cyclopean eye) of the sample scene in Fig 3 is presented in Fig 5a. Objects 1-9 are replaced with 40cm brown cubes 1–9 that are arranged to be on an equally 1m spaced rectilinear grid (3 × 3 in xz-plane). The fixated point is at (0, 0, 3)[m]. Blue cubes in Fig 5b are the reconstructed (perceived) objects in the S3D world. Blue cubes 4-6 are on the screen at 3m distance. Any difference between the corresponding features of the brown (Fig 5a) and blue cubes (Fig 5b) represents geometric distortions introduced by the mismatches between the capture and display in monocular perception. In subsequent simulations, the captured cubes and reconstructed cubes are superimposed on a single coordinate system to illustrate the distortions between the original world and reconstructed world, as shown in Fig 5c.

Fig 5

The sample scene from the perspective of the cyclopean eye.

The same arrangement as in Fig 3, cubes 1–9 are arranged to be on an equally 1m spaced rectilinear grid (3 × 3 in xz plane). The edges of the cubes are 40cm. (a) Brown cubes are in the original world. (b) Blue cubes are the geometric distortion example (Fig 4b) in the reconstructed world corresponding to the brown cubes. The blue cubes 4-6 are on the screen at 3m distance. (c) Superimposed view of the original world and reconstructed world. In subsequence figures, the superimposed views are presented to aid the visualization of distortions.

The sample scene from the perspective of the cyclopean eye.

3 S3D geometric distortion analysis

In the following sections, we will discuss the isolated effects of parameter mismatches, assuming that the other paired parameters are matched.

3.1 Mismatch of camera-eye separations

This analysis assumes that screen distance and camera convergence distance are the same (k = 1), screen FOV and camera FOV are the same (k = 1), the convergence distance is constant (e.g., d = 3m), head position is at the optimal position (T = [0, 0, 0]⊺), and only camera separation and eye separation are mismatched. The transfer function in (1) can be simplified with the ratio of eye separation to camera separation, k as follows: The visual eccentricities in the original world and reconstructed world are computed by by substituting X and Z in (7). Thus, all visual elements share exactly the same eccentricities in conditions of separation mismatches. Fig 6 shows simulations of cubes from the cyclopean eye in the original world (brown cubes) and the reconstructed world (blue cubes) when eye separation mismatches with camera separation (e.g., s = 63mm, s = 50mm, ). The brown cubes and blue cubes are perfectly matched. Therefore, from monocular perception, the views of the original and the reconstructed worlds are identical.

Fig 6

Effect of separation mismatches from the cyclopean eye (i.e., only monocular depth cues).

The brown cubes are the simulations in the original world and the blue cubes are the simulations in the reconstructed world. They are identical from the view of cyclopean eye.

Effect of separation mismatches from the cyclopean eye (i.e., only monocular depth cues).

The brown cubes are the simulations in the original world and the blue cubes are the simulations in the reconstructed world. They are identical from the view of cyclopean eye. The vast majority of adults have IPDs in the [50mm, 75mm] range, where the mean value of adult IPD is around 63mm [31], which is a recommended value of camera separation for S3D movie making [32]. Therefore, the separation ratio is in the range of [0.8, 1.2]. Fig 7 shows examples with separation mismatches in coordinates. The solid and dashed lines represent the original world () and reconstructed world () when eye separation is smaller than camera separation () and eye separation is larger than camera separation (), respectively. From binocular perception, when eye separation is smaller (or larger) than camera separation (k = 0.8/1.2), objects in front of the screen appear closer (or farther) and objects behind the screen appear farther (or closer). That is, objects appear expended (or compressed) in depth. Importantly, the eccentricities in the reconstructed world (dashed lines) are identical with the original world (solid lines), representing no geometric distortion from monocular perception.

Fig 7

Effect of the separation mismatches.

Effect of the separation mismatches.

(a) Eye separation is smaller than camera separation (k = 0.8), (b) Eye separation is larger than camera separation (k = 1.2), (c) Eye separation and camera separation are reversed (k = −1), called psudoscope. Solid lines represent the eccentricity-disparity in the original world () and dashed lines represent the eccentricity-disparity in the reconstructed world (). A special case of separation mismatch is pseudoscope where the left view is projected to the right eye and the right view is projected to the left eye (i.e., k = −1). From the cyclopean eye’s perspective, the reconstructed world is identical to the original world, as shown in Fig 6. From binocular perception (Fig 7c), onscreen objects stay on the screen, objects in front of the screen appear behind the screen, and objects behind the screen appear in front of the screen. Thus, the binocular depth cue is reversed with respect to the screen for pseudoscope.

3.2 Mismatch of convergence-screen distances

This analysis assumes that only camera convergence distance and screen distance are mismatched (i.e., k = 1, k = 1, and T = [0, 0, 0]⊺) and camera convergence distance is constant (i.e., d = 3m). The transfer function (1) can be simplified the ratio of screen distance to camera convergence distance, k as follows: The visual eccentricities in the original world and reconstructed world are computed by by substituting X and Z in (9). Thus, all visual elements share exactly the same eccentricities in conditions of distance mismatches. Fig 8 shows simulations of cubes from the cyclopean eye in the original world (brown cubes) and the reconstructed world (blue cubes) when screen distance mismatches with convergence distance (e.g., k = 0.33, a case that a user watches 3D videos on a desktop monitor at distance when convergence distance is 3m). The same as the mismatch of camera-eye separations, the brown cubes and blue cubes are perfectly matched. Therefore, from monocular perception, the views of the original and the reconstructed worlds are identical.

Fig 8

Effect of distance mismatches from the cyclopean eye (i.e., only monocular depth cues).

The brown cubes are the simulations in the original world and the blue cubes are the simulations in the reconstructed world. They are identical from the view of the cyclopean eye.

Effect of distance mismatches from the cyclopean eye (i.e., only monocular depth cues).

The brown cubes are the simulations in the original world and the blue cubes are the simulations in the reconstructed world. They are identical from the view of the cyclopean eye. We consider three different 3D movie screen distance options as examples: 1m (desktop monitor viewing distance), 3m (TV screen viewing distance), and 10m (movie theater screen viewing distance). If convergence distance is set as TV screen viewing distance 3m and the 3D movie played on desktop monitor or movie theater screen, the distance ratio k is in the range of [0.33, 3] (i.e., [2−1.58, 21.58] with same logarithmic scale distance from 1). Fig 9 shows examples with distance mismatches in coordinates. The solid lines represent the original world () and dashed line represent the reconstructed world () in two conditions: screen distance is smaller than convergence distance ), and screen distance is larger than convergence distance (). From binocular perception, when screen distance is smaller (or larger) than camera convergence distance (k = 0.33/3), objects appear farther (or closer) in peripheral. Note that, when changing the screen distance, we also change the fixation distance. The eccentricity-disparity structures presented in Fig 9 are depth information relative to the fixation object O5. Again, the eccentricities in the reconstructed world (dashed lines) are identical with the original world (solid lines), representing no geometric distortion from monocular perception.

Fig 9

Effect of distance mismatches.

(a) screen distance is smaller than convergence distance (k = 0.33), and (b) screen distance is larger than convergence distance (k = 3). Solid lines represent the eccentricity-disparity in the original world () and dashed lines represent the eccentricity-disparity in the reconstructed world ().

Effect of distance mismatches.

3.3 Mismatch of camera-screen FOVs

The analysis assumes that only screen FOV and camera FOV are mismatched (i.e., k = 1, k = 1, and T = [0, 0, 0]⊺) and camera convergence distance is constant (i.e., d = 3m), the transfer function (1) can be simplified with the ratio of screen FOV to camera FOV in linear scale, k, as follows: The visual eccentricities in the reconstructed world can be calculated as Except the fixation point F = [F, F]⊺, visual eccentricities in the original world and reconstructed world are different (). The visual eccentricity in the reconstructed world is the same as the situation when we do not change the sizes (in xy-dimension) but extend or compress the depth of objects (in z-dimension), i.e., by eliminating the coefficient in (11). Thus, monocularly, FOV distortions appear depth changes but not size changes. We consider two conditions of screen FOVs: 90° for Google Cardboard and 37° for desktop monitor (e.g., a 20-inch monitor at 66cm distance) when the camera FOV is 60°. Fig 10 shows simulations of cubes from the cyclopean eye in the original world (brown cubes) and the reconstructed world (blue cubes) when FOV ratio is (Fig 10a) and FOV ratio is (Fig 10b). From monocular perception, when screen FOV is smaller than camera FOV (k < 1), objects appear smaller and farther to the viewer. When screen FOV is smaller than camera FOV (k > 1), objects appear larger and closer to the viewer.

Fig 10

Effect of FOV mismatch from the cyclopean eye (i.e., only monocular depth cues) in two conditions: (a) screen FOV is smaller than camera FOV (k = 0.58), and (b) screen FOV is larger than camera FOV (k = 1.73).

The brown cubes are the simulations in the original world and the blue cubes are the simulations in the reconstructed world.

Effect of FOV mismatch from the cyclopean eye (i.e., only monocular depth cues) in two conditions: (a) screen FOV is smaller than camera FOV (k = 0.58), and (b) screen FOV is larger than camera FOV (k = 1.73).

The brown cubes are the simulations in the original world and the blue cubes are the simulations in the reconstructed world. Fig 11 shows examples of FOV mismatches in coordinates. The solid lines represent the original world () and dashed lines represent the reconstructed world () in two conditions: screen FOV is smaller than camera FOV (k = 0.58) and screen FOV is larger than camera FOV (k = 1.73). From binocular perception, when screen FOV is smaller than camera FOV (k = 0.58), objects appear smaller in size and compressed in depth towards the screen. When screen FOV is larger than camera FOV (k = 1.73), the objects appear larger in size and expended in depth away from the screen. Different from separation mismatches and distance mismatches, the eccentricities in the reconstructed world (dashed lines) decrease or increase compared with that in the original world (solid lines), representing geometric distortions from monocular perception.

Fig 11

Effect of FOV mismatch in two conditions: (a) screen FOV is smaller than camera FOV (k = 0.58), and (b) screen FOV is larger than camera FOV (k = 1.73).

Solid lines represent the eccentricity-disparity in the original world () and dashed lines represent the eccentricity-disparity in the reconstructed world ().

Effect of FOV mismatch in two conditions: (a) screen FOV is smaller than camera FOV (k = 0.58), and (b) screen FOV is larger than camera FOV (k = 1.73).

Solid lines represent the eccentricity-disparity in the original world () and dashed lines represent the eccentricity-disparity in the reconstructed world ().

3.4 Mismatch of head positions

This analysis assumes the orthoscopic reproduction of the scene in S3D (k = 1, k = 1, k = 1), and only the viewer’s head translates away from the origin. With these assumptions, the transfer function (1) can be simplified with the amount of head translation, T as follows, When the head translates in the amount of T, the cyclopean eye is at E = [T, T]⊺. The visual eccentricities in the reconstructed world and the original world are different (i.e., ) except for the fixation point F = [F, F]⊺. Fig 12 shows simulations of cubes from the cyclopean eye in the original world (brown cubes) and the reconstructed world (blue cubes) when the viewer’s head translates to the left T = −30cm (Fig 12a) and to the right T = 30cm (Fig 12b). From monocular perception, when the viewer’s head translates to the left or right, the cubes in front of the screen shear to the left or right (e.g., cube 1 in Fig 12a and 12b) and cubes behind the screen shear to the right or left (e.g., cube 7 in Fig 12a and 12b).

Fig 12

Effects of head translations horizontally (x-axis) from the cyclopean eye (i.e., only monocular depth cues).

Effects of head translations horizontally (x-axis) from the cyclopean eye (i.e., only monocular depth cues).

(a) Left (T = −30cm) and (b) right (T = 30cm). The brown cubes are the simulations in the original world and the blue cubes are the simulations in the reconstructed world. The black arrows show the cubes shear to the left and the right. Fig 13 shows simulations of cubes from the cyclopean eye in the original world (brown cubes) and the reconstructed world (blue cubes) when the viewer’s head translates to the backward T = −30cm (Fig 13a) and to the forward T = 30cm (Fig 13b). From monocular perception, when the viewer’s head moves backward or forward, the cubes are expanded away from the screen or compressed towards the screen (e.g., cube 1, 7 in Fig 13(a) and 13(b)).

Fig 13

Effects of head translations along the depth (z-axis) from the cyclopean eye (i.e., only monocular depth cues).

Effects of head translations along the depth (z-axis) from the cyclopean eye (i.e., only monocular depth cues).

(a) Backward (T = −30cm) and (d) forwards (T = 30cm). The brown cubes are the simulations in the original world and the blue cubes are the simulations in the reconstructed world. The red arrows show the cubes become closer and farther, representing compression or expansion in S3D space. Fig 14 shows examples of head translations in coordinates. The solid lines represent the original world () in four conditions: the head translates left (T = −30cm), the head translates right (T = 30cm), the head translates backward (T = −30cm), and the head translates forward (T = 30cm). The dashed lines represent the reconstructed world () in the four conditions. When the head translates left (T = −30cm), objects in front of the screen are shifted left (eccentricity differences of O1, O2, and O3 are negative) and objects behind the screen are shifted right (eccentricity differences of O7, O8, and O9 are positive). When the head translates right (T = 30cm), objects in front of the screen are shifted right (eccentricity differences of O1, O2, and O3 are positive) and objects behind the screen are shifted left (eccentricity differences of O7, O8, and O9 are negative). Note that, Fig 14a and 14b show the same effect as Fig 10 in [3] that has errors and was corrected in [33].

Fig 14

Effects of head translations in four conditions: The head translates (a) left (T = −30cm), (b) right (T = 30cm), (c) backward (T = −30cm), and (d) forwards (T = 30cm).

Effects of head translations in four conditions: The head translates (a) left (T = −30cm), (b) right (T = 30cm), (c) backward (T = −30cm), and (d) forwards (T = 30cm).

Solid lines represent the eccentricity-disparity in the original world () and dashed lines represent the eccentricity-disparity in the reconstructed world (). Horizontal arrows show eccentricity differences and vertical arrows show disparity differences. When the head translates backward (T = −30cm), objects appear compressed toward the screen in depth (disparity differences of O1, O2, and O3 are negative and O7, O8, and O9 are positive). When the head translates forward (T = 30cm), objects appear expanded away from the screen (disparity differences of O1, O2, and O3 are positive and O7, O8, and O9 are negative). Overall, onscreen objects have the same eccentricities and disparities in the original and reconstructed world. Fixated objects remain at zero eccentricity with head translations. In terms of perception, objects appear to always follow head movements.

4 Discussion

The mismatches of camera-eye separations and convergence-screen distances in S3D do not change the monocular depth cues. However, binocularly, these mismatches result in compression or expansion of objects in depth. Pseudoscope on S3D displays is a special case of the mismatch between the camera and the eye separation, where the left and the right view are projected to the right and the left eye, respectively (k = −1). The binocular disparity depth cue is reversed with respect to the screen while monocular depth cues present veridical depth information. Therefore, these mismatches result in depth-cue conflicts between the monocular and binocular. The mismatch of camera-screen FOVs, i.e., the screen size is too small or too large, results in scaling of objects in size but without changing the distance of objects on the screen. Monocularly, the depth perception may depend on whether the viewer is familiar with the objects. For familiar objects, minification or magnification of objects increases or decreases the distance judgment, respectively [34, 35]. As a result, object distances estimated from monocular and binocular depth cues are inconsistent. For unfamiliar objects, the viewer does not have any prior of the sizes of objects and may not discern any depth cue conflicts. As discussed in Section ‘Distortion-free scaled reproduction’ and ‘Correct geometric distortions’ of [4], this provides an approach to eliminate or compensate geometric distortions in S3D by adjusting different parameter pairs, so that the S3D world is only scaled from the original world but without distortions. Under the conditions when the viewer translates the head away from the origin, i.e., left and right or backward and forward, objects in front of the screen are sheared in the same direction as the head translation and objects behind the screen are sheared in the opposite direction. It appears as if objects follow the movements of the viewer’s head. This is because S3D displays can only provide the views captured by the cameras. Thus, the depth cue of motion parallax that exists in real life is missing, which results in a strong perception of object rotation following the viewer’s movements [3]. Therefore, the viewer’s head translations and the absence of motion parallax conflict. Visually induced motion sickness (VIMS) involves motions. If the viewer watches a stationary S3D scene and stays still, geometric distortions with depth-cue conflicts may not cause any motion sickness symptoms. In a dynamic scene, for instance, when objects move towards the user in a distorted S3D space where the convergence distance is larger than screen distance (Section 3.2), monocular depth cues remain veridical while the binocular depth cue suggests objects at near or far distances seem to approach the viewer slower or faster than the speed expected. Depth cue conflicts with motions may cause VIMS in S3D, which can be explained by the sensory rearrangement theory [9]. Gao et al. [4] proposed a geometric model for S3D and is the most related work. Gao et al. [4] analyzed the geometric distortions only based on the binocular depth cue and left a gap between geometric distortions and VIMS in S3D, i.e., the reason why geometric distortions may cause VIMS was not explicitly explained. This work bridges the gap by analyzing depth-cue conflicts and distinguishes from Gao et al. [4] in three aspects. First, we introduce a retinal disparity model to analyze geometric distortions in the eccentricity-disparity () coordinates. The angular disparity () represents the binocular depth cue and the visual eccentricity () represents the monocular depth cues. As demonstrated in the horizontal and the vertical axis of Figs 4, 7, 9, 11 and 14, the geometric distortions in terms of the monocular and the binocular are disentangled and can be discussed separately. Second, we simulate geometric distortions from the cyclopean eye to visually demonstrate the monocular perception, as illustrated in Figs 5, 6, 8, 10, 12 and 13. Third and most importantly, with the help of the retinal disparity model and the visualization technique, the inconsistency between the monocular and binocular depth cues can be clearly analyzed, which bridges the gap between geometric distortions and VIMS in S3D. In the study of Ichikawa and Egusa [36], subjects with left-right reversing spectacles (i.e., pseudoscope) were all had serious sickness on the first day. Even though depth-cue conflicts in other geometric distortions analyzed above may not be as severe as in pseudoscope, they may result in motion sickness to viewers in a dynamic scene as well. Psychophysical experiments need to be conducted to examine the cause-and-effect of depth-cue conflicts and motion sickness in the future. Shimojo and Nakajima [37] conducted another pseudoscope experiment where subjects wore left-right reversing spectacles continuously for 9 days. On day 3 of the wearing period, the relation between the direction of the disparity of line-contoured stereograms (LCSs) and the direction of perceived depth was reversed completely. In [36], six subjects wore left-right reversing spectacles continuously for 10 or 11 days. The relation between the direction of physical depth (convex or concave) and the direction of binocular disparity (crossed or uncrossed) was reversed. Also, the subjects’ sickness gradually disappeared on about the third day of the wearing period. These studies suggest that binocular disparity depth cue is adaptable to the environment. For 3D producers, e.g., 3D games creators, it may be helpful to present some 3D demonstrations first before the 3D content so that the viewers can gradually get used to the 3D environment and reduce the level of motion sickness. Hands et al. [38] investigated the perception of S3D when viewed from an oblique angle using a canonical-form task in which subjects were asked to report their perception of cubes rendered for perpendicular and oblique viewing. The study showed the lack of difference between S3D and 2D when viewing a familiar object at an oblique viewing angle as large as 20°. A compensation mechanism could work by recovering the true center of projection (e.g., from cues of the vanishing point and screen slant) and interpreting the oblique retinal images accordingly so that content rendered for a frontoparallel screen appears veridical even when viewed obliquely [39, 40]. Another compensation mechanism could work by the largely unaffected monocular depth cues in a distorted S3D space, as a result of the comparison between [38] and [41]. In [41], objects appeared distorted when viewed obliquely in S3D (i.e., compensation was abolished) since the stimuli of wire-frame hinges in [41] has weaker monocular depth cues compared to the stimuli of texture-solid cubes in [38]. For typical applications of S3D displays in entertainment, S3D content includes a large amount of monocular depth cues and appears relatively veridical when viewed from an oblique angle, which helps explain why S3D content is popular and effective commercially. However, relatively veridical perception does not mean without any issues since people complain about VIMS during or after S3D viewing. Depth-cue conflicts between the monocular and the binocular in a dynamic scene or the absence of motion parallax caused by user-initiated movements could explain the VIMS complaint and the reason why S3D could not spread any further. In this paper, we assume no viewer’s head rotations relative to the screen. This assumption does hold if the viewer sees S3D imagery in head-mounted displays, or the viewer’s head stays up relative to the screen. We also assumed that camera image plane (i.e., the image plane perpendicular to the camera axes) and the screen image plane (i.e., the image plane on which the screen is located) are matched. As pointed by [42], when the viewer’s head is rotated about a vertical axis relative to the stereo display (yaw rotation), the head is rotated about a forward axis (roll rotation), and stereo images captured by convergence-axis but displayed on a flat screen, additional geometric distortions and vertical disparities are introduced. Note that, the vertical screen disparity and eyestrain caused by the vertical screen disparity [43] and vergence-accommodation conflict [44] are out of the scope of this paper. 1 Jul 2020 PONE-D-20-04436 Stereoscopic 3D geometric distortions analyzed from the viewer's point of view PLOS ONE Dear Dr. Gao, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. In particular, Reviewer #1 raises concerns that the present work does not sufficiently extend beyond the authors' previous work, and does not provide enough analysis of the model and implications thereof (perhaps through simulations, if not experiments), but rather seems largely an exercise of explaining the math behind such a hypothetical analysis without following through on it. I also agree with the Reviewer's point about the focus on eccentricity at the expense of explicitly dealing with the full variety of monocular depth cues. These concerns must be addressed in order for the manuscript to fulfill publication criteria #1, 3, and 4, listed here: https://journals.plos.org/plosone/s/criteria-for-publication . Please submit your revised manuscript by Aug 15 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Christopher R. Fetsch Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed: Correcting geometric distortions in stereoscopic 3D imaging - https://doi.org/10.1371/journal.pone.0205032 In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed. 3. Thank you for stating the following in the Acknowledgments Section of your manuscript: 'This work was supported by the National Natural Science Foundation of China 331 (61901259) and China Postdoctoral Science Foundation (BX2019208).' We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 'The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.' Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution. State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.” If any authors received a salary from any of your funders, please state which authors and which funders. If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper presents an analysis of geometric distortions arising in stereoscopic displays of 3D content (S3D). According to the authors, the aim of the analysis is to identify potential triggers of visually induced motion sickness (VIMS) in S3D. They propose that inconsistencies between binocular (disparity) and monocular depth cues are potential triggers. While I agree that such an analysis is warranted, in my view, the manuscript does not live up to the aims stated. The paper essentially consists of the presentation of a mathematical model that aims to reconstruct a 3D scene as seen by the cyclopean eye from images captured by a pair of cameras. No experimental data or statistical analysis are presented. I should point out that the mathematical analysis appears sound although it uses tools that are standard in 3D imaging and that are sufficiently described in the authors’ previous papers. My main problem is that the work appears unfinished for two reasons. First, it claims that the model simulates a broad range of monocular depth cues (“linear perspective, interposition (occlusion), object sizes, shades and shadows, texture gradients, accommodation and blur, aerial perspective, etc.”), the only metric derived for their characterization is “eccentricity”. No explanation or justification is provided how this single metric can account for all the monocular cues listed. As it turns out, they in fact measure eccentricity on the retina of the imaginary cyclopean eye. The second major problem is that the results of the analysis appear trivial at least as far as they are presented in the manuscript. Eventually, the authors do not discuss what is the novelty of their results as compared to their or other authors’ previous work in relation to the stated aims of the study. Indeed, except for a brief derivation of the effects of a simple horizontal head translation, no analysis of dynamic distortions is provided that might occur during scene motion. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 24 Aug 2020 Manuscript ID.: PONE-D-20-04436 Title: Stereoscopic 3D geometric distortions analyzed from the viewer's point of view We thank the academic editor and reviewers for their constructive comments that helped to greatly improve the contents and presentations of this paper. We have revised the manuscript according to their suggestions. Point-by-point responses to the comments are listed below. We hope that the revised version of the manuscript and the responses in this letter are satisfactory. The Highlight changes of the manuscript can be found behind this response. (1) AE: The present work does not sufficiently extend beyond the authors' previous work, and does not provide enough analysis of the model and implications thereof (perhaps through simulations, if not experiments), but rather seems largely an exercise of explaining the math behind such a hypothetical analysis without following through on it. Reviewer: The second major problem is that the results of the analysis appear trivial at least as far as they are presented in the manuscript. Eventually, the authors do not discuss what is the novelty of their results as compared to their or other authors’ previous work in relation to the stated aims of the study. Answer 1: We newly add a discussion about the difference between Gao et al. [4] and this paper in the fourth paragraph of Discussion as follows: Gao et al. [4] proposed a geometric model for S3D and is the most related work. Gao et al. [4] analyzed the geometric distortions only based on the binocular depth cue and left a gap between geometric distortions and VIMS in S3D, i.e., the reason why geometric distortions may cause VIMS was not explicitly explained. This work bridges the gap by analyzing the depth-cue conflict and distinguishes from Gao et al. [4] in three aspects. First, we introduce a retinal disparity model to analyze geometric distortions in the eccentricity-disparity ( Ec − D ) coordinates. The angular disparity ( D ) represents the binocular depth cue and the visual eccentricity ( Ec ) represents the monocular depth cues. As demonstrated in the horizontal and the vertical axis of Figs 4, 7, 9, 11, 14, the geometric distortions in terms of the monocular and the binocular are disentangled and can be discussed separately. Second, we simulate geometric distortions from the cyclopean eye to visually demonstrate the monocular perception, as illustrated in Figs 5, 6, 8, 10, 12, 13. Third and most importantly, with the help of the retinal disparity model and the visualization technique, we can see the inconsistency between the monocular and binocular depth cues to build a connection between geometric distortions and VIMS in S3D. (2) It claims that the model simulates a broad range of monocular depth cues (“linear perspective, interposition (occlusion), object sizes, shades and shadows, texture gradients, accommodation and blur, aerial perspective, etc.”), the only metric derived for their characterization is “eccentricity”. No explanation or justification is provided how this single metric can account for all the monocular cues listed. As it turns out, they in fact measure eccentricity on the retina of the imaginary cyclopean eye. Answer 2: We newly add a figure in Fig 2. As illustrated in the figure, the far and the near objects project to the same retina locations of the cyclopean eye, including all the monocular depth cues (e.g., perspective, occlusion, shade and shadow, and texture gradient). If solely based on the visual eccentricity of the cyclopean eye, the far and the near objects are indistinguishable. Thus, the visual eccentricity (Ec) can account for all the monocular depth cues. (3) Indeed, except for a brief derivation of the effects of a simple horizontal head translation, no analysis of dynamic distortions is provided that might occur during scene motion. Answer 3: We newly added paragraph in Discussion as follows: Visually induced motion sickness (VIMS) involves motions. If the viewer watches a stationary S3D scene and stays still, geometric distortions with depth-cue conflicts may not cause any motion sickness symptoms. In a dynamic scene, for instance, when objects move towards the user in a distorted S3D space where the convergence distance is larger than screen distance (Section 3.2), monocular depth cues remain veridical while the binocular depth cue suggests objects at near or far distances seem to approach the viewer slower or faster than the speed expected. Depth cue conflicts with motions may cause VIMS in S3D, which can be explained by the sensory rearrangement theory [9] Submitted filename: Response to Reviewers_Final.pdf Click here for additional data file. 4 Sep 2020 PONE-D-20-04436R1 Stereoscopic 3D geometric distortions analyzed from the viewer's point of view PLOS ONE Dear Dr. Gao, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The manuscript has not been re-reviewed by the original Reviewer #1, but a new reviewer (#2) has provided additional comments and suggestions, some of which overlap with the original comments and thus imply that these issues have not fully been addressed. Many of the new comments seem addressable with minor textual changes to the manuscript, but others are more substantial (e.g. " Addition of a formula" in comment 4.b) and would ideally need to be re-reviewed by this reviewer. The suggestion under 4.a) to add experimental evidence is also a good one but this would not be required for acceptance. And although it is clear from both reviewers' comments that experts in the field would consider this an 'incremental' contribution, that alone is not sufficient to preclude acceptance according to PLOS ONE criteria. Please submit your revised manuscript by Oct 19 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Christopher R. Fetsch Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: N/A ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: The paper provides a model of disparity and spatial eccentricity distortion as a result of mismatch between capture and display parameters such as field of view, stereoscopic convergence, viewing distance and position. The paper further demonstrates effects of such distortions in simple example scenarios. 1) Paper strengths. a) The paper writing is clear. I have not reviewed the previous revision but based on the edits it has notably improved since then. b) The figures for individual distortions are clear and serve as a nice illustration of consequences for different mismatch scenarios. 2) The abstract " In this paper, we analyze the geometric distortions from the viewer’s perspective, so that both monocular and binocular depth cues are considered." a) The abstract distinguishes between 1st and 3rd person perspective. It is unclear what is meant. Do the authors refer to 1st person and 3rd person games? I assume presence of most pictorial cues -- such as texture gradient or occlusions -- is not removed by change of camera perspective. I assume the authors refer to the fact that they evaluate eccentricity from wrt. cyclopean eye. They should state this clearly already in the abstract. b) Further, the claim of "considering monocular cues" should be specified more accurately. The reader's impression is that the authors model magnitude and distortion of monocular cues (the authors name linear perspective, occlusion, and shadows in the abstract) yet no such analysis is done. 3) Incremental contributions. a) The paper is an extension of [4]. It uses the very same model of distortion (Eq. 1) but additionally evaluates projection distortion in spatial coordinates. Since this is a minor technical change I consider it an incremental contribution. The rest of the paper is just evaluation of the model for different input parameters. Most of the effects are fairly obvious (eg. he objects will look smaller on a smaller screen = Fig. 11). The most notable observation seems to the distortion of optical flow in non-orthostereoscopic scenario. However, similar observations has already been done in [3] and more recently in Hwang, Alex D., and Eli Peli. "Stereoscopic Three-dimensional Optic Flow Distortions Caused by Mismatches Between Image Acquisition and Display Parameters." Journal of Imaging Science and Technology 63, no. 6 (2019): 60412-1. b) The Figure 14 seems to show the same effect as Figure 5 in [3]. Please clarify it the same model as in [3] could be used to predict results presented here. 4) Perceptual considerations. a) The model in Eq. 1 is purely geometric and has no connection to the HVS. It is not obvious which of these distortions are noticeable and under which conditions. Not all monocular cues are affected by the image scaling - occlusion, aerial perspective stay completely unchanged, while the effect on texture gradient or shading would require deeper analysis. The authors should clarify which monocular cues are they modeling and evaluate why they think the eccentricity scaling is a good model for them. It seems to me that only "object size" can trivially be linked to the scaling of FOV. However, objects are quite commonly scaled even in 2D video as large objects would not easily fit small screens of phones, yet people do not perceive this as a distortion. Given the aforementioned incremental contribution in the model, adding couple of user studies to verify/quantify perceptual visibility of the predicted effects would significantly solidify the paper. b) The model predicts absolute distortion of both disparity and eccentricity separately and associates deviation in each of these dimensions with depth distortion and deviation from orthostereoscopic viewing. However, I would argue that with care (scaling multiple parameters at once), one should be able to scale the 3D content such that its smaller version is displayed on a smaller screen while maintaining orthostereoscopic viewing conditions consistent with such hypothetical miniature object. At least for unfamiliar objects, there should be no way for viewer to discern such projection. Addition of a formula that would tie the distortions in both domains together and express the anisotropy related to their mismatch could be another valuable contribution distinguishing this paper from previous work. 5) Minor issues: - L111: "If solely based on the visual eccentricity of the cyclopean eye, the far and the near objects are indistinguishable. Thus, the visual eccentricity (Ec) represents all the monocular depth cues." - This sentence is unclear. I assume that any projection to any (virtual) eye will contain all the monocular cues. How is the cyclopean eye special? - It is not clear how discussion on L328 -- 360 ties to the results of this paper. It should probably be moved to related work/background. 6) Summary The paper demonstrates use of existing model of projection distortion for evaluation of 2D distortion in visual field of view. The manuscript contains intuitive visualizations for each of the various distortion scenarios which could be useful as a quick reference for the way how capture and display mismatch affect the stereoscopic projection. However, given the very incremental contribution I would recommend the authors to supplement experimental validation of the distortion visibility. The main consequence of the work proposed in the abstract is that "The inconsistency of depth cues in a dynamic scene may be a source of visually induced motions sickness." Perhaps testing this hypothesis could provide additional value for the reader and show that the model is useful in predicting this problem. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 6 Sep 2020 Manuscript ID.: PONE-D-20-04436 Title: Stereoscopic 3D geometric distortions analyzed from the viewer's point of view We thank the academic editor and reviewers for their constructive comments that helped to greatly improve the contents and presentations of this paper. We have revised the manuscript according to their suggestions. Point-by-point responses to the comments are listed below. We hope that the revised version of the manuscript and the responses in this letter are satisfactory. The Highlight changes of the manuscript can be found behind this response. (1) The abstract distinguishes between 1st and 3rd person perspective. It is unclear what is meant. Do the authors refer to 1st person and 3rd person games? I assume presence of most pictorial cues -- such as texture gradient or occlusions -- is not removed by change of camera perspective. I assume the authors refer to the fact that they evaluate eccentricity from wrt. cyclopean eye. They should state this clearly already in the abstract. Further, the claim of "considering monocular cues" should be specified more accurately. The reader's impression is that the authors model magnitude and distortion of monocular cues (the authors name linear perspective, occlusion, and shadows in the abstract) yet no such analysis is done. Answer 1: We modified the abstract as follows: “In previous work of S3D geometric models, geometric distortions have been analyzed from a third-person perspective based on binocular depth cue (i.e., binocular disparity), where monocular depth cues (e.g., linear perspective, occlusion, and shadows) were not considered.” -> “In previous works of S3D geometric models, geometric distortions have been analyzed from a third-person perspective based on the binocular depth cue (i.e., binocular disparity). A third-person perspective is different from what the viewer sees since monocular depth cues (e.g., linear perspective, occlusion, and shadows) from different perspectives are different. “ (2) The paper is an extension of [4]. It uses the very same model of distortion (Eq. 1) but additionally evaluates projection distortion in spatial coordinates. Since this is a minor technical change I consider it an incremental contribution. The rest of the paper is just evaluation of the model for different input parameters. Most of the effects are fairly obvious (eg. the objects will look smaller on a smaller screen = Fig. 11). The most notable observation seems to the distortion of optical flow in non-orthostereoscopic scenario. However, similar observations has already been done in [3] and more recently in Hwang, Alex D., and Eli Peli. "Stereoscopic Three-dimensional Optic Flow Distortions Caused by Mismatches Between Image Acquisition and Display Parameters." Journal of Imaging Science and Technology 63, no. 6 (2019): 60412-1. Answer 2: This paper distinguishes from Gao et al. [4] as discussed in Discussion: Gao et al. [4] proposed a geometric model for S3D and is the most related work. Gao et al. [4] analyzed the geometric distortions only based on the binocular depth cue and left a gap between geometric distortions and VIMS in S3D, i.e., the reason why geometric distortions may cause VIMS was not explicitly explained. This work bridges the gap by analyzing the depth-cue conflict and distinguishes from Gao et al. [4] in three aspects. First, we introduce a retinal disparity model to analyze geometric distortions in the eccentricity-disparity ( Ec − D ) coordinates. The angular disparity ( D ) represents the binocular depth cue and the visual eccentricity ( Ec ) represents the monocular depth cues. As demonstrated in the horizontal and the vertical axis of Figs 4, 7, 9, 11, 14, the geometric distortions in terms of the monocular and the binocular are disentangled and can be discussed separately. Second, we simulate geometric distortions from the cyclopean eye to visually demonstrate the monocular perception, as illustrated in Figs 5, 6, 8, 10, 12, 13. Third and most importantly, with the help of the retinal disparity model and the visualization technique, the inconsistency between the monocular and binocular depth cues can be clearly analyzed, which bridges the gap between geometric distortions and VIMS in S3D.. Note that, for the mismatch of camera-eye separations and the mismatch of convergence-screen distances (Section 3.1 and 3.2), monocular depth cues are not changed, which is not very obvious if without the help of the retinal disparity model and the visualization tool in Fig 6 and 8. We add the citation of Hwang and Peli (2019) that focuses on optic flow distortions while this paper focuses on depth cue conflicts. (3) Figure 14 seems to show the same effect as Figure 5 in [3]. Please clarify it the same model as in [3] could be used to predict results presented here. Answer 3: We newly add a sentence in Section 3.4 as follows Note that, Fig 14a and 14b show the same effect as Figure 10 in [3] that has errors and was corrected in [33]. (4) The model in Eq. 1 is purely geometric and has no connection to the HVS. It is not obvious which of these distortions are noticeable and under which conditions. Not all monocular cues are affected by the image scaling - occlusion, aerial perspective stay completely unchanged, while the effect on texture gradient or shading would require deeper analysis. The authors should clarify which monocular cues are they modeling and evaluate why they think the eccentricity scaling is a good model for them. It seems to me that only "object size" can trivially be linked to the scaling of FOV. However, objects are quite commonly scaled even in 2D video as large objects would not easily fit small screens of phones, yet people do not perceive this as a distortion. Given the aforementioned incremental contribution in the model, adding couple of user studies to verify/quantify perceptual visibility of the predicted effects would significantly solidify the paper. Answer 4: The geometric model in Eq. 1 is derived from the binocular disparity, which is one of the most important binocular depth cues for HVS. Depth perception in a 3D space involves both monocular and binocular depth cues and HVS interprets depth by integrating various depth cues [13-15]. We agree that geometric distortions predicted by the geometric model may not be the same as what the viewer perceives since monocular depth cues are not considered for the geometric model. That is why we introduce the retinal disparity model to analyze geometric models from the viewer’s perspective. As illustrated in Fig 2, the far and the near objects project to the same retina locations of the cyclopean eye, including all the monocular depth cues (e.g., perspective, occlusion, shade and shadow, and texture gradient). If solely based on the visual eccentricity of the cyclopean eye, the far and the near objects are indistinguishable. Thus, the visual eccentricity (Ec) can account for all the monocular depth cues. We are not trying to model each monocular depth cue but the visual eccentricity of the cyclopean eye can include all the monocular depth cues. We agree that object scaling itself may not be perceived as distortions. We newly add discussion in the second paragraph of Discussion. Importantly, in this paper, we mainly focus on depth cue conflicts, i.e., the inconsistency between the monocular and the binocular depth cues. The main contributions of this paper can be summarized as follows: 1) We introduce a retinal disparity model to analyze geometric distortions in the eccentricity-disparity ( Ec − D ) coordinates. The angular disparity ( D ) represents the binocular depth cue and the visual eccentricity ( Ec ) represents the monocular depth cues. As demonstrated in the horizontal and the vertical axis of Figs 4, 7, 9, 11, 14, the geometric distortions in terms of the monocular and the binocular are disentangled and can be discussed separately. 2) We simulate geometric distortions from the cyclopean eye to visually demonstrate the monocular perception, as illustrated in Figs 5, 6, 8, 10, 12, 13. 3) With the help of the retinal disparity model and the visualization technique, the inconsistency between the monocular and binocular depth cues can be clearly analyzed, which bridges the gap between geometric distortions and VIMS in S3D. User studies are necessary to verify the hypothesis that depth cue conflicts in a dynamic scene may cause VIMS. Considering the length of this paper and what this paper mainly focusses on, user study is not covered by this paper and will be conducted in the future. (5) The model predicts absolute distortion of both disparity and eccentricity separately and associates deviation in each of these dimensions with depth distortion and deviation from orthostereoscopic viewing. However, I would argue that with care (scaling multiple parameters at once), one should be able to scale the 3D content such that its smaller version is displayed on a smaller screen while maintaining orthostereoscopic viewing conditions consistent with such hypothetcal miniature object. At least for unfamiliar objects, there should be no way for viewer to discern such projection. Addition of a formula that would tie the distortions in both domains together and express the anisotropy related to their mismatch could be another valuable contribution distinguishing this paper from previous work. Answer 5: We appreciate this discovery by the reviewer. We revised the Discussion as follows: The mismatch of camera-screen FOVs, i.e., the screen size is too small or too large, results in scaling of objects in size but without changing the distance of objects on the screen. Monocularly, the depth perception may depend on whether the viewer is familiar with the objects. For familiar objects, minification or magnification of objects increases or decreases the distance judgment, respectively [34] [35]. As a result, object distances estimated from monocular and binocular depth cues are inconsistent. For unfamiliar objects, the viewer does not have any prior of the sizes of objects and may not discern any depth cue conflicts. As discussed in Section `Distortion-free scaled reproduction' and `Correct geometric distortions' of [4], this provides an approach to eliminate or compensate geometric distortions in S3D by adjusting different parameter pairs, so that the S3D world is only scaled from the original world but without distortions. Note that, the compensation between different parameter mismatches has been discussed in the section of “Correct geometric distortions” in Gao et al. [4]. This paper is not trying to correct or remove geometric distortions but to point out that depth cue conflicts in a dynamic scene may be a source of VIMS. (6) L111: "If solely based on the visual eccentricity of the cyclopean eye, the far and the near objects are indistinguishable. Thus, the visual eccentricity (Ec) represents all the monocular depth cues." - This sentence is unclear. I assume that any projection to any (virtual) eye will contain all the monocular cues. How is the cyclopean eye special? Answer 6: We discuss the cyclopean eye because it can represent the monocular view of the viewer. Discussing any other virtual eye is meaningless since it is not related to what the viewer sees. For binocular vision, the far and the near objects are distinguishable because of the binocular disparity. (7) It is not clear how discussion on L328 -- 360 ties to the results of this paper. It should probably be moved to related work/background. Answer 7: We would like to clarify that L328 – 360 are in Discussion not in Results. In the paragraph of “Shimojo and Nakajima [37] …”, they conducted a pseudoscope experiment, which is one of the cases in this paper. This study suggests that The binocular disparity depth cue is adaptable to the environment. For 3D producers, e.g., 3D games creators, it may be helpful to present some 3D demonstrations first before the 3D content so that the viewers can gradually get used to the 3D environment and reduce the level of motion sickness. In the paragraph of “Hands et al. [38]…”, we discuss that For typical applications of S3D displays in entertainment, S3D content includes a large amount of monocular depth cues and appears relatively veridical when viewed from an oblique angle, which helps explain why S3D content is popular and effective commercially. However, relatively veridical perception does not mean without any issues since people complain about VIMS during or after S3D viewing. Depth-cue conflicts between the monocular and the binocular in a dynamic scene or the absence of motion parallax caused by user-initiated movements could explain the VIMS complaint and the reason why S3D could not spread any further. The reason that we put them in Discussion is that we are trying to discuss the connection between their results and this paper. We believe it is clearer to discuss them after introducing our work. (8) The paper demonstrates use of existing model of projection distortion for evaluation of 2D distortion in visual field of view. The manuscript contains intuitive visualizations for each of the various distortion scenarios which could be useful as a quick reference for the way how capture and display mismatch affect the stereoscopic projection. However, given the very incremental contribution I would recommend the authors to supplement experimental validation of the distortion visibility. The main consequence of the work proposed in the abstract is that "The inconsistency of depth cues in a dynamic scene may be a source of visually induced motions sickness." Perhaps testing this hypothesis could provide additional value for the reader and show that the model is useful in predicting this problem. Answer 8: We thank the reviewer for pointing out the necessity to conduct psychophysics experiments to test our hypothesis. We believe psychophysics experiments should be put in an independent paper considering the length of the paper and what this paper mainly focuses on. Submitted filename: Response to Reviewers2.pdf Click here for additional data file. 1 Oct 2020 Stereoscopic 3D geometric distortions analyzed from the viewer's point of view PONE-D-20-04436R2 Dear Dr. Gao, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Christopher R. Fetsch Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 7 Oct 2020 PONE-D-20-04436R2 Stereoscopic 3D geometric distortions analyzed from the viewer’s point of view Dear Dr. Gao: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Christopher R. Fetsch Academic Editor PLOS ONE

22 in total

9. Instability of the perceived world while watching 3D stereoscopic imagery: A likely source of motion sickness symptoms.

Authors: Alex D Hwang; Eli Peli
Journal: Iperception Date: 2014-10-07

10. Correction to Figures: A Reply to Hwang and Peli (2014).

Authors: Zhongpai Gao
Journal: Iperception Date: 2017-08-10

Stereoscopic 3D geometric distortions analyzed from the viewer's point of view.

1 Introduction

2 S3D geometric and retinal disparity models

2.1 S3D geometric model

2.2 Retinal disparity model

Visual eccentricity from the cyclopean eye represents monocular depth cues, including perspective, occlusion, shade and shadow, and texture gradient.

2.3 A sample 3D scene structure and illustration from the viewer’s perspective

Sample original scene composed of 9 objects (3-blue square, 3-red circle, and 3-green triangle) for the geometric distortion analysis.

Example of geometric distortions presented in eccentricity-disparity () coordinates.

The sample scene from the perspective of the cyclopean eye.

3 S3D geometric distortion analysis

3.1 Mismatch of camera-eye separations

Effect of separation mismatches from the cyclopean eye (i.e., only monocular depth cues).

Effect of the separation mismatches.

3.2 Mismatch of convergence-screen distances

Effect of distance mismatches from the cyclopean eye (i.e., only monocular depth cues).

Effect of distance mismatches.

3.3 Mismatch of camera-screen FOVs

Effect of FOV mismatch from the cyclopean eye (i.e., only monocular depth cues) in two conditions: (a) screen FOV is smaller than camera FOV (k = 0.58), and (b) screen FOV is larger than camera FOV (k = 1.73).

Effect of FOV mismatch in two conditions: (a) screen FOV is smaller than camera FOV (k = 0.58), and (b) screen FOV is larger than camera FOV (k = 1.73).

3.4 Mismatch of head positions

Effects of head translations horizontally (x-axis) from the cyclopean eye (i.e., only monocular depth cues).

Effects of head translations along the depth (z-axis) from the cyclopean eye (i.e., only monocular depth cues).

Effects of head translations in four conditions: The head translates (a) left (T = −30cm), (b) right (T = 30cm), (c) backward (T = −30cm), and (d) forwards (T = 30cm).

4 Discussion

1. Optimal integration of texture and motion cues to depth.

2. Judgments of size and distance in photographs.

3. Stereoscopic 3-D content appears relatively veridical when viewed from an oblique angle.

4. Visual Discomfort with Stereo 3D Displays when the Head is Not Upright.

5. How is depth perception affected by long-term wearing of left-right reversing spectacles?

6. Pictorial depth cues: a new slant.

7. Motion parallax as an independent cue for depth perception.

8. Perception of 3-D Layout in Stereo Displays.

9. Instability of the perceived world while watching 3D stereoscopic imagery: A likely source of motion sickness symptoms.

10. Correction to Figures: A Reply to Hwang and Peli (2014).