Literature DB >> 35139127

Parallax attention stereo matching network based on the improved group-wise correlation stereo network.

Xuefei Yu¹, Jinan Gu¹, Zedong Huang¹, Zhijie Zhang¹.

Abstract

Recent stereo matching methods, especially end-to-end deep stereo matching networks, have achieved remarkable performance in the fields of autonomous driving and depth sensing. However, state-of-the-art stereo algorithms, even with the deep neural network framework, still have difficulties at finding correct correspondences in near-range regions and object edge cues. To reinforce the precision of disparity prediction, in the present study, we propose a parallax attention stereo matching algorithm based on the improved group-wise correlation stereo network to learn the disparity content from a stereo correspondence, and it supports end-to-end predictions of both disparity map and edge map. Particular, we advocate for a parallax attention module in three-dimensional (disparity, height and width) level, which structure ensures high-precision estimation by improving feature expression in near-range regions. This is critical for computer vision tasks and can be utilized in several existing models to enhance their performance. Moreover, in order to making full use of the edge information learned by two-dimensional feature extraction network, we propose a novel edge detection branch and multi-featured integration cost volume. It is demonstrated that based on our model, edge detection project is conducive to improve the accuracy of disparity estimation. Our method achieves better results than previous works on both Scene Flow and KITTI datasets.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35139127 PMCID： PMC8827418 DOI： 10.1371/journal.pone.0263735

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The binocular stereo matching task is an imperative, but difficult scientific problem, which aims at computing disparity data for every pixel from a stereo correspondence. Efficient and correct stereo matching methods are necessary for computer vision tasks such as robotic pose estimation and autonomous driving [1,2]. Traditional stereo matching methods usually consist of four steps: initial matching cost calculation, matching cost aggregation, disparity prediction, and post-processing. These can be categorized into global and local algorithms [3]. Local strategies solely use constant measurement windows or changeable windows to calculate the preliminary cost. Global strategies normally treat an optimization task by minimizing a word goal characteristic that incorporates statistics and smoothness terms. However, traditional algorithms need to manually design feature description operators and cost aggregation strategies, which is not suitable for real-time applications. Complicated hand-craft production steps limit their improvement. Learning-based stereo-matching methods achieve accurate matching of the corresponding points in the left and right feature maps through exploring feature representations and aggregation algorithms. Those algorithms commonly consist of the following four steps: unary characteristic extraction [4], constructing cost volume [5], value aggregation [6], and disparity prediction [7]. Although the performance on several benchmarks is significantly promoted, drawbacks still remain: Firstly, the predicted edge cues of the disparity map is not accurate enough. Secondly, adopting the strategy of global attention, which is insensitive to the detailed texture regions, resulting in inaccurate disparity estimation in vital areas. In recent years, researchers are encouraged by the mechanism of human attention and attempt to design some network attention architectures with a CNN to enhance the performance of feature extraction. However, drawbacks still remain: Firstly, these works focus on parallax information in two-dimensional(2D) domain, 2D feature is difficult to fully reflect the three-dimensional(3D) real scenes, while ignoring the more important 3D information. Secondly, due to limited learning capability of a single network structure, the disparity map predicted is not fine enough in near-range regions. The edge cues of images are the most easily recognized feature by human eyes, in other words, humans can easily find the stereo correspondence by using edge cues of binocular images. Based on this observation, some researches have made partial progress in predicting image edge cues as a single task. In recent years, researchers are encouraged by edge detection (ED) task and use it for disparity estimation project. However, these methods regard disparity prediction and edge detection as a multi-task learning project. Yet, features learned in such multi-task pipelines cannot be fully exploited, which poses a great need for an effective fusion mechanism. In the context of autonomous driving, the relatively closer area provides larger parallax information, leading to greater risks. To address this problem, more attention should be assigned to this kind of region in the disparity estimation model. In this paper, we propose a high-quality and efficient module for stereo matching and our method achieves better performance on SceneFlow [8] and KITTI [9,10] than previous methods. Specifically, we examine the very important issue of structure design, . The importance of attention has been researched particularly in previous methods [11,12]. In our module, the left characteristic map f and the corresponding right feature map f are packed in the shape of a 3D feature map f, which is sent to the parallax attention (PA) stereo module to learn ‘what is’ to attend f. As shown in Fig 1, our structure efficiently improves the accuracy of disparity prediction by improving feature expression in near-range regions. Meanwhile, a novel edge detection branch and a multi-featured integration cost volume are proposed in our network to learn finer texture features, which are vital in the optimization of unary feature extraction tasks. In order to complete the end-to-end disparity prediction task, we assign different weights to edge detection loss and disparity smooth loss. It is demonstrated that achieving high-precision edge feature map is conducive to improve the accuracy of disparity estimation.

Fig 1

Comparisons with GWC-Net [13] on Scene flow dataset.

Comparisons with GWC-Net [13] on Scene flow dataset.

First column: left input stereo image. Second column: the ground truth map corresponds to input image. Third column: predicted disparity map by GWC-Net. Last column: predicted disparity map by PA-Net. As shown in the white boxes in top row, PA-Net performance better in nearby objects under the guidance of PA module. In bottom row, our predicted disparity map performance better in recovering edge cues. Our main contributions can be summarized as follows: We propose a PA module to further improve the accuracy of disparity prediction; An edge detection branch and a multi-featured integration cost volume are proposed in our network architecture to obtain finer texture features; Our PA-Net achieves the accuracy of 0.775 end-point-error (EPE) on Scene flow dataset and 2.05% kitti-d1-all error on KITTI 2015 dataset, which outperforms other methods by 12%.

Related work

2.1. Traditional methods

In non-end-to-end depth stereo matching algorithms, each step of traditional stereo matching can be replaced by a neural network. Some researchers have mainly focused on the use CNNS to accurately calculate the matching cost function and use the semi-global matching [14,15] method to optimize the predicted disparity map. Zbontar et al. [16] proposed a network structure called stereo matching by training a convolutional neural network (MC-CNN) to compare image patches to calculate the cost of matching by utilizing a pair of 9×9 patches. Traditional algorithms play an important role in stereo matching tasks. However, traditional algorithms generally face the problems of slow calculation speed and low matching accuracy, which greatly limits the application of stereo matching algorithm.

2.2. Learning-based methods

In 2015, Long et al. [17] achieved very good results in semantic segmentation using a fully convolutional network (FCN). Mayer et al. [8], inspired by the FCN, introduced an end-to-end stereo network in an optical flow prediction task. Disp-Net calculates the Euclidean distance construction loss for each pixel between the estimated disparity map and real disparity value. Cascade residual learning: A two-stage convolutional neural network for stereo matching (CRL) [18], and learning for disparity estimation through feature constancy. (iRes-Net) [19], utilized the idea of DispNetC [8] with stack refinement structures to optimize stereo results. Kendal et al. [20] proposed an end-to-end network GC-Net, which considers the use of context and scene geometry information in stereo matching. GC-Net is the first to concatenate the left f and the right feature f to form a 4D cost volume: Meanwhile, GC-Net transforms the stereo matching problem into a regression problem and directly realizes a refined output without post-processing. Encouraged by GC-Net, Chang et al. [21] proposed a PSM-Net, combining the spatial pyramid pooling and stacked 3D hourglass structures in a stereo refinement network. In the current work, GWC-Net [13] proposes a group-wise correlation to assemble the cost volume, whose idea is splitting the features into groups and computing correlation maps group by group. The group-wise correlation is computed as where N denotes the channels of unary features and it eventually divided into N groups along channel dimension. 〈⋅,⋅〉 is the internal product at all disparity levels d. Although the performance on several benchmarks is significantly promoted, there remains some drawbacks, including the predicted edge contour of the disparity map is not accurate enough and adopting the strategy of global attention, which is insensitive to the detailed texture information.

2.3. Learning-based attention methods

In recent years, researchers are encouraged by the mechanism of human attention and attempt to design some network attention architectures with a CNN to enhance the performance of feature extraction. Hu et al. [22] introduced a squeeze-and-excitation block to fully utilize the channel information in the network. In addition to channel attention, cbam: convolutional block attention module [23] introduced a spatial attention block to demonstrate that spatial features are vital in the network. Wang et al. [24] introduced a PASSR-Net to integrate super-resolution information from a stereo image pair, and proposed a PAM module in the article of PASMnet [25] to calculate the consistency score of left and corresponding right graphs along the epipolar line, and it was leveraged by many subsequent methods such as [19-26]. On the basis of PAM, Wang et al. [27] introduced a symmetric bi-directional parallax attention module (biPAM) to obtain cross-view information. Ying et al. [28] proposed a generic stereo attention module (SAM) which aims to solve the information incorporation problem. Chen et al. [29] addressed the stereo images with large disparity and multi types of epipolar lines issues by utilizing a cross parallax attention module (CPAM). However, the parallax information provided by binocular images has not been fully utilized in those methods. PA-Net is the first to emphasize that by improving feature expression in near-range regions is helpful to disparity prediction task.

2.4. Edge detection methods

Edge cues can be easily captured by human eyes to find stereo correspondences. Accurate edge contours can help discriminating between different objects or regions. Based on this observation, some works had made some progress in predicting image edge cues as a single task. Xie et al. [30,31] first designed an end-to-end ED network based on a VGG-16 network. Recently, Song et al. [32,33] combined an ED branch with stereo-matching network. However, these methods regard disparity prediction and edge detection as a multi-task learning project, those works did not establish an effective mechanism to integrate the information learned by multi-task project. As a result, the features learned by multi-task project are not effectively expressed and utilized. Focus on this problem, we construct a multi-featured integration cost volume to combine parallax features and edge features.

Methods

As shown in Fig 2, we proposed a PA stereo matching network (PA-Net), which extends GWC-Net [13] with a PA module, edge detection branch, and multi-featured integration cost volume.

Fig 2

Diagrammatic sketch of PA-Net.

It is constructed based on GWC-Net by adding edge-detection branch in feature extraction structure and applying PA module in the architecture of the 3D aggregation network. The left and corresponding right images are fed to a weight-sharing feature extraction pipeline, which consisting of a ResNet-50 network for feature maps calculation. It includes three branches (edge detection, group-wise correlation [13], and concatenation branches). Thereafter, a multi-featured integration cost volume is constructed by those branches and it will finally be fed into a parallax-attention 3D aggregation network for disparity regression.

Diagrammatic sketch of PA-Net.

3.1. Network architecture

The pipeline of our introduced PA network is shown in the upper half of Fig 2, it includes four parts: unary feature extraction pipeline, multi-featured integration cost volume structure, parallax-attention 3D aggregation network, and disparity prediction module. The multi-featured integration cost volume structure consists of three parts: concatenation [20], group-wise correlation [13], and edge detection volumes (details in Section 3.3). The results of the multi-featured integration cost volume are then concatenated as the input of the parallax-attention 3D aggregation network, and it will be described in Section 3.2. The parallax-attention 3D aggregation network aims to aggregate variable disparity values, which consist of two parts: a pre-hourglass module and three parallax-attention 3D aggregation networks. The pre-hourglass module consists of two components: the primary half consists of four 3D convolutional layers with batch normalization and the ReLU [26] function, where the second part consists of two PA modules.

3.2. Parallax attention module

Discriminant characteristic representations are essential for understanding the scenes. However, previous studies only focus on the two-dimensional (2D) contextual information, but ignore the significance of 3D disparity features. To emphasize the value of regions with a large parallax, we introduce a PA module that encodes the disparity information to different weights, thus enhancing their illustration capability. 3D convolution layer is widely used in stereo matching tasks and it consists of 4 parts: channel dimension, disparity dimension, height and width. However, 3D filters learned within a local field that lacks contextual information in the output feature map U. Based on these observations, as is shown in Fig 3, given a feature map f∈R, we first conduct two transformations f:f→f∈R, f:f→f∈R, which represent the importance of different position vectors. Term f implies to select the maximum value element in disparity dimension, whereas f denotes to calculate the mean value in disparity dimension, with regard to the c channel, the value of (i,j) position is calculated by: where dmax denotes the maximum parallax value. Thereafter, the two feature maps are concatenated in the channel dimension to obtain a mixed feature map M∈R2. The mixed feature map M can be treated as a collector of the local disparity texture information, and its function is to describe the entire parallax image.

Fig 3

Diagram of our PA module.

Given the feature f, PA generates parallax weights by fusing average and maximum values in disparity dimension.

Diagram of our PA module.

Given the feature f, PA generates parallax weights by fusing average and maximum values in disparity dimension. Subsequently, the feature map is sent to a shared network, which is composed of a multi-layer perceptron with two 3×3×3 convolutional layers and it accompanies the batch normalization and ReLU [26] function. To reduce the parameter overhead, the characteristics of the middle layer are set to ℝ, where r denotes the reduction ratio, and we set it to 4. Additionally, the disparity feature map is applied to a sigmoid function. Finally, we merge the output feature vectors with the input feature f using an element-wise product to obtain the final PA feature map M∈R,which can be simplified as follows: where denotes the value of the final i position, 〈⋅,⋅〉 means concatenating the inner channels, and X denotes the value of the input feature. In comparison with traditional 3D convolutional layers, our contributions can be summarized as follows: In the case of acquiring an identical receptive field, our module generates considerably fewer parameters (reduced by 25%) and consumes much less memory; consequently, the inference time of our module is faster. As summarized in Table 1, our PA structure can effectively decrease the performance of EPE with a small increase of computational complexity.

Table 1

Ablation study results of PSM-Net, GWC-Net and PA-Net on the datasets of Scene flow [8].

The results of PSM-Net [21] and GWC-Net [13] are trained with published code with our batch size, evaluation settings for fair comparison.

Model	Edge detection	PA module	EPE (px)	>1px (%)	>2px (%)	>3px (%)	Time (ms)
PSM-Net			0.988	10.62	6.31	5.02	246.1
	✓		0.955	10.27	6.10	4.85	251.5
		✓	0.892	10.16	6.31	4.80	259.4
GWC-Net			0.878	9.25	5.57	4.35	210.7
	✓		0.856	9.08	5.47	4.27	215.6
		✓	0.792	8.59	5.24	4.08	224.4
PA-Net(ours)	✓	✓	0.775	8.49	5.26	3.84	222.4

Our PA module does not change the number of channels and the size of input features, which can be added directly to 3D convolution layers.

Ablation study results of PSM-Net, GWC-Net and PA-Net on the datasets of Scene flow [8].

The results of PSM-Net [21] and GWC-Net [13] are trained with published code with our batch size, evaluation settings for fair comparison.

3.3. Edge detection and multi-featured cost volume

State-of-the-art disparity estimation method works well on ordinary and clear texture regions. The matching clues in these regions are clear and can be easily captured through the context pyramid. However, as shown in Fig 1, the edge details are lost. Hence, we design an edge detection branch to help modify disparity map. Our edge detection (ED) architecture includes three branches (group-wise, concatenation, and edge detection branches), sharing the same weights of the ResNet-50 backbone, listed in Table 1. There are four outputs in the ResNet-50 layer, for each output branch, we design a new structure that includes a 3×3 convolutional layer and 1×1 convolutional layer with batch normalization and the ReLU [26] function (we set the number of the final channel to 1 in each branch); In order to fuse the features contained in different branches, all the feature maps are concatenated to construct an edge cost volume. Finally, group-wise, concatenation, and ED features are fused to form a multi-featured integration cost volume. As shown in Fig 4, based on the architecture of the GWC-Net [13], we added an ED branch to construct the edge volume. In contrast to the concatenation volume, within which the left and corresponding right feature maps are concatenated at unique disparity levels to form a cost volume, the ED volume is constructed by computing the similarities of the left and right feature maps. For each pair of edge feature maps, the edge correlation is calculated as follows: where 〈⋅,⋅〉 denotes the internal product, N the quantity of total channels, and , the left and corresponding right feature maps, respectively. Finally, by combining all cost volumes, the multi-scale cost volume is determined as follows: where Concat(⋅,⋅) denotes the concatenation of feature maps in the channel dimension. C and C are calculated as introduced in the Eqs of (1) and (2).

Fig 4

Pipeline of feature extraction network.

3.4. Output module and loss function

Summarizing all outputs of our network, it includes 4 disparity predicted maps d, to fully utilize the output feature maps, we assign different weights to each output. We first employ two convolutional layers with 3×3 and 1×1 to obtain a 1-channel output; thereafter, the output feature map is upsampled using bilinear interpolation. Finally, a softmax function is designed to calculate the disparity prediction map. Generally, the disparity smooth loss can be calculated as follows: where λ denotes the weight for the i output disparity prediction map, N represents the total number of pixels in one image, and is the j element with ground truth . The smooth loss is computed as follows: Since the information of object edge contour in images is conducive to the parallax prediction task, we propose an edge detection loss for end-to-end learning: where x and y represent the activation value and ground truth edge probability at pixel j, respectively. P(x) is the standard sigmoid function, and N denotes the total number of pixels in one image. Fusing the edge feature information extracted from different output layers, our edge loss function can be formulated as: where x is the activation value from stage k while x denotes the last edge output. β is the weight of stage k (equals to 0.2, 0.4, and 0.6 here). Since we are working under a disparity prediction task setup, we want to fuse the edge detection loss and disparity prediction loss together. Therefore, we design a double hierarchical loss weighing scheme, the total loss is calculated as: with γ0 is the weight of total disparity prediction loss and it set to 1, γ1 denotes the total edge detection loss weighted 0.1.

Experiments

In this section, we evaluated our PA-Net with distinctive settings on the Scene Flow [8] and the KITTI datasets [9,10]. Sections 4.1 and 4.2 show the experimental setup of proposed network on the KITTI and Scene Flow datasets. In Section 4.3, we set up a series of ablation experiments using different methods to test the performance of our PA module. In Section 4.4, we add our edge detection volume to PSM-Net and GWC-Net to validate the importance of the multi-featured integration cost volume.

4.1. Experimental setup

We implemented our architectures using the PyTorch tools. All methods were trained using Adam (β = 0.9, β = 0.99). Owing to the limitation of experimental conditions, the batch size of our network was set to 4, and we optimized all the models with two Nvidia RTX 2080ti GPUs using 256×512 random crops from the input image pair. The utmost disparity value was set to 192, whereas the coefficients of the four outputs were set to λ = 0.5, λ = 0.5, λ = 0.7, and λ = 1.0. We tend to set the model on the Scene Flow dataset for a total of 16 epochs in which the learning rate was 0.001 and downscaled by 2 when the number of epochs 10, 12, and 14. For KITTI [9,10] dataset, the pre-trained model on Scene Flow [8] datasets was utilized to train another 300 epochs. The preliminary learning rate was 0.001, it is down-scaled by 10 when exceeding 200 epochs.

4.2. Dataset

1) Scene Flow [ A dataset series of artificial stereo datasets. There are three subsets in the dataset: Driving, Flyingthings3D, and Monkaa, containing 35454 images for training and 4370 images for testing with Height = 540 and Width = 960, along with ground truth maps. The trained network model has a strong generalization ability because the number of pictures in the Scene Flow dataset is sufficiently large. The results of visualization and comparisons for Scene Flow [8] are as shown in Fig 1. 2) KITTI 2012 and KITTI 2015 [ Real-word driving scene dataset using Lidar scanning to obtain the three-dimensional coordinates of space points. KITTI 2012 includes 194 training stereo correspondences and 195 testing pairs. KITTI 2015 comprises 200 stereo correspondences for training and testing. The training dataset is divided into two parts, the first section consists of 180 pairs for training and the relaxation groups of images for validation. More than that, we made the corresponding edge detection label dataset for end-to-end edge detection task. The results of the visualization and comparisons are shown in Fig 5, and we submit the results predicted by PA model on the validation set of the KITTI official website. The comparison results for the test set are summarized in Tables 2 and 3. It shows that our PA-Net achieves better results than PSMNet [21], GwcNet [13] and PASMNet [25].

Fig 5

Results visualization and comparisons on KITTI 2015 [10].

Left column: the input left image. Middle column: predicted color disparity and error map by GWC-Net. Right column: predicted color disparity and error map by PA-Net. As shown in the white boxes, PA-Net performance better than previous work in areas with large parallax such as the iron railing in front of the car.

Table 2

Performance comparison of KITTI 2012 [9] test set.

The GWC-Net [13] and PSM-Net [21] are trained with the same batch size as our method for fair comparison.

Methods	>2px (%)		>3px (%)		>5px (%)		Mean Error (px)		Time (s)
Methods	Noc	All	Noc	All	Noc	All	Noc	All	Time (s)
DispNetC [8]	7.38	8.11	4.11	4.65	2.05	2.39	0.9	1.0	0.06
MC-CNN-acrt [16]	3.90	5.45	2.43	3.63	1.64	2.39	0.7	0.9	67
GC-Net [20]	2.71	3.46	1.77	2.30	1.12	1.46	0.6	0.7	0.9
iResNet [19]	2.69	3.34	1.71	2.16	1.06	1.32	0.5	0.6	0.12
PSMNet [21]	2.68	3.20	1.68	2.09	1.05	1.21	0.5	0.5	0.6
GWCNet [13]	2.21	2.88	1.40	1.81	0.85	1.11	0.5	0.5	0.32
Edge-stereo [32]	2.32	2.88	1.46	1.83	0.83	1.04	0.4	0.5	0.32
PA-Net	2.20	2.82	1.36	1.78	0.80	1.09	0.5	0.5	0.33

Table 3

Performance comparison of KITTI 2015 [10] test set.

The GWC-Net [13] and PSM-Net [21] are trained with the same batch size as our method for fair comparison.

Methods	All (%)			Noc (%)			Time(s)
Methods	D1-bg	D1-fg	D1-all	D1-bg	D1-fg	D1-all	Time(s)
DispNetC [8]	4.32	4.41	4.34	4.11	3.72	4.05	0.06
GC-Net [20]	2.21	6.16	2.87	2.02	5.58	2.61	0.9
CRL [18]	2.48	3.59	2.67	2.32	3.12	2.45	0.47
iResNet [19]	2.14	3.45	2.36	1.94	3.20	2.15	0.22
PSMNet [21]	1.98	4.87	2.32	1.71	4.51	2.24	0.41
GWCNet [13]	1.85	4.14	2.23	1.71	3.75	2.05	0.32
PASMNet [25]	2.04	4.33	2.41	1.88	3.91	2.22	0.5
Edge-stereo [32]	1.84	3.30	2.08	1.69	2.94	1.89	0.32
PA-Net	1.73	4.05	2.05	1.63	3.59	1.86	0.33

Results visualization and comparisons on KITTI 2015 [10].

Performance comparison of KITTI 2012 [9] test set.

The GWC-Net [13] and PSM-Net [21] are trained with the same batch size as our method for fair comparison.

Performance comparison of KITTI 2015 [10] test set.

The GWC-Net [13] and PSM-Net [21] are trained with the same batch size as our method for fair comparison.

4.3 Ablation study for parallax attention module

In this section, to validate the performance of the PA module, we evaluated the PA module with different stereo matching strategies. Moreover, we set a series of ablation experiments to explore the best settings for the number of PA modules. [13,21] were selected as reference models by adding PA modules. PA module can be directly used in 3D convolution layer, since our model will not change the number of channels and image size. The experimental results demonstrate that, on the premise of a small increase in calculation time, our PA-Net performs better than previous works. As summarized in Table 1, we select the classic methods [13,21] as the conference models which include two variables (edge detection structure and PA module). Meanwhile, the prediction result of an EPE on the Scene Flow dataset is decreased by 9.71% in the model of [21] and decreased by 11.0% in the model of [13] after adding the PA modules. Fig 6, depicts the training and validation curves of PA-Net, GWC-Net and PSM-Net on KITTI 2015 dataset from epoch 100 to 300. We can easily observe that the loss curve of PA-Net decreases more smoothly than previous works and produce consistent gains in performance which are sustained throughout the training process. Moreover, we see that PA-Net performs better than those works when the networks are trained to start fitting, and it will last achieve a highest accuracy.

Fig 6

Training curves of PA-Net, GWC-Net and PSM-Net on KITTI 2015 dataset from epoch 100 to 300.

PA-Net exhibits improved optimization characteristics and more stabled in training.

Training curves of PA-Net, GWC-Net and PSM-Net on KITTI 2015 dataset from epoch 100 to 300.

PA-Net exhibits improved optimization characteristics and more stabled in training. To select an optimal value of PA modules to configure the networks. As shown in Table 4, which indicates the consequence of PA-Net with different numbers of PA modules. When the number of PA modules is larger than 6, the increase in accuracy becomes minor. Considering the amount of calculation and memory consumption, we selected half a dozen PA modules as the ultimate structure.

Table 4

Results of EPE by adding different number of PA modules on Scene flow dataset.

Model	The numbers of PA moduels	Scene Flow EPE (px)	Time (s)
PA-Net	0	0.856	0.329
	1	0.849	0.331
	4	0.816	0.334
	6	0.784	0.336
	10	0.790	0.340
	16	0.801	0.346

4.4 Ablation study for multi-featured cost volume

In this section, we apply several critical modifications to the feature extraction network compared to [13]. Specifically, we design a multi-featured integration cost volume structure that consists of three parts: ED cost, group-wise cost, and concatenation cost volumes. The experimental results in Table 1 demonstrate that by adding our edge detection structure, the parameters of EPE loss can be reduced appropriately. As summarized in Table 1, the prediction results of EPE on the Scene Flow dataset are decreased by 3.34% in the model of [21] and decreased by 2.50% in the model of [13] after adding the ED modules. Based on [13], we can conclude from several experiments that if we set the channel number of the group-wise volume as 32, we can obtain an exceptional performance. The experimental consequences in Table 5 demonstrate that the EPE is decreased by adding the correct channels of the concat volume. The best EPE is 0.574 (concat channels are 14×2) in the dataset of KITTI 2015 and 0.616 (concat channels are 16×2) in KITTI 2012. Considering both the accuracy of disparity prediction and computer memory usage, we selected 14×2 as the ultimate channel of the concatenation volume.

Table 5

Ablation study results of PA-Net with different settings on the dataset of KITTI.

Model	Concatenation channels	Kitti2015 EPE (%)	Kitti2012 EPE (%)	Time (ms)
PA-Net(edge = 4, group = 32)	10×2	0.612	0.633	215.7
	12×2	0.586	0.621	218.5
	14×2	0.574	0.617	222.4
	16×2	0.578	0.616	225.9
	18×2	0.575	0.619	229.4

4.5 Analysis and interpretation

While PA blocks have been empirically shown to improve network performance, we would like to provide an explain how the parallax attention mechanism operates in practice. To provide a clearer picture of the behavior of PA blocks, in this section we apply several examples from GWC-Net model and examine the different distributions of sensitive respective region between 3D convolutional layers and PA blocks. We then exhibit their distribution maps in Fig 7, which is trained in the dataset of KITTI 2015.

Fig 7

Attention distribution map in KITTI 2015.

First line: the input left image. Middle line: attention distribution feature map extracted by 3D convolutional layers. Third line: attention distribution feature map extracted by PA blocks.

Attention distribution map in KITTI 2015.

First line: the input left image. Middle line: attention distribution feature map extracted by 3D convolutional layers. Third line: attention distribution feature map extracted by PA blocks. We make the following observations about how the parallax attention mechanism works in 3D feature extraction stage. First, traditional 3D convolutional layer is used to adopting global receptive field mechanism, which will guide the network to pay fair attention to different features. However, in the practice life, we can easily draw a conclusion that objects closer to us will have a greater impact. As shown in Fig 7, compared with trees and houses far away, people and cars in near region should be paid more attention. But we can observe a phenomenon that the lighted regions in 3D feature map have covered every corner of the image, which is not in line with objective reality. Second, for PA blocks, we redistribute the values in the parallax dimension to make full use of context information. For the region with large parallax value, the proportion of its value will be larger after redistribution. In the third line of Fig 7, lighted regions are concentrated in areas with large parallax such as roads, cyclist and cars, indicating that the network pays more attention to these areas. PA blocks successfully focuse on objects with large parallax through the weight redistribution strategy in parallax dimension.

Discussion

Intuitively, our method not only improves the accuracy of disparity prediction globally, we also ahcieve the following advantages: Firstly, in the case of acquiring an identical receptive field with traditional 3D convolutional layer, our module generates considerably fewer parameters (reduced by 25%) and consumes much less memory. More than that, PA module can be easily utilized in other works because it will not change the size of the feature image. Secondly, as shown in Fig 1, our structure efficiently improves the accuracy of disparity prediction in near-range regions by improving 3D feature expression. Lastly, in order to making full use of the edge information learned by two-dimensional feature extraction network, we propose a novel edge detection branch and multi-featured integration cost volume. It is demonstrated that based on our model, edge detection project is conducive to improve the accuracy of disparity estimation. As Table 2 shows, compared with GWC-Net, our method performs better in two-pixel error, three-pixel error, and five-pixel error on the KITTI 2012 dataset. Compared with PSM-Net, the disparity map’s percentage of outliers averaged over all ground truth pixels (D1-all) is reduced by 11.6%, and the running speed is increased by 19.5%.

Conclusions

In this paper, we proposed a high-precision and practical stereo matching network, PA-Net, for end-to-end disparity prediction. Our net emphasizes that by improving feature expression in near-range regions is helpful to disparity prediction task. PA-Net performs better than previous networks by utilizing the edge detection layer, PA module, and multi-featured cost volume. It is demonstrated that based on our model, edge detection task is conducive to improve the accuracy of disparity estimation task. PA-Net achieves better accuracy than GWC-Net on the Scene Flow and KITTI datasets. 14 Jul 2021 PONE-D-21-20417 Parallax attention stereo matching network based on the improved group-wise correlation stereo network PLOS ONE Dear Dr. jinan, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 28 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Jyotismita Chaki, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 3. Thank you for stating the following in the Acknowledgments Section of your manuscript: "This research was funded by the National Natural Science Foundation of China (No.51875266)" We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: "The author(s) received no specific funding for this work." Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. "Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 5. Please ensure that you refer to Figure 6 in your text as, if accepted, production will need this reference to link the reader to the figure. 6. We note that Figures 1,4, and 6 in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: a. You may seek permission from the original copyright holder of Figures 1,4, and 6 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. Additional Editor Comments: We have received comments on your paper. Reviewers' comments are appended below. Your manuscript requires revision. If you undertake the revision and resubmit the revised manuscript, I would be pleased to review your revised manuscript. Please submit a list of changes or a rebuttal against each point raised by the reviewers along with the revised manuscript. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: In this paper, the authors proposed a parallax attention network for stereo matching. They also proposed an edge detection module and a multi-scale cost volume technique to further improve the accuracy. Experimental results validate the effectiveness of the proposed modules. The proposed method achieves better performance than PSMnet and GwcNet. Major Concerns: 1) Parallax attention has been widely used for various of stereo image processing tasks, such as stereo image super-resolution [c1-c3], stereo matching [c4], stereo image dehazing [c5], and stereo color transfer [c6]. The authors are suggested to provide a comprehensive review of the recent progress of parallax attention in the related work section. [c1] Symmetric parallax attention for stereo image super-resolution, CVPRW 2021. [c2] A stereo attention module for stereo image super-resolution, SPL 2020. [c3] Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation, arxiv 2021. [c4] Parallax Attention for Unsupervised stereo correspondence learning, TPAMI 2020. [c5] BidNet: Binocular image dehazing without explicit disparity estimation, CVPR 2020. [c6] Asymmetric stereo color transfer, ICME 2021. 2) Since parallax attention has already been used for stereo matching task in [c4], the main difference between the proposed network and PASMnet [c4] should be well addressed and discussed. Minor Concerns: 1) In table 1, the resolution of the 2nd to 4th branches of the PA module is 1/4 x 1 x 1/4H x 1/4 W x 32(or64). The first 1/4 is confusing. Is it the batch dimension? Reviewer #2: This paper proposes a novel parallax attention method for binocular disparity estimation. The effectiveness of the proposed method is validated through extensive experiments. Comparative results also show that the proposed PA-Net is superior than PSMnet and GWCNet. I recommend an acceptance to this paper upon some minor revisions. 1) It should be noted that parallax attention mechanism has already been used for disparity estimation in PASMnet [r1]. However, the method proposed in this paper is significantly different from that in [r1]. The authors are strongly suggested to discuss the difference between their method and PASMnet. 2) Since this paper focuses on the parallax attention, some recent advances in parallax attention should be reviewed in the related work (e.g., symmetric parallax attention [r2], cross parallax attention [r3], stereo attention [r4] and so on) to make the survey of the literature more comprehensive. 3) The layout of the tables and figures in this paper need to be further improved. A table cannot be placed on two separate pages. Refs: [r1] Parallax attention for unsupervised stereo correspondence learning, TPAMI 2020. [r2] Symmetric parallax attention for stereo image super-resolution, arxiv 2020. [r3] Cross parallax attention for stereo image super-resolution, TMM 2020. [r4] A stereo attention module for stereo image super-resolution, SPL 2020. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 19 Jul 2021 Reviewer #1: We are very grateful to your comments for the manuscript, your suggestion is of great help to our work. According to your advice, we amended the relevant part in manuscript. We have taken the following measures to this advice: Firstly, we have changed the expression of “Despite the excellent performance of the attention module, it rarely appears in stereo matching network” in the beginning of our previous abstract to “Despite the excellent performance of the attention module, the parallax information provided by binocular images has not been fully utilized.” Secondly, we have made a comprehensive review of the recent progress of parallax attention in the introduction and related work section. We introduced the progress of recent works such as PASMnet, PASSRnet, SISRnet, etc. For more details, please check the latest manuscript. Thanks again for your advices. Reviewer #2 We are very grateful to your comments for the manuscript, and we have made a comprehensive review of the recent progress of parallax attention in the introduction and related work section. We introduced the progress of recent works such as PASMnet, PASSRnet, SISRnet, etc. For more details, please check the latest manuscript. Thanks again for your advices. Submitted filename: Response to Reviewers.docx Click here for additional data file. 16 Sep 2021

PONE-D-21-20417R1

Parallax attention stereo matching network based on the improved group-wise correlation stereo network PLOS ONE Dear Dr. jinan, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 30 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Jyotismita Chaki, PhD Academic Editor PLOS ONE Journal Requirements: 1. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service. Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services. If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free. Upon resubmission, please provide the following: The name of the colleague or the details of the professional service that edited your manuscript A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file) A clean copy of the edited manuscript (uploaded as the new *manuscript* file) Additional Editor Comments (if provided): Based on the reviewer comments I am requesting you to submit the revised manuscript. Review Comments to the Author Reviewer #1: All my concerns have been addressed by the authors. I recommend an acceptance without further revisions. Reviewer #3: This paper presents a new network for the stereo correspondence problem. From the comparison, it does look like the proposed framework outperforms other works. The structure of the network is clearly explained. Generally, the work looks good. However, the writing of the paper is far from ready. First of all, the paper jump into the specific technique and problem directly from the beginning. There is not much background introduction and concept explanation for normal readers. If the reader (even doing computer vision research) are not familiar with this specific research, it will be very difficult to understand the work and follow the writing. I do suggest the author to use the introduction section to clearly explain the problem, instead of review some papers. Secondly, this paper made the same mistake as lots of machine learning papers. It only explains the proposed network structure then jump directly into the experiment, using the result to show the effectiveness of the new network. There is no clear explanation why this network is designed in this specific way. It is pretty much like, I try this and “luckily” the result is good. For a good academic writing, there should be a clear motivation and discussion on the design. All the decision should be clearly justified. Some other comments on the details: 1. In the abstract, the paper targets at “ill-posed areas (weak texture, repeat texture and occlusion regions)”. However in the paper, all those problems are not mentioned specifically in the experiment. 2. In the abstract, how the “novel edge detection branch and multi-scale cost volume” will “obtain finer texture features”. Is there a justification for this claim or “common sense”? 3. The English needs to be polished. For example, I am not sure the following expression are correct or not, please check: “word goal characteristic”, “aggressive universal overall performance”, “conference models”, “competitive advantage”. And there are lots of long sentences which should be broken into separate ones (for example line 124-127). And in line 217, “Which” should be merged into previous sentence. Line 309, “Table 5, demonstrate”, no comma. There are lots of small mistakes. 4. Line 77, “Our PA-Net outperformed the state-of-the-art GWC-Net [11] in ill-posed regions”, this should be put into the contribution. 5. Line 45-54 list quite a few papers, however, the logic behind this discussion is not clear. How these literature review helps the overall argument. 6. Line 135-136 should have an overall introduction about the whole methodology. 7. Table 2 needs more detailed discussion before concluding into the listed benefit. 8. Line 257-259, what do you mean “challenging regions”. Line 265, Figure 6 should be Figure 5. 9. Line 292, Figure 5 should be Figure 6 10. Line 305 and Table2 should be consistent, model [10] or [2]? 11. The reference format should be consistent. The last few doesn’t have page no, authors’ name are also not in the right format. Please check carefully. Reviewer #4: 1. Line 73. Achieving competitive results on the dataset is not a major contribution. 2. Figure 1 is too far away from where is first referred. The proposed architecture is quite simple, it’s not clear what is the proposed novelty. The figure refers to the PA-Net abbreviation which can’t be found inside the figure. 3. Please extend the argumentation in the design chose of the architecture. 4. Table 1 is extremely long. I could have been replaced by a better design of Figures 1, 2, 3. 5. May general conclusion is that the manuscript is well prepared and there are a few small ideas that are proposed. However, the overall results are just competitive, i.e. quite similar with the state-of-the-art. Therefore, I would prefer to see an extended discussion on why such a small gain is achieved. Why would someone chose the proposed method and not other state-of-the-art method. What are the main advantages of employing PA-NET? Reviewer #5: The primary contribution of the paper is an attention module that takes into account some of the available 3D disparity information. It builds on the work of several authors, most heavy on Kendall et al. 2017 who introduced a “Cost Volume” feature description that attempted to create a 3D volume from stereo information by concatenating features from the left image with features from the right image at some image location shifted by the depth. This produced a Depth x Height x Width x Features tensor. Here authors add an attention network that uses this “Cost Volume” to drive attention, but not directly. They first collapse the depth dimension to two values, the mean and max. They show that this features is an improvement on previous methods on the KITTI dataset and also show its general applicability to two other datasets. I have not been able to fully understand all of the details, especially the edge detection feature. I do not feel confident that a reader would be able to reproduce this part of the work from the details given. However the analysis performed is logical a fairly thorough, the results appear sound and could support the authors conclusions if the methodological issues were cleared up (also with the important caveats below). Therefore I strongly believe that this paper is not ready for publication but believe it can be ready with some minor but important modifications. Major issues. The paper is very hard to follow (not unusual for a deep learning paper with so many details to the implementation). The main issue seems to be that the authors are not using the terminology in the papers they cite. The authors may have a good reason for this but it would be helpful if they either revised their paper with matching terminology or altered the reader to the difference. “concatenation cost, group-wise correlation [11], and our proposed edge 149 detection volumes (details in Section 3.3).” What is the “concatenation cost” or group-wise correlation? The paper cited only contains a “Cost Volume”. Section 3.3 implies that the concatenation cost is equivalent to the “Cost Volume”. I am still in the dark as to what the group-wise correlation refers to. As said in the introduction I could not follow the edge detection work. Why is the edge branch called an edge branch? Did you run an edge detector on the images before passing the images into the network? As written it looks like you ran two slightly different networks in parallel. Adding an edge detection module appears to degrade the performance of the PA module (Table 2) can you comment on this? Comments on the main claims (Abstract) “Particular, we advocate for a parallax attention module in three dimensional (disparity, height and width) level that aims to aggregate variable disparity values. “ As I understand it your system does not in fact use the entire depth range (or a single disparity value) but two values derived from this information. “Meanwhile, finding correct correspondences in ill-posed areas (weak texture, repeat texture, and occlusion regions) remains an arduous task. “ How does your system address this problem? All your results are based on errors across an entire image not selected areas that are ill-posed. “To reinforce the precision of the disparity prediction in these challenging areas, in the present study, we propose a parallax attention stereo matching algorithm based on the improved group-wise correlation stereo network to learn the disparity content from a stereo correspondence. “ You don’t propose a new group-wise correlation, you use an pre-existing group-wise correlation method and extend it with an attention mechanism. You repeatedly use the term “multi-scale” to describe your algorithm. I have always understood the term multi-scale to mean combining images of different sizes (at least in this context). What you a developing appears more “multi-featured” that “multi-scale”. Minor issues “In recent years, researchers have attempted to combine the human attention mechanism with a 112 CNN to enhance the performance of feature extraction” The actual human attention mechanism is still not understood, an is probably very different the systems in machine learning. Can you alter this to say something like “human-inspired” Update citations in tables (e.g. table 2, 3) Table 2: 1027 should be 10.27 ? ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 14 Oct 2021 Thank you for giving us the opportunity to submit a revised draft of the manuscript “Parallax attention stereo matching network based on the improved group-wise correlation stereo network” for publication in the Journal of “PLOS ONE”. We appreciate the time and effort that you and the reviewers dedicated to providing feedback on our manuscript and are grateful for the insightful comments and valuable improvements to our paper. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. Submitted filename: Response to Reviewers.docx Click here for additional data file. 12 Nov 2021

PONE-D-21-20417R2

Parallax attention stereo matching network based on the improved group-wise correlation stereo network

PLOS ONE Dear Dr. jinan, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 27 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Jyotismita Chaki, PhD Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #3: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #3: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #3: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #3: No Reviewer #4: No ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #3: Thanks for the authors making the revision following our previous comments. Some of the problems are very well addressed. However, there are still lots of problems in the writing, not only typo or grammar, but also the logic behind the writing. I really recommend the authors to get some help. Some examples are list here: 1. The first part of the abstract should address the background, research problem and motivation. However the logic that the authors follow is that parallax information and edge contour information are not fully used. This shouldn’t be the research problem. The target is to improve the precision of disparity prediction. The author should discuss what is the current status of accuracy, how bad it is. The parallax and edge information is the methodology that the authors choose to exploit. For any research, it shouldn’t target at just experimenting some methods, that only give you the route for solving the problem not the problem itself. There are other parts in the abstract and introduction section also follow this logic which should be revised. 2. In the introduction section, “However, complicated manual production steps limit their improvement.” This statement concludes too quick. It needs more details (discussion) to justify why and where these methods are not good. Probably one more sentence will be enough. Also for “and “Secondly, adopting the strategy of global attention, without increasing attention to important areas.” Why this is not good, also missing a part in the logic of the discussion. 3. In section 2.4, the sentence “Observation that edge detection project is conducive to improve the accuracy of disparity estimation.” is not linked clearly with the adjacent part. Also, the last sentence in the same paragraph, the author meant to criticise the Song’s work, it only mentioned multi-task instead of integration, how this become a problem? One more sentence could conclude this statement, however it is missing. For this literature review section, discussion is very important, and it should have a clear logical link, end to end, landing on the problems that this paper trying to solve. 4. Section 3.3, “where matching cues are clear can be easily captured through the context pyramid.” 5. Section 3.3, Figure 4 caption, “Pipeline of feature extraction network. Which includes three branches (edge detection, group-wise, and concatenation feature branches).” 6. Figure 2, “Edge detection voume”, typo problem. It is impossible for me to give a comprehensive list of problems in the writing. The authors should carefully check every detail of the draft. Certainly the research work is good. It worthy taking time to polish the writing to achieve the level of publication in this journal. Reviewer #4: The authors answered all my comments, however, I would recommend to further polish the text and add the corresponding punctuation after the equations. In the future, I would recommend to the authors to answer each reviewer’s comment, not just to mention that the manuscript was modified somehow, e.g.: “We have made the latest changes in the method part.”. Please add an answer to the comment and clearly present what was modified in the manuscript, not just refer to the manuscript! ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #3: No Reviewer #4: Yes: Ionut Schiopu [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

15 Nov 2021 Thank you for giving us the opportunity to submit a revised draft of the manuscript “Parallax attention stereo matching network based on the improved group-wise correlation stereo network” for publication in the Journal of “PLOS ONE”. We appreciate the time and effort that you and the reviewers dedicated to providing feedback on our manuscript and are grateful for the insightful comments and valuable improvements to our paper. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. Submitted filename: Response to Reviewers.docx Click here for additional data file. 5 Jan 2022

PONE-D-21-20417R3

Parallax attention stereo matching network based on the improved group-wise correlation stereo network

PLOS ONE Dear Dr. jinan, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Feb 19 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Jyotismita Chaki, PhD Academic Editor PLOS ONE Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #6: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #6: Partly ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #6: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #6: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #6: No ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #6: The paper proposes a stereo matching framework that incorporates a disparity prediction model, the edge detection layer, PA module, and multi-featured cost volume. The authors made strong emphasis on improving feature expression in near-range regions to help disparity prediction tasks. Main Review: Strengths: The proposed method improves the accuracy of disparity estimation task. Weakness: -Major concerns: 1. The only difference between the proposed method and GWC-Net is the PA module, but is it absolutely necessary? According to the qualitative results shown in the main text, it is hard to interpret why the proposed PA module is effective which only brings marginal improvements. 2. The visual comparison is not enough to support the superiority of the proposed method. 3. The technical contributions of this paper are unclear. In the last few paragraphs, I think the logical relationship is not clearly explained, which leads to confusion when reading. I would like to see a description about why it works, as well as the innovations and differences from other related methods. -Minor concerns: 1. visualization of the results. All the visualizations are relatively low resolution, which is confusing and makes it hard to compare across methods and against GT. 2. In the paper, ‘our theory’ and ‘in this study’ appeared several times, please replace them with ‘out method’, ‘in this paper’ 3. In the summarization of contributions, for the third contribution, revise it to: Our PA-Net achieves the accuracy of ()% on Scene flow dataset and ()% on KITTI 2015 dataset, which outperforms other methods by ()%. 4. The paper also has some minor issues in writing, in the section of Introduction -whose goal is to -> which aims at computing -“Learning-based stereo-matching methods through exploring feature representations and aggregation algorithms for matching costs”, this sentence is not complete. - due to the limited learning features-> due to limited learning capability - the most easily recognized and utilized-> the most easily recognized -Avoid using their -However, their methods regard….as a multi-task learning project, in this way …. effective mechanism to fuse them-> However, these methods regard disparity prediction and edge detection as a multi-task learning project. Yet, features learned in such multi-task pipelines cannot be fully exploited, which poses a great need for an effective fusion mechanism. -In the autonomous driving task-> In the context of autonomous driving, -Thus, it requires the disparity estimation model to provide more attention to this region.->To address this problem, more attention should be assigned to this kind of region in the disparity estimation model. -In this study->In this paper -we demonstrate that by designing a high-quality and efficient module for stereo matching…->we propose a high-quality and efficient module for stereo matching and our method achieves better performance on SceneFlow and KITTI than previous methods. -It is demonstrated that based on our parallax attention stereo matching network->It is demonstrated that our parallax attention stereo matching -edge detection task is conducive to improve the accuracy of disparity estimation? Please make the sentence more understandable. 5. Section 3.2 -“However, each of the learned 3D filters with a local field that the output feature map U is unable to learn contextual information.” The whole sentence is not clear, please reorganize it. If I am understanding right. How about revising as the following: However, 3D filters learned within a local field that lacks contextual information in the output feature map U. -when in the c-th chanel->with regard to the c-th channel -We can treat M_u as a collector of..->The mixed feature map M_u can be treated as a collector of the local disparity texture information, and its function is to describe the entire parallax image -“Subsequently, the descriptors” You mentioned before as “a feature map”, which is inconsistent with ‘descriptors’ used here. Please avoid confusion like this in the whole manuscript. Also, “Subsequently” occurred twice in a paragraph. Replace the second one with ”additionally”. -Our benefits can be summarized as follows-> our contribution can be summarized as follows\\ -shorter->faster -“increase the performance of end-point-error” increase? Is the higher value of end-point-error, the better? -Our proposed PA module-> Our PA module -So we can add it directly to 3D convolution layers->, which can be added directly to 3D convolution layers 6. Section 3.3 -ED features were fused-> ED features are fused -Please elaborate the equation of group-wise cost volume after equation 6. 7. Section 4.1 -It decreased 10 times->it is down-scaled by 10 when exceeding 200 epochs -In the section of discussion, please revise “our method performance” to “our method performs...” -Table 2 caption, “..are trained in our batch-size”->..are trained with the same batch size as our method. Please check through the whole manuscript. -Table 3 caption, citation[10] should follow the word “test” 8. Section 4.3 - “the PA module techniques”, is PA module a technique？ - because->since - Our proposed PA-Net performance is better-> our PA-Net performs better - To select the proper number of->To select an optimal value of.. 9. Section 4.4 - A collection of->several Summary Of The Review: Based on theses observation, I believe this paper is slightly below the the acceptance threshold. And I suggest a re-submission to the journal of PLOS-ONE. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #6: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 12 Jan 2022 We are very grateful to your comments for the manuscript. To prove the necessity of PA module, we add a new section (Section 4.5 Analysis and Interpretation) to exhibit PA blocks performs better than previous works. Moreover, we add a new figure map (Fig 7. Attention distribution map in KITTI 2015.) which provides an explain how the parallax attention mechanism operates in practice and why it is effective. Submitted filename: Response to Reviewers.docx Click here for additional data file. 26 Jan 2022 Parallax attention stereo matching network based on the improved group-wise correlation stereo network PONE-D-21-20417R4 Dear Dr. jinan, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Jyotismita Chaki, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): I am happy to inform you that reviewers are satisfied with the revised manuscript. Thus the manuscript is provisionally accepted for publication. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #6: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #6: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #6: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #6: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #6: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #6: The text deserves to be proofread again to correct typos before publishing. Here are some typos in the re-submitted manuscript: 1: In section 4.3, "we can easily observation.." --> we can easily observe.. 2: Fig 6 caption, "PA-Net exhibit"--> PA-Net exhibits A careful reading and grammar check would definitely increase the quality of the paper. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #6: No 31 Jan 2022 PONE-D-21-20417R4 Parallax attention stereo matching network based on the improved group-wise correlation stereo network Dear Dr. Gu: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Jyotismita Chaki Academic Editor PLOS ONE

3 in total

1. Stereo processing by semiglobal matching and mutual information.

Authors: Heiko Hirschmüller
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2008-02 Impact factor: 6.226

2. Efficient human pose estimation from single depth images.

Authors: Jamie Shotton; Ross Girshick; Andrew Fitzgibbon; Toby Sharp; Mat Cook; Mark Finocchio; Richard Moore; Pushmeet Kohli; Antonio Criminisi; Alex Kipman; Andrew Blake
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2013-12 Impact factor: 6.226

3. Parallax Attention for Unsupervised Stereo Correspondence Learning.

Authors: Longguang Wang; Yulan Guo; Yingqian Wang; Zhengfa Liang; Zaiping Lin; Jungang Yang; Wei An
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2022-03-04 Impact factor: 6.226

3 in total