3D Focus Image Stacks

We developed a novel multi-view image noise reduction algorithm for camera array using a data structure called 3D focus image stacks (3DFIS) with which the disparity map (from the target view) can be efficiently constructed, and multi-view denoising can be performed. The 3DFIS is constructed by shifting each of the views with respect to the target view by an amount that is related to the each camera position (s, t) and candidate disparity values d. The mathematical formulation of 3DFIS, denoted as F^d, can be expressed as the following equation:

Each surface point in the 3D space is projected onto each image, forming corresponding points that are separated by (sd, td). In other words, if a pixel (x, y) has the correct disparity value that equals d, then its corresponding points will be shifted to the same position in the 3DFIS F^d. Such a point is called an in-focus pixel. On the contrary, if they are shifted by some other amount, e.g. d’, then these points will be shifted to different positions in F^d’, causing an out-of-focus effect. This process can be illustrated in Fig. 1. A visual representation of the 3DFIS is shown in Fig. 2 by simply averaging each stack with respect to the 3^rd dimension.

Fig. 1 Illustration of 3DFIS construction using a three-view system.


Disparity = 5	Disparity = 6	Disparity = 10	Disparity = 14

Fig. 2 Visualization of 3D focus image stacks.

Disparity Map Estimation

Disparity map estimation is a preliminary but also crucial procedure prior to the actual denoising as it is the essential tool to extract the corresponding 3DFIS for each pixel during denoising process. Using the 3DFIS, we proposed a robust disparity map estimation algorithm with texture-based view selection and patch size variation strategy.

Our disparity estimation is based on local stereo matching algorithm, and a robust matching cost C for each pixel (x, y) and each candidate disparity d is proposed, as shown in the following equation:

where W(x, y) is a window centered at (x, y), N is the total number of pixels in W, and K is the number of views. This matching cost is robust to noise, but vulnerable to occlusion, and also generates artifacts in flat regions.

In response, we proposed a view selection scheme to handle occlusion problem, and a patch size variation scheme to reduce artifacts in flat regions. The key is to find the corresponding regions of occlusion and flat areas. A texture map is estimated from the 3DFIS, which is shown in Fig. 3, and we derived the size of patches L and number of views V from the texture values with pre-defined linear equations.

Fig. 3 (a) noisy image, (b) texture map

We applied the variable patch size L and number of views V back to the matching cost C, and the resulting disparity map shows great improvement over existing algorithms, as shown in Fig. 4.

a	b	c
Err = 64.84%	Err = 68.58%	Err = 43.20%
d	e	f
Err = 53.02%	Err = 37.26%	Err = 19.54%

Fig. 4 Comparison of disparity estimation methods: (a) Taniai et al.; (b) Lee et al.;
(c) Klaus et al.; (d) Miyata et al.; (e) Zhou et al.; (f) proposed.

Multi-view Denoising Algorithm

With the 3DFIS and disparity map, we can denoise the target image by searching for similar patches and performing denoising process on the patch groups. In order to improve the noise robustness of searching, we proposed a 3DFIS-based similarity that takes the entire patch volume instead of single patch into account. Fig. 5 shows the comparison of different searching metrics, with red boxes indicating reference patch and green boxes indicating similar patches.

Fig. 5 Similar patch searching using different metrics: (a) block matching (clean); (b) block matching (noisy); (c) Zhang et al.; (d) proposed.

In the meantime, for each pixel, instead of using the centered window all the way, we proposed a depth-guided adaptive window selection that can intelligently select the appropriate window that causes minimum match error. As indicated by Fig. 6, totally five candidate windows are considered for the reference pixel. Among them, window #2 and #4 are eliminated due to incorrect disparity, and window #1, which has smallest matching error within the remaining candidates, is chosen as the final window for the pixel.

a	b	c
d	e	f

Fig. 6 (a) sample pixel and five windows; (b)-(f) patch vectors across views and RMSE plots for j = 0, 1, 2, 3, 4.

Finally, after the similar patches are grouped, a low-rank minimization scheme is applied to remove the noise. Each pixel is covered by multiple denoised patches, and to determine the value of the pixel in the denoised image, we can take an average of all denoised patches that cover this pixel.

The comparison of our proposed denoising algorithm with other methods is shown in Table 1.

? = 20
Image	NLM	BM3D	WNNM	Miyata et al.	VBM4D	Zhou et al.	Proposed
Tsukuba	29.13	31.41	31.72	28.36	30.41	32.43	33.62
Bicycle	27.15	28.54	28.31	27.08	30.42	30.53	31.55
Dishes	28.61	29.79	30.44	28.22	30.97	32.59	33.66
Knight	28.16	29.87	30.61	28.62	31.54	32.35	33.57
Medieval	29.59	30.99	31.05	29.33	32.67	32.70	33.55
Sideboard	26.01	27.53	28.59	26.87	29.27	30.40	31.28
Tarot	25.11	26.38	26.91	25.53	28.07	29.73	30.73
? = 30
Image	NLM	BM3D	WNNM	Miyata et al.	VBM4D	Zhou et al.	Proposed
Tsukuba	27.19	29.25	29.47	25.78	28.22	29.91	30.82
Bicycle	25.22	26.21	26.32	24.93	28.33	28.49	29.41
Dishes	26.40	27.51	28.19	25.86	28.42	30.01	31.20
Knight	25.93	27.55	28.09	26.20	29.25	29.85	30.97
Medieval	27.49	29.47	29.59	26.85	30.67	30.46	31.58
Sideboard	23.88	25.18	26.16	24.90	26.93	28.01	28.91
Tarot	23.07	23.82	24.30	23.36	25.64	27.51	28.39
? = 40
Image	NLM	BM3D	WNNM	Miyata et al.	VBM4D	Zhou et al.	Proposed
Tsukuba	25.33	27.73	27.92	24.16	26.71	27.74	28.51
Bicycle	23.64	24.75	24.97	23.34	26.86	26.76	27.88
Dishes	24.59	25.53	26.59	24.08	26.65	27.97	29.34
Knight	24.09	25.74	26.39	24.41	27.58	27.71	29.11
Medieval	25.89	28.28	28.34	24.99	29.20	28.49	30.11
Sideboard	22.29	23.48	24.33	23.25	25.28	26.24	27.24
Tarot	21.40	21.95	22.68	21.87	23.85	25.50	26.58
? = 50
Image	NLM	BM3D	WNNM	Miyata et al.	VBM4D	Zhou et al.	Proposed
Tsukuba	24.04	26.54	26.80	22.83	25.54	25.82	26.60
Bicycle	22.47	23.69	23.90	22.09	25.72	25.57	26.64
Dishes	23.26	24.57	25.43	22.70	25.36	26.20	26.24
Knight	22.83	24.45	25.01	22.89	26.31	25.83	27.47
Medieval	24.76	27.50	27.48	23.53	28.02	26.81	28.77
Sideboard	21.22	22.44	23.12	21.74	24.05	24.81	26.01
Tarot	20.20	20.80	21.47	20.64	22.50	23.88	25.14

Table 1. Denoising performance (PSNR) compared with other methods (best – bold; second best – underlined).

References

S. Zhou, Z. Lou, Y. H. Hu, and H. Jiang, “Multiple view image denoising using 3D focus image stacks,” Computer Vision and Image Understanding, to appear, 2018.

M. Miyata, K. Kodama, and T. Hamamoto. “Fast multiple-view denoising based on image reconstruction by plane sweeping.” in Proc. IEEE Vis. Commun. Image Process., Dec. 2014, pp. 462-465.

L. Zhang, S. Vaddadi, H. Jin, and S. Nayar, “Multiple view image denoising,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., Jun. 2009, pp. 1542-1549.

S. Zhou, Y. H. Hu, and H. Jiang, “Patch-based multiple view image denoising with occlusion handling,” in Proc. IEEE Int. Conf. Accoustic, Speech, Signal Process. (ICASSP), 2017, pp. 1782-1786.

T. Taniai, Y. Matsushita, Y. Sato, and T. Naemura, “Continuous stereo matching using local expansion moves,” arXiv preprint arXiv:1603.08328, Mar. 2016.

S. Lee, J. H. Lee, J. Lim, and H. I. Suh, “Robust stereo matching using adaptive random walk with restart algorithm,” Image and Vision Computing, vol. 37, 2015, pp.1-11.

A. Klaus, M. Sormann, and K. Karner, “Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure,” in Proc. 18th IEEE Int. Conf. Pattern Recognit., vol. 3, 2006, pp. 15-18.

A. Buades, B. Coll, and J. Morel. “A non-local algorithm for image denoising,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., Jun. 2005, pp. 60-65.

K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, 2007, pp. 2080-2095.

S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2014, pp. 2862-2869.

M. Maggioni, G. Boracchi, A. Foi, and K. Egiazarian, “Video denoising, deblocking, and enhancement through separable 4-D nonlocal spatiotemporal transforms,” IEEE Trans. Image Process., vol. 21, no. 9, 2012, pp. 3952-3966.