LTM-NeRF: Embedding 3D Local Tone Mapping in HDR Neural Radiance Field

Xin Huang¹ Qi Zhang² Ying Feng²

Hongdong Li³ Qing Wang¹

¹Northwestern Polytechnical University ²Tencent AI Lab ³The Australian National University

IEEE T-PAMI 2024

Our LTM-NeRF, which incorporates the Camera Response Function (CRF) module and the Neural Exposure Field, collaborates seamlessly with NeRF. Using only (a) LDR views with different exposure settings as the supervision, LTM-NeRF can reconstruct an HDR neural radiance field for HDR view rendering. Furthermore, LTM-NeRF also supports directly producing (b) locally tone-mapped views, or (c) the LDR views (globally tone-mapped using CRF) under a variety of exposure settings.

Abstract

Recent advances in Neural Radiance Fields (NeRF) have provided a new geometric primitive for novel view synthesis. High Dynamic Range NeRF (HDR NeRF) can render novel views with a higher dynamic range. However, effectively displaying the scene contents of HDR NeRF on diverse devices with limited dynamic range poses a significant challenge. To address this, we present LTM-NeRF, a method designed to recover HDR NeRF and support 3D local tone mapping. LTM-NeRF allows for the synthesis of HDR views, tone-mapped views, and LDR views under different exposure settings, using only the multi-view multi-exposure LDR inputs for supervision. Specifically, we propose a differentiable Camera Response Function (CRF) module for HDR NeRF reconstruction, globally mapping the scene's HDR radiance to LDR pixels. Moreover, we introduce a Neural Exposure Field (NeEF) to represent the spatially varying exposure time of an HDR NeRF to achieve 3D local tone mapping, for compatibility with various displays. Comprehensive experiments demonstrate that our method can not only synthesize HDR views and exposure-varying LDR views accurately but also render locally tone-mapped views naturally.

Methodology

Overview of LTM-NeRF. The optimization of LTM-NeRF is composed of two stages. In stage one, an HDR NeRF is recovered by integrating a NeRF-based framework with the CRF Network. The first stage is optimized under the supervision of multi-view multi-exposure images. After the optimization, the learned HDR NeRF and CRF are frozen. In stage two, a Neural Exposure Field (NeEF) is introduced to represent the exposure time of the HDR NeRF and achieve 3D tone mapping. To optimize NeEF, we conduct pseudo-ground truth as the supervision. After the whole training, the system can render locally tone-mapped views, HDR views, and LDR views with different exposures.

Results -- LDR Views

After reconstructing the HDR NeRF from multi-exposure multi-view images, LTM-NeRF can efficiently render controllable LDR views with varying exposures due to the incorporating of the Camera Response Function during the rendering process.

Results -- HDR Views

LTM-NeRF can render HDR novel views from the HDR NeRF. The HDR views are tone mapped with a 2D Tone Mapping Operation (TMO) for display. In comparison to LDR views, the HDR views contain the scene content of both over-exposure and under-exposure areas.

Results -- Locally Tone-mapped Views

LTM-NeRF can also directly render locally tone-mapped views. Thanks to the introduction of the neural exposure field, LTM-NeRF preserves the consistency of rendered views. The Exposure Maps display the learned exposure time of each 3D point within the HDR radiance field. The brighter the map, the longer the exposure time. The darker regions correspond with a longer exposure time while the brighter regions correspond with a shorter exposure time to achieve an appropriate contrast.

Comparisons for LDR/HDR View Synthesis

LTM-NeRF outperforms both NeRF and NeRF-W on novel LDR Views rendering. When compared with NeRF-GT (the upper bound of our method) for novel LDR and HDR view rendering, our method achieves similar performance.

Comparisons for Locally Tone-mapped View Synthesis

Compared with post-processing methods that render HDR video from HDR NeRF and then tone map it with 2D image and video methods, LTM-NeRF produces tone-mapped views with more vivid color and improved contrast. Most importantly, our 3D tone mapping method inherently preserves the view consistency.

To enhance the view consistency, a potential strategy is to train a NeRF from the images tone-mapped with 2D methods and then render views. However, this strategy also exhibits undesirable luminance changes. The flicker artifacts among the training views are reconstructed because they are interpreted as the view-dependent color by NeRF, though the small flickers are smoothed due to the high-frequency information being difficult for NeRF to overfit.

Positions to Exposure Time V.S. Radiance to Exposure Time

We train NeEF by taking 3D positions as inputs. Another implementation is to learn NeEF from radiance. However, when using radiance as input, high-frequency radiance may result in noisy exposure time and tone-mapped views. Additionally, view-dependent effects and floating artifacts in the radiance may introduce artifacts in NeEF.

Comparisons with pseudo-GTs

We compare our rendered tone-mapped view with the pseudo-ground truths generated by the exposure fusion methods. The pseudo-ground truths appear visible seams and view inconsistency. LTM-NeRF effectively smooths the inconsistency and produces better results without obvious seams and a more natural color.

Comparisons with RawNeRF for Tone-mapped View Synthesis

RawNeRF is trained from multi-view HDR images without noise, since it is unable to take multi-exposure multi-view LDR images as input. The rendered HDR views are tone-mapped using RawNeRF's tone mapping operation that employs a single gamma curve for the entire view. As shown in the video, RawNeRF's global tone mapping results in a loss of local contrast, though it preserves view consistency.

Different 3D Representations

LTM-NeRF is also compatible with other 3D scene representations, such as Instant-NGP. We showcase the results using Instant-NGP to represent both the radiance and exposure fields. The exposure maps, rendered using the model integrated with Instant-NGP, exhibit superior quality, attributable to the accurate geometry learned by Instant-NGP.

Tone-mapped Views (NeRF)

Exposure Maps (NeRF)

Tone-mapped Views (Instant-NGP)

Exposure Maps (Instant-NGP)