We propose neural Fourier filter bank to perform spatial and frequency-wise decomposition jointly, inspired by wavelets. Our method provides significantly improved reconstruction quality given the same computation and storage budget, as represented by the PSNR curve and the error image overlay. Relying only on space partitioning without frequency resolution (InstantNGP) or frequency encodings without space resolution (SIREN) provides suboptimal performance and convergence. Simply considering both (ModSine) enhances scalability when applied to larger scenes, but not in terms of quality and convergence.
We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, we learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and convergence speed on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields.
In our framework, given a position x, low- and high-frequency filters are used to decompose the signal, which is then reconstructed by accumulating them and using the intermediate outputs as shown. Here, we utilize a multi-scale grid to act as if they store these high-frequency filtering outcomes at various spatially decomposed locations.
Based on the input query, e.g. position x, our neural Fourier filter bank uses both a grid and a Multi-Layer Perceptron (MLP) to compose the final estimate. Specifically, grid features are extracted via interpolation at multiple scale levels, which are then encoded to appropriate frequencies for each layer via the Fourier Feature layers. The MLP uses these encoded features as the higher-frequency component, while the earlier layer outputs as the lower-frequency ones, similar to wavelet filter banks. Intermediate outputs are then aggregated as the final estimate.
2D Fitting -- Qualitative results for the Tokyo image. Our method provides the best reconstruction quality at various scale levels, from nearby regions to far away ones, demonstrating the importance of considering both space and frequency jointly.
3D Fitting -- Qualitative comparisons for the `Bearded Man' shape. Our method is the most compact among the compared methods, and is capable of reconstructing both coarse and fine details without obvious artifacts.
Novel View Synthesis -- Although more compact, our method can synthesize comparable or better results.
Ablation study -- We compare against variants of our method with the Fourier grid feature and/or the proposed MLP composition architecture disabled. Having both components together is critical for performance.
2D fitting result for `Tokyo' image. Our entire reconstructed image is presented on the top while four close-up views are presented on the bottom. Our method provides near-perfect reconstruction.
3D fitting result for the `Asian Dragon' The left sub-image is the ground truth shape while six zoomed insets are shown on the right for better detail visualizations.
Qualitative results for neural radiance fields. Our method is able to clearly reconstruct the textures (eg, the chair on second row) and the geometric details (eg the lego on the last row).
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant, Digital Research Alliance of Canada, and by Advanced Research Computing at the University of British Columbia.