Existing neural radiance field-based methods can achieve real-time rendering of small scenes on the web platform. However, extending these methods to large-scale scenes still poses significant challenges due to limited resources in computation, memory, and bandwidth. In this paper, we propose City-on-Web, the with dynamic loading/unloading of resources to significantly reduce memory demands. Our system achieves real-time rendering of large-scale scenes at approximately 32FPS with RTX 3060 GPU on the web and maintains rendfirst method for real-time rendering of large-scale scenes on the web. We propose a block-based volume rendering method to guarantee 3D consistency and correct occlusion between blocks, and introduce a Level-of-Detail strategy combinedering quality comparable to the current state-of-the-art novel view synthesis methods.
We first divide the entire scene into different blocks without overlap according to the ground plane. For the block that the ray passes through, the corresponding shader renders the color and opacity of each block. We depth sort the blocks that ray traversed, and then render the final result through alpha blending that maintains 3D consistency.
To facilitate volume rendering in the resource-independent environment with multiple shaders, the blocks are sorted by their distance from the camera. We then blend the rendering results according to the sequence. Using our derived alpha blending method, the rendering results can blend correctly. This method is occlusion-aware and maintains 3D consistency.
We generate additional LODs from the most detailed level obtained in the training stage. For every four block models, we downsample the resolution of each submodel’s virtual grid and then retrain a shared deferred MLP. Initially, we freeze the merf models to ensure the appearance remains consistent with finer LOD result. Afterward, we train the merf models and the shared deferred MLP together to refine the scene further.
In the rendering phase, the choice of different LODs and blocks for rendering is dynamically determined based on the camera's position and the view frustum. These selected blocks are then processed in their respective shaders to produce the color and opacity. And We leverage our block-based volume rendering strategy to ensure three-dimensional consistency and correct occlusion.
Our method excels in recovering finer details and achieves a higher quality of reconstruction across diverse scales and environments.
If you find our paper useful for your work please cite:
@inproceedings{Song2024City,
author = {Kaiwen Song and Xiaoyi Zeng and Chenqu Ren and Juyong Zhang},
title = {City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2024}
}