Abstract & Pipeline

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. Motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images, we train a point-cloud encoder within a devised point-based neural renderer by comparing the rendered images with real images on massive RGB-D data. The learned point-cloud encoder can be easily integrated into various downstream tasks, including not only high-level tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image synthesis. Extensive experiments on various tasks demonstrate the superiority of our approach compared to existing pre-training methods.

overview

The pipeline of our point cloud pre-training via neural rendering (Ponder). Given multi-view RGB-D images, we first construct the point cloud by back-projection, then use a point cloud encoder to extract per-point features. Those features are organized to a 3D feature volume (visualized as an image in this figure) by average pooling. Finally, the 3D feature volume is rendered to multi-view RGB-D images via a differentiable neural rendering, which are compared with the original input multi-view RGB-D images as the supervision. Point cloud encoder and color decoder are used for transfer learning on downstream tasks.

Performance

Ponder achieves state-of-the-art performance on various downstream tasks, e.g., 3D semantic segmentation on ScanNet and ScanNet-200.


Citation

The website template was borrowed from Michaƫl Gharbi and Ref-NeRF. Thanks to them!