A Real-World 3D Object Inverse Rendering Benchmark

NeurIPS 2023 Datasets & Benchmarks Track

Stanford University

TL;DR

We present a novel real-world 3D Object inverse Rendering Benchmark, Stanford-ORB, to evaluate object inverse rendering methods. The benchmark consists of:

2,795 HDR images of 14 objects captured in 7 in-the-wild scenes (each object is captured in 3 scenes);
418 HDR ground truth environment maps aligned with image captures;
Studio captured textured mesh of all objects;
A set of comphehensive benchmarks for inverse rendering evaluation;

The benchmark is set to be plug & play -- all data has been cleaned and organized in the most common structures (i.e. Blender, LLFF, Colmap). We also provide the scripts for dataloading and evaluation, along with the results from various state-of-the-art methods. To test your model, check our github page for more details.

Gallary

Scanned Meshes (Texture generated from NVDiffRec)

Grogu

Curry

Gnome

Teapot

Car

Pepsi

Image Captures

Overview

The figure shows the overall pipeline of data capture. For each object, Left: we obtain its 3D shape using a 3D scanner and Physics-Based Rendering (PBR) materials using high-quality light box images. Middle: we also capture multi-view masked images in 3 different in-the-wild scenes, together with the ground-truth environment maps. Right: we carefully register the camera poses for all images using the scanned mesh and recovered materials, and prepare the data for the evaluation benchmarks. Credit to Maurice Svay for the low-poly camera mesh model.

Benchmark Design

Our benchmark is based on the single-scene reconstrution. That means, to test a model, it should be trained with one of our datapoints (i.e. images of one object captured in one scene) and being evaluted at a time. . The evaluation includes:

Novel View Synthesis: Evaluating the inferred novel views in the same scene;
Novel Scene Relighting: Evaluating the inferred novel views in novel scenes, given ground-truth environment map;
Geometry Estimation: Evaluating the reconstructed 3D geometry.

For more details of our benchmark, please refer to our github page.

Results

Quantitative Results

Method		Geometry Estimation			Novel Scene Relighting				Novel View Synthesis
	Training Views	Depth↓	Normal↓	Shape↓	PSNR-H↑	PSNR-L↑	SSIM↑	LPIPS↓	PSNR-H↑	PSNR-L↑	SSIM↑	LPIPS↓
Latest Methods
Neural-PBIR (ICCV 2023)	All	0.30	0.06	0.43	26.01	33.26	0.979	0.023	28.82	36.80	0.986	0.019
IllumiNeRF (Neurips 2024)	All	N/A			25.56	32.74	0.976	0.027	N/A
RelitLRM	6	N/A			24.67	31.52	0.969	0.032	N/A
Novel View Synthesis / 3D Reconstruction Methods
IDR	All	0.35	0.05	0.30	N/A				30.11	39.66	0.990	0.017
NeRF	All	2.19	0.62	62.05	N/A				26.31	33.59	0.968	0.044
Material Decomposition Methods
Neural-PIL	All	0.86	0.29	4.14	N/A				25.79	33.35	0.963	0.051
PhySG	All	1.90	0.17	9.28	21.81	28.11	0.960	0.055	24.24	32.15	0.974	0.047
NVDiffRec	All	0.31	0.06	0.62	22.91	29.72	0.963	0.039	21.94	28.44	0.969	0.030
NeRD	All	1.39	0.28	13.70	23.29	29.65	0.957	0.059	25.83	32.61	0.963	0.054
NeRFactor	All	0.87	0.29	9.53	23.54	30.38	0.969	0.048	26.06	33.47	0.973	0.046
InvRender	All	0.59	0.06	0.44	23.76	30.83	0.970	0.046	25.91	34.01	0.977	0.042
NVDiffRecMC	All	0.32	0.04	0.51	24.43	31.60	0.972	0.036	28.03	36.40	0.982	0.028
Single-View Prediction Methods
SI-SVBRDF	1	81.48	0.29	N/A	N/A				N/A
SIRFS	1	N/A	0.59	N/A	N/A				N/A
Reference Results
NVDiffRecMC+GT Mesh	All	N/A			25.08	32.28	0.974	0.027	N/A
NVDiffRec+GT Mesh	All	N/A			24.93	32.42	0.975	0.027	N/A

Qualitative Results

Download Links

For convenience proposes, we provide separate download links for images organized in different data structures, and the auxiliary files.

blender_LDR.tar.gz (11G): LDR images and camera files (organized as the Blender Dataset structure);
blender_HDR.tar.gz (72G): HDR images and camera files (organized as the Blender Dataset structure);
llff_colmap_LDR.tar.gz (11G): LDR images and camera files (organized as the LLFF Dataset structure and as the Colmap's structure);
llff_colmap_HDR.tar.gz (72G): HDR images and camera files (organized as the LLFF Dataset structure and as the Colmap's structure);
ground_truth.tar.gz (4.8G): GT environment maps, 3D meshes, depth maps, normal maps, and pesudo-gt albedo maps.

If your model prefers the Blender Dataset, download blender_LDR.tar.gz, blender_HDR.tar.gz and ground_truth.tar.gz. Otherwise download llff_colmap_LDR.tar.gz, llff_colmap_HDR.tar.gz and ground_truth.tar.gz. Notice that the HDR dataset is also required for HDR evaluations even if your model runs in LDR.

Citation

@inproceedings{kuang2023stanfordorb,
  title={Stanford-ORB: a real-world 3D object inverse rendering benchmark},
  author={Kuang, Zhengfei and Zhang, Yunzhi and Yu, Hong-Xing and Agarwala, Samir and Wu, Elliott and Wu, Jiajun and others},
  journal={Advances in Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2023}
}

The website template was borrowed from Michaël Gharbi.