Transparent objects are ubiquitous in industry, pharmaceuticals, and households. Grasping and manipulating these objects is a significant challenge for robots. Existing methods have difficulty reconstructing complete depth maps for challenging transparent objects, leaving holes in the depth reconstruction. Recent work has shown neural radiance fields (NeRFs) work well for depth perception in scenes with transparent objects, and these depth maps can be used to grasp transparent objects with high accuracy. NeRF-based depth reconstruction can still struggle with especially challenging transparent objects and lighting conditions.
In this work, we propose Residual-NeRF, a method to improve depth perception and training speed for transparent objects. Robots often operate in the same area, such as a kitchen. By first learning a background NeRF of the scene without transparent objects to be manipulated, we reduce the ambiguity faced by learning the changes with the new object. We propose training two additional networks: a residual NeRF learns to infer residual RGB values and densities, and a Mixnet learns how to combine background and residual NeRFs. We contribute synthetic and real experiments that suggest Residual-NeRF improves depth perception of transparent objects. The results on synthetic data suggest Residual-NeRF outperforms the baselines with a 46.1% lower RMSE and a 29.5% lower MAE. Real-world qualitative experiments suggest Residual-NeRF leads to more robust depth maps with less noise and fewer holes.
Bowl | Drink Flat | Drink Up | Wine Flat | |
Method | RMSE ↓ | NeRF | 0.1900 | 0.6398 | 0.4361 | 0.2166 |
Dex-NeRF | 0.0365 | 0.1065 | 0.0699 | 0.0425 |
Residual-NeRF | 0.0213 | 0.0234 | 0.0316 | 0.0388 |
Bowl | Drink Flat | Drink Up | Wine Flat | |
Method | MAE ↓ | NeRF | 0.1453 | 0.3424 | 0.3062 | 0.1483 |
Dex-NeRF | 0.0195 | 0.0189 | 0.0255 | 0.0255 |
Residual-NeRF | 0.0140 | 0.0138 | 0.0170 | 0.0142 |