This is how we go from images to somewhat depth understanding using neural networks.
Monocular Depth Estimation
The paper that started it all ( 11 years ago from today 2025 )
It was released in 2014
No one does this anymore, but it was really good as one of the starter DL papers on inverse graphics.
The limitations were there because the ground truth data wasn't that great, edge estimation wasn't great, black holes etc.
To train NN , we need a lot of data, and some data is not scaled.
To tackle this we make scale invariant loss functions for our NN to train on.
From "Towards Robust Monocular Depth Estimation: Mixing Datasets
for Zero-Shot Cross-Dataset Transfer, Ranftl et al. 2022 "