Skip to content

Commit fe71f1d

Browse files
authored
Update README.md
Explanation to the code for Depth-Estimation
1 parent 25c0b26 commit fe71f1d

File tree

1 file changed

+118
-0
lines changed

1 file changed

+118
-0
lines changed

README.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,120 @@
11
# Depth-Estimation
22
Depth Estimation - overview
3+
4+
This is a python code demonstrating a sample working of depth estimation using **MiDaS** model.
5+
The MiDaS models are trained on a mix of several depth estimation datasets, including MegaDepth, ReDWeb, and WSVD. These datasets provide diverse and comprehensive training examples, enabling the model to generalize well to a variety of real-world scenarios.
6+
7+
#### Step 1: Install required libraries
8+
```
9+
pip install torch torchvision opencv-python
10+
pip install timm
11+
```
12+
* **Torch** and **torchvision**: These are essential libraries for building and running deep learning models.
13+
* **OpenCV**: A powerful library for image processing tasks.
14+
* **Timm**: A library for PyTorch image models, which provides pre-trained models and transformations.
15+
16+
After installation is complete, write the following code. Make sure to have a sample image to test the code out on ready. Here it is named "example.jpg", change the name of the image to the name assigned to the image you have saved.
17+
18+
#### Step 2: Import libraries
19+
```
20+
import torch
21+
import cv2
22+
import matplotlib.pyplot as plt
23+
import timm
24+
```
25+
* **Matplotlib**: Used for displaying the input image and the resulting depth map.
26+
27+
#### Step 3: Load the MiDaS model
28+
The MiDaS model is used for monocular depth estimation, which predicts depth from a single image.
29+
```
30+
model_type = "DPT_Large" # MiDaS v3 - Large model
31+
midas = torch.hub.load("intel-isl/MiDaS", model_type)
32+
```
33+
* **MiDaS** stands for **Monocular Depth Estimation for Autonomous Systems**, it provides robust depth predictions from images. The "DPT_Large" model is a variant of MiDaS that uses a large Transformer architecture for enhanced accuracy. MiDaS was developed by Intel.
34+
35+
#### Step 4: Load the transformation pipeline
36+
The transformation pipeline processes the input image to be compatible with the model.
37+
```
38+
midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")
39+
transform = midas_transforms.dpt_transform
40+
```
41+
* **Transforms**: These pre-process the input image by resizing, normalizing, and converting it into a tensor. This step is essential to ensure the image format matches what the model expects.
42+
43+
#### Step 5: Load an example image
44+
Load any image you want to estimate the depth for.
45+
```
46+
img_path = "example.jpg"
47+
img = cv2.imread(img_path)
48+
```
49+
* **OpenCV** is used to read the image from the specified file path. Make sure to replace "example.jpg" with the path to your image file.
50+
51+
#### Step 6: Check if the image was loaded successfully
52+
```
53+
if img is None:
54+
raise ValueError(f"Image at path '{img_path}' could not be loaded. Please check the file path and try again.")
55+
56+
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
57+
```
58+
If the image cannot be loaded, an error message is displayed.
59+
60+
#### Step 7: Apply the transformation to the image
61+
Transform the image to prepare it for input to the model.
62+
```
63+
input_batch = transform(img).unsqueeze(0)
64+
```
65+
* The **transform** function converts the image into a tensor and adds a batch dimension, which is required by the model.
66+
67+
#### Step 8: Ensure the input tensor has the correct shape
68+
```
69+
if len(input_batch.shape) == 5:
70+
input_batch = input_batch.squeeze(0) # Remove the extra dimension
71+
```
72+
This step ensures the input tensor has the correct shape, typically (batch_size, channels, height, width).
73+
74+
#### Step 9: Move the input to the GPU if available
75+
```
76+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
77+
midas.to(device)
78+
input_batch = input_batch.to(device)
79+
```
80+
The code checks for GPU availability and moves both the model and input tensor to the GPU for faster processing.
81+
82+
#### Step 10: Perform depth estimation
83+
84+
```
85+
with torch.no_grad():
86+
prediction = midas(input_batch)
87+
```
88+
The model generates a depth map prediction for the input image without updating model weights (torch.no_grad() ensures no gradients are calculated, reducing memory usage).
89+
90+
#### Step 11: Remove the extra dimension and convert to numpy
91+
Convert the prediction to a more usable format.
92+
```
93+
prediction = torch.nn.functional.interpolate(
94+
prediction.unsqueeze(1),
95+
size=img.shape[:2],
96+
mode="bicubic",
97+
align_corners=False,
98+
).squeeze()
99+
depth_map = prediction.cpu().numpy()
100+
```
101+
* **interpolate** Resizes the depth prediction to match the input image size using bicubic interpolation, which helps maintain detail.
102+
* ```depth_map = prediction.cpu().numpy()``` the prediction is converted from a tensor to a NumPy array for easy manipulation and display.
103+
104+
#### Step 12: Display the depth map
105+
Visualize the original image alongside the estimated depth map.
106+
```
107+
plt.figure(figsize=(10, 5))
108+
plt.subplot(1, 2, 1)
109+
plt.title("Original Image")
110+
plt.imshow(img)
111+
plt.axis("off")
112+
113+
plt.subplot(1, 2, 2)
114+
plt.title("Depth Map")
115+
plt.imshow(depth_map, cmap="inferno")
116+
plt.axis("off")
117+
118+
plt.show()
119+
```
120+
* **Matplotlib** is used to display the original image and the depth map. The depth map is shown using the 'inferno' colormap, which provides a visually appealing representation of depth.

0 commit comments

Comments
 (0)