|
1 | 1 | # Depth-Estimation
|
2 | 2 | Depth Estimation - overview
|
| 3 | + |
| 4 | +This is a python code demonstrating a sample working of depth estimation using **MiDaS** model. |
| 5 | +The MiDaS models are trained on a mix of several depth estimation datasets, including MegaDepth, ReDWeb, and WSVD. These datasets provide diverse and comprehensive training examples, enabling the model to generalize well to a variety of real-world scenarios. |
| 6 | + |
| 7 | +#### Step 1: Install required libraries |
| 8 | +``` |
| 9 | +pip install torch torchvision opencv-python |
| 10 | +pip install timm |
| 11 | +``` |
| 12 | +* **Torch** and **torchvision**: These are essential libraries for building and running deep learning models. |
| 13 | +* **OpenCV**: A powerful library for image processing tasks. |
| 14 | +* **Timm**: A library for PyTorch image models, which provides pre-trained models and transformations. |
| 15 | + |
| 16 | +After installation is complete, write the following code. Make sure to have a sample image to test the code out on ready. Here it is named "example.jpg", change the name of the image to the name assigned to the image you have saved. |
| 17 | + |
| 18 | +#### Step 2: Import libraries |
| 19 | +``` |
| 20 | +import torch |
| 21 | +import cv2 |
| 22 | +import matplotlib.pyplot as plt |
| 23 | +import timm |
| 24 | +``` |
| 25 | +* **Matplotlib**: Used for displaying the input image and the resulting depth map. |
| 26 | + |
| 27 | +#### Step 3: Load the MiDaS model |
| 28 | +The MiDaS model is used for monocular depth estimation, which predicts depth from a single image. |
| 29 | +``` |
| 30 | +model_type = "DPT_Large" # MiDaS v3 - Large model |
| 31 | +midas = torch.hub.load("intel-isl/MiDaS", model_type) |
| 32 | +``` |
| 33 | +* **MiDaS** stands for **Monocular Depth Estimation for Autonomous Systems**, it provides robust depth predictions from images. The "DPT_Large" model is a variant of MiDaS that uses a large Transformer architecture for enhanced accuracy. MiDaS was developed by Intel. |
| 34 | + |
| 35 | +#### Step 4: Load the transformation pipeline |
| 36 | +The transformation pipeline processes the input image to be compatible with the model. |
| 37 | +``` |
| 38 | +midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms") |
| 39 | +transform = midas_transforms.dpt_transform |
| 40 | +``` |
| 41 | +* **Transforms**: These pre-process the input image by resizing, normalizing, and converting it into a tensor. This step is essential to ensure the image format matches what the model expects. |
| 42 | + |
| 43 | +#### Step 5: Load an example image |
| 44 | +Load any image you want to estimate the depth for. |
| 45 | +``` |
| 46 | +img_path = "example.jpg" |
| 47 | +img = cv2.imread(img_path) |
| 48 | +``` |
| 49 | +* **OpenCV** is used to read the image from the specified file path. Make sure to replace "example.jpg" with the path to your image file. |
| 50 | + |
| 51 | +#### Step 6: Check if the image was loaded successfully |
| 52 | +``` |
| 53 | +if img is None: |
| 54 | + raise ValueError(f"Image at path '{img_path}' could not be loaded. Please check the file path and try again.") |
| 55 | +
|
| 56 | +img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) |
| 57 | +``` |
| 58 | +If the image cannot be loaded, an error message is displayed. |
| 59 | + |
| 60 | +#### Step 7: Apply the transformation to the image |
| 61 | +Transform the image to prepare it for input to the model. |
| 62 | +``` |
| 63 | +input_batch = transform(img).unsqueeze(0) |
| 64 | +``` |
| 65 | +* The **transform** function converts the image into a tensor and adds a batch dimension, which is required by the model. |
| 66 | + |
| 67 | +#### Step 8: Ensure the input tensor has the correct shape |
| 68 | +``` |
| 69 | +if len(input_batch.shape) == 5: |
| 70 | + input_batch = input_batch.squeeze(0) # Remove the extra dimension |
| 71 | +``` |
| 72 | +This step ensures the input tensor has the correct shape, typically (batch_size, channels, height, width). |
| 73 | + |
| 74 | +#### Step 9: Move the input to the GPU if available |
| 75 | +``` |
| 76 | +device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| 77 | +midas.to(device) |
| 78 | +input_batch = input_batch.to(device) |
| 79 | +``` |
| 80 | +The code checks for GPU availability and moves both the model and input tensor to the GPU for faster processing. |
| 81 | + |
| 82 | +#### Step 10: Perform depth estimation |
| 83 | + |
| 84 | +``` |
| 85 | +with torch.no_grad(): |
| 86 | + prediction = midas(input_batch) |
| 87 | +``` |
| 88 | +The model generates a depth map prediction for the input image without updating model weights (torch.no_grad() ensures no gradients are calculated, reducing memory usage). |
| 89 | + |
| 90 | +#### Step 11: Remove the extra dimension and convert to numpy |
| 91 | +Convert the prediction to a more usable format. |
| 92 | +``` |
| 93 | +prediction = torch.nn.functional.interpolate( |
| 94 | + prediction.unsqueeze(1), |
| 95 | + size=img.shape[:2], |
| 96 | + mode="bicubic", |
| 97 | + align_corners=False, |
| 98 | +).squeeze() |
| 99 | +depth_map = prediction.cpu().numpy() |
| 100 | +``` |
| 101 | +* **interpolate** Resizes the depth prediction to match the input image size using bicubic interpolation, which helps maintain detail. |
| 102 | +* ```depth_map = prediction.cpu().numpy()``` the prediction is converted from a tensor to a NumPy array for easy manipulation and display. |
| 103 | + |
| 104 | +#### Step 12: Display the depth map |
| 105 | +Visualize the original image alongside the estimated depth map. |
| 106 | +``` |
| 107 | +plt.figure(figsize=(10, 5)) |
| 108 | +plt.subplot(1, 2, 1) |
| 109 | +plt.title("Original Image") |
| 110 | +plt.imshow(img) |
| 111 | +plt.axis("off") |
| 112 | +
|
| 113 | +plt.subplot(1, 2, 2) |
| 114 | +plt.title("Depth Map") |
| 115 | +plt.imshow(depth_map, cmap="inferno") |
| 116 | +plt.axis("off") |
| 117 | +
|
| 118 | +plt.show() |
| 119 | +``` |
| 120 | +* **Matplotlib** is used to display the original image and the depth map. The depth map is shown using the 'inferno' colormap, which provides a visually appealing representation of depth. |
0 commit comments