diff --git a/docs/notebooks/small-object-detection-with-sahi.ipynb b/docs/notebooks/small-object-detection-with-sahi.ipynb index 6f10a8ecb..ecd16bfda 100644 --- a/docs/notebooks/small-object-detection-with-sahi.ipynb +++ b/docs/notebooks/small-object-detection-with-sahi.ipynb @@ -64,11 +64,11 @@ "source": [ "## Crowd counting with Computer Vision\n", "\n", - "How would you go about solving the problem of counting people in crowds? After some tests, I found that the best approach is to detect people’s heads. Other body parts are likely occluded by other people, but heads are usually exposed, especially in aerial or high-level shots.\n", + "How would you go about solving the problem of counting people in crowds? After some tests, I found that the best approach is to detect people\u2019s heads. Other body parts are likely occluded by other people, but heads are usually exposed, especially in aerial or high-level shots.\n", "\n", "### Using an Open-Source Public Model for People Detection\n", "\n", - "Detecting people (or their heads) is a common problem that has been addressed by many researchers in the past. In this project, we’ll use an open-source public dataset and a fine-tuned model to perform inference on images.\n", + "Detecting people (or their heads) is a common problem that has been addressed by many researchers in the past. In this project, we\u2019ll use an open-source public dataset and a fine-tuned model to perform inference on images.\n", "\n", "![Roboflow Universe](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/roboflow_universe.png \"Open source model for counting people's heads\")\n", "\n", @@ -151,11 +151,11 @@ "Connecting to upload.wikimedia.org (upload.wikimedia.org)|208.80.153.240|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 1865518 (1.8M) [image/jpeg]\n", - "Saving to: ‘human_tower.jpg’\n", + "Saving to: \u2018human_tower.jpg\u2019\n", "\n", "human_tower.jpg 100%[===================>] 1.78M 6.32MB/s in 0.3s \n", "\n", - "2024-09-10 13:46:53 (6.32 MB/s) - ‘human_tower.jpg’ saved [1865518/1865518]\n", + "2024-09-10 13:46:53 (6.32 MB/s) - \u2018human_tower.jpg\u2019 saved [1865518/1865518]\n", "\n", "Image shape: 2560w x 1696h\n" ] @@ -199,9 +199,9 @@ "\n", "## Let's try our model's performance\n", "\n", - "Before we dive into the SAHI technique for small object detection, it’s useful to see how a fine-tuned model performs with the image as is—without any pre-processing or slicing. The goal is to understand when the model starts to fail so that we can progressively move towards an efficient slicing strategy.\n", + "Before we dive into the SAHI technique for small object detection, it\u2019s useful to see how a fine-tuned model performs with the image as is\u2014without any pre-processing or slicing. The goal is to understand when the model starts to fail so that we can progressively move towards an efficient slicing strategy.\n", "\n", - "Let’s run the model!" + "Let\u2019s run the model!" ] }, { @@ -336,7 +336,7 @@ "id": "AutFkxbuxkPa" }, "source": [ - "The model shows strong performance in detecting people in the lower half of the image, but it struggles to accurately predict boxes in the upper half. This suggests two key insights: first, the model is proficient at identifying people’s heads from various angles, and second, using SAHI could effectively address the detection challenges in the upper portion of the image. Now, it’s time to try SAHI!" + "The model shows strong performance in detecting people in the lower half of the image, but it struggles to accurately predict boxes in the upper half. This suggests two key insights: first, the model is proficient at identifying people\u2019s heads from various angles, and second, using SAHI could effectively address the detection challenges in the upper portion of the image. Now, it\u2019s time to try SAHI!" ] }, { @@ -388,7 +388,7 @@ "\n", "## Slicing our image with `supervision`\n", "\n", - "Let’s begin by visualizing how these tiles would appear on our image. Let's start with a small set of 2x2 tiles, with a zero overlap both vertically (height) and horizontally (width) between the tiles. The final values of these parameters will ultimately depend on your use case, so trial and error is encouraged!\n", + "Let\u2019s begin by visualizing how these tiles would appear on our image. Let's start with a small set of 2x2 tiles, with a zero overlap both vertically (height) and horizontally (width) between the tiles. The final values of these parameters will ultimately depend on your use case, so trial and error is encouraged!\n", "\n", "Some of the methods below are for visualizing the tiles and overlapping. You'll only need the `calculate_tile_size` method in your application to calculate the size of the tiles.\n", "\n", @@ -706,13 +706,13 @@ "id": "W6TvNnXpewwc" }, "source": [ - "Great! We’ve detected 726 people, up from the 185 we initially detected without image slicing. The model is still detecting people from different angles, but it continues to struggle with detecting people located in the farther parts of the plaza. It’s time to increase the number of tiles—in other words, zoom in so the model can capture more details of the small heads of people.\n", + "Great! We\u2019ve detected 726 people, up from the 185 we initially detected without image slicing. The model is still detecting people from different angles, but it continues to struggle with detecting people located in the farther parts of the plaza. It\u2019s time to increase the number of tiles\u2014in other words, zoom in so the model can capture more details of the small heads of people.\n", "\n", "![Missing detections](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/detections.png)\n", "\n", "### Increasing Tile Density: Moving to a 5x5 Grid\n", "\n", - "Now that we’ve seen improvements with a 2x2 grid, it’s time to push the model further. By increasing the number of tiles to a 5x5 grid, we effectively zoom in on the image, allowing the model to capture finer details, such as smaller and more distant features that might have been missed before. This approach will help us understand how well the model performs with even more zoomed-in images. Let’s explore how this change affects our detection accuracy and overall performance." + "Now that we\u2019ve seen improvements with a 2x2 grid, it\u2019s time to push the model further. By increasing the number of tiles to a 5x5 grid, we effectively zoom in on the image, allowing the model to capture finer details, such as smaller and more distant features that might have been missed before. This approach will help us understand how well the model performs with even more zoomed-in images. Let\u2019s explore how this change affects our detection accuracy and overall performance." ] }, { @@ -790,7 +790,7 @@ "id": "eFQasUU3xkPb" }, "source": [ - "We’ve just detected 1,494 people using a 25-tile grid (5 rows x 5 columns), a significant increase from the 726 people detected with the 4-tile (2x2) grid. However, as we increase the number of tiles, a new challenge arises: duplicate detections or missed detections along the edges of the tiles. This issue becomes evident in these examples, where overlapping or gaps between tiles lead to inaccuracies in our model’s detection.\n", + "We\u2019ve just detected 1,494 people using a 25-tile grid (5 rows x 5 columns), a significant increase from the 726 people detected with the 4-tile (2x2) grid. However, as we increase the number of tiles, a new challenge arises: duplicate detections or missed detections along the edges of the tiles. This issue becomes evident in these examples, where overlapping or gaps between tiles lead to inaccuracies in our model\u2019s detection.\n", "\n", "| Example| Observations |\n", "|----|----|\n", @@ -802,7 +802,7 @@ "\n", "When objects, like people, appear at the edges of tiles, they might be detected twice or missed entirely if they span across two tiles. This can lead to inaccurate detection results. To solve this, we use overlapping tiles, allowing the model to see parts of adjacent tiles simultaneously. This overlap helps ensure that objects near the boundaries are fully captured, reducing duplicates and improving accuracy.\n", "\n", - "We’ll set the overlap ratio to `(0.2, 0.2)` on the tile’s width and height. This overlap helps ensure that objects near the boundaries are fully captured, reducing duplicates and improving accuracy." + "We\u2019ll set the overlap ratio to `(0.2, 0.2)` on the tile\u2019s width and height. This overlap helps ensure that objects near the boundaries are fully captured, reducing duplicates and improving accuracy." ] }, { @@ -881,14 +881,14 @@ "source": [ "## Non-Max Supression vs Non-Max Merge\n", "\n", - "When dealing with overlapping detections, it’s essential to determine which detections represent the same object and which are unique. Non-Maximum Suppression (NMS) and Non-Maximum Merging (NMM) are two techniques commonly used to address this challenge. NMS works by eliminating redundant detections based on confidence scores, while NMM combines overlapping detections to enhance the representation of objects spanning multiple tiles. Understanding the difference between these methods helps optimize object detection, particularly near tile boundaries.\n", + "When dealing with overlapping detections, it\u2019s essential to determine which detections represent the same object and which are unique. Non-Maximum Suppression (NMS) and Non-Maximum Merging (NMM) are two techniques commonly used to address this challenge. NMS works by eliminating redundant detections based on confidence scores, while NMM combines overlapping detections to enhance the representation of objects spanning multiple tiles. Understanding the difference between these methods helps optimize object detection, particularly near tile boundaries.\n", "\n", "In `supervision`, the `overlap_filter` parameter allows us to specify the strategy for handling overlapping detections in slices. This parameter can take on two values:\n", "\n", "- `sv.OverlapFilter.NON_MAX_SUPRESSION` (default): Eliminates redundant detections by keeping the one with the highest confidence score.\n", "- `sv.OverlapFilter.NON_MAX_MERGE`: Combines overlapping detections to create a more comprehensive representation of objects spanning multiple tiles.\n", "\n", - "It’s important to note that this method is not perfect and may require further testing and fine-tuning to achieve optimal results in various use cases. You should validate the outputs and adjust parameters as needed to handle specific scenarios effectively." + "It\u2019s important to note that this method is not perfect and may require further testing and fine-tuning to achieve optimal results in various use cases. You should validate the outputs and adjust parameters as needed to handle specific scenarios effectively." ] }, { @@ -1035,7 +1035,7 @@ "source": [ "## Conclusion\n", "\n", - "In this cookbook, we’ve explored the advantages of using the SAHI technique for enhancing small object detection and the importance of experimenting with various tiling strategies to effectively zoom into images. By combining these approaches, we can improve the accuracy and reliability of object detection models, particularly in challenging scenarios where objects are small or located near the boundaries of tiles. These methods offer practical solutions to common challenges in computer vision, empowering developers to build more robust and precise detection systems.\n", + "In this cookbook, we\u2019ve explored the advantages of using the SAHI technique for enhancing small object detection and the importance of experimenting with various tiling strategies to effectively zoom into images. By combining these approaches, we can improve the accuracy and reliability of object detection models, particularly in challenging scenarios where objects are small or located near the boundaries of tiles. These methods offer practical solutions to common challenges in computer vision, empowering developers to build more robust and precise detection systems.\n", "\n", "![\"Crowd Detection\"](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/5x5_nms.png \"Crowd Detection\")\n" ]