Computer Vision Engineer | ML Researcher
Real-time CV on edge devices • Deep Learning • Generative AI • MLOps
Passionate about turning research into production-grade solutions using Computer Vision, NLP, and Deep Learning.
| Category | Technologies |
|---|---|
| Core ML/AI | Computer Vision • NLP • Deep Learning • LLMs • RAG |
| Frameworks | TensorFlow • PyTorch • OpenCV • YOLO • Hugging Face • LangChain |
| Languages | Python • SQL • Bash |
| Tools & Platforms | Docker • Kafka • MQTT • FastAPI • Git • Linux • MediaPipe |
| Cloud & Edge | AWS (Bedrock, Lambda) • Azure Edge • MLOps • CI/CD |
Tech Stack: Python, PyTorch, OpenCV, Generative AI, Deep Learning
The Objective: While modern APIs make it easy to generate images with a single line of code, I wanted to deeply understand the fundamental mathematics powering state-of-the-art models like DALL-E and Sora. My goal was to build, debug, and train a Denoising Diffusion Probabilistic Model (DDPM) entirely from scratch using raw PyTorch.
Technical Architecture & Implementation:
- The U-Net Bottleneck: I built a custom YOLO-style U-Net equipped with skip connections to preserve spatial integrity. To allow the network to track its exact position in the 1000-step denoising process, I engineered and injected Sinusoidal Position Embeddings directly into the bottleneck.
- The Math Schedule: I implemented the forward diffusion process (adding noise) and the reverse diffusion process (removing noise) using a mathematically rigorous linear beta schedule.
- Scaling to Production: I scaled the model to train on the full 60,000-image MNIST dataset, adjusting tensor operations to handle 32x32 dimensional scaling and massively increasing batch sizes for gradient stability.
Tech Stack: YOLOv11, Object Detection, Computer Vision
This project focuses on object detection using YOLOv11 and other YOLO models. It automates the process of identifying objects in images, making it useful for various real-world applications like surveillance, quality control, and smart monitoring.
Tech Stack: Object Detection, OCR, Computer Vision, Python
This project solves a very practical problem by detecting humans and animals in images and videos, classifying them correctly, and optionally running Optical Character Recognition (OCR) on the same media.
Tech Stack: YOLO, SSD, Object Tracking, Computer Vision
An advanced system to detect and track drones effectively using YOLO and SSD (Single Shot MultiBox Detector) object detection models.
Tech Stack: OpenCV, QR Detection, Computer Vision, Python
A robust QR code detection and decoding pipeline using OpenCV, designed to handle challenging real-world conditions including noisy, rotated, or low-contrast images.
Tech Stack: CNN, ResNet50, Deep Learning, Image Classification
This research provides a detailed comparative analysis of two deep learning approaches for drone image classification: standard CNNs and the ResNet50 architecture.
Check out my live projects →
https://akshaysatyam2.github.io/akshaysatyam2/
- Email: akshaysatyam2003@gmail.com
- LinkedIn: akshaysatyam2
- GitHub: akshaysatyam2
Always open to collaborations, research discussions, or just a quick chat!


.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)





