███╗ ███╗███╗ ██╗██╗ ██╗
████╗ ████║████╗ ██║██║ ██╔╝
██╔████╔██║██╔██╗ ██║█████╔╝
██║╚██╔╝██║██║╚██╗██║██╔═██╗
██║ ╚═╝ ██║██║ ╚████║██║ ██╗
╚═╝ ╚═╝╚═╝ ╚═══╝╚═╝ ╚═╝
AI Engineer · MLOps Specialist · IEEE Researcher
I build AI systems that actually work in production — from GPU clusters running H100s to LLM inference pipelines serving thousands of requests per second.
I'm an AI & MLOps Engineer based in Islamabad, Pakistan, with a deep focus on:
- 🧠 LLM Inference & Optimization — deploying large language models at scale using vLLM, KServe, and Triton Inference Server with FP8/INT8 quantization and tensor parallelism
- 🖥️ NVIDIA Infrastructure — managing DGX systems (A100 / H100 / H200), NVIDIA Base Command Manager (BCM), NVIDIA AI Enterprise, DeepStream, and NIM microservices
- ☸️ Kubernetes at Scale — provisioning and orchestrating GPU-aware K8s clusters via Kubeadm, Terraform, and Kubeflow for high-availability AI workloads
- 🔬 AI Research — IEEE-published researcher in Graph-based Retrieval-Augmented Generation (RAG) in collaboration with University of Hull, UK
- 🤖 Agentic AI Systems — building multi-agent workflows and RAG pipelines using LangChain, LlamaIndex, and CrewAI
BetaCodes Pvt Ltd MLOps Engineer Aug 2025 – Present
iQera Schools (USA) AI Engineer Aug 2024 – Present
Sybrid Pvt Ltd GenAI Intern Jul 2024 – Sep 2024
Rapidev ML/CV Intern Aug 2023 – Sep 2023
Sino-Pak Center for AI AI Intern Jul 2022 – Sep 2022
"To Enhance Graph-Based Retrieval-Augmented Generation (RAG) with Robust Retrieval Techniques"
M. Rani, B. K. Mishra, D. Thakker, M. N. Khan
📍 IEEE ICOSST 2024 · International Conference on Open-Source Systems and Technologies
🤝 In collaboration with University of Hull, UK
| Project | Description | Stack |
|---|---|---|
| 🏗️ AI Studio (FYP) | Cloud-native AI deployment platform with MLOps workflows & model registry | K8s, KServe, MLflow |
| 🔍 Graph RAG Research | IEEE-published enhancement of graph-based retrieval for LLMs | Python, Neo4j, LLMs |
| 🧬 Glaucoma Detection | Attention-based CNN for medical imaging diagnosis | PyTorch, OpenCV |
| 🎤 Whisper Fine-Tuning | Custom ASR model trained on Quranic speech dataset | HuggingFace, PyTorch |
| 📦 Inventory Management | YOLOv8 real-time object detection for retail analytics | YOLOv8, FastAPI |
| 🌞 Solar Forecasting | Time-series model for smart building energy optimization | scikit-learn, Pandas |
| 🚗 License Plate Detection | Real-time ANPR with REST APIs for vector conversion | YOLO, OpenCV, FastAPI |
| 💬 Tax & Finance Chatbot | Domain-specific LLM chatbot with knowledge base | LangChain, OpenAI |
| 📈 DeepSeek Trading | Financial data extraction & AI-powered market analysis | DeepSeek, Python |
| 🧑💼 Facial Attendance | Real-time face recognition on edge devices | OpenCV, Raspberry Pi |
- 🟢 Certified Kubernetes Application Developer (CKAD)
- 🤖 Machine Learning Specialization — Coursera / DeepLearning.AI
- 🧠 Generative AI with Large Language Models — Coursera / DeepLearning.AI
- 🥇 1st Place — Uraan Projects Exhibition (Fall 2023) · Multi-model AI web deployment · Pak-Austria Fachhochschule
- 🏅 4th Place — Uraan Projects Exhibition (Spring 2023) · Solar Irradiance Forecasting
- 🔭 Building production-grade LLM inference infrastructure at BetaCodes with NVIDIA DGX
- 🌱 Deep-diving into NVIDIA AI Enterprise stack — NIM, DeepStream, BCM
- 📖 Exploring SaaS architecture and AI product development
- 🤝 Open to collaboration on open-source AI/MLOps projects
"The gap between a prototype and production is where real engineering lives."
Let's connect → iammuhammadnoumankhan@gmail.com
