Muhammad Nouman Khan iammuhammadnoumankhan

███╗   ███╗███╗   ██╗██╗  ██╗
████╗ ████║████╗  ██║██║ ██╔╝
██╔████╔██║██╔██╗ ██║█████╔╝ 
██║╚██╔╝██║██║╚██╗██║██╔═██╗ 
██║ ╚═╝ ██║██║ ╚████║██║  ██╗
╚═╝     ╚═╝╚═╝  ╚═══╝╚═╝  ╚═╝

Muhammad Nouman Khan

AI Engineer · MLOps Specialist · IEEE Researcher

⚡ What I Do

I build AI systems that actually work in production — from GPU clusters running H100s to LLM inference pipelines serving thousands of requests per second.

I'm an AI & MLOps Engineer based in Islamabad, Pakistan, with a deep focus on:

🧠 LLM Inference & Optimization — deploying large language models at scale using vLLM, KServe, and Triton Inference Server with FP8/INT8 quantization and tensor parallelism
🖥️ NVIDIA Infrastructure — managing DGX systems (A100 / H100 / H200), NVIDIA Base Command Manager (BCM), NVIDIA AI Enterprise, DeepStream, and NIM microservices
☸️ Kubernetes at Scale — provisioning and orchestrating GPU-aware K8s clusters via Kubeadm, Terraform, and Kubeflow for high-availability AI workloads
🔬 AI Research — IEEE-published researcher in Graph-based Retrieval-Augmented Generation (RAG) in collaboration with University of Hull, UK
🤖 Agentic AI Systems — building multi-agent workflows and RAG pipelines using LangChain, LlamaIndex, and CrewAI

🏢 Experience

BetaCodes Pvt Ltd         MLOps Engineer         Aug 2025 – Present
iQera Schools (USA)       AI Engineer            Aug 2024 – Present
Sybrid Pvt Ltd            GenAI Intern           Jul 2024 – Sep 2024
Rapidev                   ML/CV Intern           Aug 2023 – Sep 2023
Sino-Pak Center for AI    AI Intern              Jul 2022 – Sep 2022

📄 Publications

"To Enhance Graph-Based Retrieval-Augmented Generation (RAG) with Robust Retrieval Techniques"
M. Rani, B. K. Mishra, D. Thakker, M. N. Khan
📍 IEEE ICOSST 2024 · International Conference on Open-Source Systems and Technologies
🤝 In collaboration with University of Hull, UK

🛠️ Tech Stack

🤖 AI / ML & LLMs

🖥️ NVIDIA / GPU / HPC

☸️ MLOps / Infrastructure

☁️ Cloud

🧱 Backend & Databases

🚀 Featured Projects

Project	Description	Stack
🏗️ AI Studio (FYP)	Cloud-native AI deployment platform with MLOps workflows & model registry	K8s, KServe, MLflow
🔍 Graph RAG Research	IEEE-published enhancement of graph-based retrieval for LLMs	Python, Neo4j, LLMs
🧬 Glaucoma Detection	Attention-based CNN for medical imaging diagnosis	PyTorch, OpenCV
🎤 Whisper Fine-Tuning	Custom ASR model trained on Quranic speech dataset	HuggingFace, PyTorch
📦 Inventory Management	YOLOv8 real-time object detection for retail analytics	YOLOv8, FastAPI
🌞 Solar Forecasting	Time-series model for smart building energy optimization	scikit-learn, Pandas
🚗 License Plate Detection	Real-time ANPR with REST APIs for vector conversion	YOLO, OpenCV, FastAPI
💬 Tax & Finance Chatbot	Domain-specific LLM chatbot with knowledge base	LangChain, OpenAI
📈 DeepSeek Trading	Financial data extraction & AI-powered market analysis	DeepSeek, Python
🧑‍💼 Facial Attendance	Real-time face recognition on edge devices	OpenCV, Raspberry Pi

🏅 Certifications

🟢 Certified Kubernetes Application Developer (CKAD)
🤖 Machine Learning Specialization — Coursera / DeepLearning.AI
🧠 Generative AI with Large Language Models — Coursera / DeepLearning.AI

🏆 Achievements

🥇 1st Place — Uraan Projects Exhibition (Fall 2023) · Multi-model AI web deployment · Pak-Austria Fachhochschule
🏅 4th Place — Uraan Projects Exhibition (Spring 2023) · Solar Irradiance Forecasting

📊 GitHub Stats

💡 Currently

🔭 Building production-grade LLM inference infrastructure at BetaCodes with NVIDIA DGX
🌱 Deep-diving into NVIDIA AI Enterprise stack — NIM, DeepStream, BCM
📖 Exploring SaaS architecture and AI product development
🤝 Open to collaboration on open-source AI/MLOps projects

"The gap between a prototype and production is where real engineering lives."

Let's connect → iammuhammadnoumankhan@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly