Skip to content
View iammuhammadnoumankhan's full-sized avatar
🟢
Available
🟢
Available

Block or report iammuhammadnoumankhan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
███╗   ███╗███╗   ██╗██╗  ██╗
████╗ ████║████╗  ██║██║ ██╔╝
██╔████╔██║██╔██╗ ██║█████╔╝ 
██║╚██╔╝██║██║╚██╗██║██╔═██╗ 
██║ ╚═╝ ██║██║ ╚████║██║  ██╗
╚═╝     ╚═╝╚═╝  ╚═══╝╚═╝  ╚═╝

Muhammad Nouman Khan

AI Engineer · MLOps Specialist · IEEE Researcher

Profile Views

⚡ What I Do

I build AI systems that actually work in production — from GPU clusters running H100s to LLM inference pipelines serving thousands of requests per second.

I'm an AI & MLOps Engineer based in Islamabad, Pakistan, with a deep focus on:

  • 🧠 LLM Inference & Optimization — deploying large language models at scale using vLLM, KServe, and Triton Inference Server with FP8/INT8 quantization and tensor parallelism
  • 🖥️ NVIDIA Infrastructure — managing DGX systems (A100 / H100 / H200), NVIDIA Base Command Manager (BCM), NVIDIA AI Enterprise, DeepStream, and NIM microservices
  • ☸️ Kubernetes at Scale — provisioning and orchestrating GPU-aware K8s clusters via Kubeadm, Terraform, and Kubeflow for high-availability AI workloads
  • 🔬 AI Research — IEEE-published researcher in Graph-based Retrieval-Augmented Generation (RAG) in collaboration with University of Hull, UK
  • 🤖 Agentic AI Systems — building multi-agent workflows and RAG pipelines using LangChain, LlamaIndex, and CrewAI

🏢 Experience

BetaCodes Pvt Ltd         MLOps Engineer         Aug 2025 – Present
iQera Schools (USA)       AI Engineer            Aug 2024 – Present
Sybrid Pvt Ltd            GenAI Intern           Jul 2024 – Sep 2024
Rapidev                   ML/CV Intern           Aug 2023 – Sep 2023
Sino-Pak Center for AI    AI Intern              Jul 2022 – Sep 2022

📄 Publications

"To Enhance Graph-Based Retrieval-Augmented Generation (RAG) with Robust Retrieval Techniques"
M. Rani, B. K. Mishra, D. Thakker, M. N. Khan
📍 IEEE ICOSST 2024 · International Conference on Open-Source Systems and Technologies
🤝 In collaboration with University of Hull, UK


🛠️ Tech Stack

🤖 AI / ML & LLMs

PyTorch TensorFlow HuggingFace LangChain OpenAI YOLO OpenCV

🖥️ NVIDIA / GPU / HPC

NVIDIA CUDA vLLM Triton NIM DeepStream Jetson

☸️ MLOps / Infrastructure

Kubernetes Docker Terraform KServe Kubeflow MLflow GitHub Actions

☁️ Cloud

AWS GCP Azure SageMaker

🧱 Backend & Databases

Python FastAPI Django PostgreSQL MongoDB Qdrant Redis


🚀 Featured Projects

Project Description Stack
🏗️ AI Studio (FYP) Cloud-native AI deployment platform with MLOps workflows & model registry K8s, KServe, MLflow
🔍 Graph RAG Research IEEE-published enhancement of graph-based retrieval for LLMs Python, Neo4j, LLMs
🧬 Glaucoma Detection Attention-based CNN for medical imaging diagnosis PyTorch, OpenCV
🎤 Whisper Fine-Tuning Custom ASR model trained on Quranic speech dataset HuggingFace, PyTorch
📦 Inventory Management YOLOv8 real-time object detection for retail analytics YOLOv8, FastAPI
🌞 Solar Forecasting Time-series model for smart building energy optimization scikit-learn, Pandas
🚗 License Plate Detection Real-time ANPR with REST APIs for vector conversion YOLO, OpenCV, FastAPI
💬 Tax & Finance Chatbot Domain-specific LLM chatbot with knowledge base LangChain, OpenAI
📈 DeepSeek Trading Financial data extraction & AI-powered market analysis DeepSeek, Python
🧑‍💼 Facial Attendance Real-time face recognition on edge devices OpenCV, Raspberry Pi

🏅 Certifications

  • 🟢 Certified Kubernetes Application Developer (CKAD)
  • 🤖 Machine Learning Specialization — Coursera / DeepLearning.AI
  • 🧠 Generative AI with Large Language Models — Coursera / DeepLearning.AI

🏆 Achievements

  • 🥇 1st Place — Uraan Projects Exhibition (Fall 2023) · Multi-model AI web deployment · Pak-Austria Fachhochschule
  • 🏅 4th Place — Uraan Projects Exhibition (Spring 2023) · Solar Irradiance Forecasting

📊 GitHub Stats

GitHub Streak

trophy


💡 Currently

  • 🔭 Building production-grade LLM inference infrastructure at BetaCodes with NVIDIA DGX
  • 🌱 Deep-diving into NVIDIA AI Enterprise stack — NIM, DeepStream, BCM
  • 📖 Exploring SaaS architecture and AI product development
  • 🤝 Open to collaboration on open-source AI/MLOps projects

"The gap between a prototype and production is where real engineering lives."

Let's connect → iammuhammadnoumankhan@gmail.com

Popular repositories Loading

  1. FastAPI-GOT-OCR-2-Transformers FastAPI-GOT-OCR-2-Transformers Public

    refine GOT-OCR-2.0 to make a fastapi microservice

    HTML 6 4

  2. Whisper-FastAPI-Transcription-Service Whisper-FastAPI-Transcription-Service Public

    A high-performance, production-ready speech-to-text API service based on the faster-whisper library.

    HTML 4 6

  3. OpenMPI-Cluster OpenMPI-Cluster Public

    Learn how to build a professional High-Performance Computing (HPC) cluster using OpenMPI across 5 nodes! This comprehensive tutorial guides you through every step of setting up a multi-node OpenMPI…

    2

  4. OLLAMA_CHAT OLLAMA_CHAT Public

    A ChatGPT-like AI Assistant powered by Ollama, built with FastAPI backend and Streamlit frontend.

    Python 2 4

  5. depthai-experiments depthai-experiments Public

    Forked from luxonis/oak-examples

    Experimental projects we've done with DepthAI.

    Python 1

  6. Nouva Nouva Public

    AI based local personal Assistant

    Python 1 1