Welcome to my GitHub profile!
Distributed AI/ML systems, infrastructure tooling, and performance optimization in large-scale training environments.
Horovod, LLM training/inference pipelines, GPU Direct RDMA, and exploring internals of NCCL, UCX, and GDRCopy.
Open-source projects in AI infrastructure, systems programming, and performance engineering.
Python, C/C++, Linux, Docker, Jenkins, Bash, Java, Git, AWS EC2, Automation, Distributed Training, and Open-source contributions.
My commit messages tell a story β a tragic tale of bugs squashed, features conquered, and the occasional coffee spill. It's a novel in progress.
- Fine-tuned LLMs and optimized inference pipelines for performance and scalability.
 - Built and containerized Horovod-based distributed training setups using Docker.
 - Debugged NCCL for improved communication efficiency in multi-node setups.
 - Integrated UCX in deep learning environments.
 - Applied GPU Direct RDMA and GDRCopy to accelerate memory transfer.
 - Benchmarked using 
nccl-testsandib_perf. 
- Built CI/CD pipelines with Jenkins, Git, AWS EC2, and Python.
 - Developed CLI tools in C and Python.
 - Wrote automation scripts in Bash and Python for deployments and monitoring.
 - Built packet manipulation tools and custom JSON parsers.
 
- Integrated data algorithms using JNI, shared libraries, C/C++.
 - Contributed documentation and internal tooling.
 - Advocates for clean, performance-first code and OSS collaboration.
 
- 
π Unveiling the Veil: A Comprehensive Assessment of Privacy and Security in Amazon Alexa
International Journal of Innovative Science and Research Technology
 - 
π§ Yoga pose classification from images using transfer learning
International Journal of Innovative Research in Technology
 
π¦ Find my Docker images on DockerHub: sumon2j
| 
      
       Multi-disease diagnostics AI system. docker pull sumon2j/deepcarex:latest | 
    
      
       Image AI filters and transformations. docker pull sumon2j/artventure | 
  
