Skip to content
View ArjunJagdale's full-sized avatar
🌴
On vacation
🌴
On vacation

Block or report ArjunJagdale

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ArjunJagdale/README.md

Typing SVG

Portfolio LinkedIn Email LeetCode


πŸ‘‹ About Me

Turning research into production-ready ML systems. I'm an AI engineer who codes at the intersection of deep learning research and production engineering.


πŸ”₯ What I'm Working On

  • πŸš€ Contributing to Hugging Face β€” datasets & dataset-viewer libraries (7 merged PRs)
  • 🧠 Research β€” Published paper on Retrieval-Augmented Systems with Dynamic Learning
  • πŸ› οΈ Building β€” Production ML pipelines with real-time inference and GPU optimization
  • πŸ“š Learning β€” Parameter-efficient methods, vision-language models, cloud-native deployments

πŸ› οΈ Tech Arsenal

Languages & Core

Python C++ JavaScript

ML & AI Frameworks

PyTorch HuggingFace scikit-learn TensorFlow LangChain LlamaIndex

Cloud & DevOps

IBM Cloud Google Cloud Docker Kubernetes

Tools & Libraries

Git GitHub Actions Pandas NumPy OpenCV


🌟 Open Source Contributions

PRs Repos Impact

Active contributor to Hugging Face focusing on datasets infrastructure, compatibility fixes, and developer experience

πŸ“¦ huggingface/datasets

#7831 β€’ Fix ValueError in train_test_split with NumPy 2.0+

Resolved compatibility issue with NumPy 2.0+ by wrapping stratify column array access with np.asarray(). Maintains backward compatibility with NumPy 1.x while fixing array copy errors.

bug-fix compatibility numpy

#7648 β€’ Fix misleading docstring examples across multiple methods

Updated docstrings for add_column(), select_columns(), select(), filter(), shard(), and flatten() to clarify that these methods return new datasets instead of modifying in-place. Significantly improves API documentation clarity.

documentation api-improvement datasets

#7623 β€’ Fix: Raise error when data_dir and data_files are missing

Added validation check in FolderBasedBuilder to prevent silent fallback to current directory when loading folder-based datasets without required parameters. Improves user experience by catching errors early.

bug-fix validation datasets

πŸ” huggingface/dataset-viewer

#3223 β€’ Add support for Date features in Croissant schema

Implemented support for Date, UTCDate, and UTCTime features in Croissant schema generation. Automatically infers correct dataType (sc:Date, sc:Time, or sc:DateTime) based on format string.

feature croissant schema

#3219 β€’ Refactor: Replace get_empty_str_list with CONSTANT.copy

Eliminated shared mutable default values in dataclass fields by replacing helper functions with explicit constant copies. Makes configuration behavior more explicit and prevents subtle bugs.

refactor best-practices config

#3218 β€’ Test: Add unit tests for get_previous_step_or_raise

Implemented comprehensive unit tests for cache retrieval function covering successful cache hits, missing cache scenarios, and error status handling. Improves code coverage and reliability.

testing unit-tests coverage

#3206 β€’ Refactor: Use HfApi.update_repo_settings for gated datasets

Removed redundant custom implementations of update_repo_settings() across test utilities by leveraging official huggingface_hub API. Cleaned up 222 lines of code while maintaining full functionality.

refactor code-cleanup testing

View All Contributions


πŸ“š Research & Publications

Retrieval-Augmented System with Dynamic Learning from Web Content
Published research on RAG systems that dynamically learn from web content, combining retrieval mechanisms with adaptive learning strategies.


πŸ’¬ Let's Connect

Building something interesting? I'm always open to collaborating on ML research, open source contributions, or production ML systems.

LinkedIn Email Portfolio LeetCode

Pinned Loading

  1. YTRAG YTRAG Public

    Jupyter Notebook

  2. SPOOF SPOOF Public

    Jupyter Notebook

  3. CHAT CHAT Public

    JavaScript

  4. URAG URAG Public

    Python

  5. NER NER Public

    Python