Sachin Kalsi SachinKalsi

Sachin Kalsi

Data Scientist with expertise in LLMs, NLP, Search, Embedding, and AI systems.

Building and optimizing large language model pipelines — training data, constrained inference, NLP at scale.
Currently at Draup, powering sales and talent intelligence with ML.

LLMs Fine-tuning Constrained Decoding NER RLHF Embeddings PyTorch Hugging Face

Things I've built

constrained-decoding — Forces open-weight LLMs to output exactly one of your taxonomy labels using a trie. No prompt tricks, no post-processing regex — structural guarantee at the token level. Handles multi-label + repeat prevention.

LLM-NER-Offset-Generator — NER training data generation using LLM tool-calling. Gets start_offset / end_offset right, which is the part most LLM-based annotation pipelines get wrong.

annotated-research-papers — NLP/LLM papers I've read, with notes. An archive rather than a live project.

html_tag_annotator — Chrome extension + ML tool to create training datasets fast — tag HTML elements directly in the browser.

web-mark — Chrome extension to highlight and annotate web pages, with optional Google Sheets backup.

Stack: Python · PyTorch · Hugging Face · spaCy · LangChain · FastAPI · LM Studio
Focus areas: LLM fine-tuning · constrained decoding · NER · embeddings · RLHF

sachinkalsi.github.io · Blog · LinkedIn · @sachin_kalsi · sachinkalsi15@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sachin Kalsi SachinKalsi

Achievements

Achievements

Block or report SachinKalsi

Sachin Kalsi

Things I've built

Pinned Loading

Uh oh!