Trusted by AI-first companies worldwide →
LLM Systems & Fine-Tuning
I build end-to-end LLM pipelines — from fine-tuning foundation models with QLoRA and distillation, to structured extraction, prompt engineering systems, and evaluation frameworks that ensure production reliability.
Search & Retrieval Infrastructure
I specialize in building hybrid search systems that combine BM25 lexical retrieval with dense embeddings, custom ML ranking signals, query expansion, and reranking pipelines that deliver precise results at scale.
Agent Systems & Workflows
I design and build autonomous agent systems using LangGraph state machines, multi-step tool-calling, dynamic task routing, and human-in-the-loop oversight for reliable real-world execution.
AI Infrastructure & MLOps
I architect production inference pipelines with vLLM serving, INT8/INT4 quantization, FastAPI microservices, and Celery + Redis task orchestration — optimized for throughput, latency, and cost.
Strategy & Consulting
I leverage deep expertise to analyze your AI needs, evaluate emerging architectures, and chart a roadmap for success — from assessing existing ML infrastructure to recommending LLM strategies tailored to your organization.
On-Demand AI Engineering
Get a dedicated channel with me and your team where you can ask everything — from specific model issues, inference optimization, pipeline architecture, to production debugging and everything in between.
- Architected an end-to-end ML retrieval and ranking platform in Python across 2.7M+ candidate resumes, using Elasticsearch BM25, dense embeddings, and distributed scoring pipelines.
- Built containerized embedding-driven semantic search APIs using vLLM inference and FastAPI, deployed on Docker with semantic scoring, synonym expansion, and guided JSON generation.
- Designed MLOps pipelines for resume and job description extraction at scale, including evaluation frameworks benchmarking field accuracy, reasoning consistency, and hallucination detection across model versions.
- Designed a two-stage retrieval and ranking system combining high-recall search with ML scoring across skill overlap, role similarity, and experience alignment, significantly improving relevance.
- Fine-tuned Qwen2.5 Instruct models using QLoRA (4-bit NF4 quantization) and implemented teacher–student distillation pipelines to improve extraction accuracy while reducing inference cost and latency.
- Built distributed autonomous agent systems in Python for portfolio management, scaling to 5M+ requests and $10M+ managed assets with <200ms latency.
- Engineered LangGraph-based ML pipelines with state-machine orchestration and Apache Airflow for production scheduling, integrating GPT-4, Claude, and fine-tuned Llama models.
- Operated production AI infrastructure with CI/CD pipelines, containerized deployment, and observability across multi-protocol blockchain integrations.
- Built Python-based ETL pipelines for large-scale document processing on AWS, improving throughput by 30% while reducing processing errors by 25%.
- Developed document intelligence pipelines using NLP and computer vision, improving structured data extraction accuracy by 35%, deployed via Docker, AWS Lambda, and SageMaker.
- Designed and deployed fraud detection systems combining supervised anomaly detection, unsupervised clustering, and rule-based pattern analysis, with containerized APIs on AWS.
| # | Client | Year | Tags | Domain | |
|---|---|---|---|---|---|
| 01 | Hybrid Resume Search PlatformApolis | 2026 | #llm#search#fine-tuning#elasticsearch | Enterprise | → |
| 02 | Long Document LLM ExtractionApolis | 2025 | #fine-tuning#nlp#llm#qlora | Enterprise | → |
| 03 | Autonomous DeFi Agent SystemValory | 2024-2025 | #agents#langgraph#on-chain#defi | Autonomous | → |
| 04 | RAG Intelligence PlatformSmarter | 2022-2024 | #rag#llm#qdrant#reranking | Enterprise | → |
| 05 | Invoice AI AgentsApolis | 2025 | #agents#llm#validation#auditing | Healthcare | → |
| 06 | Fashion Sales ForecastingSmarter | 2023 | #clip#gpt-3#embeddings#forecasting | Retail | → |
| 07 | Document AI ExtractionPibit.ai | 2020-2022 | #nlp#cv#aws#sagemaker | Fintech | → |
| 08 | Tax Fraud DetectionPibit.ai | 2021 | #anomaly-detection#clustering#ml | Fintech | → |
“Gaurav is the rare engineer who combines deep AI research knowledge with the pragmatism to ship production systems on tight timelines. His work on NexGig — a talent intelligence platform processing 2.7 million resumes with fine-tuned LLMs — was technically ambitious and delivered measurable business impact from day one.
I'd work with him again without hesitation.”
“Gaurav brought a rare combination of LLM depth and systems-level thinking to Valory. He architected our autonomous agent infrastructure from the ground up — workflows that scaled to millions of on-chain operations managing real assets.
His work directly shaped how we build autonomous services at the protocol level.”