I design production AI systems — hybrid search engines, RAG pipelines, LLM inference infrastructure, and autonomous agent workflows.
I build production AI systems at the intersection of language models, retrieval, and backend infrastructure. My work spans LLM-powered applications, semantic search and ranking, RAG pipelines, agentic workflows, and scalable AI backends. I enjoy solving problems where AI must operate reliably at scale.
LLM Systems
Fine-tuning, structured extraction, prompt systems, evaluation pipelines.
- QLoRA / LoRA fine-tuning
- Structured JSON extraction
- Prompt engineering systems
- LLM evaluation frameworks
Retrieval & Search
Hybrid search (BM25 + embeddings), ranking pipelines, semantic retrieval.
- BM25 + dense embeddings
- Custom ML ranking signals
- Query expansion & reranking
- SPLADE sparse vectors
AI Infrastructure
Inference pipelines, FastAPI microservices, GPU optimization, production monitoring.
- vLLM inference serving
- FastAPI microservices
- INT8/INT4 quantization
- Celery + Redis pipelines
Agent Systems
LangGraph workflows, tool-calling agents, multi-step automation systems.
- LangGraph state machines
- Multi-step tool-calling
- On-chain agent execution
- Human-in-the-loop oversight
- Architected NexGig — AI talent search across 2.7M+ resumes. Elasticsearch BM25 + dense embeddings + custom ML ranking across skill, role, experience, and location signals.
- Fine-tuned Qwen2.5-3B with QLoRA (4-bit NF4) + teacher-student distillation. Deployed invoice AI agents for healthcare/financial auditing — LLM reasoning + rule-based validation.
- Built autonomous DeFi agents for portfolio management across blockchain protocols — 5M+ requests, $10M+ managed assets, 25% portfolio performance improvement.
- LangGraph state-machine agent workflows with dynamic task routing. Fine-tuned Llama + OpenAI / Claude APIs for financial AI at <200ms latency.
- RAG agent with custom Llama3-8B, LangChain, and Qdrant — query expansion + reranking, 30% improved relevance. 4× model compression via INT8, 2× latency improvement.
- Fashion sales forecasting using CLIP embeddings + GPT-3. Improved inventory management, boosted quarterly sales 18%.
- Custom entity extraction from financial documents using NLP + CV — 35% accuracy improvement. Deployed Document AI to production via AWS Lambda and SageMaker.
- Income tax fraud detection via ensemble anomaly detection, unsupervised clustering, and rule-based backtracking.
Recruiters struggled to search millions of resumes effectively using keyword matching alone.
Architected end-to-end pipeline from ingestion to ranked retrieval. Fine-tuned Qwen2.5-3B with QLoRA for structured extraction.
2.7M+ resumes indexed. Improved candidate-job matching accuracy significantly.
Resumes are long, noisy, and difficult to parse reliably with off-the-shelf models.
Designed fine-tuning pipeline with teacher-student distillation and structured prompt templates.
Consistent structured extraction at scale. Significantly reduced hallucinations vs zero-shot baselines.
Complex DeFi portfolio management requires multi-step reasoning, real-time market data, and on-chain execution.
Built LangGraph state-machine agent workflows with dynamic task routing and blockchain execution layer.
5M+ requests processed. $10M+ in managed assets. 25% portfolio performance improvement.
Organizations need AI to answer questions accurately over private knowledge without hallucination.
Built custom RAG with query expansion, reranking, and 4× model compression via INT8 quantization.
30% improved retrieval relevance. 2× latency improvement. Deployed to production on AWS.
“Gaurav is the rare engineer who combines deep AI research knowledge with the pragmatism to ship production systems on tight timelines. His work on NexGig — a talent intelligence platform processing 2.7 million resumes with fine-tuned LLMs — was technically ambitious and delivered measurable business impact from day one. I'd work with him again without hesitation.”
“Gaurav brought a rare combination of LLM depth and systems-level thinking to Valory. He architected our autonomous agent infrastructure from the ground up — workflows that scaled to millions of on-chain operations managing real assets. His work directly shaped how we build autonomous services at the protocol level.”
CourseworkMachine Learning, Data Mining, Quantitative Analysis, Financial Mathematics, Statistics, DSA, Databases