I design production AI systems — hybrid search engines, RAG pipelines, LLM inference infrastructure, and autonomous agent workflows.

Available for collaboration · Delhi, India
About

I build production AI systems at the intersection of language models, retrieval, and backend infrastructure. My work spans LLM-powered applications, semantic search and ranking, RAG pipelines, agentic workflows, and scalable AI backends. I enjoy solving problems where AI must operate reliably at scale.

Trusted by
What I build

LLM Systems

Fine-tuning, structured extraction, prompt systems, evaluation pipelines.

  • QLoRA / LoRA fine-tuning
  • Structured JSON extraction
  • Prompt engineering systems
  • LLM evaluation frameworks

Retrieval & Search

Hybrid search (BM25 + embeddings), ranking pipelines, semantic retrieval.

  • BM25 + dense embeddings
  • Custom ML ranking signals
  • Query expansion & reranking
  • SPLADE sparse vectors

AI Infrastructure

Inference pipelines, FastAPI microservices, GPU optimization, production monitoring.

  • vLLM inference serving
  • FastAPI microservices
  • INT8/INT4 quantization
  • Celery + Redis pipelines

Agent Systems

LangGraph workflows, tool-calling agents, multi-step automation systems.

  • LangGraph state machines
  • Multi-step tool-calling
  • On-chain agent execution
  • Human-in-the-loop oversight
Experience
ApolisSenior AI / ML Engineer
Oct 2025 – Mar 2026Remote
  • Architected NexGig — AI talent search across 2.7M+ resumes. Elasticsearch BM25 + dense embeddings + custom ML ranking across skill, role, experience, and location signals.
  • Fine-tuned Qwen2.5-3B with QLoRA (4-bit NF4) + teacher-student distillation. Deployed invoice AI agents for healthcare/financial auditing — LLM reasoning + rule-based validation.
ValorySenior AI Developer
Jun 2024 – Oct 2025Remote
  • Built autonomous DeFi agents for portfolio management across blockchain protocols — 5M+ requests, $10M+ managed assets, 25% portfolio performance improvement.
  • LangGraph state-machine agent workflows with dynamic task routing. Fine-tuned Llama + OpenAI / Claude APIs for financial AI at <200ms latency.
SmarterSenior ML Engineer
Oct 2022 – May 2024London, UK (Remote)
  • RAG agent with custom Llama3-8B, LangChain, and Qdrant — query expansion + reranking, 30% improved relevance. 4× model compression via INT8, 2× latency improvement.
  • Fashion sales forecasting using CLIP embeddings + GPT-3. Improved inventory management, boosted quarterly sales 18%.
Pibit.aiFounding Team — ML
Jun 2020 – Sep 2022Gurugram, India
  • Custom entity extraction from financial documents using NLP + CV — 35% accuracy improvement. Deployed Document AI to production via AWS Lambda and SageMaker.
  • Income tax fraud detection via ensemble anomaly detection, unsupervised clustering, and rule-based backtracking.
Featured projects
Hybrid Resume Search Platform
Apolis · 2026
Enterprise
LLMSearchFine-TuningElasticsearch

Recruiters struggled to search millions of resumes effectively using keyword matching alone.

Resume ingestionLLM extractionElasticsearch indexingDense embeddingsHybrid retrievalML rankingFastAPI API
PythonFastAPIElasticsearchOpenAI EmbeddingsQLoRAQwen2.5-3B

Architected end-to-end pipeline from ingestion to ranked retrieval. Fine-tuned Qwen2.5-3B with QLoRA for structured extraction.

2.7M+ resumes indexed. Improved candidate-job matching accuracy significantly.

Long Document LLM Extraction
Apolis · 2025
Enterprise
Fine-TuningNLPLLM

Resumes are long, noisy, and difficult to parse reliably with off-the-shelf models.

Document inputPrompt extractionStructured JSON outputValidation layerIndex write
PythonQLoRAQwen2.5-3BHuggingFacePydantic

Designed fine-tuning pipeline with teacher-student distillation and structured prompt templates.

Consistent structured extraction at scale. Significantly reduced hallucinations vs zero-shot baselines.

Autonomous DeFi Agent System
Valory · 2024–2025
Autonomous
AgentsLangGraphOn-ChainDeFi

Complex DeFi portfolio management requires multi-step reasoning, real-time market data, and on-chain execution.

User taskAgent planningTool executionState managementOn-chain writeResult synthesis
PythonLangGraphOpenAIClaude APIsOpen AutonomyOlas

Built LangGraph state-machine agent workflows with dynamic task routing and blockchain execution layer.

5M+ requests processed. $10M+ in managed assets. 25% portfolio performance improvement.

RAG Intelligence Platform
Smarter · 2022–2024
Enterprise
RAGLLMQdrantReranking

Organizations need AI to answer questions accurately over private knowledge without hallucination.

DocumentsChunkingEmbeddingsQdrant vector DBRetrievalRerankingLLM answer generation
PythonLlama3-8BLangChainQdrantFastAPIAWS

Built custom RAG with query expansion, reranking, and 4× model compression via INT8 quantization.

30% improved retrieval relevance. 2× latency improvement. Deployed to production on AWS.

How I architect systems
LLM Pipeline
DataExtractionEmbeddingsVector DBRetrievalLLM SynthesisEvaluation
Hybrid Search
QueryBM25 RetrievalDense Embedding RetrievalRerankingLLM Reasoning
Skills & Stack
LLMs & Agents
LangChainLangGraphOpenAIAnthropic ClaudeHuggingFacevLLMRAGQLoRARLHF / ORPOAutoGen
Backend & MLOps
PythonFastAPIPostgreSQLRedisCeleryDockerKubernetesApache AirflowCI/CD
Cloud & Search
AWS (Lambda, SageMaker)AzureElasticsearchQdrantBM25Sparse Embeddings (SPLADE)
Blockchain & Web3
Open AutonomyOlas EcosystemOn-Chain AgentsDeFi ProtocolsMulti-Agent Coordination
Kind words

Gaurav is the rare engineer who combines deep AI research knowledge with the pragmatism to ship production systems on tight timelines. His work on NexGig — a talent intelligence platform processing 2.7 million resumes with fine-tuned LLMs — was technically ambitious and delivered measurable business impact from day one. I'd work with him again without hesitation.

Gaurav brought a rare combination of LLM depth and systems-level thinking to Valory. He architected our autonomous agent infrastructure from the ground up — workflows that scaled to millions of on-chain operations managing real assets. His work directly shaped how we build autonomous services at the protocol level.

Education
B.E. in Computer Science and EngineeringUSICT Delhi
Aug 2016 – May 2020GPA 8.1 / 10

CourseworkMachine Learning, Data Mining, Quantitative Analysis, Financial Mathematics, Statistics, DSA, Databases

Research Paper — International Advanced Computing Conference '22Mentor at Scaler Academy — Quantitative analysis and ML in finance

I'm available for new projects

Let's build something together — agents, LLM systems, or production ML infrastructure.