Available for work

From Sketch to Scale.

Early-career full stack AI developer passionate about Backend engineering and Infra for Next-Gen Agentic LLM Systems.

About Me

EVERYTHING ABOUT
VAIBHAV

Hi, I'm Vaibhav Tanwar — an Applied AI Engineer who builds end-to-end ML systems that ship to production, from model training to containerized deployment at scale.

My core stack spans PyTorch, FastAPI, LangChain, Docker, and PostgreSQL. I've built computer vision pipelines with YOLO and CLIP, multi-agent LLM systems, and MLOps platforms handling millions of records — all with rigorous testing and CI/CD.

Whether it's a zero-to-one AI product or scaling existing infrastructure, I focus on reliability, low-latency inference, and clean architecture that teams can build on.

More About Me

Skills & Tech Stack

MY TECH TOOLBOX

AI & Machine Learning

PyTorch, Transformers, Scikit-Learn, YOLO, CLIP, OpenCV, SpaCy

Training, fine-tuning, and deploying CV & NLP models at scale.

LLM Ops & Agents

LangChain, LangGraph, DSPy, Gemini-ADK, Mem0, Vector DBs

Building multi-agent systems, RAG pipelines, and LLM toolchains.

Fast

Backend & APIs

FastAPI, Node.js, NestJS, Kafka, RabbitMQ

High-throughput APIs and async task queues.

Redis

Databases

PostgreSQL, MongoDB, Redis, MySQL, pgvector

Relational, document, and vector stores at scale.

AWS

Infrastructure

Docker, AWS, GitHub Actions, CI/CD, Prometheus

Containerized deployments with full observability.

Previous EndeavorsEXPERIENCE

Infosys Centre for Artificial Intelligence

ML + Backend Engineer

September 2024 — May 2025

Developed advanced wildlife monitoring capabilities by fine-tuning YOLO and custom Transformer based architectures along with (CUDA, TensorRT) inference optimization. Built a robust backend infrastructure (FastAPI, PostgreSQL, Docker) featuring optimized queries and an end-to-end MLOps pipeline for continual learning from camera trap data. Mitigated annotation bottlenecks using Active Learning algorithms and ensured system health via custom API monitoring tools.

Scale AI

LLM Post Training Contributor

Dec 2024 — February 2025

Enhanced zero shot inference capability of LLMs through supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). Curated and refined domain-specific datasets for complex reasoning tasks. Optimized reward models using human preference data to better align outputs with user expectations (truthfulness, harmlessness, instruction-following). Collaborated with ML engineers on refining annotation guidelines and feedback mechanisms.

Networked Systems and Security Research Lab

Undergraduate Researcher

May 2024 — July 2024

Improved latency and throughput for live media transfer from semi-autonomous vehicles to edge servers under Dr. Arani Bhattacharya, utilizing state-of-the-art multipath QUIC protocols. Critically analyzed and tested Alibaba's XQUIC and Tencent's TQUIC frameworks to identify solutions for latency bottlenecks. Reported detailed findings on performance and build library inconsistencies.

Ongoing

May 2025 — Present

MIDAS Research Group

Research and Development Associate

New Delhi, India

Working on improving Foundational models for Self Supervised Speech Representation Learning like HuBERT and MS-HuBERT.

VISIT WEBSITE

Projects

Youtube Multimodal RAG Pipeline

Multimodal Video-to-Knowledge Retrieval System

PythonHuggingFaceQdrantLlamaindexGemini SDKStreamlit

This project implements a sophisticated multimodal RAG system transforming YouTube videos into queryable knowledge bases through advanced frame extraction and caption analysis.
Leveraging Gemini for inference and Qdrant for efficient vector storage, the system processes both visual and textual content to generate precise, timestamped responses to natural language queries.

WhatsApp Multimodal Memory Bot

Voice, Text & Image Memory Assistant over WhatsApp

PythonTwilioGroqWhisperMem0FastAPISQLAlchemy

Architected a multimodal WhatsApp memory assistance pipeline (FastAPI + AsyncIO) that ingests text/voice/images via Twilio webhooks.
Classifies intent in real time via Groq LLM inference, embeds memories in Mem0's vector store for semantic recall, and serves analytics through idempotent SQLite transactions.

LLM powered Resume Analyzer

AI-Driven Resume Scoring & Feedback Platform

ReactTypeScriptPuter.jsTailwindReact RouterZustand

Developed a full-stack AI-powered resume analyzer using React, TypeScript, and Claude Sonnet integration, featuring real-time PDF processing, multi-dimensional scoring system (ATS, content, structure, skills), and comprehensive feedback generation for job seekers.
Integrating Zustand state management, Tailwind CSS, React Router, and Puter.js services for authentication, file system operations, and data persistence, delivering a responsive user interface with drag-and-drop functionality and visual score components.

Multi Agent Tutoring System (Work In Progress)

Orchestrated Domain-Specific Tutoring Agents

PythonFastAPIGeminiAgent Development KitJavascript

Developed a sophisticated tutoring chatbot leveraging Google's Agent Development Kit (ADK) principles with intelligent orchestration between specialized Math and Physics agents powered by Gemini API.
Integrating context-aware conversation management, autonomous query classification pipeline routing student queries to domain-specific agents and provide personalized responses through prompt engineering and tool integration.

AI Powered App Developer

Multi-Agent Code Generation from Natural Language

PythonFastAPIGroqLangGraphLangChainNiceGUI

Coding assistant built with LangGraph, simulating a multi-agent developer workflow to generate complete projects from natural language prompts.
It utilizes Planner, Architect, and Coder agents to sequentially design, structure, and implement applications, leveraging tools for file I/O and code execution.
The system is deployed with a FastAPI backend and a NiceGUI frontend for user interaction and project management.

Distributed KV Store with Modified Raft Consensus

Fault-Tolerant Key-Value Store with Leader Leases

PythonZeroMQ

Implemented a database storing string key-value pairs using Raft Consensus Algorithm, ensuring consistent data replication and fault recovery across the distributed network of nodes.
Utilized the leader lease mechanism, similar to those used by geo distributed databases such as Cockroach DB and YugaByte DB.

Vision-Language Assistant for Navigation Aid in Urban Metro Systems

Assistive Navigation for Visually Impaired Metro Users

PythonPyTorchTransformersHuggingFaceLLMsOnRender

Developed MetroSense, a novel web-based platform to empower visually impaired individuals navigate the Delhi Metro system, achieving 65.1% mAP@50 for identifying environmental elements from real-time image captures.
Integrated LLAMA Vision 3.2 90B for sophisticated VQA, engineered with context-rich, few-shot prompting and optimized decoding parameters to achieve a BERT F1 score of 0.85, delivering semantically accurate, context-aware voice-synthesized responses to user queries for improved safety and autonomy.

Multi Model Analysis for Stock Market Trend Prediction

Neural ODE & GAN-Based Financial Forecasting

PytorchTransformersScikit-LearnPandas

Developed and benchmarked novel models (GAN, Neural ODE VAE, Neural ODE Classifier) for stock market analysis, achieving a 15% F1 improvement and 85% faster training via Neural ODEs.
Implemented a CNN-LSTM architecture delivering high-accuracy regression (R² 0.99, MAE 143.58 on S&P 500) across five major indices on the CNNPred dataset.

Cloud Native Online Commodity Trading Platform

Distributed Marketplace over gRPC & Protobuf

PythongRPCProtobuf

Created a distributed online marketplace system, architected to facilitate direct transactions between buyers and sellers through a central platform hosted on Google Cloud VM instances.
Leveraging gRPC for communication and Protocol Buffers for efficient data serialization.

K Means using Map Reduce Framework

Distributed Clustering with Fault-Tolerant Map-Reduce

PythongRPC

Implemented a distributed Map-Reduce framework comprising of Master, Mapper and Reducer components to perform K Means Clustering on a given dataset ensuring fault tolerance for both components.
Utlized gRPC for communication among the three processes for each iteration.

Cycle Accurate Simulator for a 5 stage RISC CPU

RISC-V Pipeline Simulator with Cache & Forwarding

C++

Implemented a simulator for a processor based on RV32I variant of RISC-V ISA where the microarchitecture included a 5 stage pipeline allowing forwarding/bypassing and separate execution unit for Network on Chip operations.
Along with a 2-way set associative cache following Least Recently Used replacement policy.

Let's Build

Something Amazing

Have a project, idea, or collaboration in mind? I'd love to hear from you. Let's create something impactful together.