Case Studies — Rahul Surya

05

Research Profiling New — Published

KVScope: Profiling KV-Cache Memory Dynamics Across Frontier LLMs on H100

The Problem

Memory leaks in LLMs are hard to catch because PyTorch’s caching allocator acts as a black box. Standard tools like nvidia-smi only show total GPU memory, completely missing internal allocator fragmentation and unreleased tensor references post-generation.

The Approach

Built KVScope, a profiling framework that injects forward hooks into attention modules. By intercepting DynamicCache objects, it accurately captures KV cache tensor snapshots at every step without requiring changes to application code, using six heuristic detectors for specific memory pathologies.

Key Findings

Gemma 4 post-EOS retention: 4.7–5.3 GB retained per generation
gpt-oss-120B allocator gap: 14.5 GiB reserved-but-unused on H100
8-bit quantisation cost: Gemma 4 suffered +4.6% perplexity vs ≤0.25% for others

Python PyTorch CUDA Hugging Face NVML Prometheus H100 EPCC

Read Full Case Study

03

Academic Build Log In Progress Active — MSc Dissertation 2026

Early ML Detection of Memory Leaks in Containerised Environments

The Problem

Memory leaks in containerised microservices degrade performance gradually and are hard to catch early. Existing monitoring tools alert only after degradation is visible. Rule-based thresholds miss subtle pre-failure patterns in container memory metrics.

The Approach

Building an ML-based early warning system. cAdvisor and Prometheus collect container metrics; eBPF (memleak-bpfcc) provides ground truth leak labels. TimescaleDB stores time-series data with a leak_label schema (0=normal, 1=leak, 2=spike, -1=unknown). Isolation forest and LSTM are the candidate detection models. Grafana for visualisation.

Current Status

Working C malloc leak injector for controlled experiments
18 curated Docker Hub images across 5 categories as the test dataset
EIDF VM provisioned (4 vCPU, 7.8 GB RAM) on the University of Edinburgh cluster
Data collection pipeline in progress
Supervised by Ben Carpenter, EPCC Applications Developer

Python eBPF Prometheus TimescaleDB Docker scikit-learn Grafana C

Live dissertation project — updated as work develops.

01

Academic Production ML

ISRO Lightning Prediction System

The Problem

Lightning strike prediction from live meteorological data is a critical operational problem. Manual forecasting at NRSC could not keep pace with real-time WRF model outputs, and existing tools required too much analyst intervention.

The Approach

Built a ConvLSTM model in PyTorch trained on 500 GB of WRF weather data. Used a VAE-based preprocessing step to compress input features and cut training time. Deployed the full pipeline to active meteorologist workflows with a React dashboard for live monitoring.

The Impact

92% accuracy on live meteorological data
Deployed to production at ISRO NRSC, used by working meteorologists
35% training time reduction via VAE optimisation

Python PyTorch ConvLSTM VAE React WRF Data

Full write-up in progress — thesis documentation being reconstructed from memory.

02

Academic Research

GAN-Enhanced Vocal Biomarker Analysis for Respiratory Health Assessment

The Problem

Non-invasive respiratory disease screening at scale requires fast, low-cost alternatives to traditional diagnostics. This study explored whether deep learning on vocal biomarkers — cough audio, breathing patterns, speech — could classify respiratory conditions accurately enough to be clinically useful.

The Approach

Used the COSWARA dataset (IISc Bangalore) — a crowd-sourced collection of cough, breathing, and vowel recordings across healthy and COVID-positive individuals. Explored three GAN architectures (SGAN, WGAN, CGAN) for synthetic data augmentation. Extracted audio features including MFCC, Mel-spectrogram, Chroma, and Zero Crossing Rate. Evaluated multiple classifiers including SVM, LSTM, CNN+RNN, and Transformer. Pivoted from GAN augmentation to SMOTE mid-project due to compute constraints — a practical decision that maintained classification robustness.

What I learned

LSTM architectures consistently outperformed everything else on sequential cough audio. GAN training on raw audio is compute-prohibitive without dedicated GPU infrastructure. The broader finding — that vocal biomarkers alone cannot reliably generalise across demographics for COVID detection — is now well established in the literature. The methodology for respiratory audio classification transfers cleanly to other conditions.

Python GAN WGAN CGAN LSTM CNN MFCC Mel-spectrogram SMOTE COSWARA

Reimplementation in progress — new application domain, improved architecture.

04

Build Log Product

Career Radar — A Live Job-Fit Analyser Built During My Own Job Hunt

The Problem

Job hunting during an MSc generates a specific question no tool answered well: not just "what jobs are available" but "how well does my actual skill set match what the market is demanding right now, and how is that demand shifting week to week?"

What It Does

Fetches live job listings across target roles (ML Engineer, MLOps, DevOps), extracts skill requirements, and cross-references them against technologies visible in a GitHub profile. Outputs a live fit score and a ranked skill gap list.

What I Learned

Scraping job boards varies wildly in reliability. Skill extraction from job descriptions requires more normalisation than expected — "PyTorch", "pytorch", and "torch" are the same thing to a human but not to a naive parser. The most useful output ended up being the gap list, not the fit score.

Python Scraping NLP CLI GitHub API

View Source Code

Want Similar Results?

I'm available for challenging engineering roles and freelance projects.

Get in Touch View Services →