Engineering Deep Dives

Real problems. Real solutions. Measured impact.

01
Academic Production ML

ISRO Lightning Prediction System

The Problem

Lightning strike prediction from live meteorological data is a critical operational problem. Manual forecasting at NRSC could not keep pace with real-time WRF model outputs, and existing tools required too much analyst intervention.

The Approach

Built a ConvLSTM model in PyTorch trained on 500 GB of WRF weather data. Used a VAE-based preprocessing step to compress input features and cut training time. Deployed the full pipeline to active meteorologist workflows with a React dashboard for live monitoring.

The Impact

  • 92% accuracy on live meteorological data
  • Deployed to production at ISRO NRSC, used by working meteorologists
  • 35% training time reduction via VAE optimisation
Python PyTorch ConvLSTM VAE React WRF Data
Full write-up in progress — thesis documentation being reconstructed from memory.
02
Academic Research

GAN-Enhanced Vocal Biomarker Analysis for Respiratory Health Assessment

The Problem

Non-invasive respiratory disease screening at scale requires fast, low-cost alternatives to traditional diagnostics. This study explored whether deep learning on vocal biomarkers — cough audio, breathing patterns, speech — could classify respiratory conditions accurately enough to be clinically useful.

The Approach

Used the COSWARA dataset (IISc Bangalore) — a crowd-sourced collection of cough, breathing, and vowel recordings across healthy and COVID-positive individuals. Explored three GAN architectures (SGAN, WGAN, CGAN) for synthetic data augmentation. Extracted audio features including MFCC, Mel-spectrogram, Chroma, and Zero Crossing Rate. Evaluated multiple classifiers including SVM, LSTM, CNN+RNN, and Transformer. Pivoted from GAN augmentation to SMOTE mid-project due to compute constraints — a practical decision that maintained classification robustness.

What I learned

LSTM architectures consistently outperformed everything else on sequential cough audio. GAN training on raw audio is compute-prohibitive without dedicated GPU infrastructure. The broader finding — that vocal biomarkers alone cannot reliably generalise across demographics for COVID detection — is now well established in the literature. The methodology for respiratory audio classification transfers cleanly to other conditions.

Python GAN WGAN CGAN LSTM CNN MFCC Mel-spectrogram SMOTE COSWARA
Reimplementation in progress — new application domain, improved architecture.
03
Academic Build Log In Progress

Early ML Detection of Memory Leaks in Containerised Environments

The Problem

Memory leaks in containerised microservices degrade performance gradually and are hard to catch early. Existing monitoring tools alert only after degradation is visible. Rule-based thresholds miss subtle pre-failure patterns in container memory metrics.

The Approach

Building an ML-based early warning system. cAdvisor and Prometheus collect container metrics; eBPF (memleak-bpfcc) provides ground truth leak labels. TimescaleDB stores time-series data with a leak_label schema (0=normal, 1=leak, 2=spike, -1=unknown). Isolation forest and LSTM are the candidate detection models. Grafana for visualisation.

Current Status

  • Working C malloc leak injector for controlled experiments
  • 18 curated Docker Hub images across 5 categories as the test dataset
  • EIDF VM provisioned (4 vCPU, 7.8 GB RAM) on the University of Edinburgh cluster
  • Data collection pipeline in progress
  • Supervised by Ben Carpenter, EPCC Applications Developer
Python eBPF Prometheus TimescaleDB Docker scikit-learn Grafana C
Live dissertation project — updated as work develops.
04
Build Log Product

Career Radar — A Live Job-Fit Analyser Built During My Own Job Hunt

The Problem

Job hunting during an MSc generates a specific question no tool answered well: not just "what jobs are available" but "how well does my actual skill set match what the market is demanding right now, and how is that demand shifting week to week?"

What It Does

Fetches live job listings across target roles (ML Engineer, MLOps, DevOps), extracts skill requirements, and cross-references them against technologies visible in a GitHub profile. Outputs a live fit score and a ranked skill gap list.

What I Learned

Scraping job boards varies wildly in reliability. Skill extraction from job descriptions requires more normalisation than expected — "PyTorch", "pytorch", and "torch" are the same thing to a human but not to a naive parser. The most useful output ended up being the gap list, not the fit score.

Python Scraping NLP CLI GitHub API
View Source Code
05
Build Log Product

EtsyBot — Fully Automated Etsy Digital Products Pipeline

The Problem

Every step of selling digital products on Etsy — research, image generation, SEO copywriting, uploading, tracking analytics — is repetitive and time-consuming if done manually. None of it requires a human decision at runtime.

The Approach

Five-module Python pipeline orchestrated by APScheduler running on an Oracle Cloud Free ARM VM (4 vCPU, 24 GB RAM). Modules: market research, image generation (Leonardo.ai + FLUX.1-dev), SEO copy (Gemini 1.5 Flash), Etsy upload via Open API v3, analytics. A manual approval queue acts as a safety gate before any listing goes live.

Status

  • Live and running on Oracle Cloud Free Tier
  • One-time cost: £15 UK seller fee
  • Ongoing cost: zero (free tier VM, free API quotas)
Python APScheduler Gemini API Leonardo.ai FLUX.1-dev Etsy API Oracle Cloud
06
Build Log Product

ActualMind — Automated Faceless YouTube Shorts Channel

The Problem

Consistent short-form video output for a faceless channel (space, science, technology, history, mythology, gaming) requires daily scripting, voiceover, footage sourcing, and editing. Done manually, this is unsustainable alongside a full-time MSc.

The Approach

Python pipeline using APScheduler for scheduling, Gemini Flash for scripting, Coqui TTS for voiceover, MoviePy for video assembly, Pexels and NASA APIs for footage. Hosted on Azure for Students. Manual approval queue before upload, capped at 3 to 5 videos per day.

Status

  • Live on Azure for Students
  • Automated scripting, voiceover, and video assembly
  • Manual approval queue before YouTube upload
Python MoviePy Coqui TTS Gemini API Pexels API NASA API YouTube Data API v3 Azure

Want Similar Results?

I'm available for challenging engineering roles and freelance projects.