svah-x
Waterloo · Math · ML Systems

Kelvin Peng

Math (Combinatorics & Optimization + Statistics) @ University of Waterloo. Currently focused on world-model reinforcement learning, topology-guided optimization, and efficient LLM fine-tuning.

Focus
World Models · Optimization
Tooling
JAX · PyTorch · GUDHI

About

I like problems that sit between theory and engineering: training stability, evaluation discipline, and building research prototypes that are actually reproducible. Recently, I’ve been working on a DreamerV3-style world-model agent for Geometry Dash and a topology-guided optimizer (TopoAdamW) for non-convex training.

Projects

World Models · Reinforcement Learning
Research prototype
GitHub →

Geometry Dash World-Model Agent (DreamerV3-style)

A DreamerV3-style agent for a 60Hz physics-driven game environment with tight failure constraints, built with a custom Gymnasium stack, Windows↔WSL synchronization, and high-frequency logging for reproducible evaluation.

JAX DreamerV3-style Gymnasium Windows↔WSL bridge High-frequency logger
Environment
Custom Gymnasium env + reproducible evaluation harness.
Systems
Windows↔WSL bridge to sync observations/state and actions.
Debugging
High-frequency trajectories for offline analysis and sanity checks.
Optimization · Topological Data Analysis
GitHub →

TopoAdamW: TDA-Guided Meta-Optimizer

A PyTorch optimizer that uses GUDHI-based TDA features to probe local loss-landscape geometry (e.g., sharp vs. flat regions) and adapt update behavior, with stability safeguards built in.

PyTorch GUDHI Loss landscape Reproducible eval
LLM Systems

Efficient Fine-Tuning (Dream-7B, GPT-OSS-20B)

Memory-efficient fine-tuning pipelines using QLoRA (4-bit), gradient checkpointing, and DeepSpeed, targeting both single-GPU (16GB) and multi-GPU setups with stable, reproducible evaluation.

DeepSpeed QLoRA 4-bit Benchmarking

Contact