Zain Fathoni's public lab notebook for AI inference runtime engineering — from Triton kernels to deployable inference systems.
The short-term forcing function is readiness for Netra Runtime-style inference
puzzles: six technical
tasks covering Triton kernels, quantization/dequantization,
torch.compile, QLoRA/FSDP2, and benchmark-driven explanations.
The long-term direction is AI runtime engineering: understanding how models move from PyTorch code to fast, observable, deployable inference systems.
/teach style.ai.zainf.dev here.💬 This is a teaching lab first and a polished portfolio later. Lessons are allowed; mastery is tested through recall, correction, execution, and measurement.
github.com/zainfathoni/ai-inference-lab