Zain's AI Inference Lab

Zain Fathoni's public lab notebook for AI inference runtime engineering — from Triton kernels to deployable inference systems.

Start here

The short-term forcing function is readiness for Netra Runtime-style inference puzzles: six technical tasks covering Triton kernels, quantization/dequantization, torch.compile, QLoRA/FSDP2, and benchmark-driven explanations.

The long-term direction is AI runtime engineering: understanding how models move from PyTorch code to fast, observable, deployable inference systems.

Lessons

Artifact types

Near-term plan

  1. Launch this lab framing and point ai.zainf.dev here.
  2. Run Experiment 0001: a first Triton kernel on a free T4 via Kaggle or Colab.
  3. Publish the correctness check, benchmark table, notebook link, and failure notes.
  4. Use that loop to attack Netra Task A-style NF4 dequantization.

Learning records

Reference

💬 This is a teaching lab first and a polished portfolio later. Lessons are allowed; mastery is tested through recall, correction, execution, and measurement.

github.com/zainfathoni/ai-inference-lab