Zain's AI Inference Lab

Zain Fathoni's public lab notebook for AI inference runtime engineering — from Triton kernels to deployable inference systems.

Start here

The short-term forcing function is readiness for Netra Runtime-style inference puzzles: six technical tasks covering Triton kernels, quantization/dequantization, torch.compile, QLoRA/FSDP2, and benchmark-driven explanations.

The long-term direction is AI runtime engineering: understanding how models move from PyTorch code to fast, observable, deployable inference systems.

Mission · the learning and career strategy behind this lab. raw
Resources · the sources, codebases, and constraints guiding the work. raw

Lessons

Lesson 1 — Triton deletes a level · how the CUDA thread→block→grid model re-draws itself in Triton.

Artifact types

Lessons teach one concept with the AI-assisted /teach style.
Learning records capture my recall, corrections, and what actually clicked.
Experiments run code, capture output, measure behavior, and document failures.
Portfolio artifacts are curated later from the strongest records and experiments.

Near-term plan

Launch this lab framing and point ai.zainf.dev here.
Run Experiment 0001: a first Triton kernel on a free T4 via Kaggle or Colab.
Publish the correctness check, benchmark table, notebook link, and failure notes.
Use that loop to attack Netra Task A-style NF4 dequantization.

Learning records

Record 0001 — Triton program instance recall · waiting for learner recall and Experiment 0001 output.

Reference

CUDA ↔ Triton cheat-sheet · the execution-model vocabulary, compressed.

💬 This is a teaching lab first and a polished portfolio later. Lessons are allowed; mastery is tested through recall, correction, execution, and measurement.

github.com/zainfathoni/ai-inference-lab