Experiment 0001 — Triton vector add on a free T4

Status: prepared, not yet run on T4.

Purpose

Close the first real lab loop after Lesson 0001:

  1. Run a simple Triton kernel on an NVIDIA GPU.
  2. Check correctness against PyTorch.
  3. Capture a small benchmark against torch addition.
  4. Record what broke or surprised me.

This is not meant to prove performance skill yet. It proves the workflow: local notes → free GPU run → captured output → learning record.

Target environment

How to run

In Colab/Kaggle, upload or paste vectoraddbenchmark.py, then run:

python vector_add_benchmark.py

If Triton is missing:

pip install triton
python vector_add_benchmark.py

What to capture after running

Paste the terminal/notebook output below.

TODO: paste actual output here after T4 run.

Notes after run

Link to learning record