Record 0001 — Triton program instance recall

Status: waiting for learner recall and Experiment 0001 output.

Prompt

Without looking at Lesson 0001, explain:

  1. What is a Triton program instance?
  2. How does it map to CUDA's grid/block/thread picture?
  3. In the vector-add kernel, what does tl.program_id(axis=0) identify?
  4. Why does the kernel need a mask?
  5. What does Triton hide from me, and what performance questions remain my responsibility?

My cold-recall answer

TODO: write this before rereading the lesson.

Correction

TODO: after review, note what was wrong, missing, or fuzzy.

Link to experiment