Record 0001 — Triton program instance recall

Status: waiting for learner recall and Experiment 0001 output.

Prompt

Without looking at Lesson 0001, explain:

What is a Triton program instance?
How does it map to CUDA's grid/block/thread picture?
In the vector-add kernel, what does tl.program_id(axis=0) identify?
Why does the kernel need a mask?
What does Triton hide from me, and what performance questions remain my responsibility?

TODO: write this before rereading the lesson.

TODO: after review, note what was wrong, missing, or fuzzy.