Record 0001 — Triton program instance recall
Status: waiting for learner recall and Experiment 0001 output.
Prompt
Without looking at Lesson 0001, explain:
- What is a Triton program instance?
- How does it map to CUDA's grid/block/thread picture?
- In the vector-add kernel, what does
tl.program_id(axis=0)identify? - Why does the kernel need a mask?
- What does Triton hide from me, and what performance questions remain my responsibility?
My cold-recall answer
TODO: write this before rereading the lesson.
Correction
TODO: after review, note what was wrong, missing, or fuzzy.