多次元Block sizeのWarp idについて
消えたCUDA関連の旧ブログ記事を復元するひとり Advent Calendar 2024の記事です。
概要
CUDAのカーネル関数の呼び出しで,
const dim3 block_size(4, 4, 4);
kernel<<<1, block_size>>>(...);
としたときに起動されるスレッドのWarp idがどうなっているか,という話です.
調べ方
素直にWarp idを調べます. CUDAにはWarp idを得るためのレジスタがあるので、それよ読めばOKです.
結果
の場合,
[threadIdx.x, threadIdx.y, threadIdx.z] : warpid = X [ 0, 0, 0] : warpid = 0 [ 0, 0, 1] : warpid = 0 [ 0, 0, 2] : warpid = 1 [ 0, 0, 3] : warpid = 1 [ 0, 1, 0] : warpid = 0 [ 0, 1, 1] : warpid = 0 [ 0, 1, 2] : warpid = 1 [ 0, 1, 3] : warpid = 1 [ 0, 2, 0] : warpid = 0 [ 0, 2, 1] : warpid = 0 [ 0, 2, 2] : warpid = 1 [ 0, 2, 3] : warpid = 1 [ 0, 3, 0] : warpid = 0 [ 0, 3, 1] : warpid = 0 [ 0, 3, 2] : warpid = 1 [ 0, 3, 3] : warpid = 1 [ 1, 0, 0] : warpid = 0 [ 1, 0, 1] : warpid = 0 [ 1, 0, 2] : warpid = 1 [ 1, 0, 3] : warpid = 1 [ 1, 1, 0] : warpid = 0 [ 1, 1, 1] : warpid = 0 [ 1, 1, 2] : warpid = 1 [ 1, 1, 3] : warpid = 1 [ 1, 2, 0] : warpid = 0 [ 1, 2, 1] : warpid = 0 [ 1, 2, 2] : warpid = 1 [ 1, 2, 3] : warpid = 1 [ 1, 3, 0] : warpid = 0 [ 1, 3, 1] : warpid = 0 [ 1, 3, 2] : warpid = 1 [ 1, 3, 3] : warpid = 1 [ 2, 0, 0] : warpid = 0 [ 2, 0, 1] : warpid = 0 [ 2, 0, 2] : warpid = 1 [ 2, 0, 3] : warpid = 1 [ 2, 1, 0] : warpid = 0 [ 2, 1, 1] : warpid = 0 [ 2, 1, 2] : warpid = 1 [ 2, 1, 3] : warpid = 1 [ 2, 2, 0] : warpid = 0 [ 2, 2, 1] : warpid = 0 [ 2, 2, 2] : warpid = 1 [ 2, 2, 3] : warpid = 1 [ 2, 3, 0] : warpid = 0 [ 2, 3, 1] : warpid = 0 [ 2, 3, 2] : warpid = 1 [ 2, 3, 3] : warpid = 1 [ 3, 0, 0] : warpid = 0 [ 3, 0, 1] : warpid = 0 [ 3, 0, 2] : warpid = 1 [ 3, 0, 3] : warpid = 1 [ 3, 1, 0] : warpid = 0 [ 3, 1, 1] : warpid = 0 [ 3, 1, 2] : warpid = 1 [ 3, 1, 3] : warpid = 1 [ 3, 2, 0] : warpid = 0 [ 3, 2, 1] : warpid = 0 [ 3, 2, 2] : warpid = 1 [ 3, 2, 3] : warpid = 1 [ 3, 3, 0] : warpid = 0 [ 3, 3, 1] : warpid = 0 [ 3, 3, 2] : warpid = 1 [ 3, 3, 3] : warpid = 1となり,threadIdx.z = 0, 1の32スレッドがWarp 0, threadIdx.z = 2, 3の32スレッドがWarp 1に割り当てられていることが分かります. あくまで32 threadsで1 warpを作り、その中のスレッドにthreadIdx.x, y, zを割り当てているような挙動ですね. など,きれいな数字の場合きれいにWarp idが割り振られるようです.
少し気持ち悪目に の場合,
$ ./a.out | sort [ 0, 0, 0] : warpid = 0 [ 0, 0, 1] : warpid = 0 [ 0, 0, 2] : warpid = 0 [ 0, 0, 3] : warpid = 1 [ 0, 0, 4] : warpid = 1 [ 0, 1, 0] : warpid = 0 [ 0, 1, 1] : warpid = 0 [ 0, 1, 2] : warpid = 1 [ 0, 1, 3] : warpid = 1 [ 0, 1, 4] : warpid = 1 [ 0, 2, 0] : warpid = 0 [ 0, 2, 1] : warpid = 0 [ 0, 2, 2] : warpid = 1 [ 0, 2, 3] : warpid = 1 [ 0, 2, 4] : warpid = 2 [ 0, 3, 0] : warpid = 0 [ 0, 3, 1] : warpid = 0 ...:thinking_face:
おまけ
あまり一般に呼ばれることは少ないですが,1 warp内のthread idのことをlane id、thread block内のwarpのidのことをwarp idと呼びます(少なくともregister名的には).