three.js
This example demonstrates the performance of various simple parallel reduction kernels.
Reference implementations are translated from the CUDA/WGSL code present in the following books/repos:
Impl. 0 - 2:
Programming in Parallel with CUDA
by
Richard Ansorge
Impl. 3:
betann reduce_all kernel
by
zcbenz
Impl. 4:
GPUPrefixSums reduction approach
by
b0nes164
Subgroup Reduction Explanation
Use subgroupAdd() to capture reduction of each workgroup's subgroups (Hover for animation)