three.js
This example demonstrates the performance of various simple parallel reduction kernels.
Reference implementations are translated from the CUDA/WGSL code present in the following books/repos:
Impl. 0 - 2: Programming in Parallel with CUDA by Richard Ansorge
Impl. 3: betann reduce_all kernel by zcbenz
Impl. 4: GPUPrefixSums reduction approach by b0nes164

Subgroup Reduction Explanation

Use subgroupAdd() to capture reduction of each workgroup's subgroups (Hover for animation)