three.js
This example demonstrates the performance of various simple parallel reduction kernels.
Reference implementations are translated from the CUDA code present in the following books/repos:
Programming in Parallel with CUDA
by
Richard Ansorge