Thrust has the thrust::reduce_by_key
algorithm which works well for a problem of mine. I wanted to try to use CUB for finer control of memory and streams as well as interaction with my own kernels, but there doesn't seem to be a reduce_by_key
equivalent in CUB. There is cub::DeviceSegmentedReduce
but it doesn't seem to output a key order, and I would need to find the start of each consecutive segment of keys beforehand which is its own problem.
The problem is to reduce an array of key-value pairs into an array where all keys are unique, and each value has the reduction of all same-key values.
Is there an easy way to do this with CUB that I am not seeing?