As GPUs are generally harder to debug than a regular C++ program, verifying the results is utterly important. As we are dealing with floating-point math, the result might not be 100% accurate, therefore we use a flt_eq() function that accepts minor differences as equal:
auto test_kernel(bc::context& ctx, bc::command_queue& q, bc::kernel& k) { auto flt_eq = [](float a, float b) { auto epsilon = 0.00001f; return std::abs(a - b) <= epsilon; }; auto cpu = box_filter_test_cpu(2000, 1000, 2); auto gpu = box_filter_test_gpu(2000, 1000, 2, ctx, q, k); auto is_equal = cpu == dst; auto is_almost_equal = std::equal( cpu.begin(), cpu.end(), gpu.begin(), flt_eq ); std::cout << "is_equal: " << is_equal << '\n' << "is_almost_equal: ...