-
Notifications
You must be signed in to change notification settings - Fork 11
Description
(Following up on the discussion in #4, cc @BobBane)
I avoided double-precision in the past because Apple's OpenCL compiler was very buggy with doubles (crashing etc). The compiler quality has improved considerably since 2011, and my other OpenCL project (not open source, and not geo-related) uses double-precision exclusively now.
I see a few paths forward here:
-
Migrate everything to double-precision
-
Maintain separate kernels for single-precision and double-precision
-
Use typedefs / #defines with a single set of kernels
I'm hesitant to adopt the #define KFLOAT float approach because the algorithms themselves may need to differ between single and double-precision. I.e. I use a lot of tricks to avoid round-off in the single-precision world that wouldn't be necessary with double-precision. Then there's the various tolerance levels and iteration counts that will need to differ between the two precisions, as well as the annoyance that literals must have an "f" specifier in the single-precision world (i.e. 0.5 has to be written 0.5f).
The only real reason to maintain single-precision is for applications that (strongly) prefer speed over accuracy, or to support GPUs that lack double-precision support. With Magic Maps, the slowdown might be an acceptable trade-off, but I won't really know until I try it out. So I'm hesitant to rip out the single-precision code willy-nilly.
For now I'm leaning toward a two-kernel world, implementing (porting) double-precision versions of projections as needed. I imagine an extra argument to pl_context_init would specify the desired computation precision — which would later be passed to pl_find_kernel — and I think that a wrapper function (or several) around clSetKernelArg should mean we won't have to duplicate many host-side functions. I'm envisioning two separate folders kernel/float/ and kernel/double/, with all OpenCL functions prefixed with either plf_ or pld_ depending on the precision. Generally only one set of kernels would be compiled for a given PLContext.
To keep a unified C API, I am fine with requiring all input/output buffers to be double-precision; for my application, copying matters much less than raw computation speed, so I'm okay with sending in and reading out only double and double *. So the user would only need to think about the precision choice when initializing the context, and thus could easily switch between computation precisions without having to rework all the client code.
What do you think?