Support for double-precision calculations

(Following up on the discussion in #4, cc @BobBane)

I avoided double-precision in the past because Apple's OpenCL compiler was very buggy with doubles (crashing etc). The compiler quality has improved considerably since 2011, and my other OpenCL project (not open source, and not geo-related) uses double-precision exclusively now.

I see a few paths forward here:

1. Migrate everything to double-precision

2. Maintain separate kernels for single-precision and double-precision

3. Use typedefs / #defines with a single set of kernels

I'm hesitant to adopt the `#define KFLOAT float` approach because the algorithms themselves may need to differ between single and double-precision. I.e. I use a lot of tricks to avoid round-off in the single-precision world that wouldn't be necessary with double-precision. Then there's the various tolerance levels and iteration counts that will need to differ between the two precisions, as well as the annoyance that literals must have an "f" specifier in the single-precision world (i.e. `0.5` has to be written `0.5f`).

The only real reason to maintain single-precision is for applications that (strongly) prefer speed over accuracy, or to support GPUs that lack double-precision support. With Magic Maps, the slowdown might be an acceptable trade-off, but I won't really know until I try it out. So I'm hesitant to rip out the single-precision code willy-nilly.

For now I'm leaning toward a two-kernel world, implementing (porting) double-precision versions of projections as needed. I imagine an extra argument to `pl_context_init` would specify the desired computation precision — which would later be passed to `pl_find_kernel` — and I think that a wrapper function (or several) around `clSetKernelArg` should mean we won't have to duplicate many host-side functions. I'm envisioning two separate folders `kernel/float/` and `kernel/double/`, with all OpenCL functions prefixed with either `plf_` or `pld_` depending on the precision. Generally only one set of kernels would be compiled for a given `PLContext`.

To keep a unified C API, I am fine with requiring all input/output buffers to be double-precision; for my application, copying matters much less than raw computation speed, so I'm okay with sending in and reading out only `double` and `double *`. So the user would only need to think about the precision choice when initializing the context, and thus could easily switch between computation precisions without having to rework all the client code.

What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for double-precision calculations #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support for double-precision calculations #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions