aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md15
1 files changed, 8 insertions, 7 deletions
diff --git a/README.md b/README.md
index 7c1d41f..1e7601e 100644
--- a/README.md
+++ b/README.md
@@ -2,14 +2,15 @@
This is a work-in-progress port of my simple fractal-gen software to OpenCL.
That software was an experiment of mine to generate mandelbrot (and some of
-its cousin) fractals on CPU. This is my attempt at porting that software to
-OpenCL so it can be used on a multitude of computation devices, including GPUs.
+its cousin) fractals on the CPU. This is my attempt at porting that software to
+OpenCL so it can be used on a multitude of computation devices, chiefly GPUs.
I had started to port it to CUDA in October 2016, but changed to CL because
of its portability and open nature.
-Software is still in early days and needs more CL kernels for such fractals as
-tricorn, burning ship, and julia sets to name a few.
+The software is working with basic mandelbrot fractals, but needs more CL
+kernels for such fractals as tricorn, burning ship, and julia sets to name a
+few.
Below is a simple demo image produced with the software. It is a simple
mandelbrot fractal using only 75 iterations, at 768 square pixels. The
@@ -21,9 +22,9 @@ Such a small, low-detail image will not provide a case for using GPU rather
than CPU, but once you start upping the image size and detail, a modern GPU
will provide endless benefit over a modern CPU.
-For example, using the CPU-based fractal-gen, a 10240x10240 pixel image at
+For example, using the CPU-based fractal-gen, a 10240x10240 pixel image at a
10000 iteration cutout per pixel, the image will complete in about 2200 seconds
when running on all 32 threads of a dual-Xeon E5-2670 setup. Compare this to
the runtime of this software on a (much cheaper) NVIDIA GTX 1070; 1 second.
-
-This is using single-precision floats on the GPU.
+This is using single-precision floats on the GPU, but changing to `double` data
+types only slows the 1070 to about 8 seconds.