- Project Description
- Coarse Grain Approach
- Fine Grain Approach
- Algorithm Inputs and Outputs
- Results
- More Information
This project originated from a High Performance Computing Class at Iowa State University, with the main objective being the demonstration of parallelism utilizing the HPC Cluters at ISU. The focal point was the parallelization of this algorithm for the Discrete Fourier Transform, authored by Tim Burke. Rather than solely replicating the algorithm, the project's innovation lay in the development of both coarse-grained and fine-grained techniques, enhancing the existing algorithm's ability to transform randomly generated signal data into real and imaginary components.
In this section of the project, I implemented a coarse-grained parallelism approach to enhance the computation of the Discrete Fourier Transform (DFT). The strategy involved dividing the DFT calculation into independent segments and distributing them across multiple threads for simultaneous processing. Using OpenMP directives in the C code, I parallelized the primary loop responsible for calculating DFT coefficients with #pragma omp parallel for. This directive allocated iterations of the loop across available threads, with each thread managing a distinct subset of computations. Leveraging coarse-grained parallelism, each thread independently calculated segments of the DFT coefficients, reducing the need for communication or synchronization between threads during computation and resulting in enhanced performance for large-scale DFT computations.
In this section of the project, I employed a fine-grained parallelism technique to enhance the Discrete Fourier Transform (DFT) computation. Unlike the coarse-grained method, this approach dissects the calculation into smaller, concurrent tasks within each iteration of the DFT loop. By leveraging OpenMP directives to parallelize the computation of both real and imaginary parts independently, I subdivided the inner loops into smaller tasks executed by separate threads. This finer granularity maximized resource utilization and improved performance, particularly beneficial for handling large datasets or complex computations.
As mentioned previously, the goal was to run these algorithms over randomly generated signal data, such as this generated set:
Plotted over time, this looks like:
After a successfull run of the code, I was able to extract the real and imaginary components from the signal and plot them:
In addition, the spectral amplitude was also calculated and plotted:
In the context of this code, the spectral amplitude represents the magnitude of individual frequency components in the Discrete Fourier Transform (DFT) output. It is calculated as the square root of the sum of squares of the real and imaginary parts of each DFT coefficient. Essentially, higher spectral amplitudes indicate stronger contributions from corresponding frequencies in the original signal.
For each run on the cluster, a standard input of 10000 randomly generated signal data was passed into the algorithm.
Before applying parallelism, I first tested the code in serial on the cluster. Here we see a run time of more than 8 seconds.
In implementing the coarse-grained approach, I progressively scaled up the number of clusters:
- 4 -> 8 -> 16 -> 32
Here are those results. On 32 clusters, the run time was reduced to approximately 0.446 seconds.
For the fine-grained approach using the same scale up, I reduced the run time to 0.477 seconds.
-
For a comprehensive tutorial on the Discrete Fourier Transform (DFT), check out Steve Brunton's tutorial.
-
For more in-depth reading, consider:








