CUDA Accelerated Computational Fluid Simulation by mazacar8

Summary

We are going to create a high resolution dynamic fluid simulator with the aim of efficiently using the large amount of parallelism in the CUDA-enabled NVIDIA GTX 780 GPU. The simulator is based on the Navier-Stokes equations which describes the motion of viscous fluids. Another goal of this project is to compare performance against a 2.2 GHz intel core i7 processor.

Background

We need a quantitative representation of a fluids’ state to simulate its behavior. Velocity and pressure are the most important quantity that we can represent since they describe how a fluid moves and its interaction with surroundings. Thus we represent a fluid as a vector and pressure field across the space of the fluid.

Grid Image

If the velocity and pressure are known for the initial time t = 0, then the state of the fluid over time can be described by the Navier-Stokes equations for incompressible flow.

These equations are derived from applying Newton’s second law to fluid motion with the assumption that the stress in the fluid is the sum of a diffusing viscous term and a pressure term. The key to fluid simulation is to determine the current velocity and pressure field at each time step using these equations.

Our application is compute-intensive since we need to continuously solve the Navier-Stakes equations for all particles across the Cartesian grid at every time step. A time step consists of the following step in code (ref: http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html) :

u = advect(u); u = diffuse(u); u = addForces(u); computePressure(u); u = subtractPressureGradient(u, p);

Thus, this problem will benefit with parallelism since we can parallelize these computations for all the Cartesian points. Also, a lot of temporary storage is needed and the memory model of a GPU would help reduce latency for the application.

The Challenge

The primary challenge of our project is to deal with the dependencies between the different steps in the sequential algorithm. The sequential algorithm first applies advection forces, followed by diffusion, external forces and then subtracts the pressure gradient to obtain a final result for that step. Finding a way to eliminate dependencies between these steps to maximize performance and maintain correctness is going to be one of the main focuses of our algorithm.
A limitation of the numerical steps of the Navier-Stokes algorithm is that most of them cannot be done in place. This requires us to store temporary values. This would require us to store temporary values at each step for each pixel and then accessing each of these values for the next step. Figuring out a way to manage the lack of in-place algorithms/coming up with in-place algorithms will have a major impact on memory access latencies and cache locality of our algorithm.
In terms of workload, instruction stream divergence at the boundaries of the fluid or different parts of the fluid at different velocities will affect the efficiency of our code through work imbalance between SIMD execution units. Since we'll be working over tens of thousands of pixels, utilizing the full capability of the GPU should not be an issue.

Resources

The ghc41 machine has the NVIDIA GTX 780 GPU and machines 47-84 have the i7 processor that we need.
We plan to start from scratch without starter code.
We will be using CUDA and OpenMP API.
Also, we will need to use libraries for graphics support in C++ and will require guidance for this.
Two articles we have so far referenced are:
- http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html
- http://cse.mathe.uni-jena.de/pub/diplom/fritzsche.pdf

Goals & Deliverables

Plan To Achieve

A parallel implementation to solve the Navier-Stokes differential equation that produces a significant speedup on the GPU in comparison to a CPU implementation.
A simple and clear visual representation of this fluid simulation in real time.
Graphs or timing analysis to show speedup from the sequential implementation.

Update ((Checkpoint)): We still think we can achieve the above goals effectively.

Hope To Achieve

A parallel CPU implementation of the algorithm.
A detailed comparison between the CPU and GPU implementation with support of graphs and timing data.
Modify the fluid simulator to take into account external, real time forces.

Update ((Checkpoint)):

We are still unsure of whether we will be able to implement a version parallel version of the algorithm for the CPU. At this point we might focus our efforts on comparing the CUDA version to a sequential implementation. We feel that this will help us achieve the best results we can using the GPU and also help us test and improve our code more thoroughly.

Demo

For the demo, we hope to showcase a few examples of visual fluid simulation and explain the speedup obtained because of using the GPU with support from analytical data we collect.

Update (04/19/16): At our demo, we hope to show the following:

An animation that uses the CUDA accelerated version of our fluid simulator.
Graphs and statistics detailing the speedup achieved for different animations using the CUDA version as opposed to a sequential version.
Graphs comparing GPU and CPU performance (if we implement the CPU version).

Platform Choice

We will be implementing the simulator in C++ using the CUDA platform to work with the NVIDIA GTX 780 GPU. The ghc41 lab machine contains this GPU. For the CPU implementation, we will use the quad-core 2.2 GHz Intel Core i7 processors which are contained in ghc[47 -84]. The OpenMP API would be used to support parallelism on the CPU.

These systems are a good choice for our fluid simulator since the GPU is the fastest one available and is well suited to a compute-intensive project due to high scope for parallelism. The i7 processor would be a good benchmark to analyze CPU performance and compare it with the GPU's performance.

Updated Detailed Schedule (Checkpoint)

Date	Goals	Status	Lead
April 10, Sunday	Complete a sequential implementation of the Navier Stokes equation that works on small grid dimensions.	Completed	Abhishek and Preetam
April 15, Friday	Setup a graphical representation of our implementation to visually see the fluid simulation.	Completed	Abhishek and Preetam
April 21, Thursday	Simply parallelize the sequential implementation for the NVIDIA GTX 780 GPU.	In Progress	Abhishek
April 24, Sunday	Explore ideas related to how more aspects of the project can be parallelized.	In Progress	Preetam
April 28, Thursday	Optimize the parallel implementation to achieve a higher speedup.	Not Started	TBA
May 1, Sunday	Parallelize for the i7 processor and compare results with the GPU.	Not Started	TBA
May 05, Thursday	Work on improving the graphical interface and fixing bugs in the project.	Not Started	TBA
May 09, Monday	Explore further possibilities and additions to the project.	Not Started	TBA

Work Completed So Far(Checkpoint)

We completed the Sequential Implementation of the Navier Stokes Equation. To accomplish this, we understood the algorithm and segregated it into different parts. For each part of the algorithm, we implemented a separate function. An application of these in order to the grid state performs one time step of the Navier-Stokes equation.
For the graphical representation, we began by going through the starter code of Assignment 2 and used it to understand how graphics work in C++. We still have to work on effectively showing fluid simulation graphically.

Concerning Issues(Checkpoint)

The most concerning issue right now is we are unsure of whether we can effectively employ 3D graphics for our simulations.

Note: Since we are still finding an effective way to show fluid simulation we do not have preliminary results at this time.