One of the real joys of people adopting blogs is the pure serendipity of discovering useful info – some times new and sometimes old. In an old article on Steve Jenks parallel computing blog there is a great summary on different approaches to parallel simulaiton solutions and how sometimes an algorithmic optimization can make a problem run faster on fewer processors than when it is purely distributed on a greater number of processors. The discussion on parallelization of an N-body simulation problem is directly related to some of the work we are doing here on integrating micro and macro simulations.

In an N-Body problem a number of bodies (n), that interact with each by exerting a force e.g. gravity or electromagnetism on each other. During the simulation, the forces on each body (from all the other bodies) are computed for each time step, then resultant acceleration and movement can be calculated for that step of the simulation. This process means that O(n^2) force calculations are needed for each time step with the simulation usually running for many many time steps (usually hundreds or thousands). So for complex systems comprising a lot of bodies – the number of calculations quickly gets very large. In Jenk’s example, for 50000 bodies, roughly 2.5 billion force calculations are needed per time step.

These types of simulations map well to parallelization and can be effectively implemented using architectures such as CUDA (hence the great physics demo examples using PhysX). Using CUDA each thread can compute the force on a single entity by stepping through all the other bodies in the simulation and determining the resultant interactions. Most articles on CUDA and n-body parallization stop there – however Steve’s article goes further and explains several different approaches that can be faster than a pure play parallel processing approach. Custom computing architectures such as the Japanese MDGrape petascale supercomputer at the Riken research institute are designed specifically for n-body type problems. MDGRAPE-3 was specifically developed for molecular dynamics simulations such as predicting protein structure where interacting forces between molecules dictate protein formation and folding.MDGRAPE-3 is of general interest for its use of parallel processing hardware coupled with standard PC processors for data loading and node management. (Not dissimilar to how we are using modern game GPU graphics hardware for dedicated parallel processing and the CPU for loading data and user interface apps).

Another approach is based on modifying the underpinning algorithm used to solve the problem. In the case of n-body problems – for many systems – distant bodies may exert little influence. As a result it is sometimes possible to treat a group of distant objects as a single entity with corresponding mass and center of mass and compute the aggregate in the simulation. This is approach is generally known as a Barnes-Hut optimization with the simulation volume being divided up into cubic cells via an octree. A recursive algorithm is used to build the tree, then the force calculations can be performed against the largest portions of the tree that meet certain distance criteria. This can dramatically reduce the number of object pair interactions that must be computed. In the example discussed by Steve Jenks, using the Barnes-Hut optimization reduces the number of force calculations from 2.5 billion to fewer than 3 million. Steve also discusses the challenges that implementation of the optimization brings. The final output of the optimization Steve and his students came up with showed CUDA provides an order of magnitude improvement over OpenMP on Core2Quad for the O(n^2) case, but the O(n log n) Barnes-Hut version is an order of magnitude faster than CUDA.

Why is this of interest to us – well when we bridge micro and macro simulations we need means of effectively structuring the simulation. Our micro simulations comprise thousands (and eventually millions) of interacting objects that need to feed into macro level emergent behaviors and simulation structures. The existing work on N-Body simulations provides a framework we can hang micro agent and macro agent based simulation optimization around. Developments highlighted by Steve Jenks indicate that while CUDA and other processes give us powerful processing capabilities careful attention to optimizing the algorithms we use to structure how our agents and entities interact in the simulation offer the potential to significantly improve simulation performance.

## Discussion

## Comments are disallowed for this post.

Comments are closed.