Optimizations to improve performance for reaction rate tallies
Created by: paulromano
The changes in this PR originate from @salcedop's internship at ANL last summer. The main focus of his work was on optimizing the active batch performance of OpenMC, primarily for depletion simulations where reaction rates in each nuclide in fuel regions need to be determined. I've taken a few branches he had, cherry picked relevant commits, and cleaned them up a bit.
Despite the fact that we often discuss MC reactor simulations as being dominated by cross section lookups, for problems with depletion tallies, a lot of time gets spent tallying; profiling that we had done as part of the ECP project revealed that tallying was a major bottleneck. This branch implements three optimizations/improvements for tallies:
When nuclides are specified in a tally, it's necessary to determine at tally-time whether a given nuclide is in the current material. The way this is currently done is using a linear search over the
Material % nuclidearray, which as you can imagine is quite inefficient if there are hundreds of nuclides present. In this branch, we've added a "direct address table", which is basically a hash table where the keys are integers that can be used as indices into an array directly, that maps an index in the global
nuclidesarray into the corresponding position in the
Material % nuclidearray. So, if
nuclides(5)corresponds to U235 and U235 is the 10th nuclide listed in
materials(4)then we'd have
materials(4) % mat_nuclide_index(5) = 10. Empty slots in the table are just stored as zeros.
For each nuclide,
reaction_indexwas a dictionary that mapped MT values to indices in the
nuclide % reactionsarray. This is used at tally-time to calculate reaction rates for reactions that are not pre-calculated (like fission). In this branch, we've changed reaction_index from a hash table to a direct address table.
One of my ideas was to pre-calculate (cache) cross sections for depletion reactions like (n,2n), (n,gamma), etc. rather than calculating them at tally-time. This can be beneficial because if a single track crosses multiple fuel materials containing the same nuclide, there's no reason we should be recalculating the same cross section. This branch now calculates all depletion reaction cross sections in cross_section.F90 (but only for active batches) and uses the calculated values at tally-time.
Altogether, these changes make for a substantial improvement in active batch performance for depletion problems. @cjosey sent me a reference pin-cell problem a while ago that has 10 rings in fuel and something like 400 nuclides in each fuel regions, i.e., it's a worst case scenario for depletion tallies. His measurements (as well as mine) showed that the develop branch had about a 10x slowdown in active batches on this problem. I just repeated these measurements on a node with two Intel Xeon Platinum 8176 processors and obtained the following performance results on this benchmark problem:
Calculation Rate (inactive) = 12951.0 neutrons/second Calculation Rate (active) = 974.623 neutrons/second
Calculation Rate (inactive) = 12651.1 neutrons/second Calculation Rate (active) = 2635.39 neutrons/second
We see that there is a very slight drop in inactive batch performance, on the order of 2% here. However, the active batch performance increases by 270%! Again, this problem is a worst case, so the effect won't be as dramatic for other problems. One of our ECP benchmark problems is a full core SMR model with depleted fuel materials; for this problem, I get the following results:
Calculation Rate (inactive) = 68794.9 neutrons/second Calculation Rate (active) = 15149.1 neutrons/second
Calculation Rate (inactive) = 66865.7 neutrons/second Calculation Rate (active) = 24057.6 neutrons/second
In this model, we see a 3% drop in inactive batch performance but a 59% increase in active batch performance -- not too shabby.