Moving CMFD to Python through C API  Replication and performance tests
Created by: shikhar413
This PR pushes the CMFD code I have rewritten in Python through the C API, with the purpose of showing performance results and demonstrating that it replicates the code written currently in Fortran. The purpose of this PR is not to merge this branch into the develop branch, but to provide a way to compare the Fortran and Python code sidebyside and also to figure out any design changes that need to be made to openmc/openmc/cmfd.py
. Once openmc/openmc/cmfd.py
is reviewed, I will create a separate PR that removes Fortran CMFD source code completely and update the test suite to call CMFD through the C API instead of through Fortran.
While I will be describing at length some of the design details and performance results, here is a summary of the main observations from this implementation:
 The CMFD code written in Python now relies on all CMFD of the input parameters being defined directly through Python through the
CMFDRun
class instead of creating acmfd.xml
file. It should work with both OpenMPbased and MPIbased parallelism  CMFD equations are now solved through the scipy sparse matrix solver, which results in faster performance for mediumsized problems, but a noticeable performance hit for larger problems, when compared to a simple Fortranbased redblack GS solver. In the future, it may be instructive to include a C++ based matrix solver to potentially speed up CMFD calculations, since this portion of the code is where the majority of the time is spent in CMFD.
 The majority of variables used to populate the CMFD matrices are numpy arrays. To the best of my ability, I tried to define functions that populate / modify these arrays in a vectorized manner, but this form of implementation results in complicated code that runs faster but is not easily decipherable to someone unfamiliar with numpy vectorization or CMFD. To get around this, I have included vectorized and unvectorized (forloop based) implementations of functions such as
compute_dhat
,compute_dtilde
,build_matrices
, andcalc_fission_source
for easier readability
General C APIbased CMFD algorithm
The core of the CMFD algorithm can be found in the run()
function of the CMFDRun class. To run CMFD the traditional way, cmfd_run
should be set to true in settings.xml
and a corresponding cmfd.xml
with CMFD parameters should be defined. To run CMFD through the C API, cmfd_run
should be false, and all CMFD parameters are defined in a separate python file with the following template:
import openmc
from mpi4py import MPI
comm = MPI.COMM_WORLD
# Define CMFD mesh
cmfd_mesh = openmc.CMFDMesh()
#Define CMFD mesh parameters
...
# Initialize CMFDRun object
cmfd_run = openmc.CMFDRun()
# Set all runtime parameters (cmfd_mesh, tolerances, tally_resets, etc) through CMFDRun instance
cmfd_run.cmfd_mesh = cmfd_mesh
...
# Run CMFD
cmfd_run.run()
Examples of these different run configurations can be found in the openmc/tests/cmfd_testing
directory. It should be noted that error checking for CMFD parameters happens when setting the instance variables of the CMFDRun class. In the final PR, the CMFD class in openmc/cmfd.py
will be removed completely, but it is included for now so that CMFD xml files can still be created through the Python API. All other parts of the code that will be removed for the following PR are marked with TODO remove
.
Numpy vectorization tradeoffs
In order to populate the CMFD matrices that will be passed into the matrix solver, intermediate multidimensional Numpy array that define the multigroup cross sections and diffusion parameters are created. These Numpy arrays also need to be populated based on the CMFD tallies, and they are defined with bulk vectorized operations wherever possible. For functions such as neutron_balance
, vectorized definitions simplify the function definitions so as not to rely on for loops any more, but other functions are much more difficult to understand because these Numpy arrays being defined do not lend themselves to straightforward vectorized definitions. In compute_dhat
and compute_dtilde
, the dhat and dtilde arrays are populated first by definining values at each array boundary, and then defining all inner values, since these quantities are defined differently where cell boundaries do and don't exist. While I tried to create some intermediate Numpy arrays to clarify these vectorized calculations, I did try to find a balance between creating too many arrays to reduce memory costs, since each intermediate array definition creates an additional copy of a Numpy array in memory. In a lot of cases, these vectorized functions becomes difficult to understand/follow, so I included a nonvectorized form of these definitions that relies on for loops and looks very similar to the current Fortran definitions. CMFD calls that utilize vectorized numpy definitions can be toggled on using the optional vectorized=True
input argument (set to true by default) when calling CMFDRun.run(). As shown in the next section, vectorized runs perform faster than nonvectorized runs, but are much less readable. calc_fission_source
and build_matrices
are two more examples of such functions. In calc_fission_source
, the 1dimension phi vector that results from solving the CMFD equations has to be mapped back to a multidimensional cmfd_flux vector that accounts for the nonaccelerated regions in a vectorized manner (phi
contains only the regions that are defined as "accelerated" in the CMFD coremap). build_matrices
tries to build CMFD matrices diagonal by diagonal in a vectorized manner as csr (compressed sparse row) matrices. I am very open to any suggestions to make these vectorized functions more readable.
Performance results
I have created test cases for 1 group problems to see the effect of problem size on runtime. This is done for 1D, 2D, and 3D problems, with 100,000 particles, 15 inactive cycles, 20 total cycles, and CMFD feed turned on at batch 5. Vacuum boundary conditions are imposed on significant dimensions. The scripts can be found in openmc/examples/cmfd_testing/
. All of these tests are conducted on one process, since the CMFD runs on a single process. While these tests also analyze the behavior of the number of threads on performance, there was no significant change in CMFD runtimes with this approach, since multithreading is not supported for CMFD calculations. I am currently gathering simulation results by running CMFD on the 2D BEAVRS problem to illustrate that 2group CMFD also works, but these results are still pending. Here are the plots that compare performance when running CMFD using Fortran and Python, where the Python tests are also distinguished between using vectorized Numpy definitions, and using traditional for loops:

These plots compare the total simulation run time for the 1d, 2d, and 3d cases respectively

These plots compare the total time in CMFD for the 1d, 2d, and 3d cases respectively

These plots compare the time building CMFD matrices for the 1d, 2d, and 3d cases respectively

These plots compare the time solving CMFD matrices for the 1d, 2d, and 3d cases respectively

Finally, runtimes for
calc_fission_source
as well as the cumulative time incompute_dhat
andcompute_dtilde
are compared in Python when running both in vectorized manner and nonvectorized manner
It should be noted that all times are plotted on logscale. From these plots, the following conclusions can be drawn:
 Solving CMFD matrices takes the bulk of the time in CMFD for more difficult problems
 The Python implementation is faster than Fortran for smaller problems, but the default Python matrix solver runs much slower for larger problems and makes the current Python implementation slower for larger problems. Working on a C++ based matrix solver that couples with CMFD could potentially speed up calculations, since solving CMFD matrices constitutes the majority of the time spent in CMFD.
 A vectorized approach in Numpy is faster than a nonvectorized for loopbased implementation, but the code is messier and more difficult to follow.
Miscellaneous Implementation Details
 With the conversion of MPI code to C++, I am running into segfault issues when running 2D BEAVRS with a large number of particles. This issue is avoided by dynamically allocating the variable
temp_sites
ineigenvalue.cpp
instead of statically allocating it, and I will address this in the next PR once I figure out how best to address this issue. I decided to include a temporary fix here just so that all test cases would run.  Currently, all CMFD functions are contained within a single CMFDRun class, for ease of implementation instead of having to worry about passing a large set of variables from one subclass to another. Do let me know if there is opposition to such an approach.
 Indexing of the CMFD coremap is switched from 1 as nonaccelerated and 2 as accelerated in Fortran to 0 as nonaccelerated and 1 as accelerated in Python. With all matrix equations solved using Python's scipy sparse solver, Gauss Seidel tolerance variables (rtoli, atoli) are no longer needed.
 Instead of exposing the function
count_bank_sites
through the C API, which relies on MPI calls that could result in erratic behavior through the C API, this function was recreated in Python, resulting in a minor duplication of code.  The variable
cmfd_src
is broadcasted to all processes in Fortran, but this is not necessary and is not done on the Python side  MPI calls are made through Python using mpi4py. If mpi4py is not installed on the local machine, the
have_mpi
variable defined at the beginning ofopenmc/cmfd.py
is set to False and MPI calls are not made.  Since there is no way currently to control the output columns through the C API, there is no way to append the columns related to CMFD output, so all CMFD variables that need to be outputted at the end of each batch are displayed on separate lines.
 When write_matrices is set to true in the CMFDRun class, production and loss matrices as well as the CMFD flux vector are written both to file and as .npz/.npy files. These variables can be loaded in Python through numpy.load().
 Currently, numpy divide warnings are suppressed with
np.seterr(divide='ignore', invalid='ignore')
. These warnings appear as a result of having zero values in the divisor vector, and while I do handle this case by setting the output to zero for cases when the divisor is zero, numpy.divide will still carry out the divide, and hence the warning thrown.  Since the Fortran version of CMFD uses the keffective value at each generation as an initial guess when entering the power iteration step of solving the CMFD equations, I created a
keff_temp
C API function that always returns the keffective at each generation instead of the average keffective. While this does not necessary impact the results since the final CMFD flux vector is normalized, there will be minor differences in the flux vector. I wanted to address this change here for exact replication, but in the final PR I will be using the keff function defined through the C API  All CMFDrelated errors throw an OpenMCError. This could instead be changed to a CMFDError type instead as well.
 While the Fortran version of CMFD supports running OpenMC in MG mode, this will result in an error when running the Python version, since supporting such an implementation would require exposing additional variables which seems unnecessary for such a specific use case.
 The current Python implementation version does not support plotting cmfd meshes, saving CMFD variables to statepoint, and running CMFD from statepoint, and this is all part of future work.