Moving CMFD to Python through C API - Replication and performance tests
Created by: shikhar413
This PR pushes the CMFD code I have rewritten in Python through the C API, with the purpose of showing performance results and demonstrating that it replicates the code written currently in Fortran. The purpose of this PR is not to merge this branch into the develop branch, but to provide a way to compare the Fortran and Python code side-by-side and also to figure out any design changes that need to be made to
openmc/openmc/cmfd.py is reviewed, I will create a separate PR that removes Fortran CMFD source code completely and update the test suite to call CMFD through the C API instead of through Fortran.
While I will be describing at length some of the design details and performance results, here is a summary of the main observations from this implementation:
- The CMFD code written in Python now relies on all CMFD of the input parameters being defined directly through Python through the
CMFDRunclass instead of creating a
cmfd.xmlfile. It should work with both OpenMP-based and MPI-based parallelism
- CMFD equations are now solved through the scipy sparse matrix solver, which results in faster performance for medium-sized problems, but a noticeable performance hit for larger problems, when compared to a simple Fortran-based red-black GS solver. In the future, it may be instructive to include a C++ based matrix solver to potentially speed up CMFD calculations, since this portion of the code is where the majority of the time is spent in CMFD.
- The majority of variables used to populate the CMFD matrices are numpy arrays. To the best of my ability, I tried to define functions that populate / modify these arrays in a vectorized manner, but this form of implementation results in complicated code that runs faster but is not easily decipherable to someone unfamiliar with numpy vectorization or CMFD. To get around this, I have included vectorized and un-vectorized (for-loop based) implementations of functions such as
calc_fission_sourcefor easier readability
General C API-based CMFD algorithm
The core of the CMFD algorithm can be found in the
run() function of the CMFDRun class. To run CMFD the traditional way,
cmfd_run should be set to true in
settings.xml and a corresponding
cmfd.xml with CMFD parameters should be defined. To run CMFD through the C API,
cmfd_run should be false, and all CMFD parameters are defined in a separate python file with the following template:
import openmc from mpi4py import MPI comm = MPI.COMM_WORLD # Define CMFD mesh cmfd_mesh = openmc.CMFDMesh() #Define CMFD mesh parameters ... # Initialize CMFDRun object cmfd_run = openmc.CMFDRun() # Set all runtime parameters (cmfd_mesh, tolerances, tally_resets, etc) through CMFDRun instance cmfd_run.cmfd_mesh = cmfd_mesh ... # Run CMFD cmfd_run.run()
Examples of these different run configurations can be found in the
openmc/tests/cmfd_testing directory. It should be noted that error checking for CMFD parameters happens when setting the instance variables of the CMFDRun class. In the final PR, the CMFD class in
openmc/cmfd.py will be removed completely, but it is included for now so that CMFD xml files can still be created through the Python API. All other parts of the code that will be removed for the following PR are marked with
Numpy vectorization tradeoffs
In order to populate the CMFD matrices that will be passed into the matrix solver, intermediate multidimensional Numpy array that define the multigroup cross sections and diffusion parameters are created. These Numpy arrays also need to be populated based on the CMFD tallies, and they are defined with bulk vectorized operations wherever possible. For functions such as
neutron_balance, vectorized definitions simplify the function definitions so as not to rely on for loops any more, but other functions are much more difficult to understand because these Numpy arrays being defined do not lend themselves to straightforward vectorized definitions. In
compute_dtilde, the dhat and dtilde arrays are populated first by definining values at each array boundary, and then defining all inner values, since these quantities are defined differently where cell boundaries do and don't exist. While I tried to create some intermediate Numpy arrays to clarify these vectorized calculations, I did try to find a balance between creating too many arrays to reduce memory costs, since each intermediate array definition creates an additional copy of a Numpy array in memory. In a lot of cases, these vectorized functions becomes difficult to understand/follow, so I included a non-vectorized form of these definitions that relies on for loops and looks very similar to the current Fortran definitions. CMFD calls that utilize vectorized numpy definitions can be toggled on using the optional
vectorized=True input argument (set to true by default) when calling CMFDRun.run(). As shown in the next section, vectorized runs perform faster than non-vectorized runs, but are much less readable.
build_matrices are two more examples of such functions. In
calc_fission_source, the 1-dimension phi vector that results from solving the CMFD equations has to be mapped back to a multidimensional cmfd_flux vector that accounts for the non-accelerated regions in a vectorized manner (
phi contains only the regions that are defined as "accelerated" in the CMFD coremap).
build_matrices tries to build CMFD matrices diagonal by diagonal in a vectorized manner as csr (compressed sparse row) matrices. I am very open to any suggestions to make these vectorized functions more readable.
I have created test cases for 1 group problems to see the effect of problem size on runtime. This is done for 1D, 2D, and 3D problems, with 100,000 particles, 15 inactive cycles, 20 total cycles, and CMFD feed turned on at batch 5. Vacuum boundary conditions are imposed on significant dimensions. The scripts can be found in
openmc/examples/cmfd_testing/. All of these tests are conducted on one process, since the CMFD runs on a single process. While these tests also analyze the behavior of the number of threads on performance, there was no significant change in CMFD runtimes with this approach, since multithreading is not supported for CMFD calculations. I am currently gathering simulation results by running CMFD on the 2D BEAVRS problem to illustrate that 2-group CMFD also works, but these results are still pending. Here are the plots that compare performance when running CMFD using Fortran and Python, where the Python tests are also distinguished between using vectorized Numpy definitions, and using traditional for loops:
It should be noted that all times are plotted on logscale. From these plots, the following conclusions can be drawn:
- Solving CMFD matrices takes the bulk of the time in CMFD for more difficult problems
- The Python implementation is faster than Fortran for smaller problems, but the default Python matrix solver runs much slower for larger problems and makes the current Python implementation slower for larger problems. Working on a C++ based matrix solver that couples with CMFD could potentially speed up calculations, since solving CMFD matrices constitutes the majority of the time spent in CMFD.
- A vectorized approach in Numpy is faster than a non-vectorized for loop-based implementation, but the code is messier and more difficult to follow.
Miscellaneous Implementation Details
- With the conversion of MPI code to C++, I am running into segfault issues when running 2D BEAVRS with a large number of particles. This issue is avoided by dynamically allocating the variable
eigenvalue.cppinstead of statically allocating it, and I will address this in the next PR once I figure out how best to address this issue. I decided to include a temporary fix here just so that all test cases would run.
- Currently, all CMFD functions are contained within a single CMFDRun class, for ease of implementation instead of having to worry about passing a large set of variables from one subclass to another. Do let me know if there is opposition to such an approach.
- Indexing of the CMFD coremap is switched from 1 as non-accelerated and 2 as accelerated in Fortran to 0 as non-accelerated and 1 as accelerated in Python. -With all matrix equations solved using Python's scipy sparse solver, Gauss Seidel tolerance variables (rtoli, atoli) are no longer needed.
- Instead of exposing the function
count_bank_sitesthrough the C API, which relies on MPI calls that could result in erratic behavior through the C API, this function was recreated in Python, resulting in a minor duplication of code.
- The variable
cmfd_srcis broadcasted to all processes in Fortran, but this is not necessary and is not done on the Python side
- MPI calls are made through Python using mpi4py. If mpi4py is not installed on the local machine, the
have_mpivariable defined at the beginning of
openmc/cmfd.pyis set to False and MPI calls are not made.
- Since there is no way currently to control the output columns through the C API, there is no way to append the columns related to CMFD output, so all CMFD variables that need to be outputted at the end of each batch are displayed on separate lines.
- When write_matrices is set to true in the CMFDRun class, production and loss matrices as well as the CMFD flux vector are written both to file and as .npz/.npy files. These variables can be loaded in Python through numpy.load().
- Currently, numpy divide warnings are suppressed with
np.seterr(divide='ignore', invalid='ignore'). These warnings appear as a result of having zero values in the divisor vector, and while I do handle this case by setting the output to zero for cases when the divisor is zero, numpy.divide will still carry out the divide, and hence the warning thrown.
- Since the Fortran version of CMFD uses the k-effective value at each generation as an initial guess when entering the power iteration step of solving the CMFD equations, I created a
keff_tempC API function that always returns the k-effective at each generation instead of the average k-effective. While this does not necessary impact the results since the final CMFD flux vector is normalized, there will be minor differences in the flux vector. I wanted to address this change here for exact replication, but in the final PR I will be using the keff function defined through the C API
- All CMFD-related errors throw an OpenMCError. This could instead be changed to a CMFDError type instead as well.
- While the Fortran version of CMFD supports running OpenMC in MG mode, this will result in an error when running the Python version, since supporting such an implementation would require exposing additional variables which seems unnecessary for such a specific use case.
- The current Python implementation version does not support plotting cmfd meshes, saving CMFD variables to statepoint, and running CMFD from statepoint, and this is all part of future work.