Ensure mutable objects are not hashable
Created by: paulromano
As discussed in #872, we are violating a basic contract in Python regarding equality and hashing for many of our classes. The problem is summarized in Python's documentation of __hash__
:
If a class defines mutable objects and implements an
__eq__()
method, it should not implement__hash__()
, since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
As a demonstration of this:
>>> c = openmc.Cell()
>>> s = {c}
>>> c.id = 10
>>> c in s
False
So, the proper solution to this is to either make our types immutable or to get rid of the __hash__
methods. A second question is whether we should get rid of the __eq__
methods too. In this pull request, I am proposing that we mostly get rid of __eq__
for the following reason: by default, when you override __eq__
, the type becomes unhashable and thus you can't use it in a set or as a dictionary key. I do find it useful to have sets and dictionaries of our types; in fact, we use this in our Plot
class right now:
p = openmc.Plot()
p.colors = {fuel: 'yellow', water: 'blue'}
where fuel
and water
are instances of Cell
(or Material
).
For most classes, removing __hash__
and __eq__
doesn't cause any problems. There are a few cases where we have to be careful though. In some cases, we have been relying on the fact that two mutable objects that are different can actually compare equal and hash to the same value. Namely, filters are cloned often in the openmc.mgxs module and this results in filters that are the same but with different IDs, which currently hash to the same value. If we removed that, then each of the copies of the filters will get written to XML separately and will require a separate search at runtime. So, filters are going to require a little more work; I actually haven't touched them in this PR.
The same thing sort of happens with meshes (we check at XML-writing time to make sure the same mesh doesn't get written twice), but in this case, we really are just trying to avoid multiple references to the same mesh resulting in it getting written multiple times. I made a small change so that we just check mesh IDs when writing XML so that the same mesh isn't written twice.
The only classes that I've substantially changed are Nuclide
, Element
, and Macroscopic
, which I've changed into immutable subclasses of str
. There's no good reason for these to be mutable and in fact they are really not providing much for us over a normal string. The only extra attribute we were using is the iso-in-lab stuff for nuclides; I've changed this to be an attribute of a material instead (I've also made a memory optimization on the Fortran side where the material % p0
array is not allocated unless it's actually being used). Eventually, we should just be able to get rid of these three classes altogether.
One related change I made in Material
is to have add_element
immediately add individual isotopes rather than deferring this to XML-writing time. This was done for performance reasons (namely, if a material with elements is cloned, then each of the clones will have to have their elements expanded at XML writing time). This simplifies the internal representation of materials and makes it more-or-less match the Fortran side.