darshan_common_val_counter segfaults when common_val_count >= DARSHAN_COMMON_VAL_MAX_RUNTIME_COUNT
I have an application that segfaults inside darshan_common_val_counter()
as a result of (apparently) *common_val_count
reaching DARSHAN_COMMON_VAL_MAX_RUNTIME_COUNT
. The found
pointer is never defined, so by the time the following code is reached at the bottom of darshan_common_val_counter()
:
/* update common access counters as we go */
DARSHAN_COMMON_VAL_COUNTER_INC(common_val_p, common_cnt_p,
found->val, found->freq, 1);
found
remains undefined and catastrophe results.
I'm not clear on how this condition of *common_val_count == DARSHAN_COMMON_VAL_MAX_RUNTIME_COUNT
is reached, but found
definitely is dereferenced before it is defined when it happens.
I can provide my application source if it would help. This bug happened using the 3.1.1 tag on both Cori/KNL and Cori/Haswell at NERSC.