darshan randomly fails to detect usernames on cori/edison
This ticket was split off from issue #201 (closed) so that we could troubleshoot and track the issue separately for Cori/Edison systems at NERSC. See the original ticket for more details.
In summary, Darshan is randomly failing to detect the username associated with a job on these systems and instead generates a log file with the user's EUID for certain jobs. This issue occurs with both 2.x and 3.x versions of Darshan. Darshan has attempted to determine what the username is using the following 3 methods, in order:
1.) cuserid()
2.) getenv("LOGNAME")
3.) geteuid()
(which returns the numeric euid of a user rather than the string user name)
The cuserid()
option has typically been disabled on Cray systems due to it causing strange errors in the past, but this has not been tested in around 5 years or so. So, our only chance of getting a string username is if the environment has LOGNAME
defined to the username.
It turns out that if a user explicitly passes environment variables to srun
using the --export
switch, it wipes much of the environment variables that are propagated to compute nodes, including LOGNAME. Users who use the --export
switch, will therefore get log file names with their euids rather than usernames.