Run_output.txt 7.97 KB
Newer Older
Richard Berger's avatar
Richard Berger committed
1
"Higher level section"_Run_head.html - "LAMMPS WWW Site"_lws - "LAMMPS
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
Documentation"_ld - "LAMMPS Commands"_lc :c

:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Commands_all.html)

:line

Screen and logfile output :h3

As LAMMPS reads an input script, it prints information to both the
screen and a log file about significant actions it takes to setup a
simulation.  When the simulation is ready to begin, LAMMPS performs
various initializations, and prints info about the run it is about to
perform, including the amount of memory (in MBytes per processor) that
the simulation requires.  It also prints details of the initial
thermodynamic state of the system.  During the run itself,
thermodynamic information is printed periodically, every few
timesteps.  When the run concludes, LAMMPS prints the final
thermodynamic state and a total run time for the simulation.  It also
appends statistics about the CPU time and storage requirements for the
simulation.  An example set of statistics is shown here:

Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre

Performance: 18.436 ns/day  1.302 hours/ns  106.689 timesteps/s
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre

MPI task timings breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 1.9808     | 2.0134     | 2.0318     |   1.4 | 71.60
Bond    | 0.0021894  | 0.0060319  | 0.010058   |   4.7 |  0.21
Kspace  | 0.3207     | 0.3366     | 0.36616    |   3.1 | 11.97
Neigh   | 0.28411    | 0.28464    | 0.28516    |   0.1 | 10.12
Comm    | 0.075732   | 0.077018   | 0.07883    |   0.4 |  2.74
Output  | 0.00030518 | 0.00042665 | 0.00078821 |   1.0 |  0.02
Modify  | 0.086606   | 0.086631   | 0.086668   |   0.0 |  3.08
Other   |            | 0.007178   |            |       |  0.26 :pre

Nlocal:    501 ave 508 max 490 min
Histogram: 1 0 0 0 0 0 1 1 0 1
Nghost:    6586.25 ave 6628 max 6548 min
Histogram: 1 0 1 0 0 0 1 0 0 1
Neighs:    177007 ave 180562 max 170212 min
Histogram: 1 0 0 0 0 0 0 1 1 1 :pre

Total # of neighbors = 708028
Ave neighs/atom = 353.307
Ave special neighs/atom = 2.34032
Neighbor list builds = 26
Dangerous builds = 0 :pre

:line

The first section provides a global loop timing summary. The {loop
time} is the total wall-clock time for the simulation to run.  The
{Performance} line is provided for convenience to help predict how
long it will take to run a desired physical simulation.  The {CPU use}
line provides the CPU utilization per MPI task; it should be close to
100% times the number of OpenMP threads (or 1 of not using OpenMP).
Lower numbers correspond to delays due to file I/O or insufficient
thread utilization.

:line

The {MPI task} section gives the breakdown of the CPU run time (in
seconds) into major categories:

{Pair} = non-bonded force computations
{Bond} = bonded interactions: bonds, angles, dihedrals, impropers
{Kspace} = long-range interactions: Ewald, PPPM, MSM
{Neigh} = neighbor list construction
{Comm} = inter-processor communication of atoms and their properties
{Output} = output of thermodynamic info and dump files
{Modify} = fixes and computes invoked by fixes
{Other} = all the remaining time :ul

For each category, there is a breakdown of the least, average and most
amount of wall time any processor spent on this category of
computation.  The "%varavg" is the percentage by which the max or min
varies from the average.  This is an indication of load imbalance.  A
percentage close to 0 is perfect load balance.  A large percentage is
imbalance.  The final "%total" column is the percentage of the total
loop time is spent in this category.

When using the "timer full"_timer.html setting, an additional column
is added that also prints the CPU utilization in percent. In addition,
when using {timer full} and the "package omp"_package.html command are
active, a similar timing summary of time spent in threaded regions to
monitor thread utilization and load balance is provided. A new {Thread
timings} section is also added, which lists the time spent in reducing
the per-thread data elements to the storage for non-threaded
computation. These thread timings are measured for the first MPI rank
only and and thus, because the breakdown for MPI tasks can change from
MPI rank to MPI rank, this breakdown can be very different for
individual ranks. Here is an example output for this section:

Thread timings breakdown (MPI rank 0):
Total threaded time 0.6846 / 90.6%
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.5127     | 0.5147     | 0.5167     |   0.3 | 75.18
Bond    | 0.0043139  | 0.0046779  | 0.0050418  |   0.5 |  0.68
Kspace  | 0.070572   | 0.074541   | 0.07851    |   1.5 | 10.89
Neigh   | 0.084778   | 0.086969   | 0.089161   |   0.7 | 12.70
Reduce  | 0.0036485  | 0.003737   | 0.0038254  |   0.1 |  0.55 :pre

:line

The third section above lists the number of owned atoms (Nlocal),
ghost atoms (Nghost), and pair-wise neighbors stored per processor.
The max and min values give the spread of these values across
processors with a 10-bin histogram showing the distribution. The total
number of histogram counts is equal to the number of processors.

:line

The last section gives aggregate statistics (across all processors)
for pair-wise neighbors and special neighbors that LAMMPS keeps track
of (see the "special_bonds"_special_bonds.html command).  The number
of times neighbor lists were rebuilt is tallied, as is the number of
potentially {dangerous} rebuilds.  If atom movement triggered neighbor
list rebuilding (see the "neigh_modify"_neigh_modify.html command),
then dangerous reneighborings are those that were triggered on the
first timestep atom movement was checked for.  If this count is
non-zero you may wish to reduce the delay factor to insure no force
interactions are missed by atoms moving beyond the neighbor skin
distance before a rebuild takes place.

:line

If an energy minimization was performed via the
"minimize"_minimize.html command, additional information is printed,
e.g.

Minimization stats:
  Stopping criterion = linesearch alpha is zero
  Energy initial, next-to-last, final =
         -6372.3765206     -8328.46998942     -8328.46998942
  Force two-norm initial, final = 1059.36 5.36874
  Force max component initial, final = 58.6026 1.46872
  Final line search alpha, max atom move = 2.7842e-10 4.0892e-10
  Iterations, force evaluations = 701 1516 :pre

The first line prints the criterion that determined minimization was
converged. The next line lists the initial and final energy, as well
as the energy on the next-to-last iteration.  The next 2 lines give a
measure of the gradient of the energy (force on all atoms).  The
2-norm is the "length" of this 3N-component force vector; the largest
component (x, y, or z) of force (infinity-norm) is also given.  Then
information is provided about the line search and statistics on how
many iterations and force-evaluations the minimizer required.
Multiple force evaluations are typically done at each iteration to
perform a 1d line minimization in the search direction.  See the
"minimize"_minimize.html doc page for more details.

:line

If a "kspace_style"_kspace_style.html long-range Coulombics solver
that performs FFTs was used during the run (PPPM, Ewald), then
additional information is printed, e.g.

FFT time (% of Kspce) = 0.200313 (8.34477)
FFT Gflps 3d 1d-only = 2.31074 9.19989 :pre

The first line is the time spent doing 3d FFTs (several per timestep)
and the fraction it represents of the total KSpace time (listed
above).  Each 3d FFT requires computation (3 sets of 1d FFTs) and
communication (transposes).  The total flops performed is 5Nlog_2(N),
where N is the number of points in the 3d grid.  The FFTs are timed
with and without the communication and a Gflop rate is computed.  The
3d rate is with communication; the 1d rate is without (just the 1d
FFTs).  Thus you can estimate what fraction of your FFT time was spent
in communication, roughly 75% in the example above.