LuaJIT memory profiler
Starting from version 2.7.1, Tarantool
has a built‑in module called misc.memprof
that implements a LuaJIT memory
profiler (which we will just call the profiler in this section). The profiler provides
a memory allocation report that helps analyze Lua code and find the places
that put the most pressure on the Lua garbage collector (GC).
Working with the profiler
Usage of the profiler involves two steps:
- Collecting a binary profile of allocations,
reallocations, and deallocations in memory related to Lua
(further, binary memory profile or binary profile for short).
- Parsing the collected binary profile to get
a human-readable profiling report.
Collecting binary profile
To collect a binary profile for a particular part of the Lua code,
you need to place this part between two misc.memprof
functions,
namely, misc.memprof.start()
and misc.memprof.stop()
, and then execute
the code under Tarantool.
Below is a chunk of Lua code named test.lua
to illustrate this.
1-- Prevent allocations on traces.
2jit.off()
3local str, err = misc.memprof.start("memprof_new.bin")
4-- Lua doesn't create a new frame to call string.rep, and all allocations
5-- are attributed not to the append() function but to the parent scope.
6local function append(str, rep)
7 return string.rep(str, rep)
8end
9
10local t = {}
11for i = 1, 1e4 do
12 -- table.insert is the built-in function and all corresponding
13 -- allocations are reported in the scope of the main chunk.
14 table.insert(t,
15 append('q', i)
16 )
17end
18local stp, err = misc.memprof.stop()
The Lua code for starting the profiler – as in line 3 in the test.lua example above – is:
local str, err = misc.memprof.start(FILENAME)
where FILENAME
is the name of the binary file where profiling events are written.
If the operation fails,
for example if it is not possible to open a file for writing or if the profiler is already running,
misc.memprof.start()
returns nil
as the first result,
an error-message string as the second result,
and a system-dependent error code number as the third result.
If the operation succeeds, misc.memprof.start()
returns true
.
The Lua code for stopping the profiler – as in line 18 in the test.lua example above – is:
local stp, err = misc.memprof.stop()
If the operation fails,
for example if there is an error when the file descriptor is being closed
or if there is a failure during reporting,
misc.memprof.stop()
returns nil
as the first result,
an error-message string as the second result,
and a system-dependent error code number as the third result.
If the operation succeeds, misc.memprof.stop()
returns true
.
To generate the file with memory profile in binary format
(in the test.lua code example above
the file name is memprof_new.bin
), execute the code under Tarantool:
$ tarantool test.lua
Tarantool collects the allocation events in memprof_new.bin
, puts
the file in its working directory, and closes
the session.
The test.lua code example above also illustrates the memory
allocation logic in some cases that are important to understand for
reading and analyzing
a profiling report:
- Line 2: It is recommended to switch the JIT compilation off by calling
jit.off()
before the profiler start. Refer to the following
note about jitoff for more details.
- Lines 6-8: Tail call optimization doesn’t create a new call frame, so all
allocations inside the function called via the
CALLT/CALLMT
bytecodes
are attributed to the function’s caller. See also the comments preceding these lines.
- Lines 14-16: Usually the information about allocations inside Lua built‑ins
is not really
useful for developers. That’s why if a Lua built‑in function is called from
a Lua function, the profiler attributes all allocations to the Lua function.
Otherwise, this event is attributed to a C function.
See also the comments preceding these lines.
Parsing binary profile and generating profiling report
After getting the memory profile in binary format, the next step is
to parse it to get a human-readable profiling report. You can do this
via Tarantool by using the following command
(mind the hyphen -
before the filename):
$ tarantool -e 'require("memprof")(arg)' - memprof_new.bin
where memprof_new.bin
is the binary profile
generated earlier by tarantool test.lua
.
(Warning: there is a slight behavior change here, the tarantool -e ...
command was slightly different in Tarantool versions prior to Tarantool 2.8.1.)
Tarantool generates a profiling report and displays it on the console before closing
the session:
ALLOCATIONS
@test.lua:14: 10000 events +50240518 bytes -0 bytes
@test.lua:9: 1 events +32 bytes -0 bytes
@test.lua:8: 1 events +20 bytes -0 bytes
@test.lua:13: 1 events +24 bytes -0 bytes
REALLOCATIONS
@test.lua:13: 13 events +262216 bytes -131160 bytes
Overrides:
@test.lua:13
@test.lua:14: 11 events +49536 bytes -24768 bytes
Overrides:
@test.lua:14
INTERNAL
INTERNAL: 3 events +8448 bytes -16896 bytes
Overrides:
@test.lua:14
DEALLOCATIONS
INTERNAL: 1723 events +0 bytes -483515 bytes
@test.lua:14: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@test.lua:14 holds 50248326 bytes: 10010 allocs, 10 frees
@test.lua:13 holds 131080 bytes: 14 allocs, 13 frees
INTERNAL holds 8448 bytes: 3 allocs, 3 frees
@test.lua:9 holds 32 bytes: 1 allocs, 0 frees
@test.lua:8 holds 20 bytes: 1 allocs, 0 frees
Note
On macOS, a report will be different for the same chunk of code because
Tarantool and LuaJIT are built with the GC64 mode enabled for macOS.
Let’s examine the report structure. A report has four sections:
- ALLOCATIONS
- REALLOCATIONS
- DEALLOCATIONS
- HEAP SUMMARY (described later in The heap summary and the –leak-only option)
Each section contains event records that are sorted from the most frequent
to the least frequent.
An event record has the following format:
@<filename>:<line_number>: <number_of_events> events +<allocated> bytes -<freed> bytes
- <filename>—a name of the file containing Lua code.
- <line_number>—the line number where the event is detected.
- <number_of_events>—a number of events for this code line.
- +<allocated> bytes—amount of memory allocated during all the events on this line.
- -<freed> bytes—amount of memory freed during all the events on this line.
The Overrides
label shows what allocation has been overridden.
See the test.lua chunk above
with the explanation in the comments for some examples.
The INTERNAL
label indicates that this event is caused by internal LuaJIT
structures.
Note
Important note regarding the INTERNAL
label and the recommendation
of switching the JIT compilation off (jit.off()
): this version of the
profiler doesn’t support verbose reporting for allocations on
traces.
If memory allocations are made on a trace,
the profiler can’t associate the allocations with the part of Lua code
that generated the trace. In this case, the profiler labels such allocations
as INTERNAL
.
So, if the JIT compilation is on,
new traces will be generated and there will be a mixture of events labeled
INTERNAL
in the profiling report: some of them are really caused by
internal LuaJIT structures, but some of them are caused by allocations on
traces.
If you want to have a more definite report without JIT compiler allocations,
call jit.off() before starting the profiling.
And if you want to completely exclude the trace allocations from the report,
remove also the old traces by additionally calling jit.flush()
after
jit.off()
.
Nevertheless, switching the JIT compilation off before the profiling is not
“a must”. It is rather a recommendation, and in some cases,
for example in a production environment, you may need to keep JIT compilation
on to see the full picture of all the memory allocations.
In this case, the majority of the INTERNAL
events
are most probably caused by traces.
As for investigating the Lua code with the help of profiling reports,
it is always code-dependent and there can’t be hundred per cent definite
recommendations in this regard. Nevertheless, you can see some of the things
in the Profiling report analysis example later.
Also, below is the FAQ section with the questions that
most probably can arise while using profiler.
FAQ
In this section, some profiler-related points are discussed in
a Q&A format.
Question (Q): Is the profiler suitable for C allocations or allocations
inside C code?
Answer (A): The profiler reports only allocation events caused by the Lua
allocator. All Lua-related allocations, like table or string creation
are reported. But the profiler doesn’t report allocations made by malloc()
or other non-Lua allocators. You can use valgrind
to debug them.
Q: Why are there so many INTERNAL
allocations in my profiling report?
What does it mean?
A: INTERNAL
means that these allocations/reallocations/deallocations are
related to the internal LuaJIT structures or are made on traces.
Currently, the profiler doesn’t verbosely report allocations of objects
that are made during trace execution. Try adding jit.off()
before the profiler start.
Q: Why are there some reallocations/deallocations without an Overrides
section?
A: These objects can be created before the profiler starts. Adding
collectgarbage()
before the profiler’s start enables collecting all
previously allocated objects that are dead when the profiler starts.
Q: Why are some objects not collected during profiling? Is it
a memory leak?
A: LuaJIT uses incremental Garbage Collector (GC). A GC cycle may not be
finished at the moment the profiler stops. Add collectgarbage()
before
stopping the profiler to collect all the dead objects for sure.
Q: Can I profile not just a current chunk but the entire running application?
Can I start the profiler when the application is already running?
A: Yes. Here is an example of code that can be inserted in the Tarantool
console for a running instance.
1local fiber = require "fiber"
2local log = require "log"
3
4fiber.create(function()
5 fiber.name("memprof")
6
7 collectgarbage() -- Collect all objects already dead
8 log.warn("start of profile")
9
10 local st, err = misc.memprof.start(FILENAME)
11 if not st then
12 log.error("failed to start profiler: %s", err)
13 end
14
15 fiber.sleep(TIME)
16
17 collectgarbage()
18 st, err = misc.memprof.stop()
19
20 if not st then
21 log.error("profiler on stop error: %s", err)
22 end
23
24 log.warn("end of profile")
25end)
where
FILENAME
—the name of the binary file where profiling events are written
TIME
—duration of profiling, in seconds.
Also, you can directly call misc.memprof.start()
and misc.memprof.stop()
from a console.
Profiling report analysis example
In the example below, the following Lua code named format_concat.lua
is
investigated with the help of the memory profiler reports.
1-- Prevent allocations on new traces.
2jit.off()
3
4local function concat(a)
5 local nstr = a.."a"
6 return nstr
7end
8
9local function format(a)
10 local nstr = string.format("%sa", a)
11 return nstr
12end
13
14collectgarbage()
15
16local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
17
18local st, err = misc.memprof.start(binfile)
19assert(st, err)
20
21-- Payload.
22for i = 1, 10000 do
23 local f = format(i)
24 local c = concat(i)
25end
26collectgarbage()
27
28local st, err = misc.memprof.stop()
29assert(st, err)
30
31os.exit()
When you run this code under Tarantool and
then parse the binary memory profile
in /tmp/memprof_format_concat.bin,
you will get the following profiling report:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 1 events +65536 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19996 events +0 bytes -558778 bytes
Overrides:
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 65536 bytes: 1 allocs, 0 frees
Reasonable questions regarding the report can be:
- Why are there no allocations related to the
concat()
function?
- Why is the number of allocations not a round number?
- Why are there about 20K allocations instead of 10K?
First of all, LuaJIT doesn’t create a new string if the string with the same
payload exists (see details on lua-users.org/wiki).
This is called string interning.
So, when a string is
created via the format()
function, there is no need to create the same
string via the concat()
function, and LuaJIT just uses the previous one.
That is also the reason why the number of allocations is not a round number
as could be expected from the cycle operator for i = 1, 10000...
:
Tarantool creates some
strings for internal needs and built‑in modules, so some strings already exist.
But why are there so many allocations? It’s almost twice as big as the expected
amount. This is because the string.format()
built‑in function creates
another string necessary for the %s
identifier, so there are two allocations
for each iteration: for tostring(i)
and for string.format("%sa", string_i_value)
.
You can see the difference in behavior by adding the line
local _ = tostring(i)
between lines 22 and 23.
To profile only the concat()
function, comment out line 23 (which is
local f = format(i)
) and run the profiler. Now the output looks like this:
ALLOCATIONS
@format_concat.lua:5: 10000 events +284411 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 10000 events +0 bytes -218905 bytes
Overrides:
@format_concat.lua:5
@format_concat.lua:5: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@format_concat.lua:5 holds 65536 bytes: 10000 allocs, 9999 frees
Q: But what will change if JIT compilation is enabled?
A: In the code, comment out line 2 (which is
jit.off()
) and run
the profiler . Now there are only 56 allocations in the report, and all other
allocations are JIT-related (see also the related
dev issue):
ALLOCATIONS
@format_concat.lua:5: 56 events +1112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 2 events +382 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 58 events +0 bytes -1164 bytes
Overrides:
@format_concat.lua:5
INTERNAL
HEAP SUMMARY:
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
INTERNAL holds 360 bytes: 2 allocs, 1 frees
This happens because a trace has been compiled after 56 iterations (the default
value of the hotloop
compiler parameter). Then, the
JIT-compiler removed the unused variable c
from the trace, and, therefore,
the dead code of the concat()
function is eliminated.
Next, let’s profile only the format()
function with JIT enabled.
For that, comment out lines 2 and 24 (jit.off()
and
local c = concat(i)
), do not comment out line 23
(local f = format(i)
), and run the profiler.
Now the output will look like this:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 4 events +66928 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19997 events +0 bytes -559034 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 66928 bytes: 4 allocs, 0 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
Q: Why are there so many allocations in comparison to the concat()
function?
A: The answer is simple: the string.format()
function with the %s
identifier is not yet compiled via LuaJIT. So, a trace can’t be recorded and
the compiler doesn’t perform the corresponding optimizations.
If we change the format()
function in lines 9-12 of the
Profiling report analysis example
in the following way
local function format(a)
local nstr = string.format("%sa", tostring(a))
return nstr
end
the profiling report becomes much prettier:
ALLOCATIONS
@format_concat.lua:10: 109 events +2112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 3 events +1206 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 112 events +0 bytes -2460 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
INTERNAL
HEAP SUMMARY:
INTERNAL holds 1144 bytes: 3 allocs, 1 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
The heap summary and the –leak-only option
This feature was added in version 2.8.1.
The end of each display is a HEAP SUMMARY section which looks like this:
@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees
Sometimes a program can cause many deallocations, so
the DEALLOCATION section can become large, so the display is not easy to read.
To minimize output, start the parsing with an extra flag: --leak-only
,
for example
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
When --leak-only`
is used, only the HEAP SUMMARY section is displayed.
Usage of the profiler involves two steps:
- Collecting a binary profile of allocations, reallocations, and deallocations in memory related to Lua (further, binary memory profile or binary profile for short).
- Parsing the collected binary profile to get a human-readable profiling report.
Collecting binary profile
To collect a binary profile for a particular part of the Lua code,
you need to place this part between two misc.memprof
functions,
namely, misc.memprof.start()
and misc.memprof.stop()
, and then execute
the code under Tarantool.
Below is a chunk of Lua code named test.lua
to illustrate this.
1-- Prevent allocations on traces.
2jit.off()
3local str, err = misc.memprof.start("memprof_new.bin")
4-- Lua doesn't create a new frame to call string.rep, and all allocations
5-- are attributed not to the append() function but to the parent scope.
6local function append(str, rep)
7 return string.rep(str, rep)
8end
9
10local t = {}
11for i = 1, 1e4 do
12 -- table.insert is the built-in function and all corresponding
13 -- allocations are reported in the scope of the main chunk.
14 table.insert(t,
15 append('q', i)
16 )
17end
18local stp, err = misc.memprof.stop()
The Lua code for starting the profiler – as in line 3 in the test.lua example above – is:
local str, err = misc.memprof.start(FILENAME)
where FILENAME
is the name of the binary file where profiling events are written.
If the operation fails,
for example if it is not possible to open a file for writing or if the profiler is already running,
misc.memprof.start()
returns nil
as the first result,
an error-message string as the second result,
and a system-dependent error code number as the third result.
If the operation succeeds, misc.memprof.start()
returns true
.
The Lua code for stopping the profiler – as in line 18 in the test.lua example above – is:
local stp, err = misc.memprof.stop()
If the operation fails,
for example if there is an error when the file descriptor is being closed
or if there is a failure during reporting,
misc.memprof.stop()
returns nil
as the first result,
an error-message string as the second result,
and a system-dependent error code number as the third result.
If the operation succeeds, misc.memprof.stop()
returns true
.
To generate the file with memory profile in binary format
(in the test.lua code example above
the file name is memprof_new.bin
), execute the code under Tarantool:
$ tarantool test.lua
Tarantool collects the allocation events in memprof_new.bin
, puts
the file in its working directory, and closes
the session.
The test.lua code example above also illustrates the memory
allocation logic in some cases that are important to understand for
reading and analyzing
a profiling report:
- Line 2: It is recommended to switch the JIT compilation off by calling
jit.off()
before the profiler start. Refer to the following
note about jitoff for more details.
- Lines 6-8: Tail call optimization doesn’t create a new call frame, so all
allocations inside the function called via the
CALLT/CALLMT
bytecodes
are attributed to the function’s caller. See also the comments preceding these lines.
- Lines 14-16: Usually the information about allocations inside Lua built‑ins
is not really
useful for developers. That’s why if a Lua built‑in function is called from
a Lua function, the profiler attributes all allocations to the Lua function.
Otherwise, this event is attributed to a C function.
See also the comments preceding these lines.
Parsing binary profile and generating profiling report
After getting the memory profile in binary format, the next step is
to parse it to get a human-readable profiling report. You can do this
via Tarantool by using the following command
(mind the hyphen -
before the filename):
$ tarantool -e 'require("memprof")(arg)' - memprof_new.bin
where memprof_new.bin
is the binary profile
generated earlier by tarantool test.lua
.
(Warning: there is a slight behavior change here, the tarantool -e ...
command was slightly different in Tarantool versions prior to Tarantool 2.8.1.)
Tarantool generates a profiling report and displays it on the console before closing
the session:
ALLOCATIONS
@test.lua:14: 10000 events +50240518 bytes -0 bytes
@test.lua:9: 1 events +32 bytes -0 bytes
@test.lua:8: 1 events +20 bytes -0 bytes
@test.lua:13: 1 events +24 bytes -0 bytes
REALLOCATIONS
@test.lua:13: 13 events +262216 bytes -131160 bytes
Overrides:
@test.lua:13
@test.lua:14: 11 events +49536 bytes -24768 bytes
Overrides:
@test.lua:14
INTERNAL
INTERNAL: 3 events +8448 bytes -16896 bytes
Overrides:
@test.lua:14
DEALLOCATIONS
INTERNAL: 1723 events +0 bytes -483515 bytes
@test.lua:14: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@test.lua:14 holds 50248326 bytes: 10010 allocs, 10 frees
@test.lua:13 holds 131080 bytes: 14 allocs, 13 frees
INTERNAL holds 8448 bytes: 3 allocs, 3 frees
@test.lua:9 holds 32 bytes: 1 allocs, 0 frees
@test.lua:8 holds 20 bytes: 1 allocs, 0 frees
Note
On macOS, a report will be different for the same chunk of code because
Tarantool and LuaJIT are built with the GC64 mode enabled for macOS.
Let’s examine the report structure. A report has four sections:
- ALLOCATIONS
- REALLOCATIONS
- DEALLOCATIONS
- HEAP SUMMARY (described later in The heap summary and the –leak-only option)
Each section contains event records that are sorted from the most frequent
to the least frequent.
An event record has the following format:
@<filename>:<line_number>: <number_of_events> events +<allocated> bytes -<freed> bytes
- <filename>—a name of the file containing Lua code.
- <line_number>—the line number where the event is detected.
- <number_of_events>—a number of events for this code line.
- +<allocated> bytes—amount of memory allocated during all the events on this line.
- -<freed> bytes—amount of memory freed during all the events on this line.
The Overrides
label shows what allocation has been overridden.
See the test.lua chunk above
with the explanation in the comments for some examples.
The INTERNAL
label indicates that this event is caused by internal LuaJIT
structures.
Note
Important note regarding the INTERNAL
label and the recommendation
of switching the JIT compilation off (jit.off()
): this version of the
profiler doesn’t support verbose reporting for allocations on
traces.
If memory allocations are made on a trace,
the profiler can’t associate the allocations with the part of Lua code
that generated the trace. In this case, the profiler labels such allocations
as INTERNAL
.
So, if the JIT compilation is on,
new traces will be generated and there will be a mixture of events labeled
INTERNAL
in the profiling report: some of them are really caused by
internal LuaJIT structures, but some of them are caused by allocations on
traces.
If you want to have a more definite report without JIT compiler allocations,
call jit.off() before starting the profiling.
And if you want to completely exclude the trace allocations from the report,
remove also the old traces by additionally calling jit.flush()
after
jit.off()
.
Nevertheless, switching the JIT compilation off before the profiling is not
“a must”. It is rather a recommendation, and in some cases,
for example in a production environment, you may need to keep JIT compilation
on to see the full picture of all the memory allocations.
In this case, the majority of the INTERNAL
events
are most probably caused by traces.
As for investigating the Lua code with the help of profiling reports,
it is always code-dependent and there can’t be hundred per cent definite
recommendations in this regard. Nevertheless, you can see some of the things
in the Profiling report analysis example later.
Also, below is the FAQ section with the questions that
most probably can arise while using profiler.
FAQ
In this section, some profiler-related points are discussed in
a Q&A format.
Question (Q): Is the profiler suitable for C allocations or allocations
inside C code?
Answer (A): The profiler reports only allocation events caused by the Lua
allocator. All Lua-related allocations, like table or string creation
are reported. But the profiler doesn’t report allocations made by malloc()
or other non-Lua allocators. You can use valgrind
to debug them.
Q: Why are there so many INTERNAL
allocations in my profiling report?
What does it mean?
A: INTERNAL
means that these allocations/reallocations/deallocations are
related to the internal LuaJIT structures or are made on traces.
Currently, the profiler doesn’t verbosely report allocations of objects
that are made during trace execution. Try adding jit.off()
before the profiler start.
Q: Why are there some reallocations/deallocations without an Overrides
section?
A: These objects can be created before the profiler starts. Adding
collectgarbage()
before the profiler’s start enables collecting all
previously allocated objects that are dead when the profiler starts.
Q: Why are some objects not collected during profiling? Is it
a memory leak?
A: LuaJIT uses incremental Garbage Collector (GC). A GC cycle may not be
finished at the moment the profiler stops. Add collectgarbage()
before
stopping the profiler to collect all the dead objects for sure.
Q: Can I profile not just a current chunk but the entire running application?
Can I start the profiler when the application is already running?
A: Yes. Here is an example of code that can be inserted in the Tarantool
console for a running instance.
1local fiber = require "fiber"
2local log = require "log"
3
4fiber.create(function()
5 fiber.name("memprof")
6
7 collectgarbage() -- Collect all objects already dead
8 log.warn("start of profile")
9
10 local st, err = misc.memprof.start(FILENAME)
11 if not st then
12 log.error("failed to start profiler: %s", err)
13 end
14
15 fiber.sleep(TIME)
16
17 collectgarbage()
18 st, err = misc.memprof.stop()
19
20 if not st then
21 log.error("profiler on stop error: %s", err)
22 end
23
24 log.warn("end of profile")
25end)
where
FILENAME
—the name of the binary file where profiling events are written
TIME
—duration of profiling, in seconds.
Also, you can directly call misc.memprof.start()
and misc.memprof.stop()
from a console.
Profiling report analysis example
In the example below, the following Lua code named format_concat.lua
is
investigated with the help of the memory profiler reports.
1-- Prevent allocations on new traces.
2jit.off()
3
4local function concat(a)
5 local nstr = a.."a"
6 return nstr
7end
8
9local function format(a)
10 local nstr = string.format("%sa", a)
11 return nstr
12end
13
14collectgarbage()
15
16local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
17
18local st, err = misc.memprof.start(binfile)
19assert(st, err)
20
21-- Payload.
22for i = 1, 10000 do
23 local f = format(i)
24 local c = concat(i)
25end
26collectgarbage()
27
28local st, err = misc.memprof.stop()
29assert(st, err)
30
31os.exit()
When you run this code under Tarantool and
then parse the binary memory profile
in /tmp/memprof_format_concat.bin,
you will get the following profiling report:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 1 events +65536 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19996 events +0 bytes -558778 bytes
Overrides:
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 65536 bytes: 1 allocs, 0 frees
Reasonable questions regarding the report can be:
- Why are there no allocations related to the
concat()
function?
- Why is the number of allocations not a round number?
- Why are there about 20K allocations instead of 10K?
First of all, LuaJIT doesn’t create a new string if the string with the same
payload exists (see details on lua-users.org/wiki).
This is called string interning.
So, when a string is
created via the format()
function, there is no need to create the same
string via the concat()
function, and LuaJIT just uses the previous one.
That is also the reason why the number of allocations is not a round number
as could be expected from the cycle operator for i = 1, 10000...
:
Tarantool creates some
strings for internal needs and built‑in modules, so some strings already exist.
But why are there so many allocations? It’s almost twice as big as the expected
amount. This is because the string.format()
built‑in function creates
another string necessary for the %s
identifier, so there are two allocations
for each iteration: for tostring(i)
and for string.format("%sa", string_i_value)
.
You can see the difference in behavior by adding the line
local _ = tostring(i)
between lines 22 and 23.
To profile only the concat()
function, comment out line 23 (which is
local f = format(i)
) and run the profiler. Now the output looks like this:
ALLOCATIONS
@format_concat.lua:5: 10000 events +284411 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 10000 events +0 bytes -218905 bytes
Overrides:
@format_concat.lua:5
@format_concat.lua:5: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@format_concat.lua:5 holds 65536 bytes: 10000 allocs, 9999 frees
Q: But what will change if JIT compilation is enabled?
A: In the code, comment out line 2 (which is
jit.off()
) and run
the profiler . Now there are only 56 allocations in the report, and all other
allocations are JIT-related (see also the related
dev issue):
ALLOCATIONS
@format_concat.lua:5: 56 events +1112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 2 events +382 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 58 events +0 bytes -1164 bytes
Overrides:
@format_concat.lua:5
INTERNAL
HEAP SUMMARY:
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
INTERNAL holds 360 bytes: 2 allocs, 1 frees
This happens because a trace has been compiled after 56 iterations (the default
value of the hotloop
compiler parameter). Then, the
JIT-compiler removed the unused variable c
from the trace, and, therefore,
the dead code of the concat()
function is eliminated.
Next, let’s profile only the format()
function with JIT enabled.
For that, comment out lines 2 and 24 (jit.off()
and
local c = concat(i)
), do not comment out line 23
(local f = format(i)
), and run the profiler.
Now the output will look like this:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 4 events +66928 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19997 events +0 bytes -559034 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 66928 bytes: 4 allocs, 0 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
Q: Why are there so many allocations in comparison to the concat()
function?
A: The answer is simple: the string.format()
function with the %s
identifier is not yet compiled via LuaJIT. So, a trace can’t be recorded and
the compiler doesn’t perform the corresponding optimizations.
If we change the format()
function in lines 9-12 of the
Profiling report analysis example
in the following way
local function format(a)
local nstr = string.format("%sa", tostring(a))
return nstr
end
the profiling report becomes much prettier:
ALLOCATIONS
@format_concat.lua:10: 109 events +2112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 3 events +1206 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 112 events +0 bytes -2460 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
INTERNAL
HEAP SUMMARY:
INTERNAL holds 1144 bytes: 3 allocs, 1 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
The heap summary and the –leak-only option
This feature was added in version 2.8.1.
The end of each display is a HEAP SUMMARY section which looks like this:
@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees
Sometimes a program can cause many deallocations, so
the DEALLOCATION section can become large, so the display is not easy to read.
To minimize output, start the parsing with an extra flag: --leak-only
,
for example
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
When --leak-only`
is used, only the HEAP SUMMARY section is displayed.
To collect a binary profile for a particular part of the Lua code,
you need to place this part between two misc.memprof
functions,
namely, misc.memprof.start()
and misc.memprof.stop()
, and then execute
the code under Tarantool.
Below is a chunk of Lua code named test.lua
to illustrate this.
1-- Prevent allocations on traces.
2jit.off()
3local str, err = misc.memprof.start("memprof_new.bin")
4-- Lua doesn't create a new frame to call string.rep, and all allocations
5-- are attributed not to the append() function but to the parent scope.
6local function append(str, rep)
7 return string.rep(str, rep)
8end
9
10local t = {}
11for i = 1, 1e4 do
12 -- table.insert is the built-in function and all corresponding
13 -- allocations are reported in the scope of the main chunk.
14 table.insert(t,
15 append('q', i)
16 )
17end
18local stp, err = misc.memprof.stop()
The Lua code for starting the profiler – as in line 3 in the test.lua example above – is:
local str, err = misc.memprof.start(FILENAME)
where FILENAME
is the name of the binary file where profiling events are written.
If the operation fails,
for example if it is not possible to open a file for writing or if the profiler is already running,
misc.memprof.start()
returns nil
as the first result,
an error-message string as the second result,
and a system-dependent error code number as the third result.
If the operation succeeds, misc.memprof.start()
returns true
.
The Lua code for stopping the profiler – as in line 18 in the test.lua example above – is:
local stp, err = misc.memprof.stop()
If the operation fails,
for example if there is an error when the file descriptor is being closed
or if there is a failure during reporting,
misc.memprof.stop()
returns nil
as the first result,
an error-message string as the second result,
and a system-dependent error code number as the third result.
If the operation succeeds, misc.memprof.stop()
returns true
.
To generate the file with memory profile in binary format
(in the test.lua code example above
the file name is memprof_new.bin
), execute the code under Tarantool:
$ tarantool test.lua
Tarantool collects the allocation events in memprof_new.bin
, puts
the file in its working directory, and closes
the session.
The test.lua code example above also illustrates the memory allocation logic in some cases that are important to understand for reading and analyzing a profiling report:
- Line 2: It is recommended to switch the JIT compilation off by calling
jit.off()
before the profiler start. Refer to the following note about jitoff for more details. - Lines 6-8: Tail call optimization doesn’t create a new call frame, so all
allocations inside the function called via the
CALLT/CALLMT
bytecodes are attributed to the function’s caller. See also the comments preceding these lines. - Lines 14-16: Usually the information about allocations inside Lua built‑ins is not really useful for developers. That’s why if a Lua built‑in function is called from a Lua function, the profiler attributes all allocations to the Lua function. Otherwise, this event is attributed to a C function. See also the comments preceding these lines.
Parsing binary profile and generating profiling report
After getting the memory profile in binary format, the next step is
to parse it to get a human-readable profiling report. You can do this
via Tarantool by using the following command
(mind the hyphen -
before the filename):
$ tarantool -e 'require("memprof")(arg)' - memprof_new.bin
where memprof_new.bin
is the binary profile
generated earlier by tarantool test.lua
.
(Warning: there is a slight behavior change here, the tarantool -e ...
command was slightly different in Tarantool versions prior to Tarantool 2.8.1.)
Tarantool generates a profiling report and displays it on the console before closing
the session:
ALLOCATIONS
@test.lua:14: 10000 events +50240518 bytes -0 bytes
@test.lua:9: 1 events +32 bytes -0 bytes
@test.lua:8: 1 events +20 bytes -0 bytes
@test.lua:13: 1 events +24 bytes -0 bytes
REALLOCATIONS
@test.lua:13: 13 events +262216 bytes -131160 bytes
Overrides:
@test.lua:13
@test.lua:14: 11 events +49536 bytes -24768 bytes
Overrides:
@test.lua:14
INTERNAL
INTERNAL: 3 events +8448 bytes -16896 bytes
Overrides:
@test.lua:14
DEALLOCATIONS
INTERNAL: 1723 events +0 bytes -483515 bytes
@test.lua:14: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@test.lua:14 holds 50248326 bytes: 10010 allocs, 10 frees
@test.lua:13 holds 131080 bytes: 14 allocs, 13 frees
INTERNAL holds 8448 bytes: 3 allocs, 3 frees
@test.lua:9 holds 32 bytes: 1 allocs, 0 frees
@test.lua:8 holds 20 bytes: 1 allocs, 0 frees
Note
On macOS, a report will be different for the same chunk of code because
Tarantool and LuaJIT are built with the GC64 mode enabled for macOS.
Let’s examine the report structure. A report has four sections:
- ALLOCATIONS
- REALLOCATIONS
- DEALLOCATIONS
- HEAP SUMMARY (described later in The heap summary and the –leak-only option)
Each section contains event records that are sorted from the most frequent
to the least frequent.
An event record has the following format:
@<filename>:<line_number>: <number_of_events> events +<allocated> bytes -<freed> bytes
- <filename>—a name of the file containing Lua code.
- <line_number>—the line number where the event is detected.
- <number_of_events>—a number of events for this code line.
- +<allocated> bytes—amount of memory allocated during all the events on this line.
- -<freed> bytes—amount of memory freed during all the events on this line.
The Overrides
label shows what allocation has been overridden.
See the test.lua chunk above
with the explanation in the comments for some examples.
The INTERNAL
label indicates that this event is caused by internal LuaJIT
structures.
Note
Important note regarding the INTERNAL
label and the recommendation
of switching the JIT compilation off (jit.off()
): this version of the
profiler doesn’t support verbose reporting for allocations on
traces.
If memory allocations are made on a trace,
the profiler can’t associate the allocations with the part of Lua code
that generated the trace. In this case, the profiler labels such allocations
as INTERNAL
.
So, if the JIT compilation is on,
new traces will be generated and there will be a mixture of events labeled
INTERNAL
in the profiling report: some of them are really caused by
internal LuaJIT structures, but some of them are caused by allocations on
traces.
If you want to have a more definite report without JIT compiler allocations,
call jit.off() before starting the profiling.
And if you want to completely exclude the trace allocations from the report,
remove also the old traces by additionally calling jit.flush()
after
jit.off()
.
Nevertheless, switching the JIT compilation off before the profiling is not
“a must”. It is rather a recommendation, and in some cases,
for example in a production environment, you may need to keep JIT compilation
on to see the full picture of all the memory allocations.
In this case, the majority of the INTERNAL
events
are most probably caused by traces.
As for investigating the Lua code with the help of profiling reports,
it is always code-dependent and there can’t be hundred per cent definite
recommendations in this regard. Nevertheless, you can see some of the things
in the Profiling report analysis example later.
Also, below is the FAQ section with the questions that
most probably can arise while using profiler.
FAQ
In this section, some profiler-related points are discussed in
a Q&A format.
Question (Q): Is the profiler suitable for C allocations or allocations
inside C code?
Answer (A): The profiler reports only allocation events caused by the Lua
allocator. All Lua-related allocations, like table or string creation
are reported. But the profiler doesn’t report allocations made by malloc()
or other non-Lua allocators. You can use valgrind
to debug them.
Q: Why are there so many INTERNAL
allocations in my profiling report?
What does it mean?
A: INTERNAL
means that these allocations/reallocations/deallocations are
related to the internal LuaJIT structures or are made on traces.
Currently, the profiler doesn’t verbosely report allocations of objects
that are made during trace execution. Try adding jit.off()
before the profiler start.
Q: Why are there some reallocations/deallocations without an Overrides
section?
A: These objects can be created before the profiler starts. Adding
collectgarbage()
before the profiler’s start enables collecting all
previously allocated objects that are dead when the profiler starts.
Q: Why are some objects not collected during profiling? Is it
a memory leak?
A: LuaJIT uses incremental Garbage Collector (GC). A GC cycle may not be
finished at the moment the profiler stops. Add collectgarbage()
before
stopping the profiler to collect all the dead objects for sure.
Q: Can I profile not just a current chunk but the entire running application?
Can I start the profiler when the application is already running?
A: Yes. Here is an example of code that can be inserted in the Tarantool
console for a running instance.
1local fiber = require "fiber"
2local log = require "log"
3
4fiber.create(function()
5 fiber.name("memprof")
6
7 collectgarbage() -- Collect all objects already dead
8 log.warn("start of profile")
9
10 local st, err = misc.memprof.start(FILENAME)
11 if not st then
12 log.error("failed to start profiler: %s", err)
13 end
14
15 fiber.sleep(TIME)
16
17 collectgarbage()
18 st, err = misc.memprof.stop()
19
20 if not st then
21 log.error("profiler on stop error: %s", err)
22 end
23
24 log.warn("end of profile")
25end)
where
FILENAME
—the name of the binary file where profiling events are written
TIME
—duration of profiling, in seconds.
Also, you can directly call misc.memprof.start()
and misc.memprof.stop()
from a console.
Profiling report analysis example
In the example below, the following Lua code named format_concat.lua
is
investigated with the help of the memory profiler reports.
1-- Prevent allocations on new traces.
2jit.off()
3
4local function concat(a)
5 local nstr = a.."a"
6 return nstr
7end
8
9local function format(a)
10 local nstr = string.format("%sa", a)
11 return nstr
12end
13
14collectgarbage()
15
16local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
17
18local st, err = misc.memprof.start(binfile)
19assert(st, err)
20
21-- Payload.
22for i = 1, 10000 do
23 local f = format(i)
24 local c = concat(i)
25end
26collectgarbage()
27
28local st, err = misc.memprof.stop()
29assert(st, err)
30
31os.exit()
When you run this code under Tarantool and
then parse the binary memory profile
in /tmp/memprof_format_concat.bin,
you will get the following profiling report:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 1 events +65536 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19996 events +0 bytes -558778 bytes
Overrides:
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 65536 bytes: 1 allocs, 0 frees
Reasonable questions regarding the report can be:
- Why are there no allocations related to the
concat()
function?
- Why is the number of allocations not a round number?
- Why are there about 20K allocations instead of 10K?
First of all, LuaJIT doesn’t create a new string if the string with the same
payload exists (see details on lua-users.org/wiki).
This is called string interning.
So, when a string is
created via the format()
function, there is no need to create the same
string via the concat()
function, and LuaJIT just uses the previous one.
That is also the reason why the number of allocations is not a round number
as could be expected from the cycle operator for i = 1, 10000...
:
Tarantool creates some
strings for internal needs and built‑in modules, so some strings already exist.
But why are there so many allocations? It’s almost twice as big as the expected
amount. This is because the string.format()
built‑in function creates
another string necessary for the %s
identifier, so there are two allocations
for each iteration: for tostring(i)
and for string.format("%sa", string_i_value)
.
You can see the difference in behavior by adding the line
local _ = tostring(i)
between lines 22 and 23.
To profile only the concat()
function, comment out line 23 (which is
local f = format(i)
) and run the profiler. Now the output looks like this:
ALLOCATIONS
@format_concat.lua:5: 10000 events +284411 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 10000 events +0 bytes -218905 bytes
Overrides:
@format_concat.lua:5
@format_concat.lua:5: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@format_concat.lua:5 holds 65536 bytes: 10000 allocs, 9999 frees
Q: But what will change if JIT compilation is enabled?
A: In the code, comment out line 2 (which is
jit.off()
) and run
the profiler . Now there are only 56 allocations in the report, and all other
allocations are JIT-related (see also the related
dev issue):
ALLOCATIONS
@format_concat.lua:5: 56 events +1112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 2 events +382 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 58 events +0 bytes -1164 bytes
Overrides:
@format_concat.lua:5
INTERNAL
HEAP SUMMARY:
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
INTERNAL holds 360 bytes: 2 allocs, 1 frees
This happens because a trace has been compiled after 56 iterations (the default
value of the hotloop
compiler parameter). Then, the
JIT-compiler removed the unused variable c
from the trace, and, therefore,
the dead code of the concat()
function is eliminated.
Next, let’s profile only the format()
function with JIT enabled.
For that, comment out lines 2 and 24 (jit.off()
and
local c = concat(i)
), do not comment out line 23
(local f = format(i)
), and run the profiler.
Now the output will look like this:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 4 events +66928 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19997 events +0 bytes -559034 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 66928 bytes: 4 allocs, 0 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
Q: Why are there so many allocations in comparison to the concat()
function?
A: The answer is simple: the string.format()
function with the %s
identifier is not yet compiled via LuaJIT. So, a trace can’t be recorded and
the compiler doesn’t perform the corresponding optimizations.
If we change the format()
function in lines 9-12 of the
Profiling report analysis example
in the following way
local function format(a)
local nstr = string.format("%sa", tostring(a))
return nstr
end
the profiling report becomes much prettier:
ALLOCATIONS
@format_concat.lua:10: 109 events +2112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 3 events +1206 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 112 events +0 bytes -2460 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
INTERNAL
HEAP SUMMARY:
INTERNAL holds 1144 bytes: 3 allocs, 1 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
The heap summary and the –leak-only option
This feature was added in version 2.8.1.
The end of each display is a HEAP SUMMARY section which looks like this:
@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees
Sometimes a program can cause many deallocations, so
the DEALLOCATION section can become large, so the display is not easy to read.
To minimize output, start the parsing with an extra flag: --leak-only
,
for example
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
When --leak-only`
is used, only the HEAP SUMMARY section is displayed.
After getting the memory profile in binary format, the next step is
to parse it to get a human-readable profiling report. You can do this
via Tarantool by using the following command
(mind the hyphen -
before the filename):
$ tarantool -e 'require("memprof")(arg)' - memprof_new.bin
where memprof_new.bin
is the binary profile
generated earlier by tarantool test.lua
.
(Warning: there is a slight behavior change here, the tarantool -e ...
command was slightly different in Tarantool versions prior to Tarantool 2.8.1.)
Tarantool generates a profiling report and displays it on the console before closing the session:
ALLOCATIONS
@test.lua:14: 10000 events +50240518 bytes -0 bytes
@test.lua:9: 1 events +32 bytes -0 bytes
@test.lua:8: 1 events +20 bytes -0 bytes
@test.lua:13: 1 events +24 bytes -0 bytes
REALLOCATIONS
@test.lua:13: 13 events +262216 bytes -131160 bytes
Overrides:
@test.lua:13
@test.lua:14: 11 events +49536 bytes -24768 bytes
Overrides:
@test.lua:14
INTERNAL
INTERNAL: 3 events +8448 bytes -16896 bytes
Overrides:
@test.lua:14
DEALLOCATIONS
INTERNAL: 1723 events +0 bytes -483515 bytes
@test.lua:14: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@test.lua:14 holds 50248326 bytes: 10010 allocs, 10 frees
@test.lua:13 holds 131080 bytes: 14 allocs, 13 frees
INTERNAL holds 8448 bytes: 3 allocs, 3 frees
@test.lua:9 holds 32 bytes: 1 allocs, 0 frees
@test.lua:8 holds 20 bytes: 1 allocs, 0 frees
Note
On macOS, a report will be different for the same chunk of code because Tarantool and LuaJIT are built with the GC64 mode enabled for macOS.
Let’s examine the report structure. A report has four sections:
- ALLOCATIONS
- REALLOCATIONS
- DEALLOCATIONS
- HEAP SUMMARY (described later in The heap summary and the –leak-only option)
Each section contains event records that are sorted from the most frequent to the least frequent.
An event record has the following format:
@<filename>:<line_number>: <number_of_events> events +<allocated> bytes -<freed> bytes
- <filename>—a name of the file containing Lua code.
- <line_number>—the line number where the event is detected.
- <number_of_events>—a number of events for this code line.
- +<allocated> bytes—amount of memory allocated during all the events on this line.
- -<freed> bytes—amount of memory freed during all the events on this line.
The Overrides
label shows what allocation has been overridden.
See the test.lua chunk above with the explanation in the comments for some examples.
The INTERNAL
label indicates that this event is caused by internal LuaJIT
structures.
Note
Important note regarding the INTERNAL
label and the recommendation
of switching the JIT compilation off (jit.off()
): this version of the
profiler doesn’t support verbose reporting for allocations on
traces.
If memory allocations are made on a trace,
the profiler can’t associate the allocations with the part of Lua code
that generated the trace. In this case, the profiler labels such allocations
as INTERNAL
.
So, if the JIT compilation is on,
new traces will be generated and there will be a mixture of events labeled
INTERNAL
in the profiling report: some of them are really caused by
internal LuaJIT structures, but some of them are caused by allocations on
traces.
If you want to have a more definite report without JIT compiler allocations,
call jit.off() before starting the profiling.
And if you want to completely exclude the trace allocations from the report,
remove also the old traces by additionally calling jit.flush()
after
jit.off()
.
Nevertheless, switching the JIT compilation off before the profiling is not
“a must”. It is rather a recommendation, and in some cases,
for example in a production environment, you may need to keep JIT compilation
on to see the full picture of all the memory allocations.
In this case, the majority of the INTERNAL
events
are most probably caused by traces.
As for investigating the Lua code with the help of profiling reports, it is always code-dependent and there can’t be hundred per cent definite recommendations in this regard. Nevertheless, you can see some of the things in the Profiling report analysis example later.
Also, below is the FAQ section with the questions that most probably can arise while using profiler.
FAQ
In this section, some profiler-related points are discussed in
a Q&A format.
Question (Q): Is the profiler suitable for C allocations or allocations
inside C code?
Answer (A): The profiler reports only allocation events caused by the Lua
allocator. All Lua-related allocations, like table or string creation
are reported. But the profiler doesn’t report allocations made by malloc()
or other non-Lua allocators. You can use valgrind
to debug them.
Q: Why are there so many INTERNAL
allocations in my profiling report?
What does it mean?
A: INTERNAL
means that these allocations/reallocations/deallocations are
related to the internal LuaJIT structures or are made on traces.
Currently, the profiler doesn’t verbosely report allocations of objects
that are made during trace execution. Try adding jit.off()
before the profiler start.
Q: Why are there some reallocations/deallocations without an Overrides
section?
A: These objects can be created before the profiler starts. Adding
collectgarbage()
before the profiler’s start enables collecting all
previously allocated objects that are dead when the profiler starts.
Q: Why are some objects not collected during profiling? Is it
a memory leak?
A: LuaJIT uses incremental Garbage Collector (GC). A GC cycle may not be
finished at the moment the profiler stops. Add collectgarbage()
before
stopping the profiler to collect all the dead objects for sure.
Q: Can I profile not just a current chunk but the entire running application?
Can I start the profiler when the application is already running?
A: Yes. Here is an example of code that can be inserted in the Tarantool
console for a running instance.
1local fiber = require "fiber"
2local log = require "log"
3
4fiber.create(function()
5 fiber.name("memprof")
6
7 collectgarbage() -- Collect all objects already dead
8 log.warn("start of profile")
9
10 local st, err = misc.memprof.start(FILENAME)
11 if not st then
12 log.error("failed to start profiler: %s", err)
13 end
14
15 fiber.sleep(TIME)
16
17 collectgarbage()
18 st, err = misc.memprof.stop()
19
20 if not st then
21 log.error("profiler on stop error: %s", err)
22 end
23
24 log.warn("end of profile")
25end)
where
FILENAME
—the name of the binary file where profiling events are written
TIME
—duration of profiling, in seconds.
Also, you can directly call misc.memprof.start()
and misc.memprof.stop()
from a console.
Profiling report analysis example
In the example below, the following Lua code named format_concat.lua
is
investigated with the help of the memory profiler reports.
1-- Prevent allocations on new traces.
2jit.off()
3
4local function concat(a)
5 local nstr = a.."a"
6 return nstr
7end
8
9local function format(a)
10 local nstr = string.format("%sa", a)
11 return nstr
12end
13
14collectgarbage()
15
16local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
17
18local st, err = misc.memprof.start(binfile)
19assert(st, err)
20
21-- Payload.
22for i = 1, 10000 do
23 local f = format(i)
24 local c = concat(i)
25end
26collectgarbage()
27
28local st, err = misc.memprof.stop()
29assert(st, err)
30
31os.exit()
When you run this code under Tarantool and
then parse the binary memory profile
in /tmp/memprof_format_concat.bin,
you will get the following profiling report:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 1 events +65536 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19996 events +0 bytes -558778 bytes
Overrides:
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 65536 bytes: 1 allocs, 0 frees
Reasonable questions regarding the report can be:
- Why are there no allocations related to the
concat()
function?
- Why is the number of allocations not a round number?
- Why are there about 20K allocations instead of 10K?
First of all, LuaJIT doesn’t create a new string if the string with the same
payload exists (see details on lua-users.org/wiki).
This is called string interning.
So, when a string is
created via the format()
function, there is no need to create the same
string via the concat()
function, and LuaJIT just uses the previous one.
That is also the reason why the number of allocations is not a round number
as could be expected from the cycle operator for i = 1, 10000...
:
Tarantool creates some
strings for internal needs and built‑in modules, so some strings already exist.
But why are there so many allocations? It’s almost twice as big as the expected
amount. This is because the string.format()
built‑in function creates
another string necessary for the %s
identifier, so there are two allocations
for each iteration: for tostring(i)
and for string.format("%sa", string_i_value)
.
You can see the difference in behavior by adding the line
local _ = tostring(i)
between lines 22 and 23.
To profile only the concat()
function, comment out line 23 (which is
local f = format(i)
) and run the profiler. Now the output looks like this:
ALLOCATIONS
@format_concat.lua:5: 10000 events +284411 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 10000 events +0 bytes -218905 bytes
Overrides:
@format_concat.lua:5
@format_concat.lua:5: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@format_concat.lua:5 holds 65536 bytes: 10000 allocs, 9999 frees
Q: But what will change if JIT compilation is enabled?
A: In the code, comment out line 2 (which is
jit.off()
) and run
the profiler . Now there are only 56 allocations in the report, and all other
allocations are JIT-related (see also the related
dev issue):
ALLOCATIONS
@format_concat.lua:5: 56 events +1112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 2 events +382 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 58 events +0 bytes -1164 bytes
Overrides:
@format_concat.lua:5
INTERNAL
HEAP SUMMARY:
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
INTERNAL holds 360 bytes: 2 allocs, 1 frees
This happens because a trace has been compiled after 56 iterations (the default
value of the hotloop
compiler parameter). Then, the
JIT-compiler removed the unused variable c
from the trace, and, therefore,
the dead code of the concat()
function is eliminated.
Next, let’s profile only the format()
function with JIT enabled.
For that, comment out lines 2 and 24 (jit.off()
and
local c = concat(i)
), do not comment out line 23
(local f = format(i)
), and run the profiler.
Now the output will look like this:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 4 events +66928 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19997 events +0 bytes -559034 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 66928 bytes: 4 allocs, 0 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
Q: Why are there so many allocations in comparison to the concat()
function?
A: The answer is simple: the string.format()
function with the %s
identifier is not yet compiled via LuaJIT. So, a trace can’t be recorded and
the compiler doesn’t perform the corresponding optimizations.
If we change the format()
function in lines 9-12 of the
Profiling report analysis example
in the following way
local function format(a)
local nstr = string.format("%sa", tostring(a))
return nstr
end
the profiling report becomes much prettier:
ALLOCATIONS
@format_concat.lua:10: 109 events +2112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 3 events +1206 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 112 events +0 bytes -2460 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
INTERNAL
HEAP SUMMARY:
INTERNAL holds 1144 bytes: 3 allocs, 1 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
The heap summary and the –leak-only option
This feature was added in version 2.8.1.
The end of each display is a HEAP SUMMARY section which looks like this:
@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees
Sometimes a program can cause many deallocations, so
the DEALLOCATION section can become large, so the display is not easy to read.
To minimize output, start the parsing with an extra flag: --leak-only
,
for example
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
When --leak-only`
is used, only the HEAP SUMMARY section is displayed.
In this section, some profiler-related points are discussed in a Q&A format.
Question (Q): Is the profiler suitable for C allocations or allocations inside C code?
Answer (A): The profiler reports only allocation events caused by the Lua
allocator. All Lua-related allocations, like table or string creation
are reported. But the profiler doesn’t report allocations made by malloc()
or other non-Lua allocators. You can use valgrind
to debug them.
Q: Why are there so many INTERNAL
allocations in my profiling report?
What does it mean?
A: INTERNAL
means that these allocations/reallocations/deallocations are
related to the internal LuaJIT structures or are made on traces.
Currently, the profiler doesn’t verbosely report allocations of objects
that are made during trace execution. Try adding jit.off()
before the profiler start.
Q: Why are there some reallocations/deallocations without an Overrides
section?
A: These objects can be created before the profiler starts. Adding
collectgarbage()
before the profiler’s start enables collecting all
previously allocated objects that are dead when the profiler starts.
Q: Why are some objects not collected during profiling? Is it a memory leak?
A: LuaJIT uses incremental Garbage Collector (GC). A GC cycle may not be
finished at the moment the profiler stops. Add collectgarbage()
before
stopping the profiler to collect all the dead objects for sure.
Q: Can I profile not just a current chunk but the entire running application? Can I start the profiler when the application is already running?
A: Yes. Here is an example of code that can be inserted in the Tarantool console for a running instance.
1local fiber = require "fiber"
2local log = require "log"
3
4fiber.create(function()
5 fiber.name("memprof")
6
7 collectgarbage() -- Collect all objects already dead
8 log.warn("start of profile")
9
10 local st, err = misc.memprof.start(FILENAME)
11 if not st then
12 log.error("failed to start profiler: %s", err)
13 end
14
15 fiber.sleep(TIME)
16
17 collectgarbage()
18 st, err = misc.memprof.stop()
19
20 if not st then
21 log.error("profiler on stop error: %s", err)
22 end
23
24 log.warn("end of profile")
25end)
where
FILENAME
—the name of the binary file where profiling events are writtenTIME
—duration of profiling, in seconds.
Also, you can directly call misc.memprof.start()
and misc.memprof.stop()
from a console.
Profiling report analysis example
In the example below, the following Lua code named format_concat.lua
is
investigated with the help of the memory profiler reports.
1-- Prevent allocations on new traces.
2jit.off()
3
4local function concat(a)
5 local nstr = a.."a"
6 return nstr
7end
8
9local function format(a)
10 local nstr = string.format("%sa", a)
11 return nstr
12end
13
14collectgarbage()
15
16local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
17
18local st, err = misc.memprof.start(binfile)
19assert(st, err)
20
21-- Payload.
22for i = 1, 10000 do
23 local f = format(i)
24 local c = concat(i)
25end
26collectgarbage()
27
28local st, err = misc.memprof.stop()
29assert(st, err)
30
31os.exit()
When you run this code under Tarantool and
then parse the binary memory profile
in /tmp/memprof_format_concat.bin,
you will get the following profiling report:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 1 events +65536 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19996 events +0 bytes -558778 bytes
Overrides:
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 65536 bytes: 1 allocs, 0 frees
Reasonable questions regarding the report can be:
- Why are there no allocations related to the
concat()
function?
- Why is the number of allocations not a round number?
- Why are there about 20K allocations instead of 10K?
First of all, LuaJIT doesn’t create a new string if the string with the same
payload exists (see details on lua-users.org/wiki).
This is called string interning.
So, when a string is
created via the format()
function, there is no need to create the same
string via the concat()
function, and LuaJIT just uses the previous one.
That is also the reason why the number of allocations is not a round number
as could be expected from the cycle operator for i = 1, 10000...
:
Tarantool creates some
strings for internal needs and built‑in modules, so some strings already exist.
But why are there so many allocations? It’s almost twice as big as the expected
amount. This is because the string.format()
built‑in function creates
another string necessary for the %s
identifier, so there are two allocations
for each iteration: for tostring(i)
and for string.format("%sa", string_i_value)
.
You can see the difference in behavior by adding the line
local _ = tostring(i)
between lines 22 and 23.
To profile only the concat()
function, comment out line 23 (which is
local f = format(i)
) and run the profiler. Now the output looks like this:
ALLOCATIONS
@format_concat.lua:5: 10000 events +284411 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 10000 events +0 bytes -218905 bytes
Overrides:
@format_concat.lua:5
@format_concat.lua:5: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@format_concat.lua:5 holds 65536 bytes: 10000 allocs, 9999 frees
Q: But what will change if JIT compilation is enabled?
A: In the code, comment out line 2 (which is
jit.off()
) and run
the profiler . Now there are only 56 allocations in the report, and all other
allocations are JIT-related (see also the related
dev issue):
ALLOCATIONS
@format_concat.lua:5: 56 events +1112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 2 events +382 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 58 events +0 bytes -1164 bytes
Overrides:
@format_concat.lua:5
INTERNAL
HEAP SUMMARY:
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
INTERNAL holds 360 bytes: 2 allocs, 1 frees
This happens because a trace has been compiled after 56 iterations (the default
value of the hotloop
compiler parameter). Then, the
JIT-compiler removed the unused variable c
from the trace, and, therefore,
the dead code of the concat()
function is eliminated.
Next, let’s profile only the format()
function with JIT enabled.
For that, comment out lines 2 and 24 (jit.off()
and
local c = concat(i)
), do not comment out line 23
(local f = format(i)
), and run the profiler.
Now the output will look like this:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 4 events +66928 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19997 events +0 bytes -559034 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 66928 bytes: 4 allocs, 0 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
Q: Why are there so many allocations in comparison to the concat()
function?
A: The answer is simple: the string.format()
function with the %s
identifier is not yet compiled via LuaJIT. So, a trace can’t be recorded and
the compiler doesn’t perform the corresponding optimizations.
If we change the format()
function in lines 9-12 of the
Profiling report analysis example
in the following way
local function format(a)
local nstr = string.format("%sa", tostring(a))
return nstr
end
the profiling report becomes much prettier:
ALLOCATIONS
@format_concat.lua:10: 109 events +2112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 3 events +1206 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 112 events +0 bytes -2460 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
INTERNAL
HEAP SUMMARY:
INTERNAL holds 1144 bytes: 3 allocs, 1 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
The heap summary and the –leak-only option
This feature was added in version 2.8.1.
The end of each display is a HEAP SUMMARY section which looks like this:
@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees
Sometimes a program can cause many deallocations, so
the DEALLOCATION section can become large, so the display is not easy to read.
To minimize output, start the parsing with an extra flag: --leak-only
,
for example
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
When --leak-only`
is used, only the HEAP SUMMARY section is displayed.
In the example below, the following Lua code named format_concat.lua
is
investigated with the help of the memory profiler reports.
1-- Prevent allocations on new traces.
2jit.off()
3
4local function concat(a)
5 local nstr = a.."a"
6 return nstr
7end
8
9local function format(a)
10 local nstr = string.format("%sa", a)
11 return nstr
12end
13
14collectgarbage()
15
16local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
17
18local st, err = misc.memprof.start(binfile)
19assert(st, err)
20
21-- Payload.
22for i = 1, 10000 do
23 local f = format(i)
24 local c = concat(i)
25end
26collectgarbage()
27
28local st, err = misc.memprof.stop()
29assert(st, err)
30
31os.exit()
When you run this code under Tarantool and then parse the binary memory profile in /tmp/memprof_format_concat.bin, you will get the following profiling report:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 1 events +65536 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19996 events +0 bytes -558778 bytes
Overrides:
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 65536 bytes: 1 allocs, 0 frees
Reasonable questions regarding the report can be:
- Why are there no allocations related to the
concat()
function? - Why is the number of allocations not a round number?
- Why are there about 20K allocations instead of 10K?
First of all, LuaJIT doesn’t create a new string if the string with the same
payload exists (see details on lua-users.org/wiki).
This is called string interning.
So, when a string is
created via the format()
function, there is no need to create the same
string via the concat()
function, and LuaJIT just uses the previous one.
That is also the reason why the number of allocations is not a round number
as could be expected from the cycle operator for i = 1, 10000...
:
Tarantool creates some
strings for internal needs and built‑in modules, so some strings already exist.
But why are there so many allocations? It’s almost twice as big as the expected
amount. This is because the string.format()
built‑in function creates
another string necessary for the %s
identifier, so there are two allocations
for each iteration: for tostring(i)
and for string.format("%sa", string_i_value)
.
You can see the difference in behavior by adding the line
local _ = tostring(i)
between lines 22 and 23.
To profile only the concat()
function, comment out line 23 (which is
local f = format(i)
) and run the profiler. Now the output looks like this:
ALLOCATIONS
@format_concat.lua:5: 10000 events +284411 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 10000 events +0 bytes -218905 bytes
Overrides:
@format_concat.lua:5
@format_concat.lua:5: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@format_concat.lua:5 holds 65536 bytes: 10000 allocs, 9999 frees
Q: But what will change if JIT compilation is enabled?
A: In the code, comment out line 2 (which is
jit.off()
) and run
the profiler . Now there are only 56 allocations in the report, and all other
allocations are JIT-related (see also the related
dev issue):
ALLOCATIONS
@format_concat.lua:5: 56 events +1112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 2 events +382 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 58 events +0 bytes -1164 bytes
Overrides:
@format_concat.lua:5
INTERNAL
HEAP SUMMARY:
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
INTERNAL holds 360 bytes: 2 allocs, 1 frees
This happens because a trace has been compiled after 56 iterations (the default
value of the hotloop
compiler parameter). Then, the
JIT-compiler removed the unused variable c
from the trace, and, therefore,
the dead code of the concat()
function is eliminated.
Next, let’s profile only the format()
function with JIT enabled.
For that, comment out lines 2 and 24 (jit.off()
and
local c = concat(i)
), do not comment out line 23
(local f = format(i)
), and run the profiler.
Now the output will look like this:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 4 events +66928 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19997 events +0 bytes -559034 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 66928 bytes: 4 allocs, 0 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
Q: Why are there so many allocations in comparison to the concat()
function?
A: The answer is simple: the string.format()
function with the %s
identifier is not yet compiled via LuaJIT. So, a trace can’t be recorded and
the compiler doesn’t perform the corresponding optimizations.
If we change the format()
function in lines 9-12 of the
Profiling report analysis example
in the following way
local function format(a)
local nstr = string.format("%sa", tostring(a))
return nstr
end
the profiling report becomes much prettier:
ALLOCATIONS
@format_concat.lua:10: 109 events +2112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 3 events +1206 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 112 events +0 bytes -2460 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
INTERNAL
HEAP SUMMARY:
INTERNAL holds 1144 bytes: 3 allocs, 1 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
The heap summary and the –leak-only option
This feature was added in version 2.8.1.
The end of each display is a HEAP SUMMARY section which looks like this:
@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees
Sometimes a program can cause many deallocations, so
the DEALLOCATION section can become large, so the display is not easy to read.
To minimize output, start the parsing with an extra flag: --leak-only
,
for example
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
When --leak-only`
is used, only the HEAP SUMMARY section is displayed.
This feature was added in version 2.8.1.
The end of each display is a HEAP SUMMARY section which looks like this:
@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees
Sometimes a program can cause many deallocations, so
the DEALLOCATION section can become large, so the display is not easy to read.
To minimize output, start the parsing with an extra flag: --leak-only
,
for example
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
When --leak-only`
is used, only the HEAP SUMMARY section is displayed.