2018/09/24

Devel::MAT investigation into C - part 2

In the previous part we discovered and started investigating a memory leak bug in a program, which got as far as needing a custom-built perl binary to look further into where the leak comes from. We built that with perlbrew and some specific compiler flags.

This new memory-logging feature is controlled by setting an environment variable (PERL_MEM_LOG) to specify the file descriptor to write the log to, and some flags to say what kinds of events will cause logs to be written. This log won't be immediately helpful to us on its own, but when combined with the analysis from Devel::MAT will let us see the history of each individual SV, and specifically the ones that pmat-leakreport tells us are our leak candidates.

To see the logging in action, first lets check that it is working on a test run of perl -V. We'll ask it to output information about SV allocations and deletions to file descriptor 3 (so as not to disturb the process's normal output on the standard output or error streams), and send that to a file:

$ PERL_MEM_LOG=3s debugperl5.26.2 -V 3>perlmem.log
Summary of my perl5 (revision 5 version 26 subversion 2) configuration:
   
...

$ wc -l perlmem.log
8170 perlmem.log

$ head -5 perlmem.log 
new_SV: sv.c:9377:Perl_newSVpvn: 564a933f1f78
new_SV: sv.c:9678:Perl_newSV_type: 564a933f1f90
new_SV: sv.c:9678:Perl_newSV_type: 564a933f1fa8
new_SV: sv.c:9678:Perl_newSV_type: 564a933f1fc0
new_SV: sv.c:9377:Perl_newSVpvn: 564a933f1fd8

$ tail -5 perlmem.log 
del_SV: sv.c:7075:Perl_sv_free2: 564a93407e90
del_SV: sv.c:7075:Perl_sv_free2: 564a93407d88
del_SV: sv.c:7075:Perl_sv_free2: 564a93407de8
del_SV: sv.c:6805:Perl_sv_clear: 564a934077e8
del_SV: sv.c:7075:Perl_sv_free2: 564a933f2038

Here we've got a nice long file, with one line per memory allocation. We can now run our leaky process again, this time with the memory log added as well. This time, when pmat-leakreport gives us some candidates we'll be able to see in more detail where they came from, at the C level:

$ PERL_MEM_LOG=3s debugperl5.26.2 ... 3>perlmem.log
...

$ pmat-leakreport -1 perl-{4,5,6,7,8}.pmat
LEAK[1] UNDEF() at 0x564e10091f80
LEAK[1] UNDEF() at 0x564e107ae740
LEAK[1] UNDEF() at 0x564e10794008

$ grep 564e10091f80 perlmem.log
...
new_SV: sv.c:9248:Perl_sv_newmortal: 564e10091f80
del_SV: sv.c:7075:Perl_sv_free2: 564e10091f80
new_SV: sv.c:9678:Perl_newSV_type: 564e10091f80
del_SV: sv.c:6805:Perl_sv_clear: 564e10091f80
new_SV: sv.c:9753:Perl_newSVsv: 564e10091f80

Well that's partly useful - we can see that the final memory operation at that address was indeed an allocation, with some previous history there which we can ignore. The program did allocate an SV at that address, that wasn't further touched until the end of the program. Problem is, this report itself isn't all that helpful - all we learn from it is that the function that allocated this SV was Perl_newSVsv. That's not saying much - about half of the SVs ever allocated by a program will be through that function. If we want to dig deeper even than this, we'll have to find a way to get the C-level stack trace at the time that allocation was made. To do that we have no alternative but to go editing perl core source code and rebuilding it once more.

We'll abuse the build directory left behind by perlbrew, as it provides a convenient place where the source is already unpacked and configured ready to build. Lets go find it, and edit the source code to add a little extra tracing.

We start by configuring the source tree to include the C backtrace feature, and then apply a patch to util.c file, to have it call the backtrace function (the source of the patch is attached at the end of this blog post):

$ rm config.sh
$ ./Configure -DDEBUGGING -Accflags=-DPERL_MEM_LOG -Dusecbacktrace
$ patch -p1 < print-c-backtrace-on-memlog.patch
$ make

Now when we run this and pass the new c flag as part of PERL_MEM_LOG, we'll get 3 layers of C stack trace along with each line in the memory log. This will tell us what C code called Perl_newSVsv to create our leaking scalars:

$ PERL_MEM_LOG=3sc debugperl5.26.2 ... 3>perlmem.log
...

$ pmat-leakreport -1 perl-{4,5,6,7,8}.pmat
LEAK[1] UNDEF() at 0x55f72bd4b4f0
LEAK[1] UNDEF() at 0x55f72c015088
LEAK[1] UNDEF() at 0x55f72bd57238

$ grep -A3 55f72bd4b4f0 perlmem.log
...
new_SV: sv.c:9753:Perl_newSVsv: 55f72bd4b4f0
  frame[0]: 0x55f72a04d4c3 Perl_newSVsv at /home/leo/perl5/perlbrew/build/perl-5.26.2/perl-5.26.2/perl +0x16d4c3
  frame[1]: 0x7f4644e5af73 - at /home/leo/perl5/perlbrew/perls/perl-5.26.2/lib/site_perl/5.26.2/x86_64-linux/auto/Net/SSLeay/SSLeay.so +0x78f73
  frame[2]: 0x55f72a019082 Perl_pp_entersub at /home/leo/perl5/perlbrew/build/perl-5.26.2/perl-5.26.2/perl +0x139082

Now we are definitely getting somewhere. We know the call to Perl_newSVsv was made from Net::SSLeay, and we have the path to the object file, and the offset address within it. The helpful addr2line utility can convert that into something more meaningful:

$ addr2line -fe /home/leo/perl5/perlbrew/perls/perl-5.26.2/lib/site_perl/5.26.2/x86_64-linux/auto/Net/SSLeay/SSLeay.so 0x78f73
XS_Net__SSLeay_CTX_set_tlsext_status_cb
/home/leo/.cpan/build/Net-SSLeay-1.85-1/SSLeay.xs:6453

The leaked scalars were created within the XS function behind the Net::SSLeay::CTX::set_tlsext_status_cb function, and we even get a source file and line number. We can go and inspect the source code:

  6450              SSL_CTX_set_tlsext_status_cb(ctx, NULL);
  6451          } else if (SvROK(callback) && (SvTYPE(SvRV(callback)) == SVt_PVCV)) {
  6452              cb_data_advanced_put(ctx, "tlsext_status_cb!!func", newSVsv(callback));
  6453              cb_data_advanced_put(ctx, "tlsext_status_cb!!data", newSVsv(data));
  6454              SSL_CTX_set_tlsext_status_cb(ctx, tlsext_status_cb_invoke);
  6455          } else {
  6456              croak("argument must be code reference");

Sure enough, there is indeed our call to newSVsv().

So now we have a definite location in the code where we can see we are creating new SVs that don't get cleaned up again. This is a start, but is only the first step of fixing the bug. We also need to find out why it isn't being released again afterwards - for that we'll continue in part 3...


The patch to util.c to create this logging is:

No comments:

Post a Comment