2018/09/27

Devel::MAT investigation into C - part 3

In the previous part we investigated a memory leak in a Perl program, and found the line of C code in a module responsible for creating the SVs that are leaking. This of course is only half of the problem - the other half is the fact that these SVs aren't getting reclaimed again when they're no longer needed, so to answer that question we'll have to investigate a little more. We'll attempt to go looking around the code at where this SV pointer is stored, and look for other parts of code that would try to reclaim it later on, and see if we can see why they don't.

Taking a look again at the source surrounding the line that is creating these leaking SVs, we see an interesting call to a function called cb_data_advanced_put() which appears to be taking ownership of it.

cb_data_advanced_put(ctx, "tlsext_status_cb!!func", newSVsv(callback));
cb_data_advanced_put(ctx, "tlsext_status_cb!!data", newSVsv(data));
SSL_CTX_set_tlsext_status_cb(ctx, tlsext_status_cb_invoke);

This cb_data_advanced_put() is defined by the XS code itself, storing the data under the named key in a hash associated with the SSL context stored in ctx. The code implementing that looks like this:

int cb_data_advanced_put(void *ptr, const char* data_name, SV* data)
{
    HV * L2HV;
    SV ** svtmp;
    int len;
    char key_name[500];
    dMY_CXT;

    len = my_snprintf(key_name, sizeof(key_name), "ptr_%p", ptr);
    if (len == sizeof(key_name)) return 0; /* error  - key_name too short*/

    /* get or create level-2 hash */
    svtmp = hv_fetch(MY_CXT.global_cb_data, key_name, strlen(key_name), 0);
    if (svtmp == NULL) {
        L2HV = newHV();
        hv_store(MY_CXT.global_cb_data, key_name, strlen(key_name), newRV_noinc((SV*)L2HV), 0);
    }
    else {
        if (!SvOK(*svtmp) || !SvROK(*svtmp)) return 0;
        L2HV = (HV*)MUTABLE_PTR(SvRV(*svtmp));
    }

    /* first delete already stored value */
    hv_delete(L2HV, data_name, strlen(data_name), G_DISCARD);
    if (data!=NULL)
        if (SvOK(data))
            hv_store(L2HV, data_name, strlen(data_name), data, 0);

    return 1;
}

This function takes ownership of the SV pointer given in data by storing it in a hash - an HV. That HV is a second-level hash, itself stored keyed by the given context pointer ptr into a hash maintained by the toplevel context in MY_CXT.global_cb_data. To understand the intended code path to release this, we'll have to find where that gets reclaimed. Inspection of the code that manages this top-level hash, and surrounding functions in the file, soon finds another function which appears to be doing that:

int cb_data_advanced_drop(void *ptr)
{
    int len;
    char key_name[500];
    dMY_CXT;

    len = my_snprintf(key_name, sizeof(key_name), "ptr_%p", ptr);
    if (len == sizeof(key_name)) return 0; /* error  - key_name too short*/

    hv_delete(MY_CXT.global_cb_data, key_name, strlen(key_name), G_DISCARD);
    return 1;
}

The important line here is the call to hv_delete(), which is deleting a key from the shared HV stored in MY_CXT.global_cb_data. When that HV is deleted, Perl should recurse into it and reclaim all of the values stored by it, including the data we put there earlier. But yet it appears not to be - or else we wouldn't observe our memory leak in the first place - so we must dive a little deeper to understand why.

Lets start by dumping the contents of the hash that hv_delete() is discarding, to see if it contains the keys we expect - remember, from the first block of code we found, we're expecting an entry under the key named tlsext_status_cb!!data. We'll do that by adjusting the code and adding a call to the handy debugging function sv_dump() so we can see what's inside it:

SV *tmp = hv_delete(MY_CXT.global_cb_data, key_name, strlen(key_name), 0);
sv_dump(tmp);

When we run this we get a nice detailed snapshot of what was inside the HV at the time it was cleaned up. The full output is quite long, but here is a cut-down version of the most relevant detail:

SV = IV(0x55c769db3358) at 0x55c769db3368
  FLAGS = (TEMP,ROK)
  RV = 0x55c769db2f78
    SV = PVHV(0x55c769dd9280) at 0x55c769db2f78
      KEYS = 1
        Elt "tlsext_status_cb!!func" HASH = 0x567bfc0c

We have a reference to a hash, containing just one key (because KEYS = 1), being the tlsext_status_cb!!func part, but it seems we're missing the related !!data key, which is the leaking one. But this seems unusual - the code path we encountered earlier would always unconditionally insert them both. Either the data key went missing sometime, or never got inserted in the first place.

Perhaps it would help to add some further logging to the cb_data_advanced_put function, having it log what the HV looks like after it's added the key, so we can see if it was never added or goes missing later on. We'll add this to the end of the function:

fprintf(stderr, "After cb_data_advanced_put key=<%s>:\n", data_name);
sv_dump(sv_2mortal(newRV_inc(L2HV)));

(The added complication of calling sv_2mortal on newRV_inc is simply to get around an odd quirk of sv_dump(), that it will dump the contents of a hash given by indirect RV reference, but not by HV pointer directly. We have to make and dump an RV referencing it, just so we can see those keys.)

Running this, we find some more logs from its behaviour:

After cb_data_advanced_put key=:
SV = IV(0x55932b203ec0) at 0x55932b203ed0
  FLAGS = (TEMP,ROK)
  RV = 0x55932b1db110
    SV = PVHV(0x55932b20c4d0) at 0x55932b1db110
      KEYS = 2
        Elt "ssleay_verify_callback!!func" HASH = 0x4101f92c
        Elt "tlsext_status_cb!!func" HASH = 0x462de35f

After cb_data_advanced_put key=:
SV = IV(0x55932b204130) at 0x55932b204140
  FLAGS = (TEMP,ROK)
  RV = 0x55932b1db110
    SV = PVHV(0x55932b20c4d0) at 0x55932b1db110
      KEYS = 2
        Elt "ssleay_verify_callback!!func" HASH = 0x4101f92c
        Elt "tlsext_status_cb!!func" HASH = 0x462de35f

Well this is odd. After adding the func key, it was indeed there (along with another relating to the verify callback, which we can ignore for our purposes), leaving it with KEYS = 2. But after attempting to put the data data in, it still only has those two and no further key was added. It looks like the function didn't attempt to add the key to the HV.

To understand why, a further look at a small section of the code in cb_data_advanced_put() might help here:

if (data!=NULL)
    if (SvOK(data))
        hv_store(L2HV, data_name, strlen(data_name), data, 0);

The hv_store() function is only called if data points to a real SV, but contains some defined value (i.e. SvOK() is true). But recalling the leaking SV candidates we found in part 1 - they were all UNDEF. It seems perhaps we've found the reason. If the data for the tlsext_status_cb is an undefined scalar value, a new SV is allocated for it (by calling newSVsv()) which is never stored in the HV, and thus is never reclaimed when the HV itself is released. Instead, the SV just sits around in memory, unreferenced by anything else, with nothing to clean it up afterwards.

If this is the case, then we can easily fix it by just throwing away the SV immediately if it is undefined. We can change the code to:

if (data!=NULL) {
    if (SvOK(data))
        hv_store(L2HV, data_name, strlen(data_name), data, 0);
    else
        SvREFCNT_dec(data);
}

Applying this to the code, rebuilding, and testing again shows this does seem to fix this bug, as now we no longer get a constantly-increasing SV count:

SVs 49656 (+49656)
SVs 56515 (+6859)
SVs 56509 (-6)
SVs 56509 (+0)
SVs 56509 (+0)
SVs 56508 (-1)
SVs 56508 (+0)

All that remains now is to send it upstream in the form of a bug report - which I have done in RT#127131.

No comments:

Post a Comment