This post is archived and probably outdated.

References - Still bad in PHP 7

2016-02-18 23:12:47

I'm known for telling "Don't use references" (also as video) as those cause different problems (i.e. with foreach) and hurt performance. The reason for the performance loss is that references disable copy-on-write while most places in PHP assume copy-on-write. Meanwhile we have PHP 7. In PHP 7 the internal variable handling changed a lot among other things the reference counting moved from the zval, the container representing a variable, to the actual element. So I decided to run a little test to verify my performance assumption was still valid.

In my test code I'm calling a function which calls strlen (one of the cheapest functions in PHP - PHP strings carry their length, so it simply returns that property) from a loop. Once the function takes a parameter by reference, once per value. Here's the code:

<?php
function takeRef(&$string) {
    strlen($string);
}
function takeVal($string) {
    strlen($string);
}

function loopRef() {
    $start = microtime(true);
    for ($i = 0; $i < 50000000; ++$i) {
        $s = "hello world";
        takeRef($s);
    }
    return microtime(true) - $start;
}
function loopVal() {
    $start = microtime(true);
    for ($i = 0; $i < 50000000; ++$i) {
        $s = "hello world";
        takeVal($s);
    }
    return microtime(true) - $start;
}

$ref = $val = PHP_INT_MAX;
for ($i = 0; $i < 10; ++$i) {
    $ref = min($ref, loopRef());
    $val = min($val, loopVal());
}

echo "Ref: $ref\nVal: $val\n";
?>

If I run this in PHP 5, in order to have a baseline, I get this result:

Ref: 10.679290056229
Val: 9.5635061264038

So using a reference costs 10%.

Now let's try PHP 7:

Ref: 10.631688117981
Val: 9.0047070980072

Overall we saw a small performance improvement, like we expect with PHP 7 but still using a reference costs 10% throughput. So I still stand with my mantra: Don't use references in PHP!


If you wonder about the second loop in the bottom and the min() call: The code takes multiple samples and then takes the measurement with the least noise as my system isn't 100% idle and there might be unrelated events I don't want to measure so the fastest run is closest to raw system performance.