Entries tagged as php references
Related tags
api design best practice guidlines oop performance php php 4 php coding php oo planet php coding .net ajax anniversary array assembler banshee BarCamp bazaar berkeley db birthday boredom Bryan Cantrill c# christmas comments conferences cvs database db debugging delphi development dsp DTrace ego english events exchange firefox frustration fun gecko german git google goto gsoc gsoc08 gsoc09 improvements ipc08 iterator java javafx json php qa php.next memcache memcached mysql mysql cluster php 7 php session scalability acquisation apc barcamp bc beer garden berlin charsets closures commits computer science 101 data center dtrace easter encoding exception file upload froscon froscon08 froscon10 php internals hashtable mysqli php 5.4 php.iterator hamburg ide ipc macos munich mysql proxy mysqlnd namespaces netbeans opensolaris parsecvs php bbq php extensions php releases php testing php53 phpmysqlproxy phpqa play project managment qaFeb 23: More on references

In a few different places I saw comments about my last blog post about references and performance where commentators noted that my example was pointless. Which of course is true and to some degree the point.
I read a lot of PHP code and from time to time I see people with a non-PHP background (or otherwise influenced) putting references everywhere they pass arrays or such in order to prevent copies. I knew this was a bad practice in PHP 5 and wanted to verify this in PHP 7. For readers with a stronger PHP background this doesn't come to mind and so comments are like "what if I want to modify the data?" which might lead to something like this:
function modify(&$data) { $data["foo"] = "bar"; } $data = [ /* huuuuuge array */ ]; modify($data);
In this code, from a performance perspective, the reference likely works out and this is "fast." My primary critic in this would be that references aren't idiomatic in PHP. Therefore most people reading this code wouldn't expect that $data is being changed in this function call. Luckily the name of the function give this away, to some degree. The more idiomatic way might be along those lines:
function modify($data) { $data["foo"] = "bar"; return $data; } $data = [ /* huuuuuge array */ ]; $data = modify($data);
I consider this more readable and clearer, while it will create a (temporary) copy, leading to more CPU and peak memory load. Now we have to decide how much clarity we want to take out of the code as compromise for a performance gain. After that decision has been made and we decided to go for the approach with references we fix an issue or add a new feature to our code and we make a slight change and suddenly loose what we've gained before. Maybe we do something like this:
function modify(&$data) { if (!in_array("bar", $data)) { // A copy happens here $data["foo1"] = "bar"; } if (!in_array("baz", $data)) { // Yet another copy here $data["foo2"] = "baz"; } } $data = [ /* huuuuuge array */ ]; $data2 = $data; modify($data); // A copy happens here, to split $data and $data2
So the performance gain we once carefully produced fired massively back to us and we even got three copies. In this short case this quite obvious, but in an larger application context with real life changes tracking this is really hard.
If we had written this in the (in my opinion) more idiomatic way this would look like this:
function modify($data) { if (!in_array("bar", $data)) { $data["foo1"] = "bar"; // Maybe a copy here } if (!in_array("baz", $data)) { $data["foo2"] = "baz"; // Maybe copy here, but only if not copied above already } return $data; } $data = [ /* huuuuuge array */ ]; $data2 = $data; $data = modify($data);
So depending on the conditions we might end up with either no or at most one copy, compared to the three copies from above. Of course this example is constructed but the point is: If you use references for performance you have to be extremely careful and know exactly what you're doing and think about each coming modification.
Now let's take a step back and think a bit more about this code. Isn't there yet another way? - We have data and we have functions operating on them. Wasn't there another construct which we might use? - Yes, we could go object-oriented!
class DataManager { private $data; public function __construct() { $this->data = [ /* huuuuuge array */ ]; } public function modify() { if (!in_array("bar", $this->data)) { $this->data["foo1"] = "bar"; } if (!in_array("baz", $this->data)) { $this->data["foo2"] = "baz"; } } } $dm = new DataManager(); $dm2 = $dm; $dm->modify();
Suddenly we have a higher degree of abstraction, encapsulation and all those other OO benefits and no copy of the data at all. Ok, yes I cheated: I didn't remember the purpose of the $dm2 = $dm assignment any more. So maybe we need to clone there and create an explicit copy. (While then again - for the $data property we'd probably benefit from copy-on-write making even the cloning quite cheap)
In summary: Yes, when careful you can be slightly more performant in both CPU and memory usage, but in real life that gain is often lost again and eventually fires back in maintenance cost and performance loss.
Now aren't there cases where references might be a good thing? - The only reason I found in recent times (except from an extremely carefully crafted tree structure I've seen, for which I'd usually suggest an OO way) is around anonymous functions/closures. Taking this example:
$data = [ /* ... */ ]; $oldsum = 0; $doubled = array_map(function ($element) use (&$oldsum) { $oldsum += $element; return $element * 2 }, $data);
Again, the example in itself might be bad, but in such a context where we provide a closure as callback and want to keep some "trivial" state references are a way which is ok. If the state we want to keep becomes more complex than a counter it, however, might be worthwhile to think about using an object to keep it or find some other code structure.
Feb 18: References - Still bad in PHP 7

I'm known for telling "Don't use references" (also as video) as those cause different problems (i.e. with foreach) and hurt performance. The reason for the performance loss is that references disable copy-on-write while most places in PHP assume copy-on-write. Meanwhile we have PHP 7. In PHP 7 the internal variable handling changed a lot among other things the reference counting moved from the zval, the container representing a variable, to the actual element. So I decided to run a little test to verify my performance assumption was still valid.
In my test code I'm calling a function which calls strlen (one of the cheapest functions in PHP - PHP strings carry their length, so it simply returns that property) from a loop. Once the function takes a parameter by reference, once per value. Here's the code:
<?php function takeRef(&$string) { strlen($string); } function takeVal($string) { strlen($string); } function loopRef() { $start = microtime(true); for ($i = 0; $i < 50000000; ++$i) { $s = "hello world"; takeRef($s); } return microtime(true) - $start; } function loopVal() { $start = microtime(true); for ($i = 0; $i < 50000000; ++$i) { $s = "hello world"; takeVal($s); } return microtime(true) - $start; } $ref = $val = PHP_INT_MAX; for ($i = 0; $i < 10; ++$i) { $ref = min($ref, loopRef()); $val = min($val, loopVal()); } echo "Ref: $ref\nVal: $val\n"; ?>
If I run this in PHP 5, in order to have a baseline, I get this result:
Ref: 10.679290056229 Val: 9.5635061264038
So using a reference costs 10%.
Now let's try PHP 7:
Ref: 10.631688117981 Val: 9.0047070980072
Overall we saw a small performance improvement, like we expect with PHP 7 but still using a reference costs 10% throughput. So I still stand with my mantra: Don't use references in PHP!
If you wonder about the second loop in the bottom and the min() call: The code takes multiple samples and then takes the measurement with the least noise as my system isn't 100% idle and there might be unrelated events I don't want to measure so the fastest run is closest to raw system performance.
Aug 20: References and foreach

References in PHP are bad, as I mentioned before, and you certainly should avoid using them. Now there is one use case which leads to an, at first, unexpected behavior which I didn't see as a real live issue when I stumbled over it at first, but then there were a few bug reports about it and recently a friend asked me about it ... so here it goes:
What is the output of this code:
<?php $a = array('a', 'b', 'c', 'd'); foreach ($a as &$v) { } foreach ($a as $v) { } print_r($a); ?>
We are iterating two times over an array without doing anything. So the result should be no change. Right? - Wrong! The actual result looks like this:
Array ( [0] => a [1] => b [2] => c [3] => c )
For understanding why this happens let's take a step back and look at the way PHP variables are implemented and what references are:
A PHP variable basically consists out of two things: A "label" and a "container." The label is the entry in a hash table (there are a few optimizations in the engine so it is not always in a hashtable but well) which may represents a symbol table of a function, and array or an object's property table. So we have a name and a pointer to the container. The container, internally called "zval", stores the value and some meta information, this container can also be a new hashtable with a set of labels pointing to other containers if we now create an reference this will cause a second label to point to the same container as another label. Both label from then on have the same powers of the container.
Now let's look at the situation from above. In a picture it looks like this:
So we have six containers (the global symbol table on the top, a container holding the array called $a on the left and one container for each element on the right) Now we start the first iteration. So the global symbol gets a new entry for $v and v is made a reference to the container of the first array element.
So an change to either $a[0] or $v goes to the same container and therefore has an effect to the other. When the iteration continues the reference is broken and $v is made a reference to the different elements. So after the iteration ends $v is a reference to the last element.
Remember: $v being a reference means that any change to $v effects the other references, in this situation $a[3]. Up till now nothing special happened. but now the second iteration begins. This one assigns the value of the current element to $v for each step. Now $v is a reference to the same element as $a[3] so by assigning a value to $v $a[3] is changed, too:
This continues for he next steps, too.
And now we can easily guess what will happen at the last step: $v is being assigned the value of the last element, $a[3], and as $a[3] is a reference to $v it therefore assignees itself to itself so effectively nothing happens.
And this is the result we saw above.
So to make this story full of pictures short: Be careful about references! They can have really strange effects.
Jan 10: Do not use PHP references

Last year I spoke at eight conferences and attended a few more multiple times at most of them I found myself in discussions about references and PHP as many users seem to have wrong understandings about them. Before going to deep into the subject let's start with a quick reminder what references are and clear some confusion about objects which are "passed by reference."
References are a way to have multiple variables referencing the same variable container using different names -- so whatever name you're using an operation on that variable will always have an effect on the others.
Let's look into it with some code to make this all clearer. For a start we simply do a regular assignment from one variable to the other and change it:
<?php $a = 23; $b = $a; $b = 42; var_dump($a); var_dump($b); ?>
This script will tell us that $a still is 23 and $b equals 42. So what happened here is that we got a copy (more on what actually happened later...) now let's do the same with a reference:
<?php $a = 23; $b = &$a; $b = 42; var_dump($a); var_dump($b); ?>
Now suddenly $a changes to 42, too. In fact there is no difference between $a and $b and both are using the same internal variable container (aka. zval). The only way to separate these two is by invalidating one of the variables using unset().
References in PHP can't only be created in regular assignments but also for function parameters or return values:
<?php function &foo(&$param) { $param = 42; return $param; } $a = 23; echo "\$a before calling foo(): $a\n"; $b = foo($a); echo "\$a after the call to foo(): $a\n"; $b = 23; echo "\$a after touching the returned variable: $a\n"; ?>
The result from this is, well what do you expect? Right - it looks like this:
$a before calling foo(): 23 $a after the call to foo(): 42 $a after touching the returned variable: 42
So we initialize a variable, pass it to a function as referenced parameter. The function changes it and it has the new value. The function returns the same variable, we change the returned variable and the original value ... wait it didn't change!? - Yes references are mean. What happened is the following: The function returned a reference, referencing the same zval as $a and the = assignment operator creates a copy of it.
To fix this we have to add one & more:
$b = &foo($a);
Then the result is what one would expect:
$a before calling foo(): 23 $a after the call to foo(): 42 $a after touching the returned value: 23
Summary so far: PHP references are alias to the same variable and properly using them can be hard. For details on the reference counting, which is the base for this, check the according section in the manual.
When PHP 5 came to live one of the big changes was how objects were handled. The general explanation is something like this:
In PHP 4 objects are treated like other variables so when using them as function parameters or doing assignments they are copied. In PHP 5 they are always passed by reference.
Which isn't entirely correct. The issue to solve was about object oriented patterns: Objects are passed as parameters to some function or method, this function sends a signal to the object (aka calls a method) which then might change the object's state (aka. its properties). For this to work the object has to be the same. PHP 4 OO users now always passed explicit references, which is, as we saw above, tricky to do correctly. To make this nicer in PHP 5 an object storage which is independent from the variable container was introduced. So inside the variable we don't store the whole object anymore (which basically means the properties table plus class information) but a reference to an object inside an object storage - so if we create a copy of the variable we don't copy the object but this reference (or: handle) so it feels like an reference, but be aware it is no reference but a different concept. The difference can be seen by directly changing the variable:
<?php // create an object and a copy as well as a reference to the variable $a = new stdclass; $b = $a; $c = &$a; // Do something with the object $a->foo = 42; var_dump($a->foo); var_dump($b->foo); var_dump($c->foo); // Now change the variable itself $a = 42; var_dump($a); var_dump($b); var_dump($c); ?>
When running this you can see that the access to the property really affects the copy, too but in the last assignment you can see the difference to an reference as $b is not affected by it. This is the behavior most (all?) people with OO experience expect.
So OO was one valid reason for using references, but as PHP 4 is dead for over one year now old code using this should really cleaned up!
Another reason people use reference is since they think it makes the code faster. But this is wrong. It is even worse: References mostly make the code slower!
Yes, references often make the code slower - Sorry, I just had to repeat this to make it clear.
When coming from other languages from other languages people read in style guides that passing copies of large structures or strings should be avoided as creating a copy takes time. In some environments complex structures have to be passed as pointers, which is a fundamentally different model from references, and people take this to PHP references. But PHP is not that other language but PHP with PHP's runtime and in PHP we do copy-on-write.
With copy-on-write we don't copy on an assignment or function call but just note that there are multiple independent variables pointing at one and the same variable container and only if there is a write operation we separate the variable, which is written to, from the others. This means that even so a variable looks like a copy it's in fact no copy and the function call takes no penalty do to big parameters. The problem with references now is that they disable the copy-on-write mechanism so any following non-reference assignment using this variable will create an immediate copy. This in itself won't be bad - you could simply use references everywhere, well not really: PHP is built around the copy-on-write availability so most internal functions expect copies.
Somewhere I found code which something looks like this:
<?php function foo(&$data) { for ($i = 0; $i < strlen($data); $i++) { do_something($data{$i}); } } $string = "... looooong string with lots of data ....."; foo(string); ?>
Now the first issue with this code is obvious: It is calling strlen() in a loop for each iteration while the length is calculated. So that's strlen($data) function calls while a single one would be enough. Now with strlen() it won't be too bad as, unlike in a language like C, strings in PHP directly carry the length so no calculation is needed, in general. But now in this case the developer tried to be smart and save time by passing a reference. But well, strlen() expects a copy. copy-on-write can't be done on references so $data will be copied for calling strlen(), strlen() will do an absolutely simple operation - in fact strlen() is one of the most trivial functions in PHP - and the copy will be destroyed immediately.
If no reference is being used no copy is needed which makes the code way faster and even if strlen() would take the reference you wouldn't have won anything.
Summary so far:
- Do not use references for OO but get ridof PHP 4 legacy.
- Do not use references for performance.
Now a third thing which is done with references is bad API design by returning via reference parameters. The issue here is, again, that people forget that PHP is PHP and not another language.
In PHP you can return multiple types from the same function - so if the function was successful you could return a string and a boolean false in case of an error. PHP also allows to return complex structures like arrays and objects, so if multiple things are to be returned they can be packed together. Additionally there are exceptions as a way to return from a function.
Using referenced parameters is a bad thing, additionally to the fact that references are bad and cause performance penalties using references in this way makes code hard to maintain. Having such a function call:
do_something($var);
Would you expect that $var will change? - No. But if do_something() takes it as a reference it could happen.
Another problem with such APIs is that function calls can't be nested but you always have to use a temporary variable, now nesting function calls can also reduce readability, but there are enough situations where nesting makes the code clearer.
My personal favorite example for a bad design decision in regards to references is PHP's own sort() function. sort() takes an array as reference parameter which will be returned in sorted order by reference. It would be way nicer to return the sorted array as regular return value. The reason for this is history: sort() is older than copy-on-write. Copy-on write was introduced with PHP 4, while sort() is way older and from times before PHP really was its own language but a shortcut to do some things in the Web.
To sum it up: References in PHP are bad. Do not use them. They hurt and will just mess with things and do not expect to be able to outsmart the engine with references!