Last year I spoke at eight conferences and attended a few more multiple times at most of them I found myself in discussions about references and PHP as many users seem to have wrong understandings about them. Before going to deep into the subject let's start with a quick reminder what references are and clear some confusion about objects which are "passed by reference."
References are a way to have multiple variables referencing the same variable container using different names -- so whatever name you're using an operation on that variable will always have an effect on the others.
Let's look into it with some code to make this all clearer. For a start we simply do a regular assignment from one variable to the other and change it:
<?php
$a = 23;
$b = $a;
$b = 42;
var_dump($a);
var_dump($b);
?>
This script will tell us that $a still is 23 and $b equals 42. So what happened here is that we got a copy (more on what actually happened later...) now let's do the same with a reference:
<?php
$a = 23;
$b = &$a;
$b = 42;
var_dump($a);
var_dump($b);
?>
Now suddenly $a changes to 42, too. In fact there is no difference between $a and $b and both are using the same internal variable container (aka. zval). The only way to separate these two is by invalidating one of the variables using unset().
References in PHP can't only be created in regular assignments but also for function parameters or return values:
<?php
function &foo(&$param) {
$param = 42;
return $param;
}
$a = 23;
echo "\$a before calling foo(): $a\n";
$b = foo($a);
echo "\$a after the call to foo(): $a\n";
$b = 23;
echo "\$a after touching the returned variable: $a\n";
?>
The result from this is, well what do you expect? Right - it looks like this:
$a before calling foo(): 23
$a after the call to foo(): 42
$a after touching the returned variable: 42
So we initialize a variable, pass it to a function as referenced parameter. The function changes it and it has the new value. The function returns the same variable, we change the returned variable and the original value ... wait it didn't change!? - Yes references are mean. What happened is the following: The function returned a reference, referencing the same zval as $a and the = assignment operator creates a copy of it.
To fix this we have to add one & more:
$b = &foo($a);
Then the result is what one would expect:
$a before calling foo(): 23
$a after the call to foo(): 42
$a after touching the returned value: 23
Summary so far: PHP references are alias to the same variable and properly using them can be hard. For details on the reference counting, which is the base for this, check the according section in the manual.
When PHP 5 came to live one of the big changes was how objects were handled. The general explanation is something like this:
In PHP 4 objects are treated like other variables so when using them
as function parameters or doing assignments they are copied. In PHP 5
they are always passed by reference.
Which isn't entirely correct. The issue to solve was about object oriented patterns: Objects are passed as parameters to some function or method, this function sends a signal to the object (aka calls a method) which then might change the object's state (aka. its properties). For this to work the object has to be the same. PHP 4 OO users now always passed explicit references, which is, as we saw above, tricky to do correctly. To make this nicer in PHP 5 an object storage which is independent from the variable container was introduced. So inside the variable we don't store the whole object anymore (which basically means the properties table plus class information) but a reference to an object inside an object storage - so if we create a copy of the variable we don't copy the object but this reference (or: handle) so it feels like an reference, but be aware it is no reference but a different concept. The difference can be seen by directly changing the variable:
<?php
// create an object and a copy as well as a reference to the variable
$a = new stdclass;
$b = $a;
$c = &$a;
// Do something with the object
$a->foo = 42;
var_dump($a->foo);
var_dump($b->foo);
var_dump($c->foo);
// Now change the variable itself
$a = 42;
var_dump($a);
var_dump($b);
var_dump($c);
?>
When running this you can see that the access to the property really affects the copy, too but in the last assignment you can see the difference to an reference as $b is not affected by it. This is the behavior most (all?) people with OO experience expect.
So OO was one valid reason for using references, but as PHP 4 is dead for over one year now old code using this should really cleaned up!
Another reason people use reference is since they think it makes the code faster. But this is wrong. It is even worse: References mostly make the code slower!
Yes, references often make the code slower - Sorry, I just had to repeat this to make it clear.
When coming from other languages from other languages people read in style guides that passing copies of large structures or strings should be avoided as creating a copy takes time. In some environments complex structures have to be passed as pointers, which is a fundamentally different model from references, and people take this to PHP references. But PHP is not that other language but PHP with PHP's runtime and in PHP we do copy-on-write.
With copy-on-write we don't copy on an assignment or function call but just note that there are multiple independent variables pointing at one and the same variable container and only if there is a write operation we separate the variable, which is written to, from the others. This means that even so a variable looks like a copy it's in fact no copy and the function call takes no penalty do to big parameters. The problem with references now is that they disable the copy-on-write mechanism so any following non-reference assignment using this variable will create an immediate copy. This in itself won't be bad - you could simply use references everywhere, well not really: PHP is built around the copy-on-write availability so most internal functions expect copies.
Somewhere I found code which something looks like this:
<?php
function foo(&$data) {
for ($i = 0; $i < strlen($data); $i++) {
do_something($data{$i});
}
}
$string = "... looooong string with lots of data .....";
foo(string);
?>
Now the first issue with this code is obvious: It is calling strlen() in a loop for each iteration while the length is calculated. So that's strlen($data) function calls while a single one would be enough. Now with strlen() it won't be too bad as, unlike in a language like C, strings in PHP directly carry the length so no calculation is needed, in general. But now in this case the developer tried to be smart and save time by passing a reference. But well, strlen() expects a copy. copy-on-write can't be done on references so $data will be copied for calling strlen(), strlen() will do an absolutely simple operation - in fact strlen() is one of the most trivial functions in PHP - and the copy will be destroyed immediately.
If no reference is being used no copy is needed which makes the code way faster and even if strlen() would take the reference you wouldn't have won anything.
Summary so far:
- Do not use references for OO but get ridof PHP 4 legacy.
- Do not use references for performance.
Now a third thing which is done with references is bad API design by returning via reference parameters. The issue here is, again, that people forget that PHP is PHP and not another language.
In PHP you can return multiple types from the same function - so if the function was successful you could return a string and a boolean false in case of an error. PHP also allows to return complex structures like arrays and objects, so if multiple things are to be returned they can be packed together. Additionally there are exceptions as a way to return from a function.
Using referenced parameters is a bad thing, additionally to the fact that references are bad and cause performance penalties using references in this way makes code hard to maintain. Having such a function call:
do_something($var);
Would you expect that $var will change? - No. But if do_something() takes it as a reference it could happen.
Another problem with such APIs is that function calls can't be nested but you always have to use a temporary variable, now nesting function calls can also reduce readability, but there are enough situations where nesting makes the code clearer.
My personal favorite example for a bad design decision in regards to references is PHP's own sort() function. sort() takes an array as reference parameter which will be returned in sorted order by reference. It would be way nicer to return the sorted array as regular return value. The reason for this is history: sort() is older than copy-on-write. Copy-on write was introduced with PHP 4, while sort() is way older and from times before PHP really was its own language but a shortcut to do some things in the Web.
To sum it up: References in PHP are bad. Do not use them. They hurt and will just mess with things and do not expect to be able to outsmart the engine with references!
Sunday, January 10. 2010 at 16:28 (Link) (Reply)
[geshi lang=php]
$aVeryLongNameToExplain=new Foo();
$aVE=&$aVeryLongNameToExplain;
$aVE->bar(35);
$aVE->bat("x");
[/geshi]
If you don't know how refernces behave, because they are "bad", you're missing something.
Sunday, January 10. 2010 at 16:35 (Link) (Reply)
Sunday, January 10. 2010 at 16:43 (Link) (Reply)
Monday, January 11. 2010 at 00:15 (Link) (Reply)
Monday, January 11. 2010 at 05:43 (Link) (Reply)
Of course, with 5.3 we have a native linked list data type, so it will reasonably quickly become a moot point for this PEAR class - thankfully.
Monday, January 11. 2010 at 09:44 (Link) (Reply)
And yes, I'm aware of the fact that libraries have to keep compatibility ...
Monday, January 11. 2010 at 11:13 (Link) (Reply)
Monday, January 11. 2010 at 12:10 (Reply)
$a = array(1, 2, 3, 4);
foreach ($a as &$b) {
$b *= $b;
}
print_r($a);
Monday, January 11. 2010 at 14:02 (Link) (Reply)
$a = array(1, 2, 3, 4);
$b = array_map(
function($v) { return $v*$v; },
$a);
print_r($b);
The foreach with references has too many side-effects as in
$ii = array(1, 2, 3);
foreach ($ii as &$i) echo $i;
foreach ($ii as $i) echo $i;
Which prints 123122, not 123123 as the reference keeps alive after the loop finished.
Tuesday, January 12. 2010 at 01:59 (Link) (Reply)
For this to work:
$registry->someArray['foo'] = 'bar';
Registry::__get() needs to return a reference.
Tuesday, January 12. 2010 at 02:52 (Link) (Reply)
Tuesday, January 12. 2010 at 13:56 (Reply)
- You pass parameter by reference only when you are expect parameter is getting changed inside function. If your requirement just to read parameter, you should never pass parameter by reference. So in case writes inside function its always good to use references instead making new copy especially when your object/dataStructure is huge.
- In given example
function &foo(&$param) {
$param = 42;
return $param;
}
why would you return param? that defeats the whole purpose of passing parameter by reference, that's wrong.
- i agree that performance would be slightly better if you pass by value, as php does copy on write. This i think is very negligible, as your reference will be passed by value instead of passing data directly by value.
Using reference is good if you know when to use it.
Tuesday, January 12. 2010 at 22:16 (Link) (Reply)
Wednesday, January 13. 2010 at 18:31 (Reply)
I have a class member which is a deep multidimensional array and I want to process one part of this array. Is it safe to use a reference to that part of the array within a class function?
Example:
class foo {
public $array = array(
'one' => $array('one' => array(...)),
'two'=> $array('one' => array(...)),
'three'=> $array('one' => array(...))
)
public function process() {
$ref = $this->array['one']['one']['two']...
return $ref['one'] > $ref['two'] && $ref['one'] > $ref['three'];
}
}
Also what if instead of return I used array_walk to echo the elements? Am I safe?
And just to make sure I do understand right, I am definitely not safe if I where to use a function such as strlen to echo the length of each member... Since strlen uses copy and not reference... right?
And finally what about array_map on that reference to change the values. Is that safe?
I know many question... Really thanks though, this post has been very helpful...
Thursday, January 14. 2010 at 22:01 (Link) (Reply)
Greetz
twitter.com/Jon_G
Thursday, January 21. 2010 at 00:54 (Reply)
I got one object ,,let's say ,,$panel"". And I got panel elements in an array " $panel->_ELEMENTS " . Each of them is an object Panel_Element. Now, i need to allow my Panel_Element objects to work on my global panel's methods. Like, I need to reverse it. Now i'm doing this by sending at element creation:
$panel->_ELEMENTS[$iterator]->panel = &$this;
And it's working. I even did things for test like, after that command i did :$this->foo = 1;
And then checked in element if foo is setted. All was purely perfect.
Now, how could I replace it so I could work on my main object?
Wednesday, February 10. 2010 at 16:46 (Reply)
which functions are still a reference killer.
i think strlen is not the only one right?
Wednesday, February 17. 2010 at 05:38 (Reply)
Thursday, February 18. 2010 at 20:19 (Link) (Reply)
I reread Zend php5 architects book and found this:
The use of by-reference variables is a sometimes-useful, but always very risky technique,
because PHP variables tend to stay in scope for a long time, even within a
single function. Additionally, unlike what happens in many other languages, byreference activity is often slower than its by-value counterpart, because PHP uses a clever “deferred-copy†mechanism that actually optimizes by-value assignments.
Saturday, March 13. 2010 at 00:42 (Link) (Reply)
Take for example my BBCode-Parser: http://site.svn.dasprids.de/trunk/application/library/App/BBCode/Parser.php
I'm creating a token tree in there (using arrays, not objects, as they are known to be faster and less expensive), and I always need to be able to point to the current stack level element. For this, references are ideal, if used wisely.
Friday, October 15. 2010 at 10:04 (Reply)
So far, all I got is a pretty fat “DON'T! YOU! EVER!”
Thursday, October 6. 2011 at 19:46 (Reply)
Are you considering "memory usage" in the performances ?
If I have a ~2 mb array, and have a function that return a chunk of that array, shouldn't I return that chunk by reference to avoid duplicating lots of data in memory ?
Friday, October 7. 2011 at 23:00 (Link) (Reply)
Friday, October 7. 2011 at 23:37 (Reply)
exemple:
function foo($arr){
return $arr['a'];
}
$arr = array('a' =>1, 'b' => 2);
$chunk = foo($arr);
In the memory, I'll store twice the value '1' ?
Friday, October 7. 2011 at 23:45 (Link) (Reply)