Jan 10: Do not use PHP references

Last year I spoke at eight conferences and attended a few more multiple times at most of them I found myself in discussions about references and PHP as many users seem to have wrong understandings about them. Before going to deep into the subject let's start with a quick reminder what references are and clear some confusion about objects which are "passed by reference."
References are a way to have multiple variables referencing the same variable container using different names -- so whatever name you're using an operation on that variable will always have an effect on the others.
Let's look into it with some code to make this all clearer. For a start we simply do a regular assignment from one variable to the other and change it:
<?php $a = 23; $b = $a; $b = 42; var_dump($a); var_dump($b); ?>
This script will tell us that $a still is 23 and $b equals 42. So what happened here is that we got a copy (more on what actually happened later...) now let's do the same with a reference:
<?php $a = 23; $b = &$a; $b = 42; var_dump($a); var_dump($b); ?>
Now suddenly $a changes to 42, too. In fact there is no difference between $a and $b and both are using the same internal variable container (aka. zval). The only way to separate these two is by invalidating one of the variables using unset().
References in PHP can't only be created in regular assignments but also for function parameters or return values:
<?php function &foo(&$param) { $param = 42; return $param; } $a = 23; echo "\$a before calling foo(): $a\n"; $b = foo($a); echo "\$a after the call to foo(): $a\n"; $b = 23; echo "\$a after touching the returned variable: $a\n"; ?>
The result from this is, well what do you expect? Right - it looks like this:
$a before calling foo(): 23 $a after the call to foo(): 42 $a after touching the returned variable: 42
So we initialize a variable, pass it to a function as referenced parameter. The function changes it and it has the new value. The function returns the same variable, we change the returned variable and the original value ... wait it didn't change!? - Yes references are mean. What happened is the following: The function returned a reference, referencing the same zval as $a and the = assignment operator creates a copy of it.
To fix this we have to add one & more:
$b = &foo($a);
Then the result is what one would expect:
$a before calling foo(): 23 $a after the call to foo(): 42 $a after touching the returned value: 23
Summary so far: PHP references are alias to the same variable and properly using them can be hard. For details on the reference counting, which is the base for this, check the according section in the manual.
When PHP 5 came to live one of the big changes was how objects were handled. The general explanation is something like this:
In PHP 4 objects are treated like other variables so when using them as function parameters or doing assignments they are copied. In PHP 5 they are always passed by reference.
Which isn't entirely correct. The issue to solve was about object oriented patterns: Objects are passed as parameters to some function or method, this function sends a signal to the object (aka calls a method) which then might change the object's state (aka. its properties). For this to work the object has to be the same. PHP 4 OO users now always passed explicit references, which is, as we saw above, tricky to do correctly. To make this nicer in PHP 5 an object storage which is independent from the variable container was introduced. So inside the variable we don't store the whole object anymore (which basically means the properties table plus class information) but a reference to an object inside an object storage - so if we create a copy of the variable we don't copy the object but this reference (or: handle) so it feels like an reference, but be aware it is no reference but a different concept. The difference can be seen by directly changing the variable:
<?php // create an object and a copy as well as a reference to the variable $a = new stdclass; $b = $a; $c = &$a; // Do something with the object $a->foo = 42; var_dump($a->foo); var_dump($b->foo); var_dump($c->foo); // Now change the variable itself $a = 42; var_dump($a); var_dump($b); var_dump($c); ?>
When running this you can see that the access to the property really affects the copy, too but in the last assignment you can see the difference to an reference as $b is not affected by it. This is the behavior most (all?) people with OO experience expect.
So OO was one valid reason for using references, but as PHP 4 is dead for over one year now old code using this should really cleaned up!
Another reason people use reference is since they think it makes the code faster. But this is wrong. It is even worse: References mostly make the code slower!
Yes, references often make the code slower - Sorry, I just had to repeat this to make it clear.
When coming from other languages from other languages people read in style guides that passing copies of large structures or strings should be avoided as creating a copy takes time. In some environments complex structures have to be passed as pointers, which is a fundamentally different model from references, and people take this to PHP references. But PHP is not that other language but PHP with PHP's runtime and in PHP we do copy-on-write.
With copy-on-write we don't copy on an assignment or function call but just note that there are multiple independent variables pointing at one and the same variable container and only if there is a write operation we separate the variable, which is written to, from the others. This means that even so a variable looks like a copy it's in fact no copy and the function call takes no penalty do to big parameters. The problem with references now is that they disable the copy-on-write mechanism so any following non-reference assignment using this variable will create an immediate copy. This in itself won't be bad - you could simply use references everywhere, well not really: PHP is built around the copy-on-write availability so most internal functions expect copies.
Somewhere I found code which something looks like this:
<?php function foo(&$data) { for ($i = 0; $i < strlen($data); $i++) { do_something($data{$i}); } } $string = "... looooong string with lots of data ....."; foo(string); ?>
Now the first issue with this code is obvious: It is calling strlen() in a loop for each iteration while the length is calculated. So that's strlen($data) function calls while a single one would be enough. Now with strlen() it won't be too bad as, unlike in a language like C, strings in PHP directly carry the length so no calculation is needed, in general. But now in this case the developer tried to be smart and save time by passing a reference. But well, strlen() expects a copy. copy-on-write can't be done on references so $data will be copied for calling strlen(), strlen() will do an absolutely simple operation - in fact strlen() is one of the most trivial functions in PHP - and the copy will be destroyed immediately.
If no reference is being used no copy is needed which makes the code way faster and even if strlen() would take the reference you wouldn't have won anything.
Summary so far:
- Do not use references for OO but get ridof PHP 4 legacy.
- Do not use references for performance.
Now a third thing which is done with references is bad API design by returning via reference parameters. The issue here is, again, that people forget that PHP is PHP and not another language.
In PHP you can return multiple types from the same function - so if the function was successful you could return a string and a boolean false in case of an error. PHP also allows to return complex structures like arrays and objects, so if multiple things are to be returned they can be packed together. Additionally there are exceptions as a way to return from a function.
Using referenced parameters is a bad thing, additionally to the fact that references are bad and cause performance penalties using references in this way makes code hard to maintain. Having such a function call:
do_something($var);
Would you expect that $var will change? - No. But if do_something() takes it as a reference it could happen.
Another problem with such APIs is that function calls can't be nested but you always have to use a temporary variable, now nesting function calls can also reduce readability, but there are enough situations where nesting makes the code clearer.
My personal favorite example for a bad design decision in regards to references is PHP's own sort() function. sort() takes an array as reference parameter which will be returned in sorted order by reference. It would be way nicer to return the sorted array as regular return value. The reason for this is history: sort() is older than copy-on-write. Copy-on write was introduced with PHP 4, while sort() is way older and from times before PHP really was its own language but a shortcut to do some things in the Web.
To sum it up: References in PHP are bad. Do not use them. They hurt and will just mess with things and do not expect to be able to outsmart the engine with references!
Oct 21: PHP 5.3.1RC2

Just a quick heads-up: After quite some time from RC1 PHP 5.3.1RC2 has finally been packaged and released. The PHP bug tracker is welcoming reports about issues, I also welcome positive feedback.
Downloads:
- Source:
http://downloads.php.net/johannes/php-5.3.1RC2.tar.bz2
http://downloads.php.net/johannes/php-5.3.1RC2.tar.gz - Windows binaries:
http://windows.php.net/qa/
(This release candidate is not meant to be used in production systems, wait for the final release for that but please test this version)
Jun 16: PHP BBQ Munich

Yesterday we held our PHP BBQ event at Munich, well it was no BBQ as the weather forecasts predicted rain,which came in the evening, but a nice evening in a beer garden.
We had more than 30 people there, some leaving early, sme arriving late, covering quite different kinds of participants: PHP core developers, professional PHP users, people doing PHP stuff as hobby, friends and PHP community veterans like Till Gerken. Many people didn't know each other or didn't see each other or some time so we had lot's of discussions, and most of them even weren't about PHP and even many non-IT things were covered, which I find always great. If you want an impression check Ulf's photos. I really hope this makes a good foundation for more regular PHP meetups.
There will be a few more events of this kind this week in Germany, so go there if you can, don't be shy and have fun.
PHP BBQ dates:
- Tuesday, 16.06.2009: Frankfurt
- Wednesday, 17.06.2009: Karlsruhe
- Thursday, 18.06.2009: Berlin
- Friday, 19.06.2009: Dortmund
- Saturday, 20.06.2009: Hamburg
- Sunday, 21.06.2009: Kiel
May 7: PHP 5.3.0 RC 2 released

5.3 is rather big release including support for namespaces, closures, phar archives, internatioalization support via the new intl extension, improved SQLite support, mysqlnd as backend for the MySQL exensions, impressive performance improvements, ... and tons of other bigger and minor things.
Even though this server is running 5.3 already it's not suggested to be used in production evironments, yet but I'd really like to encourage
everybody to test it and give feedback! I'm also interested in positive
feedback, not only bug reports to support my good feeling about this release
Nov 18: SQL completion in PHP strings

NetBeans 6.5 is soon to be released. After 10 years of NetBeans that's the first version of Sun's OpenSource IDE featuring PHP support. While 6.5 is waiting to be packaged the development didn't stop and the first features for the successor, NetBeans.next, are already being developed. David Van Couvering just showed a preview of a cool new feature: SQL completion in PHP strings, if it does what the screenshot promises that's a damn great addition in my opinion....
Nov 3: Direct MySQL Stream Access


Ever wondered what your PHP application and MySQL actually do? An experimental mysqlnd branch will give you full access to the network communication stream. Using a custom PHP stream filter you can then intercept the communication ... but let's start at the beginning:
When talking about mysqlnd - the mysql native driver for PHP - we always mention the fact it's native in a way that we're, when possible, using PHP infrastructure. The most common example here is the memory management. By directly using PHP's memory we can avoid unnecessary copies of data from the MySQL Client Library's memory into PHP memory.
<?php
$mysqli = mysqli_connect("localhost", "root", "", "test");
$stream = mysqli_conn_to_stream($mysqli);
stream_filter_append($stream, "mysql.server", STREAM_FILTER_READ);
?>
But there's more what we're doing. We're also using PHP's stream abstraction layer. From a development perspective the benefit is that we're using a tested abstraction from different stream implementations by different operating systems instead of writing our own. But, again, there's more to it: We can export the communication stream to PHP userland. We hesitated about exporting it for some time as it can be quite dangerous and you might easily corrupt the client-server- communication.
As Ulf mentioned during his IPC talk I recently pushed a mysqlnd branch to launchpad which adds a userspace function to mysqli which returns a PHP stream for a connection. Using that stream you can now send your own requests to the server and wait for the response. That might be nice in a way, but I guess you most likely won't have use for that. PHP streams allow you to do more: PHP streams give you the possibility to add filters to a stream. These filters allow you to intercept packages which are sent or received , read them, change them or do whatever you like. A very simple filter can be found on the launchpad site, mentioned above. That filter simply prints the information after replacing unprintable (binary) characters by dots.
Once again: Just a small step, the next one is to decode the MySQL protocol. For that I've written a simple decoder for the MySQL protocol, not complete, but enough to give an idea. The script, including the decoder and some sample code using it, is, as a sample, part of the branch. When running you will get some output like
Query:
-> 0 59: QUERY: SELECT TABLE_SCHEMA FROM INFORMATION_SCHEMA.TABLES LIMIT 1
<- 1 1: DATA
<- 2 52: FIELD INFO
<- 3 5: EOF
<- 4 19: DATA
<- 5 5: EOF
Invalid Query:
-> 0 29: QUERY: ghdfjtgfdrs tztr ttgdszthtdr
<- 1 183: ERROR: 2000You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ghdfjtgfdrs tztr ttgdszthtdr' at line 1
Prepare:
-> 0 61: PREPARE: SELECT TABLE_NAME FROM INFORMATION_SCHEMA.STATISTICS LIMIT 2
<- 1 12: OK
<- 2 52: FIELD INFO
<- 3 5: EOF
Ping:
-> 0 1: PING
<- 1 7: OK
Execute:
-> 0 11: EXECUTE
<- 1 1: DATA
<- 2 52: FIELD INFO
<- 3 5: EOF
<- 4 15: OK
<- 5 15: OK
<- 6 5: EOF
EOF
-> 0 5: CLOSE_STMT
-> 0 1: QUIT
As one can see: The protocol isn't fully decoded yet so this all might be extended but for me it served the purpose well enough. For making real use out of this we're thinking about exporting the protocol decoder which exists within mysqlnd to PHP userland.
What are your ideas for such a feature? - Sending different queries to different servers? Rewriting queries? Sharding? Replication? Easy scaling of your application while refactoring your application? Let us know!
Oct 27: Namespaces, decisions, wasting time

Thank you!
Oct 24: International PHP Conference 2008

On Thursday morning I'll give a presentation about PHP 5.3, which will be quite interesting as one of the biggest features, namespaces, is still undergoing heavy discussions and the final syntax probably won't be clear when presenting - fortunately PHP 5.3 is much more than namespaces!
Sun will also be present at the conference, so if you're looking for an open source PHP IDE you might talk to Petr and Wen about the upcoming NetBeans 6.5 release which will feature PHP support, if you're running a startup company you might talk to Stefan Schneider who will represent the Startup Essential Program which has interesting discounts on Sun products.
Unfortunately I won't be able to attend Brian Aker's keynote about Drizzle or Ulf's session about new and hot stuff in mysqlnd, one of the new feature in PHP 5.3.
Oct 12: DTraceing around

Over the past few weeks I annoyed my environment with praising DTrace whenever possible. Yesterday, during a break at the Barcamp Munich, I gave Wolfram a short introduction on his Mac and decided to put some stuff here:
DTrace is a toolkit available on Solaris (Solaris 10 or OpenSolaris), recent MacOS versions and FreeBSD for mightier than tools like truss or strace but with way less impact. DTrace allows you to "hook" (called "probes") into the system and allows to do some analysis then.
I guess all that works best by showing an example first: PHP uses a wrapper over the system's memory allocation using a function called _emalloc (which is wrapped by a CPP macro called emalloc) so it might be interesting to see how often that function is being called. For doing that we can use a D-script (D being the DTrace scripting language, not DigitalMars's D) like that:
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry { printf("_emalloc was called!\n"); }
We can now simply call that script and tell DTrace to start a PHP interpreter and run a PHP script. DTrace will then change the running program in memory so that the message is printed whenever the system for the process, with the PID $target, enters the function _emalloc. $target is a special variable referring to a process started by DTrace using -c or a PID provided using -p.
$ ./script1.d -c "php script.php" _emalloc was called! _emalloc was called! _emalloc was called! ...
That's nice but not really useful in any way, yet. As we'd like to at least know the size of the allocated memory area, which is the first parameter to _emalloc. The pid-provider helps us by providing the parameters to the functions as D-variables, so we can simply change our action to print that variable:
printf("_emalloc called, allocating %i bytes\n", arg0);
running the script now gives us the sizes:
./script2.d -c "php script.php" _emalloc was called, allocating 5 bytes _emalloc was called, allocating 6 bytes _emalloc was called, allocating 5 bytes ...
The output is quite long and still rather useless, for making use from this information we at least need some aggregation, but DTrace helps there, too, so let's create an aggregation variable collecting the data in a usable way:
mallocsize and emalloc are there freely chosen identifiers. Depending on your script the output now looks something like the following:#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry { @mallocsize["emalloc"] = quantize(arg0); }
emalloc value ------------- Distribution ------------- count 0 | 0 1 | 83 2 |@@ 1122 4 |@@@@@@@@ 5141 8 |@@@@@@ 4032 16 |@@@@@@@@@@@@@@@@@@ 11881 32 |@@@@@@ 3694 64 |@ 806 128 | 27 256 | 66 512 | 1 1024 | 1 2048 | 1 4096 | 1 8192 | 4 16384 | 0 32768 | 1 65536 | 0 131072 | 1 262144 | 0
This tells us that the most used allocation size is between 9 and 16 bytes and the largest space allocated is somewhere between 65536 and 131072 bytes.
For a deeper analysis we can now add a predicate to our probe so the action triggers only for that allocation. Such predicates are writing between slashes between the probe name and the action. Additionally I'm adding a ustack() call to the action, this will print the systems userspace backtrace -- which is C level, not PHP space.
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry / arg0 > 131072 / { printf("_emalloc(%i)\n", arg0); ustack(); }
$ ./script4.d -c "php script.php" emalloc(261900) php`_emalloc php`zend_vm_stack_new_page+0x19 php`zend_vm_stack_init+0xf php`init_executor+0xf5 php`zend_activate+0x12 php`php_request_startup+0x7a php`main+0xd86 php`_start+0x7d
So we see we're in the startup of PHP allocating some space on it's stack. One question now might be about the costs of an _emalloc call, one important factor there are syscalls to the operating system. As DTrace is made for utilizing the whole system that can be done quite easy using the syscall provider. Me might now use syscall:::entry as probe to be triggered on every call, but that will be quite a lot. As we're only interested in syscalls from _emalloc we'll use a thread-local variable as a flag and check that flag in the predicate condition:
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry / arg0 > 131072 / { self->inemalloc = 1; } pid$target::_emalloc:return / arg0 > 131072 / { self->inemalloc = 0; } syscall:::entry / self->inemalloc / { printf("%s", probefunc); }
$ ./script4.d -c "php script.php" brk brk
So we're calling brk two times. brk is the syscall to "change the amount of space allocated for the calling process's data segment" which is exactly what we expect, but why is it called two times? Adding a ustack call to the syscall action can tell us where it happens, using the source this can then probably be optimized. That's left as an exercise to the interested reader.
In summary: No need to change the code and lots of information, I plan to write an additional article showing how to get interesting facts system-wide, not only for a specific process but all running ones, which is especially interesting when searching for a problem on production systems (DTrace is made to be used on productive systems!) or problems related to concurrent processes/threads.
Aug 2: PHP 5.3 reached its first major milestone

As most of you might have seen we recently announced the first alpha of PHP 5.3.0. The major changes are listed in the announcement. Source tarballs can be found on PHP's downloads site, as it's the first release I packaged I'm especially interested in feedback, whether it works or not . But we didn't only have changes on the source builds: A new team took over the creation of Windows builds. As they also did a major update to the Windows build architecture (supporting newer compiler versions on Windows and added experimental support for 64bit platforms) the build process got a bit delayed but builds should be available soon.
Remember: It's a release we packaged to get feedback from our users. So please test it. If you find issues please report them - the sooner we know about issues the sooner they can be addressed. Although our test coverage increased we can't cover all cases in our tests so please test it now and don't wait for the stable release, once the stable release is out your edge case might trigger a bug on a productive system, bugs reported now can be fixed before we release a final version.
If you want to give back to the PHP project - hey you probably make a living out of software you get for free - and can't fix bugs or assist in a similar way you probably could help our documentation team with improving the documentation, there are a few features without proper documentation, yet.
Mar 25: PHP 5.3: Up to 30% performance win

Dmitry posted results of performance test comparing PHP 5.2 and 5.3 to internals which are impressive numbers:
- Drupal 20% faster
- Qdig 2% faster
- typo3 30% faster
- wordpress 15% faster
- xoops 10% faster
Up to 30% performance win by simply updating PHP! Please help us with the quality assurance and test your applications using a 5.3 snapshot - we can't cover every usecase, you can!
And again as a sidenote: 5.2.6RC2 is out, please test that, too
Mar 18: PHP Unconference, Hamburg

Short notice: I plan to attend the PHP Unconference at Hamburg, April 26th/27th. I also plan to offer two topics there:
- An inroduction to .phpt testing as preparation for the to-be-announced TestFest at php.net and
- some ranting and Q&A session in the hope to destroy some FUD and myths about PHP.
Jan 7: Importing PHP into git

