Entries tagged as php
Related tags
.net c# coding security web 2.0 xss acquisation conferences james gosling java mysql sun microsystems ulf work ajax amber road brendand gregg dtrace hardware json php oo php.next solaris storage youtube anniversary birthday fun php extensions php shell readline apc file upload php session api design best practice guidlines oop performance php 4 php coding php references planet php array php qa university assembler boredom frustration stupidity barcamp debugging events macos opensolaris travel wolfram bazaar launchpad netbeans phpqa phpt bc improvements mysqlnd php53 beer garden munich php bbq berkeley db database db oracle sqlite sqlite3 berlin cdu concert csu dancehall data center dendemann ego entertainment gedanken gelbe säcke glos hiphop ipc ipc09 kinderzimmer productions live müll mülltrennung merkel montreal music nuremberg open air osdc osdc09 php conferences php quebec php quebec 09 politik rap recht und ordnung regierung seeed umwelt wirtschaft balcony blogging fire fire at home fire fighters home sweet home internet irc same proocedure trends tv tv series twitter video banshee exchange microsoft office outlook outlookfun vba charsets commits encoding christmas escher family games mysql storage engine pony puzzle weihnachten closures goto namespaces php releases php testfest php testing project managment BarCamp Bryan Cantrill comments cvs delphi development dsp DTrace english firefox gecko german git google gsoc gsoc08 gsoc09 ipc08 iterator javafx language computer science 101 hashtable php internals 23c3 blogger ccc froscon froscon08 froscon10 hamburg hausdurchsuchung ipc06 ipc07 ipc10 lawblog mysqlde mysqlnd plugins oscon osdc.il07 phpbarcelona parsecvs play scm vcs blackbox h0 märklin mucosug sun blackbox toy train gopal MacOS pecl releases scream pascal ms paint opensource packages paint php 5 php 5.4 php 6 processes testing unicode blog crossbow linux netbook phpbcat server solaris zones ubuntu virtualization easter gsoc2008 qa testfest exams exception raii st. augustinOct 1: Destructing PHP

Already last year I had a session at the fabulous FrOSCon conference about "Destructing PHP" No this wasn't a flaming/trolling talk, but an attempt to teach a bit about some less known language feature of PHP: Deterministic Destructors. To explain this let's take a look at a simple example:
<?php
require('autoload.php');
function foo($db, $data) {
$db->beginTransaction();
foreach ($data as $row) {
$db->insert($row);
}
$db->commit();
}
$db = db::factory();
$data = data::factory();
try {
foo($db, $data);
} catch (Exception $e) {}
$db->insert(data::finalData()); ?>
Even if the syntax is correct this program is incorrect: It fails at exception safety. If an exception is thrown the transaction state is leaked. An exception might for instance be thrown by the $db->insert() call or, if $data is an iterator, by the iteration.
According to the program the data::finalData() should be stored independently from whether foo() succeeds or not. Unfortunately this isn't the case: If something in the loop inside foo() throws there will be an open transaction. The final data now becomes part of that transaction. As there is no further handling PHP will clean-up at the end of this program and automatically rewind the transaction. So let's fix this. A typical solution looks something like this:
function foo($db, $data) {
$db->beginTransaction();
try {
foreach ($data as $row) {
$db->insert($row);
}
} catch(\Exception $e) {
$db->rollback();
throw $e;
}
$db->commit();
}
So we catch the exception and in the error case we rollback and re-throw the exception. Now this program is correct, but admit it, this is pretty verbose and ugly.
The form I would like to see is this:
function foo($db, $data) {
$transaction = new Transaction($db);
foreach ($data as $row) {
$db->insert($row);
}
$transaction->commit();
}
Now this is correct and exception safe, while being clean without noise. You might have to look closely to see the difference to the initial version - we simply introduced an transaction object. The reason this works is that PHP's memory management is based on reference counting. With reference counting (which I explained in more detail in this recorded talk) PHP keeps track how many variables refer to an object and when the last reference is gone the object will be cleaned up and the destructor is being called. PHP also is function scoped, which means that when a function ends, whether is might be due to the end of the function, a return statement or by an exception, the variables from that function will be cleaned up. In the code above we have one reference to the transaction object so at the end of the function this will be cleaned up. This is massively different to garbage collected languages like Java or Go where objects are cleaned up at, more or less, random times. In PHP this is deterministic. The only case where PHP fall back on garbage collection is a case where you have cyclic references. As long as you don't have cycles you can figure out exactly when a destructor will be called by reading the code - admittedly, if you pass around an object a lot and store it in multiple places this can be complicated.
Now let's take a look at the implementation of our Transaction class:
class Transaction { private $db; private $success = false; function __construct($db) { $this->db = $db; $db->begin(); } function commit() { $this->db->commit(); $this->success = true; } function __destruct() { if (!$this->success) { $this->db->rollback(); } } }
The key here is that we track the state. If the destructor is being called without an explicit commit before an rollback i enforced.
Now I have to admit: This pattern is no invention by me. It's a common pattern used in C++, one of the very few other functions with deterministic destructors. C++'s father Bjarne Stroustrup introduced the name RAII for this - Resource Allocation Is Initialisation. So whenever one acquires a resource, in our example a database transaction, one also initialises an object who's lifetime controls the resource's lifetime. The critical part is not to pass this object around without thought. Using this pattern needs some training initially, but once you are used to it is a very good way to write exception safe code in a clean way.
Now, for fun, in my talk I showed another trick which you can play with deterministic destructors: Ensure that a return value is actually being used. So let's assume you have a function which is very expensive and calculates a value and you want to ensure that nobody refractors the code and doesn't check the return value, thus
echo expensiveCalculation();
should work, while
$a = expensiveCalculation(); unset($a);
should throw an error. To achieve this our expensiveCalculation() function won't return the value directly but wrap it in an EnforceUsage object which might be defined like this:
class EnforceUsage {
private $value;
private $used = false;
function __construct($value) {
$this->value = $value;
}
function __toString() {
$this->used = true;
return (string)$this->value;
}
function __destruct() {
if (!$this->used) {
Logger::notice("Return value not used");
}
}
}
I admit - unlike the RAII pattern rom above - this is hardly useful in PHP, but shows the power we have at our hands.
For completeness here are the slides of the talk I mentioned in the beginning:
Sep 3: Types in PHP and MySQL


Since PHP 7.0 has been released there's more attention on scalar types. Keeping types for data from within your application is relatively simple. But when talking to external systems, like a database things aren't always as one eventually might initially expect.
For MySQL the type we see -- in the first approximation -- is defined by the network protocol. The MySQL network protocol by default converts all data into strings. So if we fetch an integer from the database and use PHP 7's typing feature we get an error:
<?php declare(strict_types=1); function getInteger() : int { $mysqli = new mysqli(...); return $mysqli->query("SELECT 1")->fetch_row()[0]; } var_dump(getInteger()); ?> Fatal error: Uncaught TypeError: Return value of getInteger() must be of the type integer, string returned in t.php:6
Of course the solution is easy: Either we cast ourselves or we disable the strict mode and PHP will cast for us.
Now let's take a look at another case. Assume we have an application where we fetch an integer ID from the database. We know MySQL will send us a string and we treat the ID as opaque data anyways so we have the type check for a string. Now we refactor the code slightly and make use of prepared statements. What will the result be?
<?php declare(strict_types=1); function getId() : string { $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 1"); $stmt->execute(); return $stmt->get_result()->fetch_row()[0]; } var_dump(getId()); ?> Fatal error: Uncaught TypeError: Return value of getId() must be of the type string, integer returned in t.php:8
Wait! - What's up there!? -- Didn't I just say that the MySQL protocol will always send a string, thus we retrieve a string in PHP!? - Yes I did and that's true for "direct queries." It's not true for results from prepared statements. With prepared statements the MySQL protocol uses a binary encoding of the data and therefore mysqlnd and mysqli will try to find the matching PHP type. This isn't always possible, especially if we're going into the range of big values. So let's query for PHP_INT_MAX and PHP_INT_MAX+1 and look at the types:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 9223372036854775807, 9223372036854775808"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> int(9223372036854775807) [1]=> string(19) "9223372036854775808" }
Here 9223372036854775807 is the largest value a PHP integer can represent and thus is an integer. 9223372036854775808 however is to large and can't fit in a signed 64bit integer thus it is converted in a string, as this keeps all information and can be handled at least to some degree.
Similar things happens to other types which can't be properly represented in PHP:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 1.23"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> string(4) "1.23" }
Yay - yet another wtf! So what is going on this time? -- Well, a literal in SQL is treated as DECIMAL. A DECIMAL field is supposed to be precise. If this were to be converted into a PHP float aka. double we probably would loose the precision, thus treating it as string again makes sure we're not loosing information. If we had a FLOAT or DOUBLE field this could safely be represented as float in PHP:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT RAND()"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> float(0.16519711461402206) }
So to summarize:
- For a direct query the MySQL server sends strings, PHP returns all data as string
- For prepared statements MySQL sends data in binary form and PHP will use a corresponding type
- If the value could only be represented with a potential data loss in PHP it is converted to a string by PHP, even with prepared statements
Now we might expect the same when using PDO. Let's check:
<?php $pdo = new PDO("mysql:host=localhost", "...", "..."); $stmt = $pdo->prepare("SELECT 9223372036854775808, RAND()"); $stmt->execute(); var_dump($stmt->fetch(PDO::FETCH_NUM)); ?> array(2) { [0]=> string(1) "1" [1]=> string(18) "0.3217373297752229" }
This example uses prepared statements, but returns strings!? The reason is that PDO by default doesn't use prepared statements on the network layer but an emulation within PHP. This means PHP will replace potential placeholders and then runs a direct query. As mentioned above with a direct query the MySQL server will send strings, thus PHP will represent all data as string. However we can easily ask PDO to disable the emulation:
<?php $pdo = new PDO("mysql:host=localhost", "...", "..."); $pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false); $stmt = $pdo->prepare("SELECT 1, RAND()"); $stmt->execute(); var_dump($stmt->fetch(PDO::FETCH_NUM)); ?> array(2) { [0]=> int(1) [1]=> float(0.24252333421495) }
This leaves the question whether you should disable the emulation in order to get the correct types. Doing this has some impact on performance characteristics: With native prepared statements there will be a client-server round-trip during the prepare and another round-trip for the execute. With emulation only during the execute. The native prepared statements also require some server resources to store the handle. However if a single statement is executed multiple times there might be some savings. Also the type representation means that different type conversions happen and a different amount of data is transfered. For most cases this shouldn't have notable impact, but in the end only a benchmark will tell.
Hope this helps to give a better understanding, or more confusion
Feb 23: More on references

In a few different places I saw comments about my last blog post about references and performance where commentators noted that my example was pointless. Which of course is true and to some degree the point.
I read a lot of PHP code and from time to time I see people with a non-PHP background (or otherwise influenced) putting references everywhere they pass arrays or such in order to prevent copies. I knew this was a bad practice in PHP 5 and wanted to verify this in PHP 7. For readers with a stronger PHP background this doesn't come to mind and so comments are like "what if I want to modify the data?" which might lead to something like this:
function modify(&$data) { $data["foo"] = "bar"; } $data = [ /* huuuuuge array */ ]; modify($data);
In this code, from a performance perspective, the reference likely works out and this is "fast." My primary critic in this would be that references aren't idiomatic in PHP. Therefore most people reading this code wouldn't expect that $data is being changed in this function call. Luckily the name of the function give this away, to some degree. The more idiomatic way might be along those lines:
function modify($data) { $data["foo"] = "bar"; return $data; } $data = [ /* huuuuuge array */ ]; $data = modify($data);
I consider this more readable and clearer, while it will create a (temporary) copy, leading to more CPU and peak memory load. Now we have to decide how much clarity we want to take out of the code as compromise for a performance gain. After that decision has been made and we decided to go for the approach with references we fix an issue or add a new feature to our code and we make a slight change and suddenly loose what we've gained before. Maybe we do something like this:
function modify(&$data) { if (!in_array("bar", $data)) { // A copy happens here $data["foo1"] = "bar"; } if (!in_array("baz", $data)) { // Yet another copy here $data["foo2"] = "baz"; } } $data = [ /* huuuuuge array */ ]; $data2 = $data; modify($data); // A copy happens here, to split $data and $data2
So the performance gain we once carefully produced fired massively back to us and we even got three copies. In this short case this quite obvious, but in an larger application context with real life changes tracking this is really hard.
If we had written this in the (in my opinion) more idiomatic way this would look like this:
function modify($data) { if (!in_array("bar", $data)) { $data["foo1"] = "bar"; // Maybe a copy here } if (!in_array("baz", $data)) { $data["foo2"] = "baz"; // Maybe copy here, but only if not copied above already } return $data; } $data = [ /* huuuuuge array */ ]; $data2 = $data; $data = modify($data);
So depending on the conditions we might end up with either no or at most one copy, compared to the three copies from above. Of course this example is constructed but the point is: If you use references for performance you have to be extremely careful and know exactly what you're doing and think about each coming modification.
Now let's take a step back and think a bit more about this code. Isn't there yet another way? - We have data and we have functions operating on them. Wasn't there another construct which we might use? - Yes, we could go object-oriented!
class DataManager { private $data; public function __construct() { $this->data = [ /* huuuuuge array */ ]; } public function modify() { if (!in_array("bar", $this->data)) { $this->data["foo1"] = "bar"; } if (!in_array("baz", $this->data)) { $this->data["foo2"] = "baz"; } } } $dm = new DataManager(); $dm2 = $dm; $dm->modify();
Suddenly we have a higher degree of abstraction, encapsulation and all those other OO benefits and no copy of the data at all. Ok, yes I cheated: I didn't remember the purpose of the $dm2 = $dm assignment any more. So maybe we need to clone there and create an explicit copy. (While then again - for the $data property we'd probably benefit from copy-on-write making even the cloning quite cheap)
In summary: Yes, when careful you can be slightly more performant in both CPU and memory usage, but in real life that gain is often lost again and eventually fires back in maintenance cost and performance loss.
Now aren't there cases where references might be a good thing? - The only reason I found in recent times (except from an extremely carefully crafted tree structure I've seen, for which I'd usually suggest an OO way) is around anonymous functions/closures. Taking this example:
$data = [ /* ... */ ]; $oldsum = 0; $doubled = array_map(function ($element) use (&$oldsum) { $oldsum += $element; return $element * 2 }, $data);
Again, the example in itself might be bad, but in such a context where we provide a closure as callback and want to keep some "trivial" state references are a way which is ok. If the state we want to keep becomes more complex than a counter it, however, might be worthwhile to think about using an object to keep it or find some other code structure.
Feb 18: References - Still bad in PHP 7

I'm known for telling "Don't use references" (also as video) as those cause different problems (i.e. with foreach) and hurt performance. The reason for the performance loss is that references disable copy-on-write while most places in PHP assume copy-on-write. Meanwhile we have PHP 7. In PHP 7 the internal variable handling changed a lot among other things the reference counting moved from the zval, the container representing a variable, to the actual element. So I decided to run a little test to verify my performance assumption was still valid.
In my test code I'm calling a function which calls strlen (one of the cheapest functions in PHP - PHP strings carry their length, so it simply returns that property) from a loop. Once the function takes a parameter by reference, once per value. Here's the code:
<?php function takeRef(&$string) { strlen($string); } function takeVal($string) { strlen($string); } function loopRef() { $start = microtime(true); for ($i = 0; $i < 50000000; ++$i) { $s = "hello world"; takeRef($s); } return microtime(true) - $start; } function loopVal() { $start = microtime(true); for ($i = 0; $i < 50000000; ++$i) { $s = "hello world"; takeVal($s); } return microtime(true) - $start; } $ref = $val = PHP_INT_MAX; for ($i = 0; $i < 10; ++$i) { $ref = min($ref, loopRef()); $val = min($val, loopVal()); } echo "Ref: $ref\nVal: $val\n"; ?>
If I run this in PHP 5, in order to have a baseline, I get this result:
Ref: 10.679290056229 Val: 9.5635061264038
So using a reference costs 10%.
Now let's try PHP 7:
Ref: 10.631688117981 Val: 9.0047070980072
Overall we saw a small performance improvement, like we expect with PHP 7 but still using a reference costs 10% throughput. So I still stand with my mantra: Don't use references in PHP!
If you wonder about the second loop in the bottom and the min() call: The code takes multiple samples and then takes the measurement with the least noise as my system isn't 100% idle and there might be unrelated events I don't want to measure so the fastest run is closest to raw system performance.
Aug 14: PHP 5.3 - Thanks for all the Fish

A few moments ago I pushed the buttons and PHP 5.3.29 came out. As this is the final release for 5.3 it is a good time to look back. PHP 5.3's history starts somewhere in 2005. We knew what a pressure point of PHP was - a language made for solving The Web Problem needs a good Unicode story. So some developers went deep into that complex area and created a prototype version of PHP with Unicode support from deep within the engine. As this was a big and pressing issue and the need was obvious and the solution looked promising it was quickly areed on making that the base for a future PHP 6. And then time passed, initial enthusiasm passed and the sheer amount of work became obvious. Two years in we noticed that the ongoing PHP 6 work blocked other work - new features couldn't be added to 5.2, the current version at that time, and adding them to (at that time) CVS's HEAD.
For solving the blocking issue we decided to create an intermediate release, pacing in all the things piled up, so on 2007-09-26 we branched off a CVS branch PHP_5_3.
Branching of PHP 5.3 set lots of enthusiasm free, and people started creating features and going into heated debates about the direction we should take so I was happy when Lukas volunteered to assist in the release management as Co-RM, playing a big role in making PHP 5.3, one of the most feature rich PHP releases, a huge success which was declared stable two years after branching of on June 30th 2009!
In those two years of development, from branching of till releasing 5.3.0 stable, we saw 5,338 commits by 83 committers (also committing work by other contributors without direct commit access) seeing 10,125 files being changed, with 1,089,600 insertions and 270,921 deletions (including tests and and generated files like parsers etc.) PHP 5.3 introduced many things many PHP developers see as normal and can hardly remember not using - things like namespaces or anonymous functions. It also introduced goto, late static binding, nowdoc, ?:, exception linking fileinfo, intl, mysqlnd, ... while also being a massive boost in performance. A massive release.
While trying to release 5.3.0 we noticed issues in our process. Notable things were that we, for a long time, didn't have a fixed cut of date and couldn't offer a promise when the next release will come. As a consequence people tried hard to push features in, as they feared having to wait a few years for the net release. In consequence a stricter release process with yearly releases etc. was created. Which lead to PHP 5.4 and 5.5 being almost on time and the upcoming PHP 5.6 being well on track.
Now development of 5.3 didn't stop with 5.3.0 but saw 29 bugfix releases with 7,554 commits from 152 comitters (due to the move to git in between a single committer might be counted multiple times, on the other hand more "external" contributor's names are being kept) and seeing 4,862 files being changed, 376,187 insertions and 207314 deletions.
On the personal side being the release master of PHP 5.3 gave me the opportunity to travel between Moscow and California and teaching different audiences in multiple languages about the great work, which was done mostly by others. (Check the ChangeLog to see whom to thank for your favorite feature!)
But now it's time to close that chapter - as of now PHP 5.3 is not supported anymore and the different RM teams and contributors are making PHP even better than PHP 5.3 ever was, as we can see in existing and previews of future releases.
Thank You All, it was a great time!
Feb 24: On rumors of "PHP dropping MySQL"


Over the last few days different people asked me for comments about PHP dropping MySQL support. These questions confused me, but meanwhile I figured out where these rumors come from and what they mean.
The simple facts are: No, PHP is not dropping MySQL support and we, Oracle's MySQL team, continue working with the PHP community.
For the long story we first have to remember what "PHP's MySQL support" includes. There key part are four extensions which are part of the main PHP tree:
- ext/mysql
- ext/mysqli
- ext/pdo_mysql
- ext/mysqlnd
The first one, ext/mysql provides the mysql_* functions. This is the classic interface taught in many (old) books and used by lots of (old) software. mysqli is "mysql improved", this is a younger extension providing access to all MySQL features. pdo_mysql contains the driver for PDO, PHP's database API abstraction layer. mysqlnd is the MSQL native driver, this module goes mostly unseen and provides a common set of functionality for the three other modules in order to talk to the MySQL Server.
Additionally we maintain a bunch of PECL extensions plugging into mysqlnd, which I won't mention in detail, the PHP documentation has info on most.
Now that's a lot of code and we an look at what is actually happening:
The old mysql extension, ext/mysql, is old. The code goes back to the early days of PHP and MySQL and embraces some bad design decisions. For example if no explicit connection resource is passed all functions will try to use the last connection which was being used. So given a simple example like this:
<?php mysql_connect("mysql.eample.com", "user", "password"); mysql_select_db("test"); $result = mysql_query("DELETE FROM t WHERE id = 23"); ?>
This might do weird things. Let's assume the connect fails, as the error is not handled the script continues to run and will call mysql_select_db(). This won't directly fail but guess and try to connect to a server (most likely on localhost), if that fails the script still won't terminate but mysql_query() will again guess and try to connect. If all things come together this will suddenly work and the script will operate on a completely different database than expected which can have really bad consequences.
But that's not all. As said the code goes way back, it is grown with PHP and MySQL. It tries to be compatible with all versions of MySQL since at least 3.23 this all makes the code hard to maintain.
When PHP 5.0, which added mysqli, came along in 2004 it was decided that maintaining this is troublesome and that we won't add new features to that old extension but only to mysqli (as well as to pdo_mysql, which came along a bit later in PHP 5.1, as long as it makes sense) We also started to advertise these newer extensions over the old one.
So we lived on for a while, added features to mysqli, fixed a few bugs in mysql, normal operations. Over the time we noticed that people still use the old extension even for new projects and prevent them access from features (i.e. prepared statements or support for multiple result sets as needed for stored procedures etc.) but we also knew that we can't simply deprecate and remove the extension as it is way to commonly used. So in 2012 we started a "soft deprecation" process, which meant to add deprecation warnings to the documentation and suggesting alternatives using mysqli or PDO.
A bit later, with PHP 5.5 which was released in June 2013, it was decided to add such a deprecation notice to the code, so each time a script is connecting to a MySQL server using the mysql extension a deprecation notice would be triggered.
That's the state we are in and there is no date by which the old mysql extension will be removed from PHP. Atsome point in the future. Certainly not the upcoming PHP 5.6, though.
Why not? - Since we are aware of many projects with a long history who can't simply swap this out. One of these projects is Wordpress. And this brings us to the current discussion:
Wordpress is an old project, going back to the days of PHP 4 where there was only the old mysql extension and nothing else. Wordpress also doesn't live on its own but with tons of plugins extending all kinds of features. Some of these go equally long back, many need database access, so many make more or less direct use of ext/mysql. After quite some discussions and heated debate in different channels the Wordpress developers now decided to do the switch. As they are aware of the trouble this causes to the plugin environment they are carefully, though - they actually allow switching between both extensions, mysql and mysqli.
As always such major changes become heated and imprecise statements loose their context and thus wrong messages circulate. So nothing to worry about, while I'd like to encourage all users of the old mysql extension to follow Wordpress and other examples and do the switch.
I hope this helped to clear things up!
Oct 9: Sharding PHP with MySQL Fabric


PHP users who attended any of my recent PHP&MySQL related talks or read Ulf's blog will know our mysqlnd_ms plugin. This plugin hooks into PHP's mysqlnd library and provides transparent support for replication and load-balancing features. Without changing your application you get transparent load-balancing and read-writ splitting so all your reading queries will be sent to a slave while the writes go to a master server. The exact strategies for that can be defined in mysqlnd_ms's configuration so quite often no, or only few application changes are needed. But we have one limitation: The MySQL servers have to be configured in each configuration on all your PHP servers, this can be annoying when you're changing your environment like adding a new slave or promoting a machine to master in case the original master fails. But there's help coming!
At this year's MySQL Connect conference we've announced the initial labs release for MySQL Fabric. MySQL Fabric aims to be "an integrated environment for managing a farm of MySQL server supporting high-availability and sharding." Part of this framework is an RPC interface to query available servers which are managed by MySQL Fabric which delivers us the missing piece for mysqlnd_ms.
As this release of Fabric put the focus on sharding, this is what I want to show here, too. A general introduction to MySQL Fabric and its sharding features can be found on VN's blog so I'll be quite fast in some areas, for details please refer to the documentation and the mentiond blogs.
The first thing we need is the MySQL Utilities package with Fabric support which is available from labs.mysql.com. After installing this package you have to locate the main.cfg configuration file and configure the storage instance.
[storage] database = fabric user = fabric address = localhost:3300 connection_timeout = 6 password = i do no now
This is a MySQL server where Fabric will store its configuration and such. After that we can initialize Fabric and start the daemon.
$ mysqlfabric setup $ mysqlfabric start
The setup step creates tables in the configured database and the start starts the daemon process. Now we can o and configure our server groups. A server group contains a master server where the group's data is being written to and a number of slaves to which MySQL will replicate data. For our sample sharding setup I plan to create two shards and a global group. The purpose of the global group is to hold table definitions and data which Fabric will make available on all systems in our environment. Each of these groups will, in this example, have one master and one slave. This means we need six MySQL server instances running. These six instances should all be running MySQL 5.6. an except from having binary logging enabled and having different server ids there is no replication configuration needed before running these commands. In my example setup I'm running all of those on one machine, obviously that's only useful for tests:
$ mysqlfabric group create global $ mysqlfabric group add global 127.0.0.1:3301 root secret $ mysqlfabric group promote global $ mysqlfabric group add global 127.0.0.1:3302 root secret $ mysqlfabric group create shard1 $ mysqlfabric group add shard1 127.0.0.1:3303 root secret $ mysqlfabric group promote shard1 $ mysqlfabric group add shard1 127.0.0.1:3304 root secret $ mysqlfabric group create shard2 $ mysqlfabric group add shard2 127.0.0.1:3305 root secret $ mysqlfabric group promote shard2 $ mysqlfabric group add shard2 127.0.0.1:3306 root secret
So this creates the three groups and will configure the servers to replicate the servers as needed. With this setup the server on port 3301 will be the global master. 3302, 3303 and 3305 will e 3301's direct slaves and 304 will be configured to be a slave for 3303, as will 3306 to 3305.
Now we go to define our sharding rules. I'm going to use range based sharding with two shards. The first shard, which will be assigned to the server group shard1 created above will have shard id 1 to 9999 and the second shard, in group shard2 will have data for shard key values 10000+. We also define the table fabrictest in the test schema as our sharding tale and id as the shard column.
$ mysqlfabric sharding define RANGE global $ mysqlfabric sharding add_mapping 1 test.fabrictest id $ mysqlfabric sharding add_shard 1 shard1 ENABLED 1 $ mysqlfabric sharding add_shard 1 shard2 ENABLED 10000
Note that for range-based sharding we don't have to define the upper bound as that is defined by the lower bound of the next shard.
Now we have MySQL Fabric and our MySQL Servers configured and can go to PHP. As mentioned in the beginning we need mysqlnd_ms, to be precise the 1.6.0 alpha release which we can install using pecl:
$ sudo pecl install mysqlnd_ms-alpha
To configure PHP we firstly need a mysqlnd_ms configuration file. myslqnd_ms uses json and a simple confiuration using Fabric might look like this:
fabric.json:
{ "test" : { "fabric":{ "hosts": [ { "host": "localhost", "port": 8080 } ] } } }
This configures the application test to use a MySQL Fabric based setup where MySQL Fabric's RPC daemon runs on the local machine. Again: We put all on one machine for a test, not what one would do on a production setup.
Next we locate our system's php.ini file and enable mysqlnd_ms to use our config.
php.ini: extension=mysqlnd_ms.so mysqlnd_ms.enable=1 mysqlnd_ms.config_file=/path/to/fabric.json
And now we are finally done and run a test script.
<?php $c = new mysqli("test", "root", "", "test"); echo "Creating global table:\n"; mysqlnd_ms_fabric_select_global($c, "test.fabrictest"); var_dump($c->query("CREATE TABLE fabrictest (id INT NOT NULL)")); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; echo "Inserting with ID 10:\n"; mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10); var_dump($c->query("INSERT INTO fabrictest VALUES (10)")); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; echo "Trying to read id 10 from 10010:\n"; mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10010); $r = $c->query("SELECT * FROM fabrictest WHERE id = 10"); var_dump($r->fetch_row()); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; ?>
With this script we do a few things, first observation is the hostname test. The mysqlnd_ms plugin will recognize that as application name and will refer to its configuration. Second are the mysqlnd_ms_* functions. First we pick the global group and execute a CREATE TABLE operation there. mysqlnd_ms will detect that this is a write operation and therefore connect to the globals master. This should be 127.0.0.1:3301 which hopefully is printed by the echo call. Then we select the shard responsible for id 10 in the fabrictest table and insert data. mysqlnd_ms will, again, detect that this is a write operation and will therefore figure out where writes to that shard have to go to, which is 127.0.0.1:3303. Finally we do an operation which will not really succeed: We select the servers for shard 10010 which is shard2 from our setup and then query for id 10. the data we stored in shard1. This will query 127.0.0.1:3306 (slave of 3305 in shard2 group) which will return an empty result set.
I hope this is enough to get you started, you can now add shards or migrate them or take servers down and promote current slaves to masters etc. and see how the system reacts.
In a future post we will combine this with Doctrine stay tuned.
Note: This blog post features labs/alpha releases. Which aim at demonstrating functionality. They are not for production use. There might be stability issues, there certainly are performance restrictions we're working on. We'd like however to receive feedback.
Jun 20: PHP 5.5 is out, what's up with 5.4 and 5.3?

Yay, finally we released PHP 5.5, which is a new big release for PHP. In preparation for this I yesterday sent out a mail to the PHP core developers stating that the "PHP-5.3 BRANCH IS CLOSED NOW". After I saw this quoted on twitter and different websites I want to make a few things clear for users of PHP:
- The mail is an information for core developer that all changes for 5.3 should go by the release master and our security group
- We won't do normal bug fixes
- we will continue doing security fixes for a year where needed
What this means for users of PHP is that they can continue using PHP 5.3 and when upgrades come they are very low risk of breaking anything (we always try not to break anything, but any person's bug might be another person's feature) so they should be applicable easily and applied fast. So when you are a happy PHP 5.3 user and don't want to touch too many things there is no immediate need to upgrade to 5.4 or 5.5 - for a year.
So even when you don't have to migrate why should you migrate? - Besides the new features I see two major reasons:
- Newer versions of PHP are generally more performant and efficient than older versions, meaning your users get faster response and you need less hosting/cloud resources to run your system.
- You get all bug fixes
So when migrating where should you go to? PHP 5.4 or 5.5? - There the answer is, in my personal, quite easy: Go to 5.5! 5.5 will live longer than 5.4 so it is the more future save path, as we try hard to keep backwards compatibility migration should be fairly simple, etc. PHP 5.5 also mostly uses the same code as 5.4 with a few extra features. So for everything PHP 5.4 does PHP 5.5 does it as stable.
So go and fetch PHP 5.5, use for new projects and work on your migration from 5.3 but don't panic.
Apr 2: Making use of PHP mysqlnd statistics

One of the great things of mysqlnd as a base library for PHP's MySQL support are the collected statistics. mysqlnd collects about 160 different statistical values about all the things going on. When having such an amount of raw data it, obviously, is quite hard to draw conclusions out of it. Therefore I recently created a PHP library sitting on top of this feature to collect all data, run some analysis and then provide some guidance and made it available from the JSMysqlndAnalytics GitHub repo (see there also for instructions for using Composer).
Using the library is relatively simple as the short instructions show. The library consists of two main parts. On the one side the "Collector" this is a wrapper around mysqli_get_client_stats() (even though this function is part of mysqli it will also work for applications using ext/mysql or PDO_mysql) which is creating the raw statistics which could be stored away or such and then the actual Analytics engine comparing the values to predefined rules. The current set of rules is a bit limited so I'm looking for input for ideas.
In case you're a Symfony user live is quite easy: Some time ago I already provided an Symfony bundle providing a Symfony Profiling Toolbar plugin showing the statistics. This JSMysqlndBundle has been extended to make use of these Analytics. The screenshot might give a rough idea on how this looks.
Hope this helps creating better applications! Happy hacking!
Sep 30: MySQL, Memcache, PHP revised


Some time ago I was writing about the InnoDB Memcache Daemon plugin already for the MySQL server. Back then we had a labs release with a little preview only. Meanwhile quite some time passed and new developments were made - just in time for the MySQL 5.6 RC announced this weekend by Tomas.
The innodb_memcache daemon plugin is a plugin for the MySQL Server end contains an embedded memcached. This embedded memcached is configured to use MySQL's InnoDB engine as storage backend. By using this data stored inside an InnoDB table can be accessed using memcache's key-value protocol. Back in the times of the previous blog post this was limited to data from a single table, which maps easily to the key-value nature of memcache but is a clear limitation. The InnoDB obviously knows that and improvd it:
A user may now define multiple configurations at the same time and therefore access different tables at the same time - or the same table using different key-columns as memcache key, for accessing the data the memcache key names will then be prefixed by specially formatted configuration names.
Let's take a look at an simple example. Assume we have this configuration inside innodb_memcache.containers:
Name: prefix_test schema: test table: test key column: id key name: PRIMARY KEY value columns: lastname
We can then use the memcache configuration using a key like this:
set @@prefix_test.1 Schlüter get @@prefix_test.1
The first call will store my last name with id=1 in the test.test table. For accessing multiple configurtions we simply add entries to the containers list.
Of course we can still access miltiple columns, as in the previous version:
Name: test_first_last schema: test table: test key columns: id key name: PRIMARY value columns: firstname,lastname
And then we add my firstname:
set @@test_first_last.1 Johannes,Schlüter get @@test_first_last.1
The configurations above are, obviously, just a short introduction. For full information please check the documentation.
Now this blog entry is tagged a s PHP. Hs is that coming into play? - Well, on the one side we have this fast memcache interface, which allows to access almost arbitrarry data from the database. On the other side we have our PHP mysqlnd plugin interface where we can add special features, like query caching or load balancing, transparently to any PHP application. Why not combine those two things? - Good question. That's what we have done in the PECL mysqlnd_memcache PHP extension. This PHP extension is a plugin to mysqlnd intercepting queries sent to the server. In a quick analysis it tries to identfy whether an SQL statement can - transparently - be converted into memcache requests. We therefore exchange some computing power on the PHP server and gain more performance from the MySQL server. As SQL is a rather complex language and memcache is a quite limited key-value protocol this will only work for a limited subset of common queries though.
So let's take a look at some PHP code:
<?php $mysqli = new mysqli("localhost", "usr", "pass", "test"); $memcache = new memcached(); $memcache->addServer("localhost", 11211); mysqlnd_memcache_set($mysqli, $memcache); ?>
Here we ceate a MySQL connection using mysqli as well as a memcache connection using the pecl/memcached extension. Instead of mysqli we could, as with any mysqlnd plugin, use ext/mysql or the MySQL PDO driver. We then associate the MySQL connection with the memcache connection. As a consequence of this code the mysqlnd_memcache plugin will query the MySQL server for the current memcache configuration. Subsequently it will analyse SQL queries sent to the server:
<?php $q1 = $mysqli->query("SELECT firstname, lastname FROM test WHERE id = 1"); $q2 = $mysqli->query("SELECT * FROM test"); ?>
These are two normal queries and nothing special on first sight. In case there's no error $q1 and $q2 will hold mysqli_result instances where rows can be read using fetch_row() or in other provided ways. But there are things going on in the back: The PHP extension will see that the first one can be translated to a memcche request and then fetch the data using this shortcut. The second call tries to read all data from the table. The memcache protocol provides no way for doing that so this query will use the "classic" way of sending the SQL to the MySQL server.
In order to be fast and limit the overhead - mind: we have to check any query - we didn't add a full SQL parser to this plugin but the check is done using a regular expression which will be enriched using data collected from the MySQL Server. In case this reguar expression causes trouble it can be overriden when the inital association is established. There are a few other caveats in the initial 1.0.0-beta reease available from PECL therefore we'd love to hear from you to see what you need and how we can improve your MySQL experience!
Apr 4: Quick setup for PHP development trees

As PHP has moved to git recently everybody who works on the PHP source has to recreate his work environment. When working on PHP I have a few requirements for my working dirs. For one I want to be able to use different branches (like 5.3, 5.4 and master) at the same time and I want to quickly test different PHP configurations, like builds using thread-safety or debug mode on or off.
A simple approach for this is to use out-of-tree builds, something like that:
$ git clone ....php-src.git
$ (cd php-src && ./buildconf)
$ mkdir build-master-minimal
$ cd build-master-minimal
$ ../php-src/configure --disable-all
$ make
This allows having both requirements full-filled as you can have build dirs for each branch and each configuration. Nice, but in the long run quite confusing as you always have to make sure php-src has the correct branch checked out, matching the build dir you're currently building in, else you will create a mess.
Thankfully there is a nice solution to have multiple checkouts using git-new-workdir. So one can easily setup the branches and build dirs. Now it's still quite some repetitive work to create a structure using different branches and a set of different build dirs for each branch. Therefore I've created a simple shell script to do this quickly on my different machines and pushed the script to github in case anybody wants to have a similar structure, and maybe improve the script. But be warned: The script is really an ad hoc thing for me to get started.
Apr 2: Some videos

Over the years a few videos of my presentations and some interviews were published. I've collected the ones I found and put them on a single web page. The oldest is from 2009 the latests just from February this year. Enjoy.
If you have another video which I missed: Please let me know!
Mar 20: Testing persistent connection and thread-safety features in PHP

By default PHP provides shared-nothing environments to ensure that whatever happens to PHP's state in one request has no effect on other requests, so all function tables are cleaned up, all file handles are closed etc. In a few rare cases this is not what people like, for that PHP introduced "persistent connections" of different kinds. Testing those is a bit annoying as you have to configure a webserver and ensure to hit the same instance over the course of a test and then use a load generator, probably one which can detect a failure. Additionally by having a webserver in the game there is more code being executed, which might mean an additional source for trouble while debugging. An alternative might be using FastCGI, while that adds it's own issues for such a test.
To solve this for myself I, some time ago, wrote a PHP SAPI module called pconn and pushed it to github. (A SAPI is the component in PHP which implements the communication with the web server or whatever triggers PHP requests) The general idea was to have a lightweight SAPI which does nothing but emulate a bunch of requests. I had it some where on my list of things to blog about, but well, low prio.
Now some time later it seems like Derick was doing some stuff with persistent connections, too, and figured that the new embedded web server is a good thing for such tests, too. While he didn't know about my solution, as one could see in a short discussion on twitter we had:
In other news, the new CLI web server in PHP 5.4 is brilliant for debugging issues with extensions that span more than one request.
— Derick Rethans (@derickr) March 15, 2012
@derickr for that you could also use github.com/johannes/pconn… which also does multithreading
— Johannes Schlüter (@phperror) March 15, 2012
@phperror: You need to write about that stuff :-þ
— Derick Rethans (@derickr) March 15, 2012
Now I've contradicted myself: Above I was proudly writing about this being lightweight and easy to debug, but in the tweet I mentioned threading. And well threading always includes lots of trouble to code. But yeah, over time I figured out that this was a good foundation to solve a second issue which has has to be done for PHP: PHP can be run in threaded environments, which in general is not advised. When doing that the old party rule applies: What happens in a thread stays in thread. Different threads should not impact the requests handled in other threads. Now testing for race conditions is even harder than testing persistent connections and additional web server code hurts even more. So my little SAPI became a lot bigger and can now be compiled in two modes. Either simple and short in non threaded mode or with all the extra stuff in threaded-mode which will allow running PHP requests in parallel threads in loops.
In case you find yourself working on some PHP extensions where this might help: Check the github repository and the README and drop me a line if anything is unclear.
Jan 12: Upcoming talks


Over the last few weeks I had been quite silent, but that's about to change: Over the next few weeks I'll give a few presentations. Feel free to join any of those.
- January, 18th: Erstellung hochperformanter PHP-Anwendungen mit MySQL (German)
MySQL Webinar, Online - February, 9th: MySQL Konnectoren (German)
OTN Developer Day: MySQL, Frankfurt, Germany - February 24th/25th: PHP under the hood (English)
PHP UK Conference, London, UK
Nov 17: High Performance PHP Session Storage on Scale


One of the great things about the HTTP protocol, besides status code 418, is that it's stateless. A web server therefore is not required to store any information on the user or allocate resources for a user after the individual request is done. By that a single web server can handle many many many different users easily, and well if it can't anymore one can add a new server, put a simple load balancer in front and scale out. Each of those web servers then handles its requests without the need for communication which leads to linear scaling (assuming network provides enough bandwidth etc.).
Now the Web isn't used for serving static documents only anymore but we have all these fancy web apps. And those applications often have the need for a state. The most trivial information they need is the current user. HTTP is a great protocol and provides a way to do authentication which works well with its stateless nature - unfortunately this authentication is implemented badly in current clients. Ugly popups, no logout button, ... I don't have to tell more I think. For having nicer login systems people want web forms. Now the stateless nature of HTTP is a problem: The user may login and then browse around. On later requests it should still be known who that user is - with a custom HTML form based login alone this is not possible. A solution might be cookies. At least one might think so for a second. But setting a cookie "this is an authorized user" alone doesn't make sense as it could easily be faked. Better is to simply store a random identifier in a cookie and then keep a state information on the server. Then all session data is protected and only the user who knows this random identifier is authenticated. If this identifier is wisely chosen and hard to guess this works quite well. Luckily this is a mostly PHP- and MySQL-focused blog and as PHP is a system for building web applications this functionality is part of the core language: The PHP session module.
The session module, which was introduced in PHP 4, partly based on work on the famous phplib library, is quite a fascinating piece of code. It is open and extendable in so many directions but still so simple to use that everybody uses it, often newcomers learn about it on their first day in PHP land. Of course you can not only store the information whether the user is logged in but cache some user-specific data or keep the state on some transactions by the user, like multi-page forms or such.
In its default configuration session state will be stored on the web server's file system. Each session's data in its own file in serialized form. If the filesystem does some caching or one uses a ramdisk or something this can be quite efficient. But as we suddenly have a state on the web server we can't scale as easily as before anymore: If we add a new server and then route a user with an existing session to the new server all the session data won't be there. That is bad. This is often solved by a configuration of the load balancer to route all requests from the same user to the same web server. In some cases this works quite ok, but it is often seen that this might cause problems. Let's assume you want to take a machine down for maintenance. All sessions there will die. Or imagine there's a bunch of users who do complex and expensive tasks - then one of your servers will have a hard time, giving these users bad response times which feels like bad service, even though your other systems are mostly idle.
A nice solution for this would be to store the sessions in a central repository which can be accessed from all web servers.
Read More