Sep 27: Efficient Node.js Buffer usage
When building network libraries for Node.js, as we do at work, one quite quickly comes by Node's Buffer type. A Buffer gives access to a memory region in a quite raw form, allowing to handle raw data and allowing to interpret binary streams. The Buffer interface predates ES6 TypedArrays and has some optimizations.
Two optimisations are notable:
For one the slice() method does not copy data, but returns a view on the underlying data. This makes it quite efficient to work on a window of the data, but when writing one has to be careful. Simple example:
const buffer = Buffer.from("hello"); const slice = buffer.slice(1,2); slice[0] = 97; console.log(buffer.toString('utf8')); // will print 'hallo'
The second one is that allocating a small buffer won't actually go to the operating system and allocate a memory area, but Node.js has a memory region from which small buffers can be derived quickly. This actually works by using the slicing logic.
const buffer1 = Buffer.from("hello"); const buffer2 = Buffer.from("world"); console.log(buffer1.byteOffset); // 0 console.log(buffer2.byteOffset); // 8
This indicates that both buffers use the same memory region with an alignment of 8 bytes.
So how does this work? - Underlying the Buffer, in modern versions of Node is an ArrayBuffer. We can ask the Buffer to provide us with the underlying ArrayBuffer using the .buffer property. One thing one has to be careful about is that for a slice the returned ArrayBuffer will be the full buffer and not only a sliced part. Giving the two Buffers from above we can see this.
const buffer3 = Buffer.from(buffer2.buffer); console.log(buffer3.length); // 8192 const buffer4 = buffer3.slice(0, 5); console.log(buffer4.toString('utf8')); // hello
A raw ArrayBuffer doesn't provide many things, but we can create a new Buffer on top of it. This won't copy the data, but use the same memory as above. We can see the the pre-allocated block by Node (in the version I'm using for this test) apparantely is 8192 bytes. 8k is a common size used for multiple buffers. A factor in such a choice is that many filesystems use 512 byte blocks and 8k is a handleable multiple of it. Additionally CPU caches often are mutiples of 8k. So this is not a fully arbitrary choice.
Since most of the data most likely is rubbish we slice it and look at the beginning of the data and notably we're not seeing "world", but "hello" confirming that these are indeed using the same underlying buffer.
As said Buffer is a Node-specific type and different Node.js libraries can't handle it. One of thosel ibraries we use is the Google Protocol Buffer library. In simple terms protobuf is a format for serializing data for example for exchange over network. We can call it a typed and more efficient JSON alternative.
Protobuf's deserializeBinary() function now won't work with Buffer instaces, but requires a TypedArray, like Uint8Array. A Uint8Array is the ES6 counterpart to Node's Buffer. It is a layer on top of ArrayBuffer for accessing bytes.
The easy way to make it work is converting a Buffer to Uint8Array like this:
const buf = stream.read(length); const u8 = new Uint8Array(buf); const msg = Message.deserializeBinary(u8);
However this is inefficient as the Uint8Array constructor will copy the data. But as we learned, both Buffer and Uint8Array are just views on top of an ArrayBuffer. The Buffer gives us access to the ArrayBuffer and if we are careful about our slicing offsets we can ask the Uint8Array to re-use the same memory without copying:
const buf = stream.read(length); const u8 = new Uint8Array(buf.buffer, buf.byteOffset, buf.length); const msg = Message.deserialzeBinary(u8);
A tiny bit more to type, but a notable gain in performance, which we can verify with a trivial benchmark:
console.time("A"); for (let i = 0; i < 1000; ++i) { const b = new Buffer(10000); const u = new Uint8Array(b); } console.timeEnd("A"); console.time("B"); for (let i = 0; i < 1000; ++i) { const b = new Buffer(10000); const u = new Uint8Array(b.buffer, b.byteOffset, b.length); } console.timeEnd("B");
On my machine I see results like
A: 9.895ms B: 5.216ms
Difference in a real application of course will be different.
Oct 1: Destructing PHP

Already last year I had a session at the fabulous FrOSCon conference about "Destructing PHP" No this wasn't a flaming/trolling talk, but an attempt to teach a bit about some less known language feature of PHP: Deterministic Destructors. To explain this let's take a look at a simple example:
<?php
require('autoload.php');
function foo($db, $data) {
$db->beginTransaction();
foreach ($data as $row) {
$db->insert($row);
}
$db->commit();
}
$db = db::factory();
$data = data::factory();
try {
foo($db, $data);
} catch (Exception $e) {}
$db->insert(data::finalData()); ?>
Even if the syntax is correct this program is incorrect: It fails at exception safety. If an exception is thrown the transaction state is leaked. An exception might for instance be thrown by the $db->insert() call or, if $data is an iterator, by the iteration.
According to the program the data::finalData() should be stored independently from whether foo() succeeds or not. Unfortunately this isn't the case: If something in the loop inside foo() throws there will be an open transaction. The final data now becomes part of that transaction. As there is no further handling PHP will clean-up at the end of this program and automatically rewind the transaction. So let's fix this. A typical solution looks something like this:
function foo($db, $data) {
$db->beginTransaction();
try {
foreach ($data as $row) {
$db->insert($row);
}
} catch(\Exception $e) {
$db->rollback();
throw $e;
}
$db->commit();
}
So we catch the exception and in the error case we rollback and re-throw the exception. Now this program is correct, but admit it, this is pretty verbose and ugly.
The form I would like to see is this:
function foo($db, $data) {
$transaction = new Transaction($db);
foreach ($data as $row) {
$db->insert($row);
}
$transaction->commit();
}
Now this is correct and exception safe, while being clean without noise. You might have to look closely to see the difference to the initial version - we simply introduced an transaction object. The reason this works is that PHP's memory management is based on reference counting. With reference counting (which I explained in more detail in this recorded talk) PHP keeps track how many variables refer to an object and when the last reference is gone the object will be cleaned up and the destructor is being called. PHP also is function scoped, which means that when a function ends, whether is might be due to the end of the function, a return statement or by an exception, the variables from that function will be cleaned up. In the code above we have one reference to the transaction object so at the end of the function this will be cleaned up. This is massively different to garbage collected languages like Java or Go where objects are cleaned up at, more or less, random times. In PHP this is deterministic. The only case where PHP fall back on garbage collection is a case where you have cyclic references. As long as you don't have cycles you can figure out exactly when a destructor will be called by reading the code - admittedly, if you pass around an object a lot and store it in multiple places this can be complicated.
Now let's take a look at the implementation of our Transaction class:
class Transaction { private $db; private $success = false; function __construct($db) { $this->db = $db; $db->begin(); } function commit() { $this->db->commit(); $this->success = true; } function __destruct() { if (!$this->success) { $this->db->rollback(); } } }
The key here is that we track the state. If the destructor is being called without an explicit commit before an rollback i enforced.
Now I have to admit: This pattern is no invention by me. It's a common pattern used in C++, one of the very few other functions with deterministic destructors. C++'s father Bjarne Stroustrup introduced the name RAII for this - Resource Allocation Is Initialisation. So whenever one acquires a resource, in our example a database transaction, one also initialises an object who's lifetime controls the resource's lifetime. The critical part is not to pass this object around without thought. Using this pattern needs some training initially, but once you are used to it is a very good way to write exception safe code in a clean way.
Now, for fun, in my talk I showed another trick which you can play with deterministic destructors: Ensure that a return value is actually being used. So let's assume you have a function which is very expensive and calculates a value and you want to ensure that nobody refractors the code and doesn't check the return value, thus
echo expensiveCalculation();
should work, while
$a = expensiveCalculation(); unset($a);
should throw an error. To achieve this our expensiveCalculation() function won't return the value directly but wrap it in an EnforceUsage object which might be defined like this:
class EnforceUsage {
private $value;
private $used = false;
function __construct($value) {
$this->value = $value;
}
function __toString() {
$this->used = true;
return (string)$this->value;
}
function __destruct() {
if (!$this->used) {
Logger::notice("Return value not used");
}
}
}
I admit - unlike the RAII pattern rom above - this is hardly useful in PHP, but shows the power we have at our hands.
For completeness here are the slides of the talk I mentioned in the beginning:
Sep 15: MySQL 8.0: JSON Aggregation functions

In MySQL 5.7 the new JSON support had been added. The JSON support consists out of three features:
All three features combined allow building very powerful applications without committing to a fixed data structure on every part, but I was missing one thing a lot: Aggregation.
A topic I'm thinking a lot about is finding ways to select nested data. In MySQL we typically have two ways for this. Either one sends multiple queries to retrieve different nesting levels or one builds JOINs which tend to deliver repetitive responses. A tool we got for a while is GROUP_CONCAT. With GROUP_CONCAT we can get some of our nested information:
mysql> SELECT blog.*, GROUP_CONCAT(blog_comment.id) AS comments FROM blog JOIN blog_comment ON blog.id = blog_comment.blog_id GROUP BY blog_id; +----+-------------+----------------------+----------+ | id | title | body | comments | +----+-------------+----------------------+----------+ | 1 | Hello World | This is my new Blog! | 1,2 | | 2 | spam trap | spam goes here | 3 | +----+-------------+----------------------+----------+ 2 rows in set (0.07 sec)
This gives us the ids for all comments to a specific blog post. But it's not really nice to process. To improve this we can try to abuse GROUP_CONCAT like this:
mysql> SELECT blog.*, CONCAT('[', GROUP_CONCAT( JSON_OBJECT("author", blog_comment.author, "body", blog_comment.body)), ']') AS comments FROM blog JOIN blog_comment ON blog.id = blog_comment.blog_id GROUP BY blog_id; +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ | id | title | body | comments | +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ | 1 | Hello World | This is my new Blog! | [{"body": "Great blog!", "author": "Your Friend"},{"body": "Great new blog", "author": "Best Friend"}] | | 2 | spam trap | spam goes here | [{"body": "Buy these pills to enlarge your blog", "author": "Visitor"}] | +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec)
But this has multiple issues, not only is it kind of ugly and error-prone to build our JSON string manually using CONCAT() like this, but also GROUP_CONCAT uses an internal buffer with a default size of only 1024, which, while configurable, can easily be too little for our use case.
Now in the new development milestone release for MySQL 8.0 we can see two new aggregation functions: JSON_ARRAYAGG() and JSON_OBJECTAGG() which build proper JSON arrays or objects without any need for a hack:
mysql> SELECT blog.*, JSON_ARRAYAGG( JSON_OBJECT("author", blog_comment.author, "body", blog_comment.body) ) AS comments FROM blog JOIN blog_comment ON blog.id = blog_comment.blog_id GROUP BY blog_id; +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ | id | title | body | comments | +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ | 1 | Hello World | This is my new Blog! | [{"body": "Great blog!", "author": "Your Friend"},{"body": "Great new blog", "author": "Best Friend"}] | | 2 | spam trap | spam goes here | [{"body": "Buy these pills to enlarge your blog", "author": "Visitor"}] | +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec)
This is now quite nice and by using the proper aggregation function we can be sure that our result will always be valid JSON, which allows further easy processing.
Sep 3: Types in PHP and MySQL


Since PHP 7.0 has been released there's more attention on scalar types. Keeping types for data from within your application is relatively simple. But when talking to external systems, like a database things aren't always as one eventually might initially expect.
For MySQL the type we see -- in the first approximation -- is defined by the network protocol. The MySQL network protocol by default converts all data into strings. So if we fetch an integer from the database and use PHP 7's typing feature we get an error:
<?php declare(strict_types=1); function getInteger() : int { $mysqli = new mysqli(...); return $mysqli->query("SELECT 1")->fetch_row()[0]; } var_dump(getInteger()); ?> Fatal error: Uncaught TypeError: Return value of getInteger() must be of the type integer, string returned in t.php:6
Of course the solution is easy: Either we cast ourselves or we disable the strict mode and PHP will cast for us.
Now let's take a look at another case. Assume we have an application where we fetch an integer ID from the database. We know MySQL will send us a string and we treat the ID as opaque data anyways so we have the type check for a string. Now we refactor the code slightly and make use of prepared statements. What will the result be?
<?php declare(strict_types=1); function getId() : string { $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 1"); $stmt->execute(); return $stmt->get_result()->fetch_row()[0]; } var_dump(getId()); ?> Fatal error: Uncaught TypeError: Return value of getId() must be of the type string, integer returned in t.php:8
Wait! - What's up there!? -- Didn't I just say that the MySQL protocol will always send a string, thus we retrieve a string in PHP!? - Yes I did and that's true for "direct queries." It's not true for results from prepared statements. With prepared statements the MySQL protocol uses a binary encoding of the data and therefore mysqlnd and mysqli will try to find the matching PHP type. This isn't always possible, especially if we're going into the range of big values. So let's query for PHP_INT_MAX and PHP_INT_MAX+1 and look at the types:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 9223372036854775807, 9223372036854775808"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> int(9223372036854775807) [1]=> string(19) "9223372036854775808" }
Here 9223372036854775807 is the largest value a PHP integer can represent and thus is an integer. 9223372036854775808 however is to large and can't fit in a signed 64bit integer thus it is converted in a string, as this keeps all information and can be handled at least to some degree.
Similar things happens to other types which can't be properly represented in PHP:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 1.23"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> string(4) "1.23" }
Yay - yet another wtf! So what is going on this time? -- Well, a literal in SQL is treated as DECIMAL. A DECIMAL field is supposed to be precise. If this were to be converted into a PHP float aka. double we probably would loose the precision, thus treating it as string again makes sure we're not loosing information. If we had a FLOAT or DOUBLE field this could safely be represented as float in PHP:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT RAND()"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> float(0.16519711461402206) }
So to summarize:
- For a direct query the MySQL server sends strings, PHP returns all data as string
- For prepared statements MySQL sends data in binary form and PHP will use a corresponding type
- If the value could only be represented with a potential data loss in PHP it is converted to a string by PHP, even with prepared statements
Now we might expect the same when using PDO. Let's check:
<?php $pdo = new PDO("mysql:host=localhost", "...", "..."); $stmt = $pdo->prepare("SELECT 9223372036854775808, RAND()"); $stmt->execute(); var_dump($stmt->fetch(PDO::FETCH_NUM)); ?> array(2) { [0]=> string(1) "1" [1]=> string(18) "0.3217373297752229" }
This example uses prepared statements, but returns strings!? The reason is that PDO by default doesn't use prepared statements on the network layer but an emulation within PHP. This means PHP will replace potential placeholders and then runs a direct query. As mentioned above with a direct query the MySQL server will send strings, thus PHP will represent all data as string. However we can easily ask PDO to disable the emulation:
<?php $pdo = new PDO("mysql:host=localhost", "...", "..."); $pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false); $stmt = $pdo->prepare("SELECT 1, RAND()"); $stmt->execute(); var_dump($stmt->fetch(PDO::FETCH_NUM)); ?> array(2) { [0]=> int(1) [1]=> float(0.24252333421495) }
This leaves the question whether you should disable the emulation in order to get the correct types. Doing this has some impact on performance characteristics: With native prepared statements there will be a client-server round-trip during the prepare and another round-trip for the execute. With emulation only during the execute. The native prepared statements also require some server resources to store the handle. However if a single statement is executed multiple times there might be some savings. Also the type representation means that different type conversions happen and a different amount of data is transfered. For most cases this shouldn't have notable impact, but in the end only a benchmark will tell.
Hope this helps to give a better understanding, or more confusion
Feb 23: More on references

In a few different places I saw comments about my last blog post about references and performance where commentators noted that my example was pointless. Which of course is true and to some degree the point.
I read a lot of PHP code and from time to time I see people with a non-PHP background (or otherwise influenced) putting references everywhere they pass arrays or such in order to prevent copies. I knew this was a bad practice in PHP 5 and wanted to verify this in PHP 7. For readers with a stronger PHP background this doesn't come to mind and so comments are like "what if I want to modify the data?" which might lead to something like this:
function modify(&$data) { $data["foo"] = "bar"; } $data = [ /* huuuuuge array */ ]; modify($data);
In this code, from a performance perspective, the reference likely works out and this is "fast." My primary critic in this would be that references aren't idiomatic in PHP. Therefore most people reading this code wouldn't expect that $data is being changed in this function call. Luckily the name of the function give this away, to some degree. The more idiomatic way might be along those lines:
function modify($data) { $data["foo"] = "bar"; return $data; } $data = [ /* huuuuuge array */ ]; $data = modify($data);
I consider this more readable and clearer, while it will create a (temporary) copy, leading to more CPU and peak memory load. Now we have to decide how much clarity we want to take out of the code as compromise for a performance gain. After that decision has been made and we decided to go for the approach with references we fix an issue or add a new feature to our code and we make a slight change and suddenly loose what we've gained before. Maybe we do something like this:
function modify(&$data) { if (!in_array("bar", $data)) { // A copy happens here $data["foo1"] = "bar"; } if (!in_array("baz", $data)) { // Yet another copy here $data["foo2"] = "baz"; } } $data = [ /* huuuuuge array */ ]; $data2 = $data; modify($data); // A copy happens here, to split $data and $data2
So the performance gain we once carefully produced fired massively back to us and we even got three copies. In this short case this quite obvious, but in an larger application context with real life changes tracking this is really hard.
If we had written this in the (in my opinion) more idiomatic way this would look like this:
function modify($data) { if (!in_array("bar", $data)) { $data["foo1"] = "bar"; // Maybe a copy here } if (!in_array("baz", $data)) { $data["foo2"] = "baz"; // Maybe copy here, but only if not copied above already } return $data; } $data = [ /* huuuuuge array */ ]; $data2 = $data; $data = modify($data);
So depending on the conditions we might end up with either no or at most one copy, compared to the three copies from above. Of course this example is constructed but the point is: If you use references for performance you have to be extremely careful and know exactly what you're doing and think about each coming modification.
Now let's take a step back and think a bit more about this code. Isn't there yet another way? - We have data and we have functions operating on them. Wasn't there another construct which we might use? - Yes, we could go object-oriented!
class DataManager { private $data; public function __construct() { $this->data = [ /* huuuuuge array */ ]; } public function modify() { if (!in_array("bar", $this->data)) { $this->data["foo1"] = "bar"; } if (!in_array("baz", $this->data)) { $this->data["foo2"] = "baz"; } } } $dm = new DataManager(); $dm2 = $dm; $dm->modify();
Suddenly we have a higher degree of abstraction, encapsulation and all those other OO benefits and no copy of the data at all. Ok, yes I cheated: I didn't remember the purpose of the $dm2 = $dm assignment any more. So maybe we need to clone there and create an explicit copy. (While then again - for the $data property we'd probably benefit from copy-on-write making even the cloning quite cheap)
In summary: Yes, when careful you can be slightly more performant in both CPU and memory usage, but in real life that gain is often lost again and eventually fires back in maintenance cost and performance loss.
Now aren't there cases where references might be a good thing? - The only reason I found in recent times (except from an extremely carefully crafted tree structure I've seen, for which I'd usually suggest an OO way) is around anonymous functions/closures. Taking this example:
$data = [ /* ... */ ]; $oldsum = 0; $doubled = array_map(function ($element) use (&$oldsum) { $oldsum += $element; return $element * 2 }, $data);
Again, the example in itself might be bad, but in such a context where we provide a closure as callback and want to keep some "trivial" state references are a way which is ok. If the state we want to keep becomes more complex than a counter it, however, might be worthwhile to think about using an object to keep it or find some other code structure.
Feb 18: References - Still bad in PHP 7

I'm known for telling "Don't use references" (also as video) as those cause different problems (i.e. with foreach) and hurt performance. The reason for the performance loss is that references disable copy-on-write while most places in PHP assume copy-on-write. Meanwhile we have PHP 7. In PHP 7 the internal variable handling changed a lot among other things the reference counting moved from the zval, the container representing a variable, to the actual element. So I decided to run a little test to verify my performance assumption was still valid.
In my test code I'm calling a function which calls strlen (one of the cheapest functions in PHP - PHP strings carry their length, so it simply returns that property) from a loop. Once the function takes a parameter by reference, once per value. Here's the code:
<?php function takeRef(&$string) { strlen($string); } function takeVal($string) { strlen($string); } function loopRef() { $start = microtime(true); for ($i = 0; $i < 50000000; ++$i) { $s = "hello world"; takeRef($s); } return microtime(true) - $start; } function loopVal() { $start = microtime(true); for ($i = 0; $i < 50000000; ++$i) { $s = "hello world"; takeVal($s); } return microtime(true) - $start; } $ref = $val = PHP_INT_MAX; for ($i = 0; $i < 10; ++$i) { $ref = min($ref, loopRef()); $val = min($val, loopVal()); } echo "Ref: $ref\nVal: $val\n"; ?>
If I run this in PHP 5, in order to have a baseline, I get this result:
Ref: 10.679290056229 Val: 9.5635061264038
So using a reference costs 10%.
Now let's try PHP 7:
Ref: 10.631688117981 Val: 9.0047070980072
Overall we saw a small performance improvement, like we expect with PHP 7 but still using a reference costs 10% throughput. So I still stand with my mantra: Don't use references in PHP!
If you wonder about the second loop in the bottom and the min() call: The code takes multiple samples and then takes the measurement with the least noise as my system isn't 100% idle and there might be unrelated events I don't want to measure so the fastest run is closest to raw system performance.
Jan 24: MySQL Tour Munich


In February my colleague Morgan Tocker will travel through Europe and talk about MySQL 5.7, which recently went GA.
In Munich we're having a meeting for PHPers and other Web developers, where Morgan will talk about MySQL and JSON and I'll also be around.
Summary of the talk:
With the newly added JSON support in MySQL, you can combine the flexibility of NoSQL with the strength of a relational database. In this session, Morgan will explain the new JSON datatype, and the new set of functions for handling JSON documents, available storage and indexing options. In addition, he will present benchmarks showing how much one can expect from the MySQL server in different use case scenarios.
About Morgan: Morgan rejoined the MySQL team at Oracle in 2013 as MySQL Community Manager, having previously worked for MySQL and Sun Microsystems. He blogs from tocker.ca, and especially likes the new features in MySQL that make operations life easier.
The meeting will be held at Oracle's Munich Office, Meeting Room Van Gogh, Ground floor building B, Riesstr. 25 (close to underground station Olympiaeinkaufszentrum) on February 4th at 18:30. Open for anybody interested.
Aug 14: PHP 5.3 - Thanks for all the Fish

A few moments ago I pushed the buttons and PHP 5.3.29 came out. As this is the final release for 5.3 it is a good time to look back. PHP 5.3's history starts somewhere in 2005. We knew what a pressure point of PHP was - a language made for solving The Web Problem needs a good Unicode story. So some developers went deep into that complex area and created a prototype version of PHP with Unicode support from deep within the engine. As this was a big and pressing issue and the need was obvious and the solution looked promising it was quickly areed on making that the base for a future PHP 6. And then time passed, initial enthusiasm passed and the sheer amount of work became obvious. Two years in we noticed that the ongoing PHP 6 work blocked other work - new features couldn't be added to 5.2, the current version at that time, and adding them to (at that time) CVS's HEAD.
For solving the blocking issue we decided to create an intermediate release, pacing in all the things piled up, so on 2007-09-26 we branched off a CVS branch PHP_5_3.
Branching of PHP 5.3 set lots of enthusiasm free, and people started creating features and going into heated debates about the direction we should take so I was happy when Lukas volunteered to assist in the release management as Co-RM, playing a big role in making PHP 5.3, one of the most feature rich PHP releases, a huge success which was declared stable two years after branching of on June 30th 2009!
In those two years of development, from branching of till releasing 5.3.0 stable, we saw 5,338 commits by 83 committers (also committing work by other contributors without direct commit access) seeing 10,125 files being changed, with 1,089,600 insertions and 270,921 deletions (including tests and and generated files like parsers etc.) PHP 5.3 introduced many things many PHP developers see as normal and can hardly remember not using - things like namespaces or anonymous functions. It also introduced goto, late static binding, nowdoc, ?:, exception linking fileinfo, intl, mysqlnd, ... while also being a massive boost in performance. A massive release.
While trying to release 5.3.0 we noticed issues in our process. Notable things were that we, for a long time, didn't have a fixed cut of date and couldn't offer a promise when the next release will come. As a consequence people tried hard to push features in, as they feared having to wait a few years for the net release. In consequence a stricter release process with yearly releases etc. was created. Which lead to PHP 5.4 and 5.5 being almost on time and the upcoming PHP 5.6 being well on track.
Now development of 5.3 didn't stop with 5.3.0 but saw 29 bugfix releases with 7,554 commits from 152 comitters (due to the move to git in between a single committer might be counted multiple times, on the other hand more "external" contributor's names are being kept) and seeing 4,862 files being changed, 376,187 insertions and 207314 deletions.
On the personal side being the release master of PHP 5.3 gave me the opportunity to travel between Moscow and California and teaching different audiences in multiple languages about the great work, which was done mostly by others. (Check the ChangeLog to see whom to thank for your favorite feature!)
But now it's time to close that chapter - as of now PHP 5.3 is not supported anymore and the different RM teams and contributors are making PHP even better than PHP 5.3 ever was, as we can see in existing and previews of future releases.
Thank You All, it was a great time!
Feb 24: On rumors of "PHP dropping MySQL"


Over the last few days different people asked me for comments about PHP dropping MySQL support. These questions confused me, but meanwhile I figured out where these rumors come from and what they mean.
The simple facts are: No, PHP is not dropping MySQL support and we, Oracle's MySQL team, continue working with the PHP community.
For the long story we first have to remember what "PHP's MySQL support" includes. There key part are four extensions which are part of the main PHP tree:
- ext/mysql
- ext/mysqli
- ext/pdo_mysql
- ext/mysqlnd
The first one, ext/mysql provides the mysql_* functions. This is the classic interface taught in many (old) books and used by lots of (old) software. mysqli is "mysql improved", this is a younger extension providing access to all MySQL features. pdo_mysql contains the driver for PDO, PHP's database API abstraction layer. mysqlnd is the MSQL native driver, this module goes mostly unseen and provides a common set of functionality for the three other modules in order to talk to the MySQL Server.
Additionally we maintain a bunch of PECL extensions plugging into mysqlnd, which I won't mention in detail, the PHP documentation has info on most.
Now that's a lot of code and we an look at what is actually happening:
The old mysql extension, ext/mysql, is old. The code goes back to the early days of PHP and MySQL and embraces some bad design decisions. For example if no explicit connection resource is passed all functions will try to use the last connection which was being used. So given a simple example like this:
<?php mysql_connect("mysql.eample.com", "user", "password"); mysql_select_db("test"); $result = mysql_query("DELETE FROM t WHERE id = 23"); ?>
This might do weird things. Let's assume the connect fails, as the error is not handled the script continues to run and will call mysql_select_db(). This won't directly fail but guess and try to connect to a server (most likely on localhost), if that fails the script still won't terminate but mysql_query() will again guess and try to connect. If all things come together this will suddenly work and the script will operate on a completely different database than expected which can have really bad consequences.
But that's not all. As said the code goes way back, it is grown with PHP and MySQL. It tries to be compatible with all versions of MySQL since at least 3.23 this all makes the code hard to maintain.
When PHP 5.0, which added mysqli, came along in 2004 it was decided that maintaining this is troublesome and that we won't add new features to that old extension but only to mysqli (as well as to pdo_mysql, which came along a bit later in PHP 5.1, as long as it makes sense) We also started to advertise these newer extensions over the old one.
So we lived on for a while, added features to mysqli, fixed a few bugs in mysql, normal operations. Over the time we noticed that people still use the old extension even for new projects and prevent them access from features (i.e. prepared statements or support for multiple result sets as needed for stored procedures etc.) but we also knew that we can't simply deprecate and remove the extension as it is way to commonly used. So in 2012 we started a "soft deprecation" process, which meant to add deprecation warnings to the documentation and suggesting alternatives using mysqli or PDO.
A bit later, with PHP 5.5 which was released in June 2013, it was decided to add such a deprecation notice to the code, so each time a script is connecting to a MySQL server using the mysql extension a deprecation notice would be triggered.
That's the state we are in and there is no date by which the old mysql extension will be removed from PHP. Atsome point in the future. Certainly not the upcoming PHP 5.6, though.
Why not? - Since we are aware of many projects with a long history who can't simply swap this out. One of these projects is Wordpress. And this brings us to the current discussion:
Wordpress is an old project, going back to the days of PHP 4 where there was only the old mysql extension and nothing else. Wordpress also doesn't live on its own but with tons of plugins extending all kinds of features. Some of these go equally long back, many need database access, so many make more or less direct use of ext/mysql. After quite some discussions and heated debate in different channels the Wordpress developers now decided to do the switch. As they are aware of the trouble this causes to the plugin environment they are carefully, though - they actually allow switching between both extensions, mysql and mysqli.
As always such major changes become heated and imprecise statements loose their context and thus wrong messages circulate. So nothing to worry about, while I'd like to encourage all users of the old mysql extension to follow Wordpress and other examples and do the switch.
I hope this helped to clear things up!
Oct 9: Wie weiter mit der U-Bahn in München?
Dezember 2010 war eine Zeitenwende für das Münchner U-Bahn-Netz. Erstmals seit 1965 wurde mit der Eröffnung des letzten Teilstücks zum Moosacher Bahnhof finden keine Bauarbeiten für Streckenerweiterungen mehr statt. Gleichzeitig wächst München samt Region aber und die Nutzungszahlen des ÖPNV in München steigen. Somit hat man sich Gedanken zu machen wie es mit dem ÖPNV-System weiter gehen soll.
Das Münchner Verkehrsnetz basiert im wesentlichen auf einer zentralen Sternstruktur um das Dreieck Hauptahnhof - Marienplatz - Sendlinger Tor. Egal ob Tram, U- oder S-Bahn diese Knoten fehlen bei kaum einer Fahrt. Dies war gut mit früheren Fahrgastzahlen, da die zentrale Form einfach zu erfassen ist und eine hohe Integration der verschiedenen Linien auf ein System gibt, jedoch zeigt die Auslastung der Innenstadtstrecken und -bahnhöfe dass die Kapazitätsgrenzen nahe sind, so sie im Stadionverkehr und ähnlichem nicht schon überschritten sind. Es muss also eine Erweiterung statt finden.
Ein viel diskutiertes Konzept ist die zweite S-Bahn-Stammstrecke. Diese Strecke führt mit weniger halten parallel zur bisherigen Ost-West-querenden S-Bahn Stammstrecke. Dies führt, so der Plan, zu einer Entlastung der bisherigen Stammstracke. Meiner Meinung nach hat sie jedoch ein massives Problem: Sie manifestiert die zentrale Struktur des gesamten Systems. Insbesondere durch Reduktion der Halte auf Laim - Hauptbahnhof - Marienplatz - Ostbahnhof.
Dies ist, meiner Meinung nach, keine gesamtheitliche Lösung.
Vor einiger Zeit diskutierte ich im Familienkreis andere Pläne für periphere arbeiten am U-Bahn-Netz. Im Rahmen der Diskussion und Recherche dazu bin ich auf ein Dokument der Grünen im Stadtrat gestoßen für einen Ausbau des U-Bahn-Netzes. Die Grünen schlagen in Antrag 08-14 / A 04451 die "die Ausfädelung der U1 ab Kolumbusplatz bzw. Candidplatz und Führung zum Hauptbahnhof (Variante 2 u. 3) mit neuen Stationen Balde - und Goetheplatz (Umstieg zur U3 /U6) und Einfädelung in die bestehende Station Hauptbahnhof" vor. Dies habe ich mal auf dem Stadtplan visualisiert (anklicken zum vergrößern, neue strecke hellblau/türkis). Dieser Vorschlag regte meine Fantasie an.
Die erste Überlegung, die ich hatte war wie man da den Bahnhof Kolumbusplatz ausbauen müsste (die Idee bereits am Candidplatz auszufädeln habe ich gleich verworfen, da sonst kein Anschluss U1neu-Süd zu U2-Süd besteht) und kam zum Schluss, dass das ein recht großer Umbau würde. Dann fiel mein Blick auf den Südring und wanderte bis zum Ostbahnhof und mir vielen zwei Dinge ein: Die Anbindung im Bereich U1-Süd und U2-Süd zum Ostbahnhof ist schlecht, zum anderen entsteht im Bereich Au/Nockherberg durch Wegzug der Paulaner-Brauerei einneues Wohnviertel. Warum also nicht den Vorschlag der Grünen modifizieren und statt der Einfädelung in U1/2 am Kolumbusplatz im Bereich Südring die Strecke zum Ostbahnhof führen.
Und schon gab es ein Bild. Mit ein paar iterativen Schritten war schnell klar was das wird: Ein U-Bahn-Ring! Also mal was weiter gemalt und ein wenig Feintuning gemacht und schon ist sie das: Die Lösung für nen Stapel von Problemen.
Wichtige Punkte in der Lage des Planes sind die Verknüpfungen zu den verschiedenen Tram und U-Bahn-Linien, die in die Stadt führen, die tangential verbunden werden auf Strecken, die sich im Vergleich zu Fahrten in die Stadt noch lohnen. Nebenbei bietet er eine Alternative zu der "Straßenbahn durch den Englischen Garten"-Diskussion, die seit Jahren geführt wird.
Was er nicht, direkt, macht ist die Alternative zur S-Bahn-Stammstrecke zu bieten. Das funktioniert auf einzelnen Ästen (z.B. für Fahrten von der U3/6 Nord zum Hauptbahnhof, die bislang über Marienplatz führen. Was er dabei jedoch macht ist die Attraktivität des Südrings steigern, der so zu einer Alternative zur zweiten Stammstrecke wird.
Die neue Haltestelle Regerplatz ermöglicht es das neue Wohnviertel am Paulaner Areal von früh an attraktiv an das ÖPNV-Netz anzuschließen.
Die Verbindung Hauptbahnhof - Goetheplatz - Kolumbusplatz bietet zudem, bei entsprechendem Bau, eine weitere Ableitung des Oktoberfestverkehrs, sowohl für Besucher aus dem Umland, die via Hauptbahnhof fahren als auch in den Süden der Stadt. Am Goetheplatz schwebt mir da ein Bahnhof mit drei Gleisen ähnlich der S-Bahn an Stachus und Hauptbahnhof vor, wo dann zu Oktoberfestzeit der Mittelbahnsteig getrennt wird und so An- und Abreisende sowie nach Nord und Süd fahrende getrennt werden.
Durch die Halte Chinesischer Turm / Tivolistraße und Sternwartstraße werden Fahrgäste der Tram vom Effnerplatz vor der Innenstadt abgefangen.
Natürlich ist dies ein recht umfangreiches Projekt das so manchen Bau in den Schatten stellt wo es auch eine interessante Frage ist, ob ein U-Bahn-Fahrer es durchsteht stundenlang auf einem Kreis mit ca. 15 Minuten Fahrzeit zu fahren oder ob er da Tunnelkoller bekommt. Aber nunja, man braucht doch Visionen.
(Alle Grafiken basieren auf OpenStreetMap, © OpenStreetMap
contributors, mit eigenem, mit Maperitive gerendertem Layout und Ergänzungen)
Oct 9: Sharding PHP with MySQL Fabric


PHP users who attended any of my recent PHP&MySQL related talks or read Ulf's blog will know our mysqlnd_ms plugin. This plugin hooks into PHP's mysqlnd library and provides transparent support for replication and load-balancing features. Without changing your application you get transparent load-balancing and read-writ splitting so all your reading queries will be sent to a slave while the writes go to a master server. The exact strategies for that can be defined in mysqlnd_ms's configuration so quite often no, or only few application changes are needed. But we have one limitation: The MySQL servers have to be configured in each configuration on all your PHP servers, this can be annoying when you're changing your environment like adding a new slave or promoting a machine to master in case the original master fails. But there's help coming!
At this year's MySQL Connect conference we've announced the initial labs release for MySQL Fabric. MySQL Fabric aims to be "an integrated environment for managing a farm of MySQL server supporting high-availability and sharding." Part of this framework is an RPC interface to query available servers which are managed by MySQL Fabric which delivers us the missing piece for mysqlnd_ms.
As this release of Fabric put the focus on sharding, this is what I want to show here, too. A general introduction to MySQL Fabric and its sharding features can be found on VN's blog so I'll be quite fast in some areas, for details please refer to the documentation and the mentiond blogs.
The first thing we need is the MySQL Utilities package with Fabric support which is available from labs.mysql.com. After installing this package you have to locate the main.cfg configuration file and configure the storage instance.
[storage] database = fabric user = fabric address = localhost:3300 connection_timeout = 6 password = i do no now
This is a MySQL server where Fabric will store its configuration and such. After that we can initialize Fabric and start the daemon.
$ mysqlfabric setup $ mysqlfabric start
The setup step creates tables in the configured database and the start starts the daemon process. Now we can o and configure our server groups. A server group contains a master server where the group's data is being written to and a number of slaves to which MySQL will replicate data. For our sample sharding setup I plan to create two shards and a global group. The purpose of the global group is to hold table definitions and data which Fabric will make available on all systems in our environment. Each of these groups will, in this example, have one master and one slave. This means we need six MySQL server instances running. These six instances should all be running MySQL 5.6. an except from having binary logging enabled and having different server ids there is no replication configuration needed before running these commands. In my example setup I'm running all of those on one machine, obviously that's only useful for tests:
$ mysqlfabric group create global $ mysqlfabric group add global 127.0.0.1:3301 root secret $ mysqlfabric group promote global $ mysqlfabric group add global 127.0.0.1:3302 root secret $ mysqlfabric group create shard1 $ mysqlfabric group add shard1 127.0.0.1:3303 root secret $ mysqlfabric group promote shard1 $ mysqlfabric group add shard1 127.0.0.1:3304 root secret $ mysqlfabric group create shard2 $ mysqlfabric group add shard2 127.0.0.1:3305 root secret $ mysqlfabric group promote shard2 $ mysqlfabric group add shard2 127.0.0.1:3306 root secret
So this creates the three groups and will configure the servers to replicate the servers as needed. With this setup the server on port 3301 will be the global master. 3302, 3303 and 3305 will e 3301's direct slaves and 304 will be configured to be a slave for 3303, as will 3306 to 3305.
Now we go to define our sharding rules. I'm going to use range based sharding with two shards. The first shard, which will be assigned to the server group shard1 created above will have shard id 1 to 9999 and the second shard, in group shard2 will have data for shard key values 10000+. We also define the table fabrictest in the test schema as our sharding tale and id as the shard column.
$ mysqlfabric sharding define RANGE global $ mysqlfabric sharding add_mapping 1 test.fabrictest id $ mysqlfabric sharding add_shard 1 shard1 ENABLED 1 $ mysqlfabric sharding add_shard 1 shard2 ENABLED 10000
Note that for range-based sharding we don't have to define the upper bound as that is defined by the lower bound of the next shard.
Now we have MySQL Fabric and our MySQL Servers configured and can go to PHP. As mentioned in the beginning we need mysqlnd_ms, to be precise the 1.6.0 alpha release which we can install using pecl:
$ sudo pecl install mysqlnd_ms-alpha
To configure PHP we firstly need a mysqlnd_ms configuration file. myslqnd_ms uses json and a simple confiuration using Fabric might look like this:
fabric.json:
{ "test" : { "fabric":{ "hosts": [ { "host": "localhost", "port": 8080 } ] } } }
This configures the application test to use a MySQL Fabric based setup where MySQL Fabric's RPC daemon runs on the local machine. Again: We put all on one machine for a test, not what one would do on a production setup.
Next we locate our system's php.ini file and enable mysqlnd_ms to use our config.
php.ini: extension=mysqlnd_ms.so mysqlnd_ms.enable=1 mysqlnd_ms.config_file=/path/to/fabric.json
And now we are finally done and run a test script.
<?php $c = new mysqli("test", "root", "", "test"); echo "Creating global table:\n"; mysqlnd_ms_fabric_select_global($c, "test.fabrictest"); var_dump($c->query("CREATE TABLE fabrictest (id INT NOT NULL)")); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; echo "Inserting with ID 10:\n"; mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10); var_dump($c->query("INSERT INTO fabrictest VALUES (10)")); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; echo "Trying to read id 10 from 10010:\n"; mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10010); $r = $c->query("SELECT * FROM fabrictest WHERE id = 10"); var_dump($r->fetch_row()); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; ?>
With this script we do a few things, first observation is the hostname test. The mysqlnd_ms plugin will recognize that as application name and will refer to its configuration. Second are the mysqlnd_ms_* functions. First we pick the global group and execute a CREATE TABLE operation there. mysqlnd_ms will detect that this is a write operation and therefore connect to the globals master. This should be 127.0.0.1:3301 which hopefully is printed by the echo call. Then we select the shard responsible for id 10 in the fabrictest table and insert data. mysqlnd_ms will, again, detect that this is a write operation and will therefore figure out where writes to that shard have to go to, which is 127.0.0.1:3303. Finally we do an operation which will not really succeed: We select the servers for shard 10010 which is shard2 from our setup and then query for id 10. the data we stored in shard1. This will query 127.0.0.1:3306 (slave of 3305 in shard2 group) which will return an empty result set.
I hope this is enough to get you started, you can now add shards or migrate them or take servers down and promote current slaves to masters etc. and see how the system reacts.
In a future post we will combine this with Doctrine stay tuned.
Note: This blog post features labs/alpha releases. Which aim at demonstrating functionality. They are not for production use. There might be stability issues, there certainly are performance restrictions we're working on. We'd like however to receive feedback.
Sep 11: Volksentscheide
Ich brauche mal Hilfe. Sonntag haben wir in Bayern Volksentscheide zu 5 Verfassungsänderungen. Im Landtag haben CSU, SPD, FDP und FW dafür gestimmt nur die Grünen nicht. Die Änderungen gehen zurück auf eine Aschermittwochsrede Seehofers in der er die deutsche "Leitkultur" in die Verfassung aufnehmen wollte. Da das 2/3 der Stimmen im Landtag und Volksentscheid hat die CSU zu verhandeln begonnen. Herausgekommen sind 5 Verfassungsänderungen:
- Förderung gleichwertiger Lebensverhältnisse und Arbeitsbedingungen
- Förderung des ehrenamtlichen Engagements
- Angelegenheiten der Europäischen Union
- Schuldenbremse
- Angemessene Finanzausstattung der Kommunen
Wie eingangs erwähnt sind die Grünen die einzigen, die gegen das Paket gestimmt haben. Bei den ersten beiden und dem letzten ist die Grüne Argumentation, dass das Inhaltslose Aussagen sind und nichts in der Verfassung bringen. Bei der dritten Änderung zur EU Argumentieren sie, dass das rechtlich fraglich ist, dazu unten mehr, bei der Schuldenbremse argumentieren sie,dass das im Grundgesetz geregelt ist und in der Bayerischen Verfassung somit auch nicht nötig ist.
Wirklich gute Medienartikel zu den Themen finde ich nicht so frage ich mich: Wie soll ich da jeweils Stimmen? Alsogehen wir es durch, und ich hoffe auf Anregungen!
Die erste Änderung will den Satz "Er [der Freistaat Bayern] fördert und sichert gleichwertige Lebensverhältnisse und Arbeitsbedingungen in ganz Bayern in Stadt und Land" zu den Staatszielen hinzu fügen. Klingt nett, aber was soll das für Wirkung haben? Grüne sagen "keine" das klingt überzeugend. Und überhaupt - ich in in der Stadt. Die ist ja eher bevorzugt, willich das egoistisch nicht erhalten?
Der zweite ist eben so konsequenzlos "Staat und Gemeinden fördern den ehrenamtlichen Einsatz für das Gemeinwohl." Ich finde es ja gut, wenn sich Leute engagieren aber muss das so in der Verfassung kodifiziert werden?
Der dritte ist komplizierter. Der besagt, dass die Staatsregierung die Hinweise des Landtags in EU-Fragen "maßgeblich" zu beachten hat. Konkreter geht es dort um Gesetze im Bundesrat zu EU-Themen. Das Konstrukt des Bundesrates fasziniert mich ja immer schon, dass da die Legislative im Bund durch die Exekutive der Länder gemacht wird. Ist das Gewaltenteilung? Als Fan von Parlamenten finde ich die Idee den Bundesrat mehr an die Parlamente zu binden natürlich gut, verstehe aber auch die grundsätzliche Problematik durch die Bindung die hier passieren soll, wo die Abgrenzung unklar ist.
Nummer vier ist die gute alte Schuldenbremse, wie sie auch fürs Grundgesetz besteht wonach ab 2020 keine Schulden mehr gemacht werden dürfen. klingt ja erstmal gut, nur bedeutet das, dass der Staat Probleme hat große Infrastrukturmaßnahmen zu finanzieren, die sich eventuell erst nach Jahrzehnten rechnen, so sich Infrastruktur überhaupt rechnen kann. Die Bayerische Regierung is doch so schon stolz auf ihren Haushalt. Zunächst hat auch diese Regelung keine Konsequenz, da die Regelung bereits im Grundgesetz ist. Spannend würde es, wenn im Bund eine Mehrheit dran kommt, die die Problematik erkennt und das wieder abschafft. Bis das dann in Bayern abgeschafft ist dauert es dann wieder, das geht ja nur per Volksentscheid und "wir wollen Sschulden machen" kommt nicht unbedingt gut. Ich, als Parlaments-Fan, will lieber Abgeordnete im Parlament, die verantwortungsbewusst mit dem Budgetrecht umgehen.
Die letzte vorgeschlagene Änderung fügt wieder so einen Satz ein. "Der Staat gewährleistet den Gemeinden im Rahmen seiner finanziellen Leistungsfähigkeit eine angemessene Finanzausstattung." Das würde ich für selbstverständlich halten. Durch die Eingrenzung auf die "Möglicheiten," die relativ frei definiert werden können dürften sich die Folgen für die Praxis in Grenzen halten.
Nun, wie stimmt man da ab und warum?
Jun 20: PHP 5.5 is out, what's up with 5.4 and 5.3?

Yay, finally we released PHP 5.5, which is a new big release for PHP. In preparation for this I yesterday sent out a mail to the PHP core developers stating that the "PHP-5.3 BRANCH IS CLOSED NOW". After I saw this quoted on twitter and different websites I want to make a few things clear for users of PHP:
- The mail is an information for core developer that all changes for 5.3 should go by the release master and our security group
- We won't do normal bug fixes
- we will continue doing security fixes for a year where needed
What this means for users of PHP is that they can continue using PHP 5.3 and when upgrades come they are very low risk of breaking anything (we always try not to break anything, but any person's bug might be another person's feature) so they should be applicable easily and applied fast. So when you are a happy PHP 5.3 user and don't want to touch too many things there is no immediate need to upgrade to 5.4 or 5.5 - for a year.
So even when you don't have to migrate why should you migrate? - Besides the new features I see two major reasons:
- Newer versions of PHP are generally more performant and efficient than older versions, meaning your users get faster response and you need less hosting/cloud resources to run your system.
- You get all bug fixes
So when migrating where should you go to? PHP 5.4 or 5.5? - There the answer is, in my personal, quite easy: Go to 5.5! 5.5 will live longer than 5.4 so it is the more future save path, as we try hard to keep backwards compatibility migration should be fairly simple, etc. PHP 5.5 also mostly uses the same code as 5.4 with a few extra features. So for everything PHP 5.4 does PHP 5.5 does it as stable.
So go and fetch PHP 5.5, use for new projects and work on your migration from 5.3 but don't panic.
Apr 2: Making use of PHP mysqlnd statistics

One of the great things of mysqlnd as a base library for PHP's MySQL support are the collected statistics. mysqlnd collects about 160 different statistical values about all the things going on. When having such an amount of raw data it, obviously, is quite hard to draw conclusions out of it. Therefore I recently created a PHP library sitting on top of this feature to collect all data, run some analysis and then provide some guidance and made it available from the JSMysqlndAnalytics GitHub repo (see there also for instructions for using Composer).
Using the library is relatively simple as the short instructions show. The library consists of two main parts. On the one side the "Collector" this is a wrapper around mysqli_get_client_stats() (even though this function is part of mysqli it will also work for applications using ext/mysql or PDO_mysql) which is creating the raw statistics which could be stored away or such and then the actual Analytics engine comparing the values to predefined rules. The current set of rules is a bit limited so I'm looking for input for ideas.
In case you're a Symfony user live is quite easy: Some time ago I already provided an Symfony bundle providing a Symfony Profiling Toolbar plugin showing the statistics. This JSMysqlndBundle has been extended to make use of these Analytics. The screenshot might give a rough idea on how this looks.
Hope this helps creating better applications! Happy hacking!
Oct 12: Analysing WHER-clauses in INFORMATION_SCHEMA table implemtations

The MySQL Server has a quite simple interface for plugins to create tables inside INFORMATION_SCHEMA. A minimal plugin for creating a table with nothing but a counter might look like this:
static int counter_fill_table(THD *thd, TABLE_LIST *tables, Item *cond) { ulonglong value= 0; while (1) { table->field[0]->store(value++, true); } return 0; } static ST_FIELD_INFO counter_table_fields[]= { {"COUNT", 20, MYSQL_TYPE_LONGLONG, 0, MY_I_S_UNSIGNED, 0, 0}, {0, 0, MYSQL_TYPE_NULL, 0, 0, 0, 0} }; static int counter_table_init(void *ptr) { ST_SCHEMA_TABLE *schema_table= (ST_SCHEMA_TABLE*)ptr; schema_table->fields_info= counter_table_fields; schema_table->fill_table= counter_fill_table; return 0; } static struct st_mysql_information_schema counter_table_info = { MYSQL_INFORMATION_SCHEMA_INTERFACE_VERSION }; mysql_declare_plugin(counter) { MYSQL_INFORMATION_SCHEMA_PLUGIN, &counter_table_info, /* type-specific descriptor */ "COUNTER", /* plugin and table name */ "My Name", /* author */ "An I_S table with a counter",/* description */ PLUGIN_LICENSE_GPL, /* license type */ counter_table_init, /* init function */ NULL, /* deinit function */ 0x10, /* version */ NULL, /* no status variables */ NULL, /* no system variables */ NULL, /* no reserved information */ 0 /* no flags */ } mysql_declare_plugin_end;
This is quite straight forward and documented inside the MySQL documentation. It also has an obvious issue: It will run forever (at least if we assume we don't run in an out of memory situation). Luckily we might have a user who foresees this issue and added a WHERE clause like here:
Read MoreSELECT COUNT FROM INFORMATION_SCHEMA.COUNTER WHERE COUNT < 10