Sep 26: Direct MySQL Stream Access - Revised


Roughly three years ago I was writing about Direct MySQL Stream Access - a way to access the low-level stream PHP's mysqlnd library is using. Back then this had been a patch against PHP's mysqli extension. As such a feature is quite dangerous (you can easily mess with the connection state which confuses mysqlnd and/or the MySQL server) we didn't push it into the main PHP tree. Now three years later it's time to look at this again as we don't need to patch PHP anymore.
Since the mentioned patch was written mysqlnd got a plugin interface about which I was talking before. This plugin-interface, especially in the version of PHP 5.4, makes it trivial to implement this feature.
PHP_FUNCTION(mysqlnd_to_stream) { zval *conn_zv; MYSQLND *conn; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &conn_zv) == FAILURE) { return; } if (!(conn = zval_to_mysqlnd(conn_zv))) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Passed variable is no mysqlnd based connection"); RETURN_FALSE; } php_stream_to_zval(conn->net->stream, return_value); }
If you take a function like the one shown above and add some general PHP infrastructure you are done. The key function here is the function MYSQLND* zval_to_mysqlnd(zval *connection) which takes a PHP variable as parameter and in case it is a MySQL connection (ext/mysql, mysqli or pdo_mysql) will return the corresponding MYSQLND pointer which gives access to the stream which then has to be packed into a PHP variable, again. The nice thing, compared to the old version is not only that it is a plugin which can be loaded into PHP as shared extension via php.ini but also that it works with all MySQL extensions, not only mysqli as the one before.
You can download the complete source, but be warned: This is experimental stuff and not supported in any way, but I hope good enough to get a feeling what's possible with mysqlnd.
Sep 14: mysqlnd plugins and json


Some time ago I was already writing about the power included with mysqlnd plugins and how they can they can be used transparently to help you with your requirements without changing your code. But well, as mysqlnd plugins in fact are regular PHP extensions they can export functions to the PHP userland and providing complete new functionality.
In my spare time I'm currently writing a shiny Web 2.0 application where I'm heavily using AJAX-like things, so what I do quite often in this application is, basically this: Check some pre-conditions (permissions etc.) then select some data from the database, do a fetch_all to get the complete result set as an array and run it through json_encode; or to have it in code:
<?php $m = new MySQLi(/*...*/); check_whether_the_user_is_checked_in_and_allowed_to_see_this(); $result = $m->query("SELECT a,b,c,d FROM t WHERE e=23"); echo json_encode($result->fetch_all()); ?>
Of course that example is simplified as I'm using the Symfony 2 framework for this project. When writing a similar function for the 5th time I wondered whether I really need to create the temporary array and all these temporary elements in it.
So I wrote a mysqlnd plugin.
The mysqlnd_query_to_json plugin (hey what a name!) provides a single function, mysqlnd_query_to_json(), which takes two parameters, a connection identifier and an SQL query, and returns a JSON string containing the result set. The connection identifier can be a mysql resource, a mysqli object or even a PDO object. The resulting JSON string will be created directly from the network buffer without the need of temporary complex structures. Using the above example would create code like this:
<?php $m = new MySQLi(/*...*/); check_whether_the_user_is_checked_in_and_allowed_to_see_this(); echo mysqlnd_query_to_json($m, "SELECT a,b,c,d FROM t WHERE e=23"); ?>
The plugin, which you can find here, requires PHP 5.4 and has a few limitations as it knows nothing about MySQL bitfields or escaping of unicode characters for creating fully valid JSON data and Andrey called it, for good reasons, a hack. Neither did I benchmark it, yet as I merely share it to show what's possible and maybe start some discussion on what is actually needed.
If you want to learn more on these topics I also suggest to check the MySQL Webinar page frequently as Ulf is going to hold a Webinar on myslqnd plugins in October!
Jul 1: MySQLi result set iteration - recursive


PHP 5.3 is released and after the release stress is over my mind is open for new ideas. While relaxing yesterday I thought about many things, among them was the Resultset iterator I recently discussed.
Now I wondered where to go next with this and had the idea that an individual Resultset is a child of the whole result and this might be wrapped in an Recursive Iterator. For doing so we don't implement the Iterator interface but RecursiveIterator. RecursiveIterator extends a typical Iterator with two methods: hasChildren() and getChildren(). But now we have a problem: The Iterator returned by getChildren() has to be a RecursiveIterator, too, which makes sense, in general. But I want to return a MySQLi Resultset which isn't recursive - so making this a RecursiveIterator is wrong. My solution now is to introduce yet another Iterator which goes by the name of MySQLi_PseudoRecursiveResultIterator and is implemented by extending IteratorIterator which will wrap the MySQLi_Result and implements RecursiveIterator telling the caller that there are no children.
As a sidenote: In our experimental tree Andrey made MySQLi_Result an iterator but that's not yet in php.net's CVS (might need some more testing, and probably we might change the design there...) so I'm emulating this with MySQLi_Result::fetch_all() combined with an ArrayIterator, using the experimental code the constructor can be dropped.
So let's finally look at the code of these two classes:
<?php class MySQLi_ResultsetIterator implements RecursiveIterator { private $mysqli; private $counter = 0; private $current = null; private $rewinded = false; public function __construct(mysqli $mysqli) { $this->mysqli = $mysqli; } private function freeCurrent() { if ($this->current) { $this->current->free(); $this->current = null; } } public function rewind() { if ($this->rewinded) { throw new Exception("Already rewinded"); } $this->freeCurrent(); $this->counter = 0; $this->rewinded = true; } public function valid() { $this->current = $this->mysqli->store_result(); return (bool)$this->current; } public function next() { $this->freeCurrent(); $this->counter++; $this->mysqli->next_result(); } public function key() { return $this->counter; } public function current() { if (!$this->current) { throw new Exception("valid() not called"); } return $this->current; } public function hasChildren() { return true; } public function getChildren() { return new MySQLi_PseudoRecursiveResultIterator($this->current); } } class MySQLi_PseudoRecursiveResultIterator extends IteratorIterator implements RecursiveIterator { public function __construct(MySQLi_Result $result) { // This ctor can be dropped with the experimental bzr sources // as IteratorIterator::__construct() directly works with // MySQLi_Result parent::__construct(new ArrayIterator($result->fetch_all())); } public function hasChildren() { return false; } public function getChildren() { throw new Exception("This should never be called"); } } ?>
Now we can use this code. For properly using a RecursiveIterator one should use a RecursiveIteratorIterator (RII). To get some nice labels I'm extending the RII and then have a single foreach:
<?php class MyRecursive_IteratorIterator extends RecursiveIteratorIterator { public function __construct(MySQLi $mysqli, $flags = 0) { parent::__construct( new Mysqli_ResultSetIterator($mysqli), $flags | RecursiveIteratorIterator::LEAVES_ONLY); } public function beginChildren() { echo "Next ResultSet:\n"; } } $mysqli = new MySQLi("localhost", "root", "", "test"); $query = "SELECT 1,2 UNION SELECT 3, 4;". "SELECT 'hi world' UNION SELECT 'foobar'"; if ($mysqli->multi_query($query)) { foreach (new MyRecursive_IteratorIterator($mysqli) as $key => $row) { printf(" %s\n", $row[0]); } } ?>
Now calling this code gives us a result similar to the following:
Next ResultSet: 1 3 Next ResultSet: hi world foobar
Isn't that nice? - I think that's a cool API! What do you think? Do you have use cases for such an API? Should we implement this in C and bundle it with PHP? Any feedback welcome!
Jun 19: MySQLi Resultset Iterator


Over at phpdeveloper.org I was pointed to a blog post talking about MySQLi and stored procedures. That reminded me about a small thing I recently did: When using MySQLi's multi_query to send queries which return multiple result sets you have to use a rather unintuitive API which can certainly be improved.
Recently I sat down and cooked up a small improvement for that, being an iterator fan I, of course, had to use an iterator for that and implemented the following class:
<?php class MySQLi_ResultsetIterator implements Iterator { private $mysqli; private $counter = 0; private $current = null; private $rewinded = false; public function __construct(mysqli $mysqli) { $this->mysqli = $mysqli; } private function freeCurrent() { if ($this->current) { $this->current->free(); $this->current = null; } } public function rewind() { if ($this->rewinded) { throw new Exception("Already rewinded, rewinding multiple times is not allowed!"); } $this->freeCurrent(); $this->counter = 0; $this->rewinded = true; } public function valid() { $this->current = $this->mysqli->store_result(); return (bool)$this->current; } public function next() { $this->freeCurrent(); $this->counter++; $this->mysqli->next_result(); } public function key() { return $this->counter; } public function current() { if (!$this->current) { throw new Exception("valid() not called"); } ?>
This iterator is wrapping all that's needed an then can be used like that:
<php $mysqli = new MySQLi("localhost", "root", "", "test"); $query = "SELECT 1,2 UNION SELECT 3, 4;". "SELECT 'hi world' UNION SELECT 'foobar'"; if ($mysqli->multi_query($query)) { foreach (new MySQLi_ResultsetIterator($mysqli) as $key => $result) { echo 'MySQL Resultset #'.(1+$key).":\n"; while ($row = $result->fetch_row()) { printf(" %s\n", $row[0]); } } } ?>
The output will be something like
MySQL Resultset #1: 1 3 MySQL Resultset #2: hi world foobar
And is, in my opinion, way nicer than the classical way, which you can see on the multi_query docs page.
That code is the first revision of that idea, I'll try to improve it and port it over to C so that some future version of PHP will include it. As a disclaimer: If you plan on using this class be aware that a future PHP might bundle a class having that exact name so use your own name Feedback welcome.
Mar 24: Google Summer of Code 2009


Dec 30: MySQL Storage Engine based on PHP


Sometimes one has weird ideas, or am I the only one? - This specific one is at least a year old, now, during the Christmas days, waiting for New Year's Eve I had the time and mood to finally try it out: MySQL 5.1 has a plugin interface to easily add storage engines. PHP can easily embedded into other applications. So why not combine these two things? - Writing a MySQL Storage Engine which reads data by calling a PHP script.
Let's start with a simple example first:
<?phpfunction create_table($table, $data) {
    return true;
}
function open_table($table) {
    return new ArrayIterator(array(
        array('id' => 1, 'dat' => 'foo'),
        array('id' => 2, 'a' => 'bar')
    ));
}
?>
This is the bare minimum storage engine my plugin supports. create_table() is called for creating the table, open_table() to access it, the later one then returns an iterator which is used for a full table scan. This example uses an ArrayIterator, which implements the SeekableIterator and the Countable interfaces, the first one provides a seek() method, which is called to read specific rows after sorting for instance, the later provides a method count() which gives the optimizer a hint.
Let's use this table:
mysql> CREATE TABLE php_test (id int, val CHAR(3)) ENGINE=PHP;
Query OK, 0 rows affected (0.04 sec)
mysql> SELECT * FROM php_test;
+------+------+
| id | val |
+------+------+
| 1 | foo |
| 2 | bar |
+------+------+
2 rows in set (0.00 sec)
Ok, of course that's nice and shiny but well, it's read only. To solve that you can implement a few interfaces provided by the plugin to handle writes:
<?php
class Test extends ArrayIteratorÂ
implements MySQLStorage_Writable, MySQLStorage_Updatable, MySQLStorage_Deletable {
    public function write($data) {
        $this[] = $data;
    }
    public function update($data) {
        $this[$this->key()] = $data;
    }
    public function delete() {
        unset($this[$this->key()]);
    }
}
function create_table($table, $data) {
    return true;
}
function open_table($table) {
    return new Test(array(
        array('id' => 1, 'dat' => 'foo'),
        array('id' => 2, 'a' => 'bar')
    ));
}
?>
Again, we can test it:
mysql> UPDATE php_test SET val = 'baz' WHERE id = 1;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> DELETE FROM php_test WHERE id = 2;
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO php_test VALUES(3, 'bar');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM php_write;
+------+------+
| id | val |
+------+------+
| 1 | baz |
| 3 | bar |
+------+------+
2 rows in set (0.00 sec)
As a reminder: This is calling PHP for all these operations.
So what might real life use cases be, once the major issues in the code are fixed? I have a few ideas like
- A live-logfile query tool, not sure that's really need, see the primitive Apache httpd access_log parser which is provided with the code as an example
- Combine it with the embedded MySQL server and use this storage engine for your unit tests, "mock tables" ...
Any other ideas? - Leave a comment
Oh, and like most MySQL stuff nowadays: There's a launchpad project for this plugin.
Nov 3: Direct MySQL Stream Access


Ever wondered what your PHP application and MySQL actually do? An experimental mysqlnd branch will give you full access to the network communication stream. Using a custom PHP stream filter you can then intercept the communication ... but let's start at the beginning:
When talking about mysqlnd - the mysql native driver for PHP - we always mention the fact it's native in a way that we're, when possible, using PHP infrastructure. The most common example here is the memory management. By directly using PHP's memory we can avoid unnecessary copies of data from the MySQL Client Library's memory into PHP memory.
<?php
$mysqli = mysqli_connect("localhost", "root", "", "test");
$stream = mysqli_conn_to_stream($mysqli);
stream_filter_append($stream, "mysql.server", STREAM_FILTER_READ);
?>
But there's more what we're doing. We're also using PHP's stream abstraction layer. From a development perspective the benefit is that we're using a tested abstraction from different stream implementations by different operating systems instead of writing our own. But, again, there's more to it: We can export the communication stream to PHP userland. We hesitated about exporting it for some time as it can be quite dangerous and you might easily corrupt the client-server- communication.
As Ulf mentioned during his IPC talk I recently pushed a mysqlnd branch to launchpad which adds a userspace function to mysqli which returns a PHP stream for a connection. Using that stream you can now send your own requests to the server and wait for the response. That might be nice in a way, but I guess you most likely won't have use for that. PHP streams allow you to do more: PHP streams give you the possibility to add filters to a stream. These filters allow you to intercept packages which are sent or received , read them, change them or do whatever you like. A very simple filter can be found on the launchpad site, mentioned above. That filter simply prints the information after replacing unprintable (binary) characters by dots.
Once again: Just a small step, the next one is to decode the MySQL protocol. For that I've written a simple decoder for the MySQL protocol, not complete, but enough to give an idea. The script, including the decoder and some sample code using it, is, as a sample, part of the branch. When running you will get some output like
Query:
-> 0 59: QUERY: SELECT TABLE_SCHEMA FROM INFORMATION_SCHEMA.TABLES LIMIT 1
<- 1 1: DATA
<- 2 52: FIELD INFO
<- 3 5: EOF
<- 4 19: DATA
<- 5 5: EOF
Invalid Query:
-> 0 29: QUERY: ghdfjtgfdrs tztr ttgdszthtdr
<- 1 183: ERROR: 2000You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ghdfjtgfdrs tztr ttgdszthtdr' at line 1
Prepare:
-> 0 61: PREPARE: SELECT TABLE_NAME FROM INFORMATION_SCHEMA.STATISTICS LIMIT 2
<- 1 12: OK
<- 2 52: FIELD INFO
<- 3 5: EOF
Ping:
-> 0 1: PING
<- 1 7: OK
Execute:
-> 0 11: EXECUTE
<- 1 1: DATA
<- 2 52: FIELD INFO
<- 3 5: EOF
<- 4 15: OK
<- 5 15: OK
<- 6 5: EOF
EOF
-> 0 5: CLOSE_STMT
-> 0 1: QUIT
As one can see: The protocol isn't fully decoded yet so this all might be extended but for me it served the purpose well enough. For making real use out of this we're thinking about exporting the protocol decoder which exists within mysqlnd to PHP userland.
What are your ideas for such a feature? - Sending different queries to different servers? Rewriting queries? Sharding? Replication? Easy scaling of your application while refactoring your application? Let us know!
Oct 26: More on DTrace ... and MySQL

Angelo recently showed an easy way to dump SQL queries using DTrace, while reading the articles I felt that some important information is missing: The name of the user executing the query and the selected database. So I sat down a few minutes and tried to collect that data.
For the database name I found a quite simple solution: It is passed as parameter to the check_user() function to MySQL so we can easily add a thread-local variable to keep that name. Simple script for that:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$1::*check_user*:entry
{
self->db = arg4 ? copyinstr(arg4) : "(no schema)";
}
pid$1::*dispatch_command*:entry
{
printf("%s: %s\n", self->db, copyinstr(arg2));
}
Getting the username is a bit harder, for the PID-provider we need a function where it is passed as parameter, best would be if it was passed as char*, but there's no such function reliably called. (For instance there are functions for checking max connections per user where all we need is passed in a proper way, but it's only called when a limit was set) The only way would be to take the user_connect property of the THD object which is passed to dispatch_command and then access the username (and hostname). But getting that working from within DTrace is quite some work. I prepared some scripts doing this with simple C structures for the second part of my DTrace article series, which is ready in my head and is waiting to be typed, so in theory it should be possible, anybody wants to try?
The best solution, of course, would be to add proper probes into the MySQL server code which provide all that information.Feb 26: The deal is closed - MySQL is part of Sun


In January Sun announced to buy MySQL, today the deal was closed and MySQL is now part of probably the worlds biggest Open Source company. For a full coverage visit mysql.com, Kaj's Blog or Sun's new product site. I'll continue with my tasks now.