Nov 17: High Performance PHP Session Storage on Scale


One of the great things about the HTTP protocol, besides status code 418, is that it's stateless. A web server therefore is not required to store any information on the user or allocate resources for a user after the individual request is done. By that a single web server can handle many many many different users easily, and well if it can't anymore one can add a new server, put a simple load balancer in front and scale out. Each of those web servers then handles its requests without the need for communication which leads to linear scaling (assuming network provides enough bandwidth etc.).
Now the Web isn't used for serving static documents only anymore but we have all these fancy web apps. And those applications often have the need for a state. The most trivial information they need is the current user. HTTP is a great protocol and provides a way to do authentication which works well with its stateless nature - unfortunately this authentication is implemented badly in current clients. Ugly popups, no logout button, ... I don't have to tell more I think. For having nicer login systems people want web forms. Now the stateless nature of HTTP is a problem: The user may login and then browse around. On later requests it should still be known who that user is - with a custom HTML form based login alone this is not possible. A solution might be cookies. At least one might think so for a second. But setting a cookie "this is an authorized user" alone doesn't make sense as it could easily be faked. Better is to simply store a random identifier in a cookie and then keep a state information on the server. Then all session data is protected and only the user who knows this random identifier is authenticated. If this identifier is wisely chosen and hard to guess this works quite well. Luckily this is a mostly PHP- and MySQL-focused blog and as PHP is a system for building web applications this functionality is part of the core language: The PHP session module.
The session module, which was introduced in PHP 4, partly based on work on the famous phplib library, is quite a fascinating piece of code. It is open and extendable in so many directions but still so simple to use that everybody uses it, often newcomers learn about it on their first day in PHP land. Of course you can not only store the information whether the user is logged in but cache some user-specific data or keep the state on some transactions by the user, like multi-page forms or such.
In its default configuration session state will be stored on the web server's file system. Each session's data in its own file in serialized form. If the filesystem does some caching or one uses a ramdisk or something this can be quite efficient. But as we suddenly have a state on the web server we can't scale as easily as before anymore: If we add a new server and then route a user with an existing session to the new server all the session data won't be there. That is bad. This is often solved by a configuration of the load balancer to route all requests from the same user to the same web server. In some cases this works quite ok, but it is often seen that this might cause problems. Let's assume you want to take a machine down for maintenance. All sessions there will die. Or imagine there's a bunch of users who do complex and expensive tasks - then one of your servers will have a hard time, giving these users bad response times which feels like bad service, even though your other systems are mostly idle.
A nice solution for this would be to store the sessions in a central repository which can be accessed from all web servers.
Read MoreNov 14: mysqli_result iterations


For the last few months I had quite a few MySQL blog posts and didn't have anything from my "new features in PHP [trunk|5.4]" series. This article is a bridge between those two. The PHP 5.4 NEWS file has a small entry:
MySQLi: Added iterator support in MySQLi. mysqli_result implements Traversable. (Andrey, Johannes)
From the outside it is a really small change and easy to miss. The mentioned class, mysqli_result, implements an interface which adds no new methods. What once can't see is that this relates to some internal C-level functions which can be called by the engine for doing a foreach iteration on objects of this class. So with PHP 5.4 you don't have to use an "ugly" while construct anymore to fetch rows from a mysqli result but can simply do a foreach:
mysqli_report(MYSQLI_REPORT_STRICT); try { $mysqli = new mysqli(/* ... */); foreach ($myslqi->query("SELECT a, b, c FROM t") as $row) { /* Process $row which is an associative array */ } } catch (mysqli_sql_exception $e) { /* an error happened ... */ }
I'm configuring mysqli in a way to throw exceptions on error. This is useful in this case as mysqli::query() might return false in the case of an error. Passing false to a foreach will give a fatal error, so I'd need a temporary variable and a check in front of the foreach loop, with exceptions I simply do the error handling in the catch block.
One thing to note is that mysqli is using buffered results ("store result") by default. If you want to use unbuffered result sets ("use result") you can easily do that by setting the flag accordingly:
foreach ($myslqi->query("SELECT a, b, c FROM t", MYSQLI_USE_RESULT) as $row) { /* ... */ }
People who are advanced with iterators in PHP might ask "Why did you implement Traversable only, not Iterator?" - the main reason is that we simply didn't want to. The mysqli_result class already has quite a few methods and we didn't want to make the interface confusing. If you need an Iterator class for some purpose you can simply wrap mysqli_result in an IteratoIterator.