Nov 19: Using Terraform to configure MySQL Database Service

Recently the MySQL Database Service (MDS) was launched in Oracle Cloud Infrastructure (OCI). As Cloud is about automation you don't have to use the Web Console to configure your instances, but can do it via API, for instance with the oci command line tool or your favorite programming language. However often it is nice to define the world in a declarative way ("I want a network, like this, and a MySQL database like that") and let the tool figure out how to align the reality in the cloud with your wish. A tool doing this is Terraform.
With Terraform you can declare the state in description files, the tool creates a dependency graph and then applies what has to be applied and of course it supports OCI and as part of the default OCI Provider, there is even directly MDS support.
So let's build a set of description files for a cloud environment. I assume you have a tenancy and want to create a new compartment, with it's own VCN and a Compute instance with a client, like MySQL Shell to access the database.
For this configuration create a new empty directory. In there the first thing we need is to tell Terraform that we want to use the OCI provider for accessing OCI. We will come back to it, but for now this will be quite short. Whether we put everything in one file, or split it up and how our files are called doesn't matter for Terraform. It will scan for all files called something.tf and will build it's graph. I like relatively small files, one for each aspect of the configuration, but you are of course free. I start with oci.tf for my general configuration:
provider "oci" { version = "~> 3.95" region = var.region }
Here we say that we want at least version 3.95 for the OCI provider and configure our cloud region using a variable. All variables I use, and which can be set or overwritten, I put in a file called variables.tf, where I declare region like this:
variable "region" {}
As said the first thing I want to to create a Compartment. A Compartment in OCI is a grouping of different instances from services you are using. You can use Compartments for instance to group services different departments of your company are using and giving them different resource limits or having development and production systems separated or whatever you might need. By using a compartment here, we won't get in conflict with other services you are already using.
This is my compartment.tf:
resource "oci_identity_compartment" "mds_terraform" { name = "mds_terraform" description = "Compartment to house the MySQL Database and Terraform experiment" compartment_id = var.compartment_ocid enable_delete = true }
In the first line we declare that the following is a description of a resource of a type oci_identity_compartment, which inside our other Terraform resources will be called mds_terraform. Then we define the name of the compartment we want to have inside OCI. Here I'm using the same name both times, followed by a description, which might help your colleagues or your later self to understand the purpose. The compartment_id property here refers to the parent, as Compartments can be hierarchical nested. Finally setting the property enable_deleta means that Terraform will try to delete the Compartment, when we tell it to delete things.
As the parent Compartment is a variable again, we need to declare it, thus let's extend variables.tf:
variable "compartment_ocid" {}
With the compartment the first thing we need is our network. This is my vcn.tf:
resource "oci_core_vcn" "mds_terraform_vcn" { cidr_block = "10.0.0.0/16" dns_label = "mdsterraform" compartment_id = oci_identity_compartment.mds_terraform.id display_name = "mds_terraform_vcn" } resource "oci_core_internet_gateway" "internet_gateway" { compartment_id = oci_identity_compartment.mds_terraform.id vcn_id = oci_core_vcn.mds_terraform_vcn.id display_name = "gateway" } resource "oci_core_default_route_table" "default-route-table-options" { manage_default_resource_id = oci_core_vcn.mds_terraform_vcn.default_route_table_id route_rules { network_entity_id = oci_core_internet_gateway.internet_gateway.id cidr_block = "0.0.0.0/0" } } resource "oci_core_subnet" "test_subnet" { cidr_block = "10.0.2.0/24" display_name = "mds_tf_subnet" dns_label = "mdssubnet" security_list_ids = [oci_core_security_list.securitylist1.id] compartment_id = oci_identity_compartment.mds_terraform.id vcn_id = oci_core_vcn.mds_terraform_vcn.id route_table_id = oci_core_vcn.mds_terraform_vcn.default_route_table_id dhcp_options_id = oci_core_vcn.mds_terraform_vcn.default_dhcp_options_id } resource "oci_core_security_list" "securitylist1" { display_name = "securitylist1" compartment_id = oci_identity_compartment.mds_terraform.id vcn_id = oci_core_vcn.mds_terraform_vcn.id egress_security_rules { protocol = "all" destination = "0.0.0.0/0" } ingress_security_rules { protocol = "6" source = "0.0.0.0/0" tcp_options { min = 22 max = 22 } } ingress_security_rules { protocol = "6" source = "0.0.0.0/0" tcp_options { min = 3306 max = 3306 } } ingress_security_rules { protocol = "6" source = "0.0.0.0/0" tcp_options { min = 33060 max = 33060 } } }
This is quite a lot and I won't go through all things here, but this declares a VCN with a single subnet, adds an internet gateway, so that we can export services to the internet and can reach the internet from our VCN and sets ingress and egress firewall rules, to only allow traffic to MDS (ports 3306 and 33060) and SSH (port 22). What you might notice is how we are referring to the id of the Compartment we created before, by using oci_identity_compartment.mds_terraform.id and how the different network resources refer to each other in similar ways.
Now it's time to create our MDS instance! Here is my mysql.tf:
data "oci_mysql_mysql_configurations" "shape" { compartment_id = oci_identity_compartment.mds_terraform.id
type = ["DEFAULT"] shape_name = var.mysql_shape } resource "oci_mysql_mysql_db_system" "mds_terraform" { display_name = "Terraform Experiment" admin_username = var.mysql_admin_user admin_password = var.mysql_admin_password shape_name = var.mysql_shape configuration_id =data.oci_mysql_mysql_configurations.shape.configurations[0].id subnet_id = oci_core_subnet.test_subnet.id compartment_id = oci_identity_compartment.mds_terraform.idm.id availability_domain = data.oci_identity_availability_domain.ad.name data_storage_size_in_gb = var.mysql_data_storage_in_gb } output "mysql_url" { value = "mysqlx://${var.mysql_admin_user}:${var.mysql_admin_password}@${oci_mysql_mysql_db_system.mds_terraform.ip_address}:${oci_mysql_mysql_db_system.mds_terraform.port_x}" }
The actual MySQL Database Instance is declared in the second block, where we give it a name, configure the adminstrative user account, assign the subnet etc. Again we introduced some variables, so let's declare them in variables.tf:
variable "mysql_admin_user" { default = "root" } variable "mysql_admin_password" { } variable "mysql_shape" { default = "VM.Standard.E2.1" } variable "mysql_data_storage_in_gb" { default = 50 }
A few fields might need some extra explanation:
The shape is the machine type we want and defines CPU type, whether we want a VM, memory and so on. Here we default to VM.Standard.E2.1, which is the smallest type and good enough for an experiment. On a production system you probably want to override and use a larger shape.
Then MDS allows you to use different Configurations, so you can tune MySQL Configuration Variables for your application's needs. If you have your custom config you can provide the ID, but I want to use the default for that shape, so I use a data resource to look it up.
In many Cloud Region there are different Availability Domains, different data centers close to each other. The resources we created before span over ADs. However the MDS Instance has to live in a AD. To lookup the AD's ID based on the number of the AD we can put this in oci.tf:
data "oci_identity_availability_domain" "ad" { compartment_id = var.compartment_ocid ad_number = var.availability_domain }
And, again, I add another variable to variables.tf.
Now there's one more thing in the mysql.tf: An output block. This will ask Terraform to give us a summary once it is done.
With all these things ready we can execute it! For a start I want to use the Web Console and OCI's Resource Manager. For that I have to package my files, which I do from my command line:
$ zip mds-terraform.zip *.tf adding: compartment.tf (deflated 38%) adding: mysql.tf (deflated 61%) adding: network.tf (deflated 75%) adding: oci.tf (deflated 35%) adding: variables.tf (deflated 50%)
With that file we can login to the Console, and navigate to the Resource Manager.
After clicking the "Create Stack" button we can use the checkbox to tell the system that we have zip file and then either drag the file from a file manager or browse for the file.
Now we are being asked to fill the configuration variables we defined previously. No surprise is that our defaults are pre-filled, however the system also identified your region and Compartment ID! The Compartment ID suggested is the one which was used to create the Stack, which probably is the root aka. the tenancy's ID.
Now you could pick a password for the MySQL user and continue. However MDS has specific requirements on the password security and we would eventually fail later, so let's take a quick side tour and make this form a bit nicer. This can be done by providing a schema.yml file:
title: "MySQL Terraform Experiment" description: "An experimental Terraform setup to create MySQL Database Service Instances" schemaVersion: 1.1.0 version: "20190304" locale: "en" groupings: - title: "Basic Hidden" visible: false variables: - compartment_ocid - tenancy_ocid - region - title: "General Configuration" variables: - mysql_admin_user - mysql_admin_password variables: compartment_ocid: type: oci:identity:compartment:id # type: string required: true title: "Compartment" description: "The compartment in which to create compute instance(s)" mysql_admin_user: type: string required: true title: "MySQL Admin User" description: "Username for MySQL Admin User" minLength: 1 maxLength: 14 pattern: "^[a-zA-Z][a-zA-Z0-9]+$" mysql_admin_password: type: string required: true title: "MySQL Password" description: "Password for MySQL Admin User" pattern: "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%^&*()_+\\-=\\[\\]{};':\\\"\\\\|,.<>\\/?]).{8,32}$" outputGroups: - title: "MySQL Database Service" outputs: - mysql_url outputs: mysql_url: type: string title: "MySQL Connection String" visible: true
And then packing the zip file again:
$ zip mds-terraform.zip *.tf schema.yaml updating: compartment.tf (deflated 38%) updating: mysql.tf (deflated 61%) updating: network.tf (deflated 75%) updating: oci.tf (deflated 35%) updating: variables.tf (deflated 50%) adding: schema.yaml (deflated 57%)
In the stack configuration I now blick Back, upload the new file and get a nicer form.
So, let's enter a password and we can continue. (IMPORTANT: The password will be stored insecurely in the stack, for production usage you should secure it)
After completing the Wizard I come to an overview page for the Stack and can then pick the Terraform Apply Action. This will take about 15 minutes and create our resources.
After the process is done I browse to the MySQL Database Service page
But oh wait, there is no System in the list!? - Yes, since it is in the newly created compartment, so on the left I can select the mds_terraform Compartment. If it doesn't appear in the list my browser has an outdated version cached and I simply reload the page.
Now we have a MySQL Database Service Database Instance within a VCN and can't reach it. Not so good, so I add one more service to my Terraform configuration: A compute instance with pre-installed MySQL Shell. Here's the compute.tf:
data "oci_core_images" "images_for_shape" { compartment_id = oci_identity_compartment.mds_terraform.id operating_system = "Oracle Linux" operating_system_version = "7.8" shape = var.compute_shape sort_by = "TIMECREATED" sort_order = "DESC" } resource "oci_core_instance" "compute_instance" { availability_domain = data.oci_identity_availability_domain.ad.name compartment_id = oci_identity_compartment.mds_terraform.id display_name = "MySQL Database Service and Terraform Test" shape = var.compute_shape source_details { source_type = "image" source_id = data.oci_core_images.images_for_shape.images[0].id } create_vnic_details { assign_public_ip = true display_name = "primaryVnic" subnet_id = oci_core_subnet.test_subnet.id hostname_label = "compute" } metadata = { ssh_authorized_keys = var.public_key user_data = filebase64("init-scripts/compute-init.sh") } } output "compute_public_ip" { value = oci_core_instance.compute_instance.public_ip }
This creates a VM using the latest Oracle Linux 7.8 image and asks for a public IP address, so we can reach it from the outside. I also reference a script called init-scripts/compute-init.sh. This script looks like this and simply installs MySQL Shell from MySQL's yum repository:
#!/bin/sh cd /tmp wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm sudo rpm -i mysql80-community-release-el7-3.noarch.rpm sudo yum update sudo yum install -y mysql-shell
In variables.tf a new variable is to be added, which will ask for an SSH public key, so we can login to the machine and a variable to configure the shape with a sensible default:
variable "compute_shape" { default ="VM.Standard2.1" } variable "public_key" { }
For adding the new configuration field and new output to our form in the Resource Manager schema.yml needs a few minor updates, for simplicity here is the complete file:
title: "MySQL Terraform Experiment" description: "An experimental Terraform setup to create MySQL Database Service Instances" schemaVersion: 1.1.0 version: "20190304" locale: "en" groupings: - title: "Basic Hidden" visible: false variables: - compartment_ocid - tenancy_ocid - region - title: "General Configuration" variables: - mysql_admin_user - mysql_admin_password - public_key variables: compartment_ocid: type: oci:identity:compartment:id # type: string required: true title: "Compartment" description: "The compartment in which to create compute instance(s)" mysql_admin_user: type: string required: true title: "MySQL Admin User" description: "Username for MySQL Admin User" minLength: 1 maxLength: 14 pattern: "^[a-zA-Z][a-zA-Z0-9]+$" mysql_admin_password: type: password required: true title: "MySQL Password" description: "Password for MySQL Admin User" pattern: "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%^&*()_+\\-=\\[\\]{};':\\\"\\\\|,.<>\\/?]).{8,32}$" public_key: type: string title: "SSH Public Key" description: "An OpenSSH public key for accessing your compute instance" outputGroups: - title: "MySQL Database Service" outputs: - mysql_url - title: "Compute Instance" outputs: - compute_public_ip outputs: mysql_url: type: string title: "MySQL Connection String" visible: true compute_public_ip: type: string title: "Public IP" visible: true
Now I can package it up, again:
$ zip mds-terraform.zip *.tf schema.yaml init-scripts/compute-init.sh updating: compartment.tf (deflated 38%) updating: mysql.tf (deflated 62%) updating: network.tf (deflated 76%) updating: oci.tf (deflated 35%) updating: variables.tf (deflated 55%) updating: schema.yaml (deflated 57%) adding: compute.tf (deflated 54%) adding: init-scripts/compute-init.sh (deflated 39%)
And go back to the resource manager ... oh wait .. list is empty ... hah .. I'm in the wrong Compartment. Once that hurdle is bypassed I can select the Stack I created previously, click Edit and upload the new file. The wizard will now ask for the ssh key, which I copy from my $HOME/.ssh/id_rsa.pub before completing the wizard. then I again pick the Terraform Apply action and can observe how Terraform notices that most things already exist, but only the Compute instance is missing and creates it. A few minutes later it is done and the task completed.
On top of the page a new tab Application Information appeared and based on information from the schema.yml file giving me an mysqlx URL and an IP address. I then use that IP address to connect to the machine, using my ssh key and the usernamne opc. I have to confirm the server identity by typing yes and am on my Compute instance, which is in my VCN. I can then use MySQL Shell with the URL from the Terraform summary to connect to the MySQL instance.
MySQL Shell will by default start in JavaScript mode. If I'm not in mood for that I can type \sql and switch in SQL mode. I can also install other programs as I like, including my own, and connect to the MDS instance just like any other MySQL.
[opc@compute ~]$ mysqlsh 'mysqlx://root:...@10.0.2.5:33060' MySQL Shell 8.0.22 Copyright (c) 2016, 2020, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type '\help' or '\?' for help; '\quit' to exit. WARNING: Using a password on the command line interface can be insecure. Creating an X protocol session to 'root@10.0.2.5:33060' Fetching schema names for autocompletion... Press ^C to stop. Your MySQL connection id is 13 (X protocol) Server version: 8.0.22-u2-cloud MySQL Enterprise - Cloud No default schema selected; type \use <schema> to set one. MySQL 10.0.2.5:33060+ ssl JS > \sql Switching to SQL mode... Commands end with ; MySQL 10.0.2.5:33060+ ssl SQL > CREATE DATABASE foo; Query OK, 1 row affected (0.0052 sec) MySQL 10.0.2.5:33060+ ssl SQL > use foo Default schema set to `foo`. Fetching table and column names from `foo` for auto-completion... Press ^C to stop. MySQL 10.0.2.5:33060+ ssl foo SQL > CREATE TABLE t (id INT); Query OK, 0 rows affected (0.0236 sec) MySQL 10.0.2.5:33060+ ssl foo SQL > INSERT INTO t VALUES (1); Query OK, 1 row affected (0.0139 sec) MySQL 10.0.2.5:33060+ ssl foo SQL > SELECT * FROM t; +----+ | id | +----+ | 1 | +----+ 1 row in set (0.0007 sec)
Once you are done you can go back to the Web Console and the Stack's page and pick the Terraform Destroy Action and all things will be removed again.
Note: It can happen that the Cloud Init Script didn't finish, yet and MySQL Shell isn't installed. Then wait a few moments and try again. Also you might see an error like mysqlsh: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory. If that happens logout and back in. If the error persists run export LD_LIBRARY_PATH=/usr/lib/mysqlsh/ mysqlsh
as a workaround.
Now didn't I initially say that I want to automate it and not click in a Web Console? - Yeah I did and install the terraform tool locallyby downloading from terraform.io and then changing my oci.tf file. Previously I was inside OCI and could use my Web Session as authentication and gather data. From my local machine I have to configure more. The provider entry now looks like this:
provider "oci" { version = "~> 3.95" region = var.region tenancy_ocid = var.tenancy_ocid user_ocid = var.user_ocid fingerprint = var.fingerprint private_key_path = var.private_key_path }
There are new variables, so I add them to variables.tf:
variable "user_ocid" {} variable "fingerprint" {} variable "private_key_path" {}
Now I can run terraform init, which will read the files and download the oci provider. If I now run terraform apply it will ask me about all those variables. Best way to gather those is by installing the OCI command line tool and running oci setup bootstrap, which will guide you through the process to setup a client and putting relevant information in your $HOME/.oci/config file.
All these files are available on GitHub at https://github.com/johannes/mysql-database-service-terraform-example
Happy MySQLing.
If you want to see how to use a similar setup for running a serverless application using Node.js on OCI you can look at this Hands-on-Lab and I also suggest reserving some time to attend the Oracle Live event with a big MySQL announcement on December 2nd.
Sep 17: Using JSON features to restructure results

Recently there was a question about which clients were connected to a server being asked in the MySQL Community Slack. The relevant information is available from performance schema, as most connectors will send information about themselves when connecting:
select * from performance_schema.session_connect_attrs; +----------------+-----------------+------------------------+------------------+ | PROCESSLIST_ID | ATTR_NAME | ATTR_VALUE | ORDINAL_POSITION | +----------------+-----------------+------------------------+------------------+ | 130 | _pid | 17412 | 0 | | 130 | _platform | x86_64 | 1 | | 130 | _os | Linux-5.4.0 | 2 | | 130 | _source_host | maniacmansion | 3 | | 130 | _client_name | mysql-connector-nodejs | 4 | | 130 | _client_version | 8.0.22 | 5 | | 130 | _client_license | GPL-2.0 | 6 | | 130 | program_name | mysqlx-shell | 7 | | 129 | _client_name | libmysqlxclient | 0 | | 129 | _client_version | 8.0.21 | 1 | | 129 | _os | Linux | 2 | | 129 | _platform | x86_64 | 3 | | 129 | _client_license | GPL | 4 | | 129 | _pid | 17257 | 5 | | 129 | program_name | mysqlsh | 6 | | 131 | _pid | 17510 | 0 | | 131 | _platform | x86_64 | 1 | | 131 | _os | Linux | 2 | | 131 | _client_name | libmysql | 3 | | 131 | os_user | johannes | 4 | | 131 | _client_version | 8.0.22 | 5 | | 131 | program_name | mysql | 6 | +----------------+-----------------+------------------------+------------------+ 22 rows in set (0.0027 sec)
Now this gives us quite some information, but the person asking wanted to have it presented in a nice, transposed way, where for isntance the _client_name
and _client_version
appeared in their own column. As I like MySQL's JSON features I decided to use those for doing this. (There are other ways, which in many cases might be better, see Window functions, CTEs, sub-selects, ... this here is only to show JSON things)
First feature I needed is JSON Aggregation. JSON aggregation works similar to GROUP_CONCAT
or other aggregations: If we have a GROUP BY
clause we can build a JSON object, taking one column as key and one column as value:
SELECT PROCESSLIST_ID, JSON_PRETTY(JSON_OBJECTAGG(ATTR_NAME, ATTR_VALUE)) attribs FROM performance_schema.session_connect_attrs GROUP BY PROCESSLIST_ID; | PROCESSLIST_ID | attribs | | 129 | { "_os": "Linux", "_pid": "17257", "_platform": "x86_64", "_client_name": "libmysqlxclient", "program_name": "mysqlsh", "_client_license": "GPL", "_client_version": "8.0.21" } | 130 | { "_os": "Linux-5.4.0", "_pid": "17412", "_platform": "x86_64", "_client_name": "mysql-connector-nodejs", "_source_host": "maniacmansion", "program_name": "mysqlx-shell", "_client_license": "GPL-2.0", "_client_version": "8.0.22" } | 131 | { "_os": "Linux", "_pid": "17510", "os_user": "johannes", "_platform": "x86_64", "_client_name": "libmysql", "program_name": "mysql", "_client_version": "8.0.22" }
With this all data is properly aggregated. Now we can use my friend JSON_TABLE
to convert the JSON data back into a table:
SELECT PROCESSLIST_ID, i.* FROM ( SELECT PROCESSLIST_ID, JSON_OBJECTAGG(ATTR_NAME, ATTR_VALUE) j FROM performance_schema.session_connect_attrs GROUP BY PROCESSLIST_ID ) s, JSON_TABLE( s.j, '$' COLUMNS ( client_name VARCHAR(100) PATH "$._client_name", client_version VARCHAR(100) PATH "$._client_version" ) ) AS i ; +----------------+------------------------+----------------+ | PROCESSLIST_ID | client_name | client_version | +----------------+------------------------+----------------+ | 129 | libmysqlxclient | 8.0.21 | | 130 | mysql-connector-nodejs | 8.0.22 | | 131 | libmysql | 8.0.22 | +----------------+------------------------+----------------+
Any yay, we got a readable result about which active session is using which connector and can identify outdated ones. Of course we have all SQL things avaialble, thus adding a WHERE i.client_version != '8.0.22'
would filter the result accordingly.
Jun 26: UTC isn't always the answer for storing dates
When looking through online forums at date related questions a typical advice is like this:
Store dates in UTC and convert to a local timezone on display.
For many cases this is the absolute right choice. So if this is a date when an article was published it's useful for users to get their local time.
However we all know it - there never is a single true answer and the correct answer always is "it depends." An example:
You are building a system for booking a concert hall. Such tickets can be a bit in the future and have a printed time. Also there are other processes depending on that time (catering staff and security have to be in time etc.)
When storing this date in UTC all will be fineat first, browsing through the calendar will look fine and if somebody comes from across the timezone border they can see their local time. Nice.
Now politics come into play and change the rules for daylight saving times. Suddenly your event is one hour off. Sad.
And yes, these things happen. For instance:
- The EU is debating to get rid of daylight saving time in future. (as it was intended by the previous commission and the current hasn't touched it yet that's probably not going to happen, but you never know)
- Russia in 2011 decided to stay on DST and not switch back, but in 2014, after protests, they decided to switch to permanent "normal" time.
- Samoa introduced DST only in 2010, while they planned to do that in 2009 but delayed due to Tsunami
- Crimea switched timezone, when Russia took control
- One can dream of the world at least unifying the DST switchover dates some time in future
Thus, if the local time is important the advice to store UTC might not be so good, but rather store date, time and timezone. Best form for storing the timezone is not in the common abbreviation like "CEST" or "ET" which can be ambiguous (AST could be Arabian Standard Time UTC+3h or Atlantic Standard Time UTC-4h) but in the form used by the IANA tz database as Europe/Berlin or America/New_York.
Sep 27: Efficient Node.js Buffer usage
When building network libraries for Node.js, as we do at work, one quite quickly comes by Node's Buffer type. A Buffer gives access to a memory region in a quite raw form, allowing to handle raw data and allowing to interpret binary streams. The Buffer interface predates ES6 TypedArrays and has some optimizations.
Two optimisations are notable:
For one the slice() method does not copy data, but returns a view on the underlying data. This makes it quite efficient to work on a window of the data, but when writing one has to be careful. Simple example:
const buffer = Buffer.from("hello"); const slice = buffer.slice(1,2); slice[0] = 97; console.log(buffer.toString('utf8')); // will print 'hallo'
The second one is that allocating a small buffer won't actually go to the operating system and allocate a memory area, but Node.js has a memory region from which small buffers can be derived quickly. This actually works by using the slicing logic.
const buffer1 = Buffer.from("hello"); const buffer2 = Buffer.from("world"); console.log(buffer1.byteOffset); // 0 console.log(buffer2.byteOffset); // 8
This indicates that both buffers use the same memory region with an alignment of 8 bytes.
So how does this work? - Underlying the Buffer, in modern versions of Node is an ArrayBuffer. We can ask the Buffer to provide us with the underlying ArrayBuffer using the .buffer property. One thing one has to be careful about is that for a slice the returned ArrayBuffer will be the full buffer and not only a sliced part. Giving the two Buffers from above we can see this.
const buffer3 = Buffer.from(buffer2.buffer); console.log(buffer3.length); // 8192 const buffer4 = buffer3.slice(0, 5); console.log(buffer4.toString('utf8')); // hello
A raw ArrayBuffer doesn't provide many things, but we can create a new Buffer on top of it. This won't copy the data, but use the same memory as above. We can see the the pre-allocated block by Node (in the version I'm using for this test) apparantely is 8192 bytes. 8k is a common size used for multiple buffers. A factor in such a choice is that many filesystems use 512 byte blocks and 8k is a handleable multiple of it. Additionally CPU caches often are mutiples of 8k. So this is not a fully arbitrary choice.
Since most of the data most likely is rubbish we slice it and look at the beginning of the data and notably we're not seeing "world", but "hello" confirming that these are indeed using the same underlying buffer.
As said Buffer is a Node-specific type and different Node.js libraries can't handle it. One of thosel ibraries we use is the Google Protocol Buffer library. In simple terms protobuf is a format for serializing data for example for exchange over network. We can call it a typed and more efficient JSON alternative.
Protobuf's deserializeBinary() function now won't work with Buffer instaces, but requires a TypedArray, like Uint8Array. A Uint8Array is the ES6 counterpart to Node's Buffer. It is a layer on top of ArrayBuffer for accessing bytes.
The easy way to make it work is converting a Buffer to Uint8Array like this:
const buf = stream.read(length); const u8 = new Uint8Array(buf); const msg = Message.deserializeBinary(u8);
However this is inefficient as the Uint8Array constructor will copy the data. But as we learned, both Buffer and Uint8Array are just views on top of an ArrayBuffer. The Buffer gives us access to the ArrayBuffer and if we are careful about our slicing offsets we can ask the Uint8Array to re-use the same memory without copying:
const buf = stream.read(length); const u8 = new Uint8Array(buf.buffer, buf.byteOffset, buf.length); const msg = Message.deserialzeBinary(u8);
A tiny bit more to type, but a notable gain in performance, which we can verify with a trivial benchmark:
console.time("A"); for (let i = 0; i < 1000; ++i) { const b = new Buffer(10000); const u = new Uint8Array(b); } console.timeEnd("A"); console.time("B"); for (let i = 0; i < 1000; ++i) { const b = new Buffer(10000); const u = new Uint8Array(b.buffer, b.byteOffset, b.length); } console.timeEnd("B");
On my machine I see results like
A: 9.895ms B: 5.216ms
Difference in a real application of course will be different.
Oct 1: Destructing PHP

Already last year I had a session at the fabulous FrOSCon conference about "Destructing PHP" No this wasn't a flaming/trolling talk, but an attempt to teach a bit about some less known language feature of PHP: Deterministic Destructors. To explain this let's take a look at a simple example:
<?php
require('autoload.php');
function foo($db, $data) {
$db->beginTransaction();
foreach ($data as $row) {
$db->insert($row);
}
$db->commit();
}
$db = db::factory();
$data = data::factory();
try {
foo($db, $data);
} catch (Exception $e) {}
$db->insert(data::finalData()); ?>
Even if the syntax is correct this program is incorrect: It fails at exception safety. If an exception is thrown the transaction state is leaked. An exception might for instance be thrown by the $db->insert() call or, if $data is an iterator, by the iteration.
According to the program the data::finalData() should be stored independently from whether foo() succeeds or not. Unfortunately this isn't the case: If something in the loop inside foo() throws there will be an open transaction. The final data now becomes part of that transaction. As there is no further handling PHP will clean-up at the end of this program and automatically rewind the transaction. So let's fix this. A typical solution looks something like this:
function foo($db, $data) {
$db->beginTransaction();
try {
foreach ($data as $row) {
$db->insert($row);
}
} catch(\Exception $e) {
$db->rollback();
throw $e;
}
$db->commit();
}
So we catch the exception and in the error case we rollback and re-throw the exception. Now this program is correct, but admit it, this is pretty verbose and ugly.
The form I would like to see is this:
function foo($db, $data) {
$transaction = new Transaction($db);
foreach ($data as $row) {
$db->insert($row);
}
$transaction->commit();
}
Now this is correct and exception safe, while being clean without noise. You might have to look closely to see the difference to the initial version - we simply introduced an transaction object. The reason this works is that PHP's memory management is based on reference counting. With reference counting (which I explained in more detail in this recorded talk) PHP keeps track how many variables refer to an object and when the last reference is gone the object will be cleaned up and the destructor is being called. PHP also is function scoped, which means that when a function ends, whether is might be due to the end of the function, a return statement or by an exception, the variables from that function will be cleaned up. In the code above we have one reference to the transaction object so at the end of the function this will be cleaned up. This is massively different to garbage collected languages like Java or Go where objects are cleaned up at, more or less, random times. In PHP this is deterministic. The only case where PHP fall back on garbage collection is a case where you have cyclic references. As long as you don't have cycles you can figure out exactly when a destructor will be called by reading the code - admittedly, if you pass around an object a lot and store it in multiple places this can be complicated.
Now let's take a look at the implementation of our Transaction class:
class Transaction { private $db; private $success = false; function __construct($db) { $this->db = $db; $db->begin(); } function commit() { $this->db->commit(); $this->success = true; } function __destruct() { if (!$this->success) { $this->db->rollback(); } } }
The key here is that we track the state. If the destructor is being called without an explicit commit before an rollback i enforced.
Now I have to admit: This pattern is no invention by me. It's a common pattern used in C++, one of the very few other functions with deterministic destructors. C++'s father Bjarne Stroustrup introduced the name RAII for this - Resource Allocation Is Initialisation. So whenever one acquires a resource, in our example a database transaction, one also initialises an object who's lifetime controls the resource's lifetime. The critical part is not to pass this object around without thought. Using this pattern needs some training initially, but once you are used to it is a very good way to write exception safe code in a clean way.
Now, for fun, in my talk I showed another trick which you can play with deterministic destructors: Ensure that a return value is actually being used. So let's assume you have a function which is very expensive and calculates a value and you want to ensure that nobody refractors the code and doesn't check the return value, thus
echo expensiveCalculation();
should work, while
$a = expensiveCalculation(); unset($a);
should throw an error. To achieve this our expensiveCalculation() function won't return the value directly but wrap it in an EnforceUsage object which might be defined like this:
class EnforceUsage {
private $value;
private $used = false;
function __construct($value) {
$this->value = $value;
}
function __toString() {
$this->used = true;
return (string)$this->value;
}
function __destruct() {
if (!$this->used) {
Logger::notice("Return value not used");
}
}
}
I admit - unlike the RAII pattern rom above - this is hardly useful in PHP, but shows the power we have at our hands.
For completeness here are the slides of the talk I mentioned in the beginning:
Sep 15: MySQL 8.0: JSON Aggregation functions

In MySQL 5.7 the new JSON support had been added. The JSON support consists out of three features:
All three features combined allow building very powerful applications without committing to a fixed data structure on every part, but I was missing one thing a lot: Aggregation.
A topic I'm thinking a lot about is finding ways to select nested data. In MySQL we typically have two ways for this. Either one sends multiple queries to retrieve different nesting levels or one builds JOINs which tend to deliver repetitive responses. A tool we got for a while is GROUP_CONCAT. With GROUP_CONCAT we can get some of our nested information:
mysql> SELECT blog.*, GROUP_CONCAT(blog_comment.id) AS comments FROM blog JOIN blog_comment ON blog.id = blog_comment.blog_id GROUP BY blog_id; +----+-------------+----------------------+----------+ | id | title | body | comments | +----+-------------+----------------------+----------+ | 1 | Hello World | This is my new Blog! | 1,2 | | 2 | spam trap | spam goes here | 3 | +----+-------------+----------------------+----------+ 2 rows in set (0.07 sec)
This gives us the ids for all comments to a specific blog post. But it's not really nice to process. To improve this we can try to abuse GROUP_CONCAT like this:
mysql> SELECT blog.*, CONCAT('[', GROUP_CONCAT( JSON_OBJECT("author", blog_comment.author, "body", blog_comment.body)), ']') AS comments FROM blog JOIN blog_comment ON blog.id = blog_comment.blog_id GROUP BY blog_id; +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ | id | title | body | comments | +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ | 1 | Hello World | This is my new Blog! | [{"body": "Great blog!", "author": "Your Friend"},{"body": "Great new blog", "author": "Best Friend"}] | | 2 | spam trap | spam goes here | [{"body": "Buy these pills to enlarge your blog", "author": "Visitor"}] | +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec)
But this has multiple issues, not only is it kind of ugly and error-prone to build our JSON string manually using CONCAT() like this, but also GROUP_CONCAT uses an internal buffer with a default size of only 1024, which, while configurable, can easily be too little for our use case.
Now in the new development milestone release for MySQL 8.0 we can see two new aggregation functions: JSON_ARRAYAGG() and JSON_OBJECTAGG() which build proper JSON arrays or objects without any need for a hack:
mysql> SELECT blog.*, JSON_ARRAYAGG( JSON_OBJECT("author", blog_comment.author, "body", blog_comment.body) ) AS comments FROM blog JOIN blog_comment ON blog.id = blog_comment.blog_id GROUP BY blog_id; +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ | id | title | body | comments | +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ | 1 | Hello World | This is my new Blog! | [{"body": "Great blog!", "author": "Your Friend"},{"body": "Great new blog", "author": "Best Friend"}] | | 2 | spam trap | spam goes here | [{"body": "Buy these pills to enlarge your blog", "author": "Visitor"}] | +----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec)
This is now quite nice and by using the proper aggregation function we can be sure that our result will always be valid JSON, which allows further easy processing.
Sep 3: Types in PHP and MySQL


Since PHP 7.0 has been released there's more attention on scalar types. Keeping types for data from within your application is relatively simple. But when talking to external systems, like a database things aren't always as one eventually might initially expect.
For MySQL the type we see -- in the first approximation -- is defined by the network protocol. The MySQL network protocol by default converts all data into strings. So if we fetch an integer from the database and use PHP 7's typing feature we get an error:
<?php declare(strict_types=1); function getInteger() : int { $mysqli = new mysqli(...); return $mysqli->query("SELECT 1")->fetch_row()[0]; } var_dump(getInteger()); ?> Fatal error: Uncaught TypeError: Return value of getInteger() must be of the type integer, string returned in t.php:6
Of course the solution is easy: Either we cast ourselves or we disable the strict mode and PHP will cast for us.
Now let's take a look at another case. Assume we have an application where we fetch an integer ID from the database. We know MySQL will send us a string and we treat the ID as opaque data anyways so we have the type check for a string. Now we refactor the code slightly and make use of prepared statements. What will the result be?
<?php declare(strict_types=1); function getId() : string { $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 1"); $stmt->execute(); return $stmt->get_result()->fetch_row()[0]; } var_dump(getId()); ?> Fatal error: Uncaught TypeError: Return value of getId() must be of the type string, integer returned in t.php:8
Wait! - What's up there!? -- Didn't I just say that the MySQL protocol will always send a string, thus we retrieve a string in PHP!? - Yes I did and that's true for "direct queries." It's not true for results from prepared statements. With prepared statements the MySQL protocol uses a binary encoding of the data and therefore mysqlnd and mysqli will try to find the matching PHP type. This isn't always possible, especially if we're going into the range of big values. So let's query for PHP_INT_MAX and PHP_INT_MAX+1 and look at the types:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 9223372036854775807, 9223372036854775808"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> int(9223372036854775807) [1]=> string(19) "9223372036854775808" }
Here 9223372036854775807 is the largest value a PHP integer can represent and thus is an integer. 9223372036854775808 however is to large and can't fit in a signed 64bit integer thus it is converted in a string, as this keeps all information and can be handled at least to some degree.
Similar things happens to other types which can't be properly represented in PHP:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT 1.23"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> string(4) "1.23" }
Yay - yet another wtf! So what is going on this time? -- Well, a literal in SQL is treated as DECIMAL. A DECIMAL field is supposed to be precise. If this were to be converted into a PHP float aka. double we probably would loose the precision, thus treating it as string again makes sure we're not loosing information. If we had a FLOAT or DOUBLE field this could safely be represented as float in PHP:
<?php $mysqli = new mysqli(...); $stmt = $mysqli->prepare("SELECT RAND()"); $stmt->execute(); var_dump($stmt->get_result()->fetch_row()); ?> array(2) { [0]=> float(0.16519711461402206) }
So to summarize:
- For a direct query the MySQL server sends strings, PHP returns all data as string
- For prepared statements MySQL sends data in binary form and PHP will use a corresponding type
- If the value could only be represented with a potential data loss in PHP it is converted to a string by PHP, even with prepared statements
Now we might expect the same when using PDO. Let's check:
<?php $pdo = new PDO("mysql:host=localhost", "...", "..."); $stmt = $pdo->prepare("SELECT 9223372036854775808, RAND()"); $stmt->execute(); var_dump($stmt->fetch(PDO::FETCH_NUM)); ?> array(2) { [0]=> string(1) "1" [1]=> string(18) "0.3217373297752229" }
This example uses prepared statements, but returns strings!? The reason is that PDO by default doesn't use prepared statements on the network layer but an emulation within PHP. This means PHP will replace potential placeholders and then runs a direct query. As mentioned above with a direct query the MySQL server will send strings, thus PHP will represent all data as string. However we can easily ask PDO to disable the emulation:
<?php $pdo = new PDO("mysql:host=localhost", "...", "..."); $pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false); $stmt = $pdo->prepare("SELECT 1, RAND()"); $stmt->execute(); var_dump($stmt->fetch(PDO::FETCH_NUM)); ?> array(2) { [0]=> int(1) [1]=> float(0.24252333421495) }
This leaves the question whether you should disable the emulation in order to get the correct types. Doing this has some impact on performance characteristics: With native prepared statements there will be a client-server round-trip during the prepare and another round-trip for the execute. With emulation only during the execute. The native prepared statements also require some server resources to store the handle. However if a single statement is executed multiple times there might be some savings. Also the type representation means that different type conversions happen and a different amount of data is transfered. For most cases this shouldn't have notable impact, but in the end only a benchmark will tell.
Hope this helps to give a better understanding, or more confusion
Feb 23: More on references

In a few different places I saw comments about my last blog post about references and performance where commentators noted that my example was pointless. Which of course is true and to some degree the point.
I read a lot of PHP code and from time to time I see people with a non-PHP background (or otherwise influenced) putting references everywhere they pass arrays or such in order to prevent copies. I knew this was a bad practice in PHP 5 and wanted to verify this in PHP 7. For readers with a stronger PHP background this doesn't come to mind and so comments are like "what if I want to modify the data?" which might lead to something like this:
function modify(&$data) { $data["foo"] = "bar"; } $data = [ /* huuuuuge array */ ]; modify($data);
In this code, from a performance perspective, the reference likely works out and this is "fast." My primary critic in this would be that references aren't idiomatic in PHP. Therefore most people reading this code wouldn't expect that $data is being changed in this function call. Luckily the name of the function give this away, to some degree. The more idiomatic way might be along those lines:
function modify($data) { $data["foo"] = "bar"; return $data; } $data = [ /* huuuuuge array */ ]; $data = modify($data);
I consider this more readable and clearer, while it will create a (temporary) copy, leading to more CPU and peak memory load. Now we have to decide how much clarity we want to take out of the code as compromise for a performance gain. After that decision has been made and we decided to go for the approach with references we fix an issue or add a new feature to our code and we make a slight change and suddenly loose what we've gained before. Maybe we do something like this:
function modify(&$data) { if (!in_array("bar", $data)) { // A copy happens here $data["foo1"] = "bar"; } if (!in_array("baz", $data)) { // Yet another copy here $data["foo2"] = "baz"; } } $data = [ /* huuuuuge array */ ]; $data2 = $data; modify($data); // A copy happens here, to split $data and $data2
So the performance gain we once carefully produced fired massively back to us and we even got three copies. In this short case this quite obvious, but in an larger application context with real life changes tracking this is really hard.
If we had written this in the (in my opinion) more idiomatic way this would look like this:
function modify($data) { if (!in_array("bar", $data)) { $data["foo1"] = "bar"; // Maybe a copy here } if (!in_array("baz", $data)) { $data["foo2"] = "baz"; // Maybe copy here, but only if not copied above already } return $data; } $data = [ /* huuuuuge array */ ]; $data2 = $data; $data = modify($data);
So depending on the conditions we might end up with either no or at most one copy, compared to the three copies from above. Of course this example is constructed but the point is: If you use references for performance you have to be extremely careful and know exactly what you're doing and think about each coming modification.
Now let's take a step back and think a bit more about this code. Isn't there yet another way? - We have data and we have functions operating on them. Wasn't there another construct which we might use? - Yes, we could go object-oriented!
class DataManager { private $data; public function __construct() { $this->data = [ /* huuuuuge array */ ]; } public function modify() { if (!in_array("bar", $this->data)) { $this->data["foo1"] = "bar"; } if (!in_array("baz", $this->data)) { $this->data["foo2"] = "baz"; } } } $dm = new DataManager(); $dm2 = $dm; $dm->modify();
Suddenly we have a higher degree of abstraction, encapsulation and all those other OO benefits and no copy of the data at all. Ok, yes I cheated: I didn't remember the purpose of the $dm2 = $dm assignment any more. So maybe we need to clone there and create an explicit copy. (While then again - for the $data property we'd probably benefit from copy-on-write making even the cloning quite cheap)
In summary: Yes, when careful you can be slightly more performant in both CPU and memory usage, but in real life that gain is often lost again and eventually fires back in maintenance cost and performance loss.
Now aren't there cases where references might be a good thing? - The only reason I found in recent times (except from an extremely carefully crafted tree structure I've seen, for which I'd usually suggest an OO way) is around anonymous functions/closures. Taking this example:
$data = [ /* ... */ ]; $oldsum = 0; $doubled = array_map(function ($element) use (&$oldsum) { $oldsum += $element; return $element * 2 }, $data);
Again, the example in itself might be bad, but in such a context where we provide a closure as callback and want to keep some "trivial" state references are a way which is ok. If the state we want to keep becomes more complex than a counter it, however, might be worthwhile to think about using an object to keep it or find some other code structure.
Feb 18: References - Still bad in PHP 7

I'm known for telling "Don't use references" (also as video) as those cause different problems (i.e. with foreach) and hurt performance. The reason for the performance loss is that references disable copy-on-write while most places in PHP assume copy-on-write. Meanwhile we have PHP 7. In PHP 7 the internal variable handling changed a lot among other things the reference counting moved from the zval, the container representing a variable, to the actual element. So I decided to run a little test to verify my performance assumption was still valid.
In my test code I'm calling a function which calls strlen (one of the cheapest functions in PHP - PHP strings carry their length, so it simply returns that property) from a loop. Once the function takes a parameter by reference, once per value. Here's the code:
<?php function takeRef(&$string) { strlen($string); } function takeVal($string) { strlen($string); } function loopRef() { $start = microtime(true); for ($i = 0; $i < 50000000; ++$i) { $s = "hello world"; takeRef($s); } return microtime(true) - $start; } function loopVal() { $start = microtime(true); for ($i = 0; $i < 50000000; ++$i) { $s = "hello world"; takeVal($s); } return microtime(true) - $start; } $ref = $val = PHP_INT_MAX; for ($i = 0; $i < 10; ++$i) { $ref = min($ref, loopRef()); $val = min($val, loopVal()); } echo "Ref: $ref\nVal: $val\n"; ?>
If I run this in PHP 5, in order to have a baseline, I get this result:
Ref: 10.679290056229 Val: 9.5635061264038
So using a reference costs 10%.
Now let's try PHP 7:
Ref: 10.631688117981 Val: 9.0047070980072
Overall we saw a small performance improvement, like we expect with PHP 7 but still using a reference costs 10% throughput. So I still stand with my mantra: Don't use references in PHP!
If you wonder about the second loop in the bottom and the min() call: The code takes multiple samples and then takes the measurement with the least noise as my system isn't 100% idle and there might be unrelated events I don't want to measure so the fastest run is closest to raw system performance.
Jan 24: MySQL Tour Munich


In February my colleague Morgan Tocker will travel through Europe and talk about MySQL 5.7, which recently went GA.
In Munich we're having a meeting for PHPers and other Web developers, where Morgan will talk about MySQL and JSON and I'll also be around.
Summary of the talk:
With the newly added JSON support in MySQL, you can combine the flexibility of NoSQL with the strength of a relational database. In this session, Morgan will explain the new JSON datatype, and the new set of functions for handling JSON documents, available storage and indexing options. In addition, he will present benchmarks showing how much one can expect from the MySQL server in different use case scenarios.
About Morgan: Morgan rejoined the MySQL team at Oracle in 2013 as MySQL Community Manager, having previously worked for MySQL and Sun Microsystems. He blogs from tocker.ca, and especially likes the new features in MySQL that make operations life easier.
The meeting will be held at Oracle's Munich Office, Meeting Room Van Gogh, Ground floor building B, Riesstr. 25 (close to underground station Olympiaeinkaufszentrum) on February 4th at 18:30. Open for anybody interested.
Aug 14: PHP 5.3 - Thanks for all the Fish

A few moments ago I pushed the buttons and PHP 5.3.29 came out. As this is the final release for 5.3 it is a good time to look back. PHP 5.3's history starts somewhere in 2005. We knew what a pressure point of PHP was - a language made for solving The Web Problem needs a good Unicode story. So some developers went deep into that complex area and created a prototype version of PHP with Unicode support from deep within the engine. As this was a big and pressing issue and the need was obvious and the solution looked promising it was quickly areed on making that the base for a future PHP 6. And then time passed, initial enthusiasm passed and the sheer amount of work became obvious. Two years in we noticed that the ongoing PHP 6 work blocked other work - new features couldn't be added to 5.2, the current version at that time, and adding them to (at that time) CVS's HEAD.
For solving the blocking issue we decided to create an intermediate release, pacing in all the things piled up, so on 2007-09-26 we branched off a CVS branch PHP_5_3.
Branching of PHP 5.3 set lots of enthusiasm free, and people started creating features and going into heated debates about the direction we should take so I was happy when Lukas volunteered to assist in the release management as Co-RM, playing a big role in making PHP 5.3, one of the most feature rich PHP releases, a huge success which was declared stable two years after branching of on June 30th 2009!
In those two years of development, from branching of till releasing 5.3.0 stable, we saw 5,338 commits by 83 committers (also committing work by other contributors without direct commit access) seeing 10,125 files being changed, with 1,089,600 insertions and 270,921 deletions (including tests and and generated files like parsers etc.) PHP 5.3 introduced many things many PHP developers see as normal and can hardly remember not using - things like namespaces or anonymous functions. It also introduced goto, late static binding, nowdoc, ?:, exception linking fileinfo, intl, mysqlnd, ... while also being a massive boost in performance. A massive release.
While trying to release 5.3.0 we noticed issues in our process. Notable things were that we, for a long time, didn't have a fixed cut of date and couldn't offer a promise when the next release will come. As a consequence people tried hard to push features in, as they feared having to wait a few years for the net release. In consequence a stricter release process with yearly releases etc. was created. Which lead to PHP 5.4 and 5.5 being almost on time and the upcoming PHP 5.6 being well on track.
Now development of 5.3 didn't stop with 5.3.0 but saw 29 bugfix releases with 7,554 commits from 152 comitters (due to the move to git in between a single committer might be counted multiple times, on the other hand more "external" contributor's names are being kept) and seeing 4,862 files being changed, 376,187 insertions and 207314 deletions.
On the personal side being the release master of PHP 5.3 gave me the opportunity to travel between Moscow and California and teaching different audiences in multiple languages about the great work, which was done mostly by others. (Check the ChangeLog to see whom to thank for your favorite feature!)
But now it's time to close that chapter - as of now PHP 5.3 is not supported anymore and the different RM teams and contributors are making PHP even better than PHP 5.3 ever was, as we can see in existing and previews of future releases.
Thank You All, it was a great time!
Feb 24: On rumors of "PHP dropping MySQL"


Over the last few days different people asked me for comments about PHP dropping MySQL support. These questions confused me, but meanwhile I figured out where these rumors come from and what they mean.
The simple facts are: No, PHP is not dropping MySQL support and we, Oracle's MySQL team, continue working with the PHP community.
For the long story we first have to remember what "PHP's MySQL support" includes. There key part are four extensions which are part of the main PHP tree:
- ext/mysql
- ext/mysqli
- ext/pdo_mysql
- ext/mysqlnd
The first one, ext/mysql provides the mysql_* functions. This is the classic interface taught in many (old) books and used by lots of (old) software. mysqli is "mysql improved", this is a younger extension providing access to all MySQL features. pdo_mysql contains the driver for PDO, PHP's database API abstraction layer. mysqlnd is the MSQL native driver, this module goes mostly unseen and provides a common set of functionality for the three other modules in order to talk to the MySQL Server.
Additionally we maintain a bunch of PECL extensions plugging into mysqlnd, which I won't mention in detail, the PHP documentation has info on most.
Now that's a lot of code and we an look at what is actually happening:
The old mysql extension, ext/mysql, is old. The code goes back to the early days of PHP and MySQL and embraces some bad design decisions. For example if no explicit connection resource is passed all functions will try to use the last connection which was being used. So given a simple example like this:
<?php mysql_connect("mysql.eample.com", "user", "password"); mysql_select_db("test"); $result = mysql_query("DELETE FROM t WHERE id = 23"); ?>
This might do weird things. Let's assume the connect fails, as the error is not handled the script continues to run and will call mysql_select_db(). This won't directly fail but guess and try to connect to a server (most likely on localhost), if that fails the script still won't terminate but mysql_query() will again guess and try to connect. If all things come together this will suddenly work and the script will operate on a completely different database than expected which can have really bad consequences.
But that's not all. As said the code goes way back, it is grown with PHP and MySQL. It tries to be compatible with all versions of MySQL since at least 3.23 this all makes the code hard to maintain.
When PHP 5.0, which added mysqli, came along in 2004 it was decided that maintaining this is troublesome and that we won't add new features to that old extension but only to mysqli (as well as to pdo_mysql, which came along a bit later in PHP 5.1, as long as it makes sense) We also started to advertise these newer extensions over the old one.
So we lived on for a while, added features to mysqli, fixed a few bugs in mysql, normal operations. Over the time we noticed that people still use the old extension even for new projects and prevent them access from features (i.e. prepared statements or support for multiple result sets as needed for stored procedures etc.) but we also knew that we can't simply deprecate and remove the extension as it is way to commonly used. So in 2012 we started a "soft deprecation" process, which meant to add deprecation warnings to the documentation and suggesting alternatives using mysqli or PDO.
A bit later, with PHP 5.5 which was released in June 2013, it was decided to add such a deprecation notice to the code, so each time a script is connecting to a MySQL server using the mysql extension a deprecation notice would be triggered.
That's the state we are in and there is no date by which the old mysql extension will be removed from PHP. Atsome point in the future. Certainly not the upcoming PHP 5.6, though.
Why not? - Since we are aware of many projects with a long history who can't simply swap this out. One of these projects is Wordpress. And this brings us to the current discussion:
Wordpress is an old project, going back to the days of PHP 4 where there was only the old mysql extension and nothing else. Wordpress also doesn't live on its own but with tons of plugins extending all kinds of features. Some of these go equally long back, many need database access, so many make more or less direct use of ext/mysql. After quite some discussions and heated debate in different channels the Wordpress developers now decided to do the switch. As they are aware of the trouble this causes to the plugin environment they are carefully, though - they actually allow switching between both extensions, mysql and mysqli.
As always such major changes become heated and imprecise statements loose their context and thus wrong messages circulate. So nothing to worry about, while I'd like to encourage all users of the old mysql extension to follow Wordpress and other examples and do the switch.
I hope this helped to clear things up!
Oct 9: Sharding PHP with MySQL Fabric


PHP users who attended any of my recent PHP&MySQL related talks or read Ulf's blog will know our mysqlnd_ms plugin. This plugin hooks into PHP's mysqlnd library and provides transparent support for replication and load-balancing features. Without changing your application you get transparent load-balancing and read-writ splitting so all your reading queries will be sent to a slave while the writes go to a master server. The exact strategies for that can be defined in mysqlnd_ms's configuration so quite often no, or only few application changes are needed. But we have one limitation: The MySQL servers have to be configured in each configuration on all your PHP servers, this can be annoying when you're changing your environment like adding a new slave or promoting a machine to master in case the original master fails. But there's help coming!
At this year's MySQL Connect conference we've announced the initial labs release for MySQL Fabric. MySQL Fabric aims to be "an integrated environment for managing a farm of MySQL server supporting high-availability and sharding." Part of this framework is an RPC interface to query available servers which are managed by MySQL Fabric which delivers us the missing piece for mysqlnd_ms.
As this release of Fabric put the focus on sharding, this is what I want to show here, too. A general introduction to MySQL Fabric and its sharding features can be found on VN's blog so I'll be quite fast in some areas, for details please refer to the documentation and the mentiond blogs.
The first thing we need is the MySQL Utilities package with Fabric support which is available from labs.mysql.com. After installing this package you have to locate the main.cfg configuration file and configure the storage instance.
[storage] database = fabric user = fabric address = localhost:3300 connection_timeout = 6 password = i do no now
This is a MySQL server where Fabric will store its configuration and such. After that we can initialize Fabric and start the daemon.
$ mysqlfabric setup $ mysqlfabric start
The setup step creates tables in the configured database and the start starts the daemon process. Now we can o and configure our server groups. A server group contains a master server where the group's data is being written to and a number of slaves to which MySQL will replicate data. For our sample sharding setup I plan to create two shards and a global group. The purpose of the global group is to hold table definitions and data which Fabric will make available on all systems in our environment. Each of these groups will, in this example, have one master and one slave. This means we need six MySQL server instances running. These six instances should all be running MySQL 5.6. an except from having binary logging enabled and having different server ids there is no replication configuration needed before running these commands. In my example setup I'm running all of those on one machine, obviously that's only useful for tests:
$ mysqlfabric group create global $ mysqlfabric group add global 127.0.0.1:3301 root secret $ mysqlfabric group promote global $ mysqlfabric group add global 127.0.0.1:3302 root secret $ mysqlfabric group create shard1 $ mysqlfabric group add shard1 127.0.0.1:3303 root secret $ mysqlfabric group promote shard1 $ mysqlfabric group add shard1 127.0.0.1:3304 root secret $ mysqlfabric group create shard2 $ mysqlfabric group add shard2 127.0.0.1:3305 root secret $ mysqlfabric group promote shard2 $ mysqlfabric group add shard2 127.0.0.1:3306 root secret
So this creates the three groups and will configure the servers to replicate the servers as needed. With this setup the server on port 3301 will be the global master. 3302, 3303 and 3305 will e 3301's direct slaves and 304 will be configured to be a slave for 3303, as will 3306 to 3305.
Now we go to define our sharding rules. I'm going to use range based sharding with two shards. The first shard, which will be assigned to the server group shard1 created above will have shard id 1 to 9999 and the second shard, in group shard2 will have data for shard key values 10000+. We also define the table fabrictest in the test schema as our sharding tale and id as the shard column.
$ mysqlfabric sharding define RANGE global $ mysqlfabric sharding add_mapping 1 test.fabrictest id $ mysqlfabric sharding add_shard 1 shard1 ENABLED 1 $ mysqlfabric sharding add_shard 1 shard2 ENABLED 10000
Note that for range-based sharding we don't have to define the upper bound as that is defined by the lower bound of the next shard.
Now we have MySQL Fabric and our MySQL Servers configured and can go to PHP. As mentioned in the beginning we need mysqlnd_ms, to be precise the 1.6.0 alpha release which we can install using pecl:
$ sudo pecl install mysqlnd_ms-alpha
To configure PHP we firstly need a mysqlnd_ms configuration file. myslqnd_ms uses json and a simple confiuration using Fabric might look like this:
fabric.json:
{ "test" : { "fabric":{ "hosts": [ { "host": "localhost", "port": 8080 } ] } } }
This configures the application test to use a MySQL Fabric based setup where MySQL Fabric's RPC daemon runs on the local machine. Again: We put all on one machine for a test, not what one would do on a production setup.
Next we locate our system's php.ini file and enable mysqlnd_ms to use our config.
php.ini: extension=mysqlnd_ms.so mysqlnd_ms.enable=1 mysqlnd_ms.config_file=/path/to/fabric.json
And now we are finally done and run a test script.
<?php $c = new mysqli("test", "root", "", "test"); echo "Creating global table:\n"; mysqlnd_ms_fabric_select_global($c, "test.fabrictest"); var_dump($c->query("CREATE TABLE fabrictest (id INT NOT NULL)")); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; echo "Inserting with ID 10:\n"; mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10); var_dump($c->query("INSERT INTO fabrictest VALUES (10)")); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; echo "Trying to read id 10 from 10010:\n"; mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10010); $r = $c->query("SELECT * FROM fabrictest WHERE id = 10"); var_dump($r->fetch_row()); echo "Server used: ".mysqlnd_ms_get_last_used_connection($c)['scheme']."\n\n"; ?>
With this script we do a few things, first observation is the hostname test. The mysqlnd_ms plugin will recognize that as application name and will refer to its configuration. Second are the mysqlnd_ms_* functions. First we pick the global group and execute a CREATE TABLE operation there. mysqlnd_ms will detect that this is a write operation and therefore connect to the globals master. This should be 127.0.0.1:3301 which hopefully is printed by the echo call. Then we select the shard responsible for id 10 in the fabrictest table and insert data. mysqlnd_ms will, again, detect that this is a write operation and will therefore figure out where writes to that shard have to go to, which is 127.0.0.1:3303. Finally we do an operation which will not really succeed: We select the servers for shard 10010 which is shard2 from our setup and then query for id 10. the data we stored in shard1. This will query 127.0.0.1:3306 (slave of 3305 in shard2 group) which will return an empty result set.
I hope this is enough to get you started, you can now add shards or migrate them or take servers down and promote current slaves to masters etc. and see how the system reacts.
In a future post we will combine this with Doctrine stay tuned.
Note: This blog post features labs/alpha releases. Which aim at demonstrating functionality. They are not for production use. There might be stability issues, there certainly are performance restrictions we're working on. We'd like however to receive feedback.
Jun 20: PHP 5.5 is out, what's up with 5.4 and 5.3?

Yay, finally we released PHP 5.5, which is a new big release for PHP. In preparation for this I yesterday sent out a mail to the PHP core developers stating that the "PHP-5.3 BRANCH IS CLOSED NOW". After I saw this quoted on twitter and different websites I want to make a few things clear for users of PHP:
- The mail is an information for core developer that all changes for 5.3 should go by the release master and our security group
- We won't do normal bug fixes
- we will continue doing security fixes for a year where needed
What this means for users of PHP is that they can continue using PHP 5.3 and when upgrades come they are very low risk of breaking anything (we always try not to break anything, but any person's bug might be another person's feature) so they should be applicable easily and applied fast. So when you are a happy PHP 5.3 user and don't want to touch too many things there is no immediate need to upgrade to 5.4 or 5.5 - for a year.
So even when you don't have to migrate why should you migrate? - Besides the new features I see two major reasons:
- Newer versions of PHP are generally more performant and efficient than older versions, meaning your users get faster response and you need less hosting/cloud resources to run your system.
- You get all bug fixes
So when migrating where should you go to? PHP 5.4 or 5.5? - There the answer is, in my personal, quite easy: Go to 5.5! 5.5 will live longer than 5.4 so it is the more future save path, as we try hard to keep backwards compatibility migration should be fairly simple, etc. PHP 5.5 also mostly uses the same code as 5.4 with a few extra features. So for everything PHP 5.4 does PHP 5.5 does it as stable.
So go and fetch PHP 5.5, use for new projects and work on your migration from 5.3 but don't panic.
Apr 2: Making use of PHP mysqlnd statistics

One of the great things of mysqlnd as a base library for PHP's MySQL support are the collected statistics. mysqlnd collects about 160 different statistical values about all the things going on. When having such an amount of raw data it, obviously, is quite hard to draw conclusions out of it. Therefore I recently created a PHP library sitting on top of this feature to collect all data, run some analysis and then provide some guidance and made it available from the JSMysqlndAnalytics GitHub repo (see there also for instructions for using Composer).
Using the library is relatively simple as the short instructions show. The library consists of two main parts. On the one side the "Collector" this is a wrapper around mysqli_get_client_stats() (even though this function is part of mysqli it will also work for applications using ext/mysql or PDO_mysql) which is creating the raw statistics which could be stored away or such and then the actual Analytics engine comparing the values to predefined rules. The current set of rules is a bit limited so I'm looking for input for ideas.
In case you're a Symfony user live is quite easy: Some time ago I already provided an Symfony bundle providing a Symfony Profiling Toolbar plugin showing the statistics. This JSMysqlndBundle has been extended to make use of these Analytics. The screenshot might give a rough idea on how this looks.
Hope this helps creating better applications! Happy hacking!