Oct 27: Namespaces, decisions, wasting time
Thank you!
Oct 26: More on DTrace ... and MySQL
Angelo recently showed an easy way to dump SQL queries using DTrace, while reading the articles I felt that some important information is missing: The name of the user executing the query and the selected database. So I sat down a few minutes and tried to collect that data.
For the database name I found a quite simple solution: It is passed as parameter to the check_user() function to MySQL so we can easily add a thread-local variable to keep that name. Simple script for that:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$1::*check_user*:entry
{
self->db = arg4 ? copyinstr(arg4) : "(no schema)";
}
pid$1::*dispatch_command*:entry
{
printf("%s: %s\n", self->db, copyinstr(arg2));
}
Getting the username is a bit harder, for the PID-provider we need a function where it is passed as parameter, best would be if it was passed as char*, but there's no such function reliably called. (For instance there are functions for checking max connections per user where all we need is passed in a proper way, but it's only called when a limit was set) The only way would be to take the user_connect property of the THD object which is passed to dispatch_command and then access the username (and hostname). But getting that working from within DTrace is quite some work. I prepared some scripts doing this with simple C structures for the second part of my DTrace article series, which is ready in my head and is waiting to be typed, so in theory it should be possible, anybody wants to try?
The best solution, of course, would be to add proper probes into the MySQL server code which provide all that information.Oct 24: International PHP Conference 2008
On Thursday morning I'll give a presentation about PHP 5.3, which will be quite interesting as one of the biggest features, namespaces, is still undergoing heavy discussions and the final syntax probably won't be clear when presenting - fortunately PHP 5.3 is much more than namespaces!
Sun will also be present at the conference, so if you're looking for an open source PHP IDE you might talk to Petr and Wen about the upcoming NetBeans 6.5 release which will feature PHP support, if you're running a startup company you might talk to Stefan Schneider who will represent the Startup Essential Program which has interesting discounts on Sun products.
Unfortunately I won't be able to attend Brian Aker's keynote about Drizzle or Ulf's session about new and hot stuff in mysqlnd, one of the new feature in PHP 5.3.
Oct 12: DTraceing around
Over the past few weeks I annoyed my environment with praising DTrace whenever possible. Yesterday, during a break at the Barcamp Munich, I gave Wolfram a short introduction on his Mac and decided to put some stuff here:
DTrace is a toolkit available on Solaris (Solaris 10 or OpenSolaris), recent MacOS versions and FreeBSD for mightier than tools like truss or strace but with way less impact. DTrace allows you to "hook" (called "probes") into the system and allows to do some analysis then.
I guess all that works best by showing an example first: PHP uses a wrapper over the system's memory allocation using a function called _emalloc (which is wrapped by a CPP macro called emalloc) so it might be interesting to see how often that function is being called. For doing that we can use a D-script (D being the DTrace scripting language, not DigitalMars's D) like that:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$target::_emalloc:entry
{
printf("_emalloc was called!\n");
}
We can now simply call that script and tell DTrace to start a PHP interpreter and run a PHP script. DTrace will then change the running program in memory so that the message is printed whenever the system for the process, with the PID $target, enters the function _emalloc. $target is a special variable referring to a process started by DTrace using -c or a PID provided using -p.
$ ./script1.d -c "php script.php" _emalloc was called! _emalloc was called! _emalloc was called! ...
That's nice but not really useful in any way, yet. As we'd like to at least know the size of the allocated memory area, which is the first parameter to _emalloc. The pid-provider helps us by providing the parameters to the functions as D-variables, so we can simply change our action to print that variable:
printf("_emalloc called, allocating %i bytes\n", arg0);
running the script now gives us the sizes:
./script2.d -c "php script.php" _emalloc was called, allocating 5 bytes _emalloc was called, allocating 6 bytes _emalloc was called, allocating 5 bytes ...
The output is quite long and still rather useless, for making use from this information we at least need some aggregation, but DTrace helps there, too, so let's create an aggregation variable collecting the data in a usable way:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$target::_emalloc:entry
{
@mallocsize["emalloc"] = quantize(arg0);
}
mallocsize and emalloc are there freely chosen identifiers. Depending on your script the output now looks something like the following:
emalloc
value ------------- Distribution ------------- count
0 | 0
1 | 83
2 |@@ 1122
4 |@@@@@@@@ 5141
8 |@@@@@@ 4032
16 |@@@@@@@@@@@@@@@@@@ 11881
32 |@@@@@@ 3694
64 |@ 806
128 | 27
256 | 66
512 | 1
1024 | 1
2048 | 1
4096 | 1
8192 | 4
16384 | 0
32768 | 1
65536 | 0
131072 | 1
262144 | 0
This tells us that the most used allocation size is between 9 and 16 bytes and the largest space allocated is somewhere between 65536 and 131072 bytes.
For a deeper analysis we can now add a predicate to our probe so the action triggers only for that allocation. Such predicates are writing between slashes between the probe name and the action. Additionally I'm adding a ustack() call to the action, this will print the systems userspace backtrace -- which is C level, not PHP space.
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$target::_emalloc:entry
/ arg0 > 131072 /
{
printf("_emalloc(%i)\n", arg0);
ustack();
}
$ ./script4.d -c "php script.php"
emalloc(261900)
php`_emalloc
php`zend_vm_stack_new_page+0x19
php`zend_vm_stack_init+0xf
php`init_executor+0xf5
php`zend_activate+0x12
php`php_request_startup+0x7a
php`main+0xd86
php`_start+0x7d
So we see we're in the startup of PHP allocating some space on it's stack. One question now might be about the costs of an _emalloc call, one important factor there are syscalls to the operating system. As DTrace is made for utilizing the whole system that can be done quite easy using the syscall provider. Me might now use syscall:::entry as probe to be triggered on every call, but that will be quite a lot. As we're only interested in syscalls from _emalloc we'll use a thread-local variable as a flag and check that flag in the predicate condition:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$target::_emalloc:entry
/ arg0 > 131072 /
{
self->inemalloc = 1;
}
pid$target::_emalloc:return
/ arg0 > 131072 /
{
self->inemalloc = 0;
}
syscall:::entry
/ self->inemalloc /
{
printf("%s", probefunc);
}
$ ./script4.d -c "php script.php" brk brk
So we're calling brk two times. brk is the syscall to "change the amount of space allocated for the calling process's data segment" which is exactly what we expect, but why is it called two times? Adding a ustack call to the syscall action can tell us where it happens, using the source this can then probably be optimized. That's left as an exercise to the interested reader.
In summary: No need to change the code and lots of information, I plan to write an additional article showing how to get interesting facts system-wide, not only for a specific process but all running ones, which is especially interesting when searching for a problem on production systems (DTrace is made to be used on productive systems!) or problems related to concurrent processes/threads.
Oct 5: DTrace, PID provider and rights
DTrace is a damn cool debugging tool, unfortunately only available for Solaris and different BSD flavors. If you want to learn about it watch the quite entertaining video (well, I guess you should be a true geek to be entertained, ...) from Bryan Cantrill's talk.
The reason for me writing this is that I had some problems with the PID provider and wanted to note the solution for myself:
$ dtrace -n 'pid1005:::entry { printf("Hello"); }'
probe description pid1005:::entry does not match any probes
The reason is that my user only had the dtrace_user, not the dtrace_proc right. Setting the attributes correctly solved the issue. Time for more DTrace'ing.

