Entries tagged as DTrace
Related tags
ajax amber road brendand gregg coding dtrace hardware json php php oo php.next solaris storage sun microsystems web 2.0 youtube barcamp conferences debugging events fun macos opensolaris planet php travel wolfram berlin cdu concert csu dancehall data center dendemann ego entertainment gedanken gelbe säcke glos hiphop ipc ipc09 kinderzimmer productions live müll mülltrennung merkel montreal music nuremberg open air osdc osdc09 php conferences php quebec php quebec 09 politik rap recht und ordnung regierung seeed umwelt wirtschaft blog linux zfs zones Bryan Cantrill development google MacOS .net anniversary array assembler banshee BarCamp bazaar berkeley db birthday boredom c# christmas comments cvs database db delphi dsp english exchange firefox frustration gecko german git goto gsoc gsoc08 gsoc09 improvements ipc08 iterator java javafx 23c3 acquisation blogger ccc froscon froscon08 froscon10 hamburg hausdurchsuchung ipc06 ipc07 ipc10 james gosling lawblog mysql mysqlde mysqlnd mysqlnd plugins namespaces oscon osdc.il07 php extensions php releases php53 phpbarcelona crossbow server solaris zones virtualization blackbox h0 märklin mucosug sun blackbox toy train gopal pecl releases scream microsoft ms paint opensource packages paint php 5 php 5.4 php 6 php testing processes testing unicode video exams php shell readline university badeente barcelona beer garden blogging catalonia daten frei earth escher family games god google maps history home sweet home job kaffee kaunas launchpad linkblog lithuania beate merk br alpha bundesdatenschutzbeauftragter bundestrojaner bundesverfassungsgericht Daten frei datenschutz gmail movies onlinedursuchung patriot act privacy schäuble sicherheitsstaat stupidity terroristen tv urteilvorratsdatenspeicherung presentations talks ulf netbeans buffer closures cta ide information_schema memcache memcached memcche mysql 8.0 mysql analytics mysql cluster mysql fabric mysql js mysql json mysql plugins mysql proxy mysql storage engine mysql56 mysqli mysqlnd_ms netbook phpbcat ubuntu gimp apc api design bc best practice charsets commits computer science 101 easter encoding exception file upload guidlines munich oop parsecvs performance php 4 php bbq php coding php qa php references phpmysqlproxy phpqa play project managment qa discless etherboot gpxe iscsi oracle virtualbox vmdk zvol contract employment legal xing überwachungsstaat loriot simpsons tv series world-dominationMay 10: ZFS
So, I use OpenSolaris on most of my boxes as native operating system. VMs run other OSs. My choice for OpenSolaris was driven by the availability of DTrace. One of the greatest tools for system/program analysis ever created. By running OpenSolaris I've also got ZFS which is Oracle's über file system. I never really cared about ZFS, at least not until I missed it. So ZFS integrates all the different storage layers in one system - RAID-controller, logical volume manager, POSIX file system layer, ... Really nice to have that integrated, eases management. Now I don't change my disks that often and the file system silently runs underneath. From time to time I looked into my auto snapshot to restore some stuff and got used to snapshot my VMs (running on ZFS-powered "virtual" zvol devices) before updating them which over time became a habit about which I didn't really think.
Then I've got myself a netbook. Some cheap up to date ASUS EeePC. On that system I choose to install Ubuntu - which was troublesome enough (had to compile my owned manually patched wireless driver) so I didn't bother to try OpenSolaris. Works like a charm, even without ZFS. Some time after I configured the netbook a new Ubuntu release came out and since then I'm in trouble. I read on too many sites that things broke with this release so I dare to update the system. On my OpenSolaris boxes updating to a update, even to a dev build, is a no-brainer: The packaging system automatically creates a ZFS snapshot and configures the boot loader in a way that the old as well as the new system can be booted. So I can click the update button, reboot and either it works (typical case) or I can revert. Really nice.
Now back to Ubuntu: If I press the button and something goes wrong I have to reinstall the system (or use a backup) which I don't want. I just want to use the netbook as a mobile browser, presentation system etc. There are other systems I use to play/experiment with...
At the recent PHP Barcamp Salzburg we got to a discussion about ZFS, too. In the discussion there was talk about the auto snapshotting and a claim was "well, I won't need it, I have everything in a version control software and I know what I delete" that might be true but once you have ZFS you change your way to operate and you don't have the whole system in a version control thing. It's so great to be able to clone a VM in less than a second to play with some stuff. It is cool to be able to enable compression with one short shell command. It's fantastic to have a fully checksummed filesystem with RAID-Z. Man how did we live in the old days? Nice to e aware of the luxury I'm used to
P.S. This blog is running on ZFS, too - of course, gave a good feeling to be able to revert during today's update, too.
Nov 28: Shooting with Crossbows into Zones
Ok, so this site (and some other stuff) is now running on OpenSolaris. The previous previous article was mostly a test entry for me to see whether the DNS update was through but as some people wonder why I'm using this system that "fails while trying to copy Linux" I decided to discuss some of the reasons in more detail.
Some people already know that my main system meanwhile runs OpenSolaris. The reason there is DTrace - a great way to see what the system, from the kernel, over userspaces programs, into a VM like the JVM or PHP's Zend VM, ... is doing which is a big help while debugging and developing applications. Even though DTrace is meant to do such analysis on live machines this wasn't the main reason for this choice on the server. For the server I actually didn't plan a change, ok, the old Linux box wasn't maintained well but it worked good enough for the few things it does, but then David came along and had the idea to share a server so I started thinking about dropping the old contract and getting a new machine for us both - and possible some other friends. And there we find the actual reasons for the OS choice:
Zones
So we were planing to share a box as both of us are doing Web/PHP-related stuff it was clear that it's likely that both of us would might need special versions and configurations of some software components which will then conflict with each other. Additionally I want to be able to do a killall apache in case I configured something wrong and I don't want the others to be affected too much while configuring my web servers as I need/want them. The obious solution these days? - Virtualization.
Now virtualization comes in many flavors. The simple one most people know is Desktop Virtualization, so you take a software like VirtualBox, which is running as a regular userspace application and holds a complete operating stack. In there one has a kernel of the virtualized system which thinks it's running directly on physical hardware. The big benefit is that one can run any operating system in the VM but also has negative effects in areas like disk buffers (the virtualized and the host kernel buffer independently) or overall process scheduling (the VM is scheduled by the host and then schedules itself again..) or syscalls (an application running in the VM does a syscall to the VM's kernel which then calls a Hypervisor-provided hardware emulation function which then triggers a syscall on the hostsystem)
Another approach is Operating System Virtualization like Solaris Zones. Here the operating system handles the virtualiztion. With zones this works in a way were one has a single kernel and multiple userland instances. By this one has one kernel with one scheduler (ok, Solaris allows using different schedulers and so on - let's ignore this and look at the default) and one disk IO layer. Inside a Zone one has Zone-specific userland with service management an own network device (see more on this below), an own user database (/etc/passwd, LDAP, ...) and so on. But as of the syscall interface it all runs on one kernel which also means that all processes are handled equally by the kernel (unless configured otherwise)
The result of using Solaris Zones is that one has a lightweight isolation of independent userland environments. Now as said the virtualisation has one boundary at the syscall layer, so the userland has to be Solaris - one thinks. But that's not true: There are Branded Zones which emulate another syscall interface,by that one can run a Linux userland on a Solaris kernel so Linux-only apps benefit from stuff like ZFS and DTrace - but that's not relevant for me here.
So to summarize: Zones are great for lightweight isolation (and other stuff)
Crossbow
Now I was mentioning that each Zone can have it's own network interface assigned.This is nice if you have a box with many network devices - now a typical server you get as a root-server for little money usually has just one. Now what you traditionally can do is assigning multiple IPs to that device and then use the single device shared over multiple zones. That works but is inconvenient as you can't really check the status (which device/zone is producing how much traffic?) or add bandwidth limitations (I want to be able to reduce one zones bandwidth in case an article is slashdotted without going to deep into everything to keep other parts of the system running) and additionally IP addresses are limited and I don't want all zones to be publicly accessible - for instance my MySQL zone can't be reached from the outside.
Now crossbow - that's the name of the Solaris network virtualization layer introduced with OpenSolaris 2009.06 - for me always was a so what thing till I started using it. Well yes you can create virtual switches and virtual network interfaces. So what? Well combined with zones I can achieve what I described in the above paragraph.
So let's build a network:
dladm create-etherstub mystub0
dladm create-vnic -l mystub0 vnic0
dladm create-vnic -l mystub0 vinc1
That's all that's needed to create an internal ethernet with two devices. Next step is to assign them to zones and configure IP for this network. In my current setup I have a zone for this web site and one zone for the MySQL server. The MySQL zone has a vnic for an internal network, the web-zone has two vnics - one is used for the internal network and the second is configured to work on top of the physical networking device so it can talk to the outside using its own public IP address. For limiting resources and stuff there's the flowadm tool for simple access to control network resource limits or service priorities (ssh connections have higher priorities so the system can be controlled in case the network is busy)
And even for me, who tries to stay above the TCP layer, this is quite trivial to setup.
ZFS
Now one of the most cited features of Solaris is the zfs filesystem. While zfs is more than just a filesystem - it's a combination of volume manager, raid controller and other related things. The key feature there for me is snapshotting: zfs is using a copy on write mechanism so zfs can create snapshot which in itself has barely no costs. Only if data is changed a new block is being written and the old one is kept untouched by that the snapshots costs only the space the difference needs. Additionally this allows clones so one gets a copy of a directory and it will cost space only if data is changed - that's of special interest with zones. As said each zone is it's own userspace system. By using zfs clones they share the same blocks on disk. Really useful. In the next version this will even be better thanks to deduplication in zfs ...
Problems
Coming from Linux there are - of course - different problems, as I'm using OpenSolaris on other boxes for sometime now I'm used to many administration tools but I learn new things every time i work on the system.
A bit more problematic is that the main OpenSolaris package repository doesn't offer as much software as typical linux distributions, but for most software packages can be found in other repositories, too. This is a bit annoying but as one can see the growth and has access to above mentioned features this is no big problem - especially on a server where most of the tools exist for Solaris, too.
Oh, and for the German speakers: David and I discussed some experience while installing the server in the latest HELDENFunk podcast.
Nov 24: Now running on OpenSolaris

So, this website moved. It isn't the citizen of a Linux box anymore but is running inside a zone on an OpenSolaris host. The only non-default software powering this server I compiled myself is a current svn snapshot of PHP 5.3.2-dev. Let's see if I can keep this system clean or whether it becomes such a mess as the old Linux box. For now I'm happy about the isolation using zones, snapshots with ZFS before playing around and DTrace in case something goes wrong
Mar 22: What's so special about Sun engineers?
Feb 28: Upcoming talks

I forgot to announce some talks in the past here but the next bunch are already scheduled:
- Next week I'll attend PHP Quebec and talk about PHP 5.3 and, together with my friend Marcus, PHP Worst Practices. Really looking forward to this conference.
- In April I'll give an introduction to DTrace at the OenSoure Datacenter Conference in Nuremberg, Germany
- DTrace, with a focus on the AMP stack, will also be a topic at the Spring edition of the International PHP conference in Berlin. There I'll also present some Hidden Gems in PHP 5.3.
Oct 26: More on DTrace ... and MySQL

Angelo recently showed an easy way to dump SQL queries using DTrace, while reading the articles I felt that some important information is missing: The name of the user executing the query and the selected database. So I sat down a few minutes and tried to collect that data.
For the database name I found a quite simple solution: It is passed as parameter to the check_user() function to MySQL so we can easily add a thread-local variable to keep that name. Simple script for that:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$1::*check_user*:entry
{
self->db = arg4 ? copyinstr(arg4) : "(no schema)";
}
pid$1::*dispatch_command*:entry
{
printf("%s: %s\n", self->db, copyinstr(arg2));
}
Getting the username is a bit harder, for the PID-provider we need a function where it is passed as parameter, best would be if it was passed as char*, but there's no such function reliably called. (For instance there are functions for checking max connections per user where all we need is passed in a proper way, but it's only called when a limit was set) The only way would be to take the user_connect property of the THD object which is passed to dispatch_command and then access the username (and hostname). But getting that working from within DTrace is quite some work. I prepared some scripts doing this with simple C structures for the second part of my DTrace article series, which is ready in my head and is waiting to be typed, so in theory it should be possible, anybody wants to try?
The best solution, of course, would be to add proper probes into the MySQL server code which provide all that information.Oct 12: DTraceing around

Over the past few weeks I annoyed my environment with praising DTrace whenever possible. Yesterday, during a break at the Barcamp Munich, I gave Wolfram a short introduction on his Mac and decided to put some stuff here:
DTrace is a toolkit available on Solaris (Solaris 10 or OpenSolaris), recent MacOS versions and FreeBSD for mightier than tools like truss or strace but with way less impact. DTrace allows you to "hook" (called "probes") into the system and allows to do some analysis then.
I guess all that works best by showing an example first: PHP uses a wrapper over the system's memory allocation using a function called _emalloc (which is wrapped by a CPP macro called emalloc) so it might be interesting to see how often that function is being called. For doing that we can use a D-script (D being the DTrace scripting language, not DigitalMars's D) like that:
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry { printf("_emalloc was called!\n"); }
We can now simply call that script and tell DTrace to start a PHP interpreter and run a PHP script. DTrace will then change the running program in memory so that the message is printed whenever the system for the process, with the PID $target, enters the function _emalloc. $target is a special variable referring to a process started by DTrace using -c or a PID provided using -p.
$ ./script1.d -c "php script.php" _emalloc was called! _emalloc was called! _emalloc was called! ...
That's nice but not really useful in any way, yet. As we'd like to at least know the size of the allocated memory area, which is the first parameter to _emalloc. The pid-provider helps us by providing the parameters to the functions as D-variables, so we can simply change our action to print that variable:
printf("_emalloc called, allocating %i bytes\n", arg0);
running the script now gives us the sizes:
./script2.d -c "php script.php" _emalloc was called, allocating 5 bytes _emalloc was called, allocating 6 bytes _emalloc was called, allocating 5 bytes ...
The output is quite long and still rather useless, for making use from this information we at least need some aggregation, but DTrace helps there, too, so let's create an aggregation variable collecting the data in a usable way:
mallocsize and emalloc are there freely chosen identifiers. Depending on your script the output now looks something like the following:#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry { @mallocsize["emalloc"] = quantize(arg0); }
emalloc value ------------- Distribution ------------- count 0 | 0 1 | 83 2 |@@ 1122 4 |@@@@@@@@ 5141 8 |@@@@@@ 4032 16 |@@@@@@@@@@@@@@@@@@ 11881 32 |@@@@@@ 3694 64 |@ 806 128 | 27 256 | 66 512 | 1 1024 | 1 2048 | 1 4096 | 1 8192 | 4 16384 | 0 32768 | 1 65536 | 0 131072 | 1 262144 | 0
This tells us that the most used allocation size is between 9 and 16 bytes and the largest space allocated is somewhere between 65536 and 131072 bytes.
For a deeper analysis we can now add a predicate to our probe so the action triggers only for that allocation. Such predicates are writing between slashes between the probe name and the action. Additionally I'm adding a ustack() call to the action, this will print the systems userspace backtrace -- which is C level, not PHP space.
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry / arg0 > 131072 / { printf("_emalloc(%i)\n", arg0); ustack(); }
$ ./script4.d -c "php script.php" emalloc(261900) php`_emalloc php`zend_vm_stack_new_page+0x19 php`zend_vm_stack_init+0xf php`init_executor+0xf5 php`zend_activate+0x12 php`php_request_startup+0x7a php`main+0xd86 php`_start+0x7d
So we see we're in the startup of PHP allocating some space on it's stack. One question now might be about the costs of an _emalloc call, one important factor there are syscalls to the operating system. As DTrace is made for utilizing the whole system that can be done quite easy using the syscall provider. Me might now use syscall:::entry as probe to be triggered on every call, but that will be quite a lot. As we're only interested in syscalls from _emalloc we'll use a thread-local variable as a flag and check that flag in the predicate condition:
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry / arg0 > 131072 / { self->inemalloc = 1; } pid$target::_emalloc:return / arg0 > 131072 / { self->inemalloc = 0; } syscall:::entry / self->inemalloc / { printf("%s", probefunc); }
$ ./script4.d -c "php script.php" brk brk
So we're calling brk two times. brk is the syscall to "change the amount of space allocated for the calling process's data segment" which is exactly what we expect, but why is it called two times? Adding a ustack call to the syscall action can tell us where it happens, using the source this can then probably be optimized. That's left as an exercise to the interested reader.
In summary: No need to change the code and lots of information, I plan to write an additional article showing how to get interesting facts system-wide, not only for a specific process but all running ones, which is especially interesting when searching for a problem on production systems (DTrace is made to be used on productive systems!) or problems related to concurrent processes/threads.
Oct 5: DTrace, PID provider and rights
DTrace is a damn cool debugging tool, unfortunately only available for Solaris and different BSD flavors. If you want to learn about it watch the quite entertaining video (well, I guess you should be a true geek to be entertained, ...) from Bryan Cantrill's talk.
The reason for me writing this is that I had some problems with the PID provider and wanted to note the solution for myself:
$ dtrace -n 'pid1005:::entry { printf("Hello"); }'
probe description pid1005:::entry does not match any probes
The reason is that my user only had the dtrace_user, not the dtrace_proc right. Setting the attributes correctly solved the issue. Time for more DTrace'ing.