This is a random item from my wishlist:
Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design.
The full list is on amazon.de
Related tags
barcamp coding conferences debugging dtrace events fun macos php planet php solaris travel wolfram blog linux zfs zones .net ajax anniversary array assembler banshee BarCamp bazaar berkeley db birthday boredom Bryan Cantrill c# christmas comments cvs database db delphi development dsp DTrace ego english exchange firefox frustration gecko german git google goto gsoc gsoc08 gsoc09 improvements ipc08 iterator java javafx json crossbow server solaris zones virtualization gopal MacOS mysql pecl php extensions releases scream sun microsystems amber road berlin brendand gregg data center hardware ipc ipc09 montreal netbook nuremberg osdc osdc09 php conferences php quebec php quebec 09 phpbcat storage ubuntu web 2.0 youtube netbeans stupidity gimp acquisation exams grades james gosling jee launchpad phpqa phpt security ulf university work xenjo xss closures froscon froscon10 ide job mysql proxy mysql storage engine mysqli mysqlnd namespaces packages php qa php releases php testfest php testing php53 phpmysqlproxy project managment resultset stored procedures api design bc beer garden best practice charsets commits computer science 101 easter encoding froscon08 guidlines hamburg munich oop parsecvs performance php 4 php bbq php coding php oo php references play qa blackbox contract employment h0 legal märklin mucosug sun blackbox toy train oracle virtualbox vmdk zvolSo, I use OpenSolaris on most of my boxes as native operating system. VMs run other OSs. My choice for OpenSolaris was driven by the availability of DTrace. One of the greatest tools for system/program analysis ever created. By running OpenSolaris I've also got ZFS which is Oracle's über file system. I never really cared about ZFS, at least not until I missed it. So ZFS integrates all the different storage layers in one system - RAID-controller, logical volume manager, POSIX file system layer, ... Really nice to have that integrated, eases management. Now I don't change my disks that often and the file system silently runs underneath. From time to time I looked into my auto snapshot to restore some stuff and got used to snapshot my VMs (running on ZFS-powered "virtual" zvol devices) before updating them which over time became a habit about which I didn't really think.
Then I've got myself a netbook. Some cheap up to date ASUS EeePC. On that system I choose to install Ubuntu - which was troublesome enough (had to compile my owned manually patched wireless driver) so I didn't bother to try OpenSolaris. Works like a charm, even without ZFS. Some time after I configured the netbook a new Ubuntu release came out and since then I'm in trouble. I read on too many sites that things broke with this release so I dare to update the system. On my OpenSolaris boxes updating to a update, even to a dev build, is a no-brainer: The packaging system automatically creates a ZFS snapshot and configures the boot loader in a way that the old as well as the new system can be booted. So I can click the update button, reboot and either it works (typical case) or I can revert. Really nice.
Now back to Ubuntu: If I press the button and something goes wrong I have to reinstall the system (or use a backup) which I don't want. I just want to use the netbook as a mobile browser, presentation system etc. There are other systems I use to play/experiment with...
At the recent PHP Barcamp Salzburg we got to a discussion about ZFS, too. In the discussion there was talk about the auto snapshotting and a claim was "well, I won't need it, I have everything in a version control software and I know what I delete" that might be true but once you have ZFS you change your way to operate and you don't have the whole system in a version control thing. It's so great to be able to clone a VM in less than a second to play with some stuff. It is cool to be able to enable compression with one short shell command. It's fantastic to have a fully checksummed filesystem with RAID-Z. Man how did we live in the old days? Nice to e aware of the luxury I'm used to ![]()
P.S. This blog is running on ZFS, too - of course, gave a good feeling to be able to revert during today's update, too.
Ok, so this site (and some other stuff) is now running on OpenSolaris. The previous previous article was mostly a test entry for me to see whether the DNS update was through but as some people wonder why I'm using this system that "fails while trying to copy Linux" I decided to discuss some of the reasons in more detail.
Some people already know that my main system meanwhile runs OpenSolaris. The reason there is DTrace - a great way to see what the system, from the kernel, over userspaces programs, into a VM like the JVM or PHP's Zend VM, ... is doing which is a big help while debugging and developing applications. Even though DTrace is meant to do such analysis on live machines this wasn't the main reason for this choice on the server. For the server I actually didn't plan a change, ok, the old Linux box wasn't maintained well but it worked good enough for the few things it does, but then David came along and had the idea to share a server so I started thinking about dropping the old contract and getting a new machine for us both - and possible some other friends. And there we find the actual reasons for the OS choice:
So we were planing to share a box as both of us are doing Web/PHP-related stuff it was clear that it's likely that both of us would might need special versions and configurations of some software components which will then conflict with each other. Additionally I want to be able to do a killall apache in case I configured something wrong and I don't want the others to be affected too much while configuring my web servers as I need/want them. The obious solution these days? - Virtualization.
Now virtualization comes in many flavors. The simple one most people know is Desktop Virtualization, so you take a software like VirtualBox, which is running as a regular userspace application and holds a complete operating stack. In there one has a kernel of the virtualized system which thinks it's running directly on physical hardware. The big benefit is that one can run any operating system in the VM but also has negative effects in areas like disk buffers (the virtualized and the host kernel buffer independently) or overall process scheduling (the VM is scheduled by the host and then schedules itself again..) or syscalls (an application running in the VM does a syscall to the VM's kernel which then calls a Hypervisor-provided hardware emulation function which then triggers a syscall on the hostsystem)
Another approach is Operating System Virtualization like Solaris Zones. Here the operating system handles the virtualiztion. With zones this works in a way were one has a single kernel and multiple userland instances. By this one has one kernel with one scheduler (ok, Solaris allows using different schedulers and so on - let's ignore this and look at the default) and one disk IO layer. Inside a Zone one has Zone-specific userland with service management an own network device (see more on this below), an own user database (/etc/passwd, LDAP, ...) and so on. But as of the syscall interface it all runs on one kernel which also means that all processes are handled equally by the kernel (unless configured otherwise)
The result of using Solaris Zones is that one has a lightweight isolation of independent userland environments. Now as said the virtualisation has one boundary at the syscall layer, so the userland has to be Solaris - one thinks. But that's not true: There are Branded Zones which emulate another syscall interface,by that one can run a Linux userland on a Solaris kernel so Linux-only apps benefit from stuff like ZFS and DTrace - but that's not relevant for me here.
So to summarize: Zones are great for lightweight isolation (and other stuff)
Now I was mentioning that each Zone can have it's own network interface assigned.This is nice if you have a box with many network devices - now a typical server you get as a root-server for little money usually has just one. Now what you traditionally can do is assigning multiple IPs to that device and then use the single device shared over multiple zones. That works but is inconvenient as you can't really check the status (which device/zone is producing how much traffic?) or add bandwidth limitations (I want to be able to reduce one zones bandwidth in case an article is slashdotted without going to deep into everything to keep other parts of the system running) and additionally IP addresses are limited and I don't want all zones to be publicly accessible - for instance my MySQL zone can't be reached from the outside.
Now crossbow - that's the name of the Solaris network virtualization layer introduced with OpenSolaris 2009.06 - for me always was a so what thing till I started using it. Well yes you can create virtual switches and virtual network interfaces. So what? Well combined with zones I can achieve what I described in the above paragraph.
So let's build a network:
dladm create-etherstub mystub0
dladm create-vnic -l mystub0 vnic0
dladm create-vnic -l mystub0 vinc1
That's all that's needed to create an internal ethernet with two devices. Next step is to assign them to zones and configure IP for this network. In my current setup I have a zone for this web site and one zone for the MySQL server. The MySQL zone has a vnic for an internal network, the web-zone has two vnics - one is used for the internal network and the second is configured to work on top of the physical networking device so it can talk to the outside using its own public IP address. For limiting resources and stuff there's the flowadm tool for simple access to control network resource limits or service priorities (ssh connections have higher priorities so the system can be controlled in case the network is busy)
And even for me, who tries to stay above the TCP layer, this is quite trivial to setup.
Now one of the most cited features of Solaris is the zfs filesystem. While zfs is more than just a filesystem - it's a combination of volume manager, raid controller and other related things. The key feature there for me is snapshotting: zfs is using a copy on write mechanism so zfs can create snapshot which in itself has barely no costs. Only if data is changed a new block is being written and the old one is kept untouched by that the snapshots costs only the space the difference needs. Additionally this allows clones so one gets a copy of a directory and it will cost space only if data is changed - that's of special interest with zones. As said each zone is it's own userspace system. By using zfs clones they share the same blocks on disk. Really useful. In the next version this will even be better thanks to deduplication in zfs ...
Coming from Linux there are - of course - different problems, as I'm using OpenSolaris on other boxes for sometime now I'm used to many administration tools but I learn new things every time i work on the system.
A bit more problematic is that the main OpenSolaris package repository doesn't offer as much software as typical linux distributions, but for most software packages can be found in other repositories, too. This is a bit annoying but as one can see the growth and has access to above mentioned features this is no big problem - especially on a server where most of the tools exist for Solaris, too.
Oh, and for the German speakers: David and I discussed some experience while installing the server in the latest HELDENFunk podcast.
So, this website moved. It isn't the citizen of a Linux box anymore but is running inside a zone on an OpenSolaris host. The only non-default software powering this server I compiled myself is a current svn snapshot of PHP 5.3.2-dev. Let's see if I can keep this system clean or whether it becomes such a mess as the old Linux box. For now I'm happy about the isolation using zones, snapshots with ZFS before playing around and DTrace in case something goes wrong ![]()
I recently wrote a few postings about great Sun stuff. The reason for that? - Well since joining Sun I get all the information about the things Sun does and there are tons of cool things, I enjoy OpenSolaris, especially DTrace and zfs (zfs snapshots!) - these are great pieces of technology. Ok, you can feel that OpenSolaris isn't finished yet, but it's a good step from a classic Unix to a quite usable system.
Now Sun is famous for another product family: Java. For me Java has always been a synonym for ugly and annoying applets and over-engineered "enterprise" applications which are close to being unusable. Being Sun I learned about a new technology for fighting the RIA wars against Microsoft's Silverlight and Adobe's Flex: JavaFX. After browsing a while over the different sites I found out that key part of JavaFX is a declarative scripting language for creating user-interfaces. That sounds quite cool - no annoying and over-verbose XML and no procedural coding for describing a GUI, but a syntax which looks quite sane for that. So I wanted to give it a shot. And that's where the trouble began ...
Angelo recently showed an easy way to dump SQL queries using DTrace, while reading the articles I felt that some important information is missing: The name of the user executing the query and the selected database. So I sat down a few minutes and tried to collect that data.
For the database name I found a quite simple solution: It is passed as parameter to the check_user() function to MySQL so we can easily add a thread-local variable to keep that name. Simple script for that:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$1::*check_user*:entry
{
self->db = arg4 ? copyinstr(arg4) : "(no schema)";
}
pid$1::*dispatch_command*:entry
{
printf("%s: %s\n", self->db, copyinstr(arg2));
}
Getting the username is a bit harder, for the PID-provider we need a function where it is passed as parameter, best would be if it was passed as char*, but there's no such function reliably called. (For instance there are functions for checking max connections per user where all we need is passed in a proper way, but it's only called when a limit was set) The only way would be to take the user_connect property of the THD object which is passed to dispatch_command and then access the username (and hostname). But getting that working from within DTrace is quite some work. I prepared some scripts doing this with simple C structures for the second part of my DTrace article series, which is ready in my head and is waiting to be typed, so in theory it should be possible, anybody wants to try?
The best solution, of course, would be to add proper probes into the MySQL server code which provide all that information.
Over the past few weeks I annoyed my environment with praising DTrace whenever possible. Yesterday, during a break at the Barcamp Munich, I gave Wolfram a short introduction on his Mac and decided to put some stuff here:
DTrace is a toolkit available on Solaris (Solaris 10 or OpenSolaris), recent MacOS versions and FreeBSD for mightier than tools like truss or strace but with way less impact. DTrace allows you to "hook" (called "probes") into the system and allows to do some analysis then.
I guess all that works best by showing an example first: PHP uses a wrapper over the system's memory allocation using a function called _emalloc (which is wrapped by a CPP macro called emalloc) so it might be interesting to see how often that function is being called. For doing that we can use a D-script (D being the DTrace scripting language, not DigitalMars's D) like that:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$target::_emalloc:entry
{
printf("_emalloc was called!\n");
}
We can now simply call that script and tell DTrace to start a PHP interpreter and run a PHP script. DTrace will then change the running program in memory so that the message is printed whenever the system for the process, with the PID $target, enters the function _emalloc. $target is a special variable referring to a process started by DTrace using -c or a PID provided using -p.
$ ./script1.d -c "php script.php" _emalloc was called! _emalloc was called! _emalloc was called! ...
That's nice but not really useful in any way, yet. As we'd like to at least know the size of the allocated memory area, which is the first parameter to _emalloc. The pid-provider helps us by providing the parameters to the functions as D-variables, so we can simply change our action to print that variable:
printf("_emalloc called, allocating %i bytes\n", arg0);
running the script now gives us the sizes:
./script2.d -c "php script.php" _emalloc was called, allocating 5 bytes _emalloc was called, allocating 6 bytes _emalloc was called, allocating 5 bytes ...
The output is quite long and still rather useless, for making use from this information we at least need some aggregation, but DTrace helps there, too, so let's create an aggregation variable collecting the data in a usable way:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$target::_emalloc:entry
{
@mallocsize["emalloc"] = quantize(arg0);
}
mallocsize and emalloc are there freely chosen identifiers. Depending on your script the output now looks something like the following:
emalloc
value ------------- Distribution ------------- count
0 | 0
1 | 83
2 |@@ 1122
4 |@@@@@@@@ 5141
8 |@@@@@@ 4032
16 |@@@@@@@@@@@@@@@@@@ 11881
32 |@@@@@@ 3694
64 |@ 806
128 | 27
256 | 66
512 | 1
1024 | 1
2048 | 1
4096 | 1
8192 | 4
16384 | 0
32768 | 1
65536 | 0
131072 | 1
262144 | 0
This tells us that the most used allocation size is between 9 and 16 bytes and the largest space allocated is somewhere between 65536 and 131072 bytes.
For a deeper analysis we can now add a predicate to our probe so the action triggers only for that allocation. Such predicates are writing between slashes between the probe name and the action. Additionally I'm adding a ustack() call to the action, this will print the systems userspace backtrace -- which is C level, not PHP space.
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$target::_emalloc:entry
/ arg0 > 131072 /
{
printf("_emalloc(%i)\n", arg0);
ustack();
}
$ ./script4.d -c "php script.php"
emalloc(261900)
php`_emalloc
php`zend_vm_stack_new_page+0x19
php`zend_vm_stack_init+0xf
php`init_executor+0xf5
php`zend_activate+0x12
php`php_request_startup+0x7a
php`main+0xd86
php`_start+0x7d
So we see we're in the startup of PHP allocating some space on it's stack. One question now might be about the costs of an _emalloc call, one important factor there are syscalls to the operating system. As DTrace is made for utilizing the whole system that can be done quite easy using the syscall provider. Me might now use syscall:::entry as probe to be triggered on every call, but that will be quite a lot. As we're only interested in syscalls from _emalloc we'll use a thread-local variable as a flag and check that flag in the predicate condition:
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$target::_emalloc:entry
/ arg0 > 131072 /
{
self->inemalloc = 1;
}
pid$target::_emalloc:return
/ arg0 > 131072 /
{
self->inemalloc = 0;
}
syscall:::entry
/ self->inemalloc /
{
printf("%s", probefunc);
}
$ ./script4.d -c "php script.php" brk brk
So we're calling brk two times. brk is the syscall to "change the amount of space allocated for the calling process's data segment" which is exactly what we expect, but why is it called two times? Adding a ustack call to the syscall action can tell us where it happens, using the source this can then probably be optimized. That's left as an exercise to the interested reader.
In summary: No need to change the code and lots of information, I plan to write an additional article showing how to get interesting facts system-wide, not only for a specific process but all running ones, which is especially interesting when searching for a problem on production systems (DTrace is made to be used on productive systems!) or problems related to concurrent processes/threads.