Entries tagged as solaris
Related tags
ajax amber road brendand gregg coding dtrace hardware json php php oo php.next storage sun microsystems web 2.0 youtube barcamp conferences debugging events fun macos opensolaris planet php travel wolfram blog linux zfs zones Bryan Cantrill development DTrace google MacOS .net anniversary array assembler banshee BarCamp bazaar berkeley db birthday boredom c# christmas comments cvs database db delphi dsp ego english exchange firefox frustration gecko german git goto gsoc gsoc08 gsoc09 improvements ipc08 iterator java javafx crossbow server solaris zones virtualization gopal mysql pecl php extensions releases scream entertainment microsoft ms paint mysqlnd namespaces opensource packages paint php 5 php 5.4 php 6 php releases php testing php53 processes testing unicode video berlin data center ipc ipc09 montreal netbook nuremberg osdc osdc09 php conferences php quebec php quebec 09 phpbcat ubuntu 23c3 badeente barcelona beer garden blogger blogging catalonia ccc concert dancehall daten frei dendemann earth escher family games god google maps hausdurchsuchung hiphop history home sweet home job kaffee kaunas kinderzimmer productions launchpad lawblog linkblog lithuania live gimp beate merk br alpha bundesdatenschutzbeauftragter bundestrojaner bundesverfassungsgericht cdu csu Daten frei datenschutz gmail movies onlinedursuchung patriot act politik privacy recht und ordnung regierung schäuble sicherheitsstaat stupidity terroristen tv urteilvorratsdatenspeicherung netbeans acquisation apc api design bc best practice charsets closures commits computer science 101 easter encoding exception file upload froscon froscon08 froscon10 guidlines hamburg ide munich mysql proxy oop parsecvs performance php 4 php bbq php coding php qa php references phpmysqlproxy phpqa play project managment qa discless etherboot gpxe iscsi oracle virtualbox vmdk zvol blackbox contract employment h0 james gosling legal märklin mucosug sun blackbox toy train xing überwachungsstaat loriot merkel music simpsons tv series world-dominationNov 28: Shooting with Crossbows into Zones
Ok, so this site (and some other stuff) is now running on OpenSolaris. The previous previous article was mostly a test entry for me to see whether the DNS update was through but as some people wonder why I'm using this system that "fails while trying to copy Linux" I decided to discuss some of the reasons in more detail.
Some people already know that my main system meanwhile runs OpenSolaris. The reason there is DTrace - a great way to see what the system, from the kernel, over userspaces programs, into a VM like the JVM or PHP's Zend VM, ... is doing which is a big help while debugging and developing applications. Even though DTrace is meant to do such analysis on live machines this wasn't the main reason for this choice on the server. For the server I actually didn't plan a change, ok, the old Linux box wasn't maintained well but it worked good enough for the few things it does, but then David came along and had the idea to share a server so I started thinking about dropping the old contract and getting a new machine for us both - and possible some other friends. And there we find the actual reasons for the OS choice:
Zones
So we were planing to share a box as both of us are doing Web/PHP-related stuff it was clear that it's likely that both of us would might need special versions and configurations of some software components which will then conflict with each other. Additionally I want to be able to do a killall apache in case I configured something wrong and I don't want the others to be affected too much while configuring my web servers as I need/want them. The obious solution these days? - Virtualization.
Now virtualization comes in many flavors. The simple one most people know is Desktop Virtualization, so you take a software like VirtualBox, which is running as a regular userspace application and holds a complete operating stack. In there one has a kernel of the virtualized system which thinks it's running directly on physical hardware. The big benefit is that one can run any operating system in the VM but also has negative effects in areas like disk buffers (the virtualized and the host kernel buffer independently) or overall process scheduling (the VM is scheduled by the host and then schedules itself again..) or syscalls (an application running in the VM does a syscall to the VM's kernel which then calls a Hypervisor-provided hardware emulation function which then triggers a syscall on the hostsystem)
Another approach is Operating System Virtualization like Solaris Zones. Here the operating system handles the virtualiztion. With zones this works in a way were one has a single kernel and multiple userland instances. By this one has one kernel with one scheduler (ok, Solaris allows using different schedulers and so on - let's ignore this and look at the default) and one disk IO layer. Inside a Zone one has Zone-specific userland with service management an own network device (see more on this below), an own user database (/etc/passwd, LDAP, ...) and so on. But as of the syscall interface it all runs on one kernel which also means that all processes are handled equally by the kernel (unless configured otherwise)
The result of using Solaris Zones is that one has a lightweight isolation of independent userland environments. Now as said the virtualisation has one boundary at the syscall layer, so the userland has to be Solaris - one thinks. But that's not true: There are Branded Zones which emulate another syscall interface,by that one can run a Linux userland on a Solaris kernel so Linux-only apps benefit from stuff like ZFS and DTrace - but that's not relevant for me here.
So to summarize: Zones are great for lightweight isolation (and other stuff)
Crossbow
Now I was mentioning that each Zone can have it's own network interface assigned.This is nice if you have a box with many network devices - now a typical server you get as a root-server for little money usually has just one. Now what you traditionally can do is assigning multiple IPs to that device and then use the single device shared over multiple zones. That works but is inconvenient as you can't really check the status (which device/zone is producing how much traffic?) or add bandwidth limitations (I want to be able to reduce one zones bandwidth in case an article is slashdotted without going to deep into everything to keep other parts of the system running) and additionally IP addresses are limited and I don't want all zones to be publicly accessible - for instance my MySQL zone can't be reached from the outside.
Now crossbow - that's the name of the Solaris network virtualization layer introduced with OpenSolaris 2009.06 - for me always was a so what thing till I started using it. Well yes you can create virtual switches and virtual network interfaces. So what? Well combined with zones I can achieve what I described in the above paragraph.
So let's build a network:
dladm create-etherstub mystub0
dladm create-vnic -l mystub0 vnic0
dladm create-vnic -l mystub0 vinc1
That's all that's needed to create an internal ethernet with two devices. Next step is to assign them to zones and configure IP for this network. In my current setup I have a zone for this web site and one zone for the MySQL server. The MySQL zone has a vnic for an internal network, the web-zone has two vnics - one is used for the internal network and the second is configured to work on top of the physical networking device so it can talk to the outside using its own public IP address. For limiting resources and stuff there's the flowadm tool for simple access to control network resource limits or service priorities (ssh connections have higher priorities so the system can be controlled in case the network is busy)
And even for me, who tries to stay above the TCP layer, this is quite trivial to setup.
ZFS
Now one of the most cited features of Solaris is the zfs filesystem. While zfs is more than just a filesystem - it's a combination of volume manager, raid controller and other related things. The key feature there for me is snapshotting: zfs is using a copy on write mechanism so zfs can create snapshot which in itself has barely no costs. Only if data is changed a new block is being written and the old one is kept untouched by that the snapshots costs only the space the difference needs. Additionally this allows clones so one gets a copy of a directory and it will cost space only if data is changed - that's of special interest with zones. As said each zone is it's own userspace system. By using zfs clones they share the same blocks on disk. Really useful. In the next version this will even be better thanks to deduplication in zfs ...
Problems
Coming from Linux there are - of course - different problems, as I'm using OpenSolaris on other boxes for sometime now I'm used to many administration tools but I learn new things every time i work on the system.
A bit more problematic is that the main OpenSolaris package repository doesn't offer as much software as typical linux distributions, but for most software packages can be found in other repositories, too. This is a bit annoying but as one can see the growth and has access to above mentioned features this is no big problem - especially on a server where most of the tools exist for Solaris, too.
Oh, and for the German speakers: David and I discussed some experience while installing the server in the latest HELDENFunk podcast.
Nov 24: Now running on OpenSolaris

So, this website moved. It isn't the citizen of a Linux box anymore but is running inside a zone on an OpenSolaris host. The only non-default software powering this server I compiled myself is a current svn snapshot of PHP 5.3.2-dev. Let's see if I can keep this system clean or whether it becomes such a mess as the old Linux box. For now I'm happy about the isolation using zones, snapshots with ZFS before playing around and DTrace in case something goes wrong
Mar 22: What's so special about Sun engineers?
Dec 15: OpenSolaris ....
Oct 12: DTraceing around

Over the past few weeks I annoyed my environment with praising DTrace whenever possible. Yesterday, during a break at the Barcamp Munich, I gave Wolfram a short introduction on his Mac and decided to put some stuff here:
DTrace is a toolkit available on Solaris (Solaris 10 or OpenSolaris), recent MacOS versions and FreeBSD for mightier than tools like truss or strace but with way less impact. DTrace allows you to "hook" (called "probes") into the system and allows to do some analysis then.
I guess all that works best by showing an example first: PHP uses a wrapper over the system's memory allocation using a function called _emalloc (which is wrapped by a CPP macro called emalloc) so it might be interesting to see how often that function is being called. For doing that we can use a D-script (D being the DTrace scripting language, not DigitalMars's D) like that:
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry { printf("_emalloc was called!\n"); }
We can now simply call that script and tell DTrace to start a PHP interpreter and run a PHP script. DTrace will then change the running program in memory so that the message is printed whenever the system for the process, with the PID $target, enters the function _emalloc. $target is a special variable referring to a process started by DTrace using -c or a PID provided using -p.
$ ./script1.d -c "php script.php" _emalloc was called! _emalloc was called! _emalloc was called! ...
That's nice but not really useful in any way, yet. As we'd like to at least know the size of the allocated memory area, which is the first parameter to _emalloc. The pid-provider helps us by providing the parameters to the functions as D-variables, so we can simply change our action to print that variable:
printf("_emalloc called, allocating %i bytes\n", arg0);
running the script now gives us the sizes:
./script2.d -c "php script.php" _emalloc was called, allocating 5 bytes _emalloc was called, allocating 6 bytes _emalloc was called, allocating 5 bytes ...
The output is quite long and still rather useless, for making use from this information we at least need some aggregation, but DTrace helps there, too, so let's create an aggregation variable collecting the data in a usable way:
mallocsize and emalloc are there freely chosen identifiers. Depending on your script the output now looks something like the following:#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry { @mallocsize["emalloc"] = quantize(arg0); }
emalloc value ------------- Distribution ------------- count 0 | 0 1 | 83 2 |@@ 1122 4 |@@@@@@@@ 5141 8 |@@@@@@ 4032 16 |@@@@@@@@@@@@@@@@@@ 11881 32 |@@@@@@ 3694 64 |@ 806 128 | 27 256 | 66 512 | 1 1024 | 1 2048 | 1 4096 | 1 8192 | 4 16384 | 0 32768 | 1 65536 | 0 131072 | 1 262144 | 0
This tells us that the most used allocation size is between 9 and 16 bytes and the largest space allocated is somewhere between 65536 and 131072 bytes.
For a deeper analysis we can now add a predicate to our probe so the action triggers only for that allocation. Such predicates are writing between slashes between the probe name and the action. Additionally I'm adding a ustack() call to the action, this will print the systems userspace backtrace -- which is C level, not PHP space.
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry / arg0 > 131072 / { printf("_emalloc(%i)\n", arg0); ustack(); }
$ ./script4.d -c "php script.php" emalloc(261900) php`_emalloc php`zend_vm_stack_new_page+0x19 php`zend_vm_stack_init+0xf php`init_executor+0xf5 php`zend_activate+0x12 php`php_request_startup+0x7a php`main+0xd86 php`_start+0x7d
So we see we're in the startup of PHP allocating some space on it's stack. One question now might be about the costs of an _emalloc call, one important factor there are syscalls to the operating system. As DTrace is made for utilizing the whole system that can be done quite easy using the syscall provider. Me might now use syscall:::entry as probe to be triggered on every call, but that will be quite a lot. As we're only interested in syscalls from _emalloc we'll use a thread-local variable as a flag and check that flag in the predicate condition:
#!/usr/sbin/dtrace -s #pragma D option quiet pid$target::_emalloc:entry / arg0 > 131072 / { self->inemalloc = 1; } pid$target::_emalloc:return / arg0 > 131072 / { self->inemalloc = 0; } syscall:::entry / self->inemalloc / { printf("%s", probefunc); }
$ ./script4.d -c "php script.php" brk brk
So we're calling brk two times. brk is the syscall to "change the amount of space allocated for the calling process's data segment" which is exactly what we expect, but why is it called two times? Adding a ustack call to the syscall action can tell us where it happens, using the source this can then probably be optimized. That's left as an exercise to the interested reader.
In summary: No need to change the code and lots of information, I plan to write an additional article showing how to get interesting facts system-wide, not only for a specific process but all running ones, which is especially interesting when searching for a problem on production systems (DTrace is made to be used on productive systems!) or problems related to concurrent processes/threads.
Oct 5: DTrace, PID provider and rights
DTrace is a damn cool debugging tool, unfortunately only available for Solaris and different BSD flavors. If you want to learn about it watch the quite entertaining video (well, I guess you should be a true geek to be entertained, ...) from Bryan Cantrill's talk.
The reason for me writing this is that I had some problems with the PID provider and wanted to note the solution for myself:
$ dtrace -n 'pid1005:::entry { printf("Hello"); }'
probe description pid1005:::entry does not match any probes
The reason is that my user only had the dtrace_user, not the dtrace_proc right. Setting the attributes correctly solved the issue. Time for more DTrace'ing.