Ok, so this site (and some other stuff) is now running on OpenSolaris. The previous previous article was mostly a test entry for me to see whether the DNS update was through but as some people wonder why I'm using this system that "fails while trying to copy Linux" I decided to discuss some of the reasons in more detail.
Some people already know that my main system meanwhile runs OpenSolaris. The reason there is DTrace - a great way to see what the system, from the kernel, over userspaces programs, into a VM like the JVM or PHP's Zend VM, ... is doing which is a big help while debugging and developing applications. Even though DTrace is meant to do such analysis on live machines this wasn't the main reason for this choice on the server. For the server I actually didn't plan a change, ok, the old Linux box wasn't maintained well but it worked good enough for the few things it does, but then David came along and had the idea to share a server so I started thinking about dropping the old contract and getting a new machine for us both - and possible some other friends. And there we find the actual reasons for the OS choice:
So we were planing to share a box as both of us are doing Web/PHP-related stuff it was clear that it's likely that both of us would might need special versions and configurations of some software components which will then conflict with each other. Additionally I want to be able to do a killall apache in case I configured something wrong and I don't want the others to be affected too much while configuring my web servers as I need/want them. The obious solution these days? - Virtualization.
Now virtualization comes in many flavors. The simple one most people know is Desktop Virtualization, so you take a software like VirtualBox, which is running as a regular userspace application and holds a complete operating stack. In there one has a kernel of the virtualized system which thinks it's running directly on physical hardware. The big benefit is that one can run any operating system in the VM but also has negative effects in areas like disk buffers (the virtualized and the host kernel buffer independently) or overall process scheduling (the VM is scheduled by the host and then schedules itself again..) or syscalls (an application running in the VM does a syscall to the VM's kernel which then calls a Hypervisor-provided hardware emulation function which then triggers a syscall on the hostsystem)
Another approach is Operating System Virtualization like Solaris Zones. Here the operating system handles the virtualiztion. With zones this works in a way were one has a single kernel and multiple userland instances. By this one has one kernel with one scheduler (ok, Solaris allows using different schedulers and so on - let's ignore this and look at the default) and one disk IO layer. Inside a Zone one has Zone-specific userland with service management an own network device (see more on this below), an own user database (/etc/passwd, LDAP, ...) and so on. But as of the syscall interface it all runs on one kernel which also means that all processes are handled equally by the kernel (unless configured otherwise)
The result of using Solaris Zones is that one has a lightweight isolation of independent userland environments. Now as said the virtualisation has one boundary at the syscall layer, so the userland has to be Solaris - one thinks. But that's not true: There are Branded Zones which emulate another syscall interface,by that one can run a Linux userland on a Solaris kernel so Linux-only apps benefit from stuff like ZFS and DTrace - but that's not relevant for me here.
So to summarize: Zones are great for lightweight isolation (and other stuff)
Now I was mentioning that each Zone can have it's own network interface assigned.This is nice if you have a box with many network devices - now a typical server you get as a root-server for little money usually has just one. Now what you traditionally can do is assigning multiple IPs to that device and then use the single device shared over multiple zones. That works but is inconvenient as you can't really check the status (which device/zone is producing how much traffic?) or add bandwidth limitations (I want to be able to reduce one zones bandwidth in case an article is slashdotted without going to deep into everything to keep other parts of the system running) and additionally IP addresses are limited and I don't want all zones to be publicly accessible - for instance my MySQL zone can't be reached from the outside.
Now crossbow - that's the name of the Solaris network virtualization layer introduced with OpenSolaris 2009.06 - for me always was a so what thing till I started using it. Well yes you can create virtual switches and virtual network interfaces. So what? Well combined with zones I can achieve what I described in the above paragraph.
So let's build a network:
dladm create-etherstub mystub0
dladm create-vnic -l mystub0 vnic0
dladm create-vnic -l mystub0 vinc1
That's all that's needed to create an internal ethernet with two devices. Next step is to assign them to zones and configure IP for this network. In my current setup I have a zone for this web site and one zone for the MySQL server. The MySQL zone has a vnic for an internal network, the web-zone has two vnics - one is used for the internal network and the second is configured to work on top of the physical networking device so it can talk to the outside using its own public IP address. For limiting resources and stuff there's the flowadm tool for simple access to control network resource limits or service priorities (ssh connections have higher priorities so the system can be controlled in case the network is busy)
And even for me, who tries to stay above the TCP layer, this is quite trivial to setup.
Now one of the most cited features of Solaris is the zfs filesystem. While zfs is more than just a filesystem - it's a combination of volume manager, raid controller and other related things. The key feature there for me is snapshotting: zfs is using a copy on write mechanism so zfs can create snapshot which in itself has barely no costs. Only if data is changed a new block is being written and the old one is kept untouched by that the snapshots costs only the space the difference needs. Additionally this allows clones so one gets a copy of a directory and it will cost space only if data is changed - that's of special interest with zones. As said each zone is it's own userspace system. By using zfs clones they share the same blocks on disk. Really useful. In the next version this will even be better thanks to deduplication in zfs ...
Coming from Linux there are - of course - different problems, as I'm using OpenSolaris on other boxes for sometime now I'm used to many administration tools but I learn new things every time i work on the system.
A bit more problematic is that the main OpenSolaris package repository doesn't offer as much software as typical linux distributions, but for most software packages can be found in other repositories, too. This is a bit annoying but as one can see the growth and has access to above mentioned features this is no big problem - especially on a server where most of the tools exist for Solaris, too.
Oh, and for the German speakers: David and I discussed some experience while installing the server in the latest HELDENFunk podcast.