Mar 28: Mind the encodings!

Dealing with character sets and encodings is tough. As long as you're dealing only with English texts you in a luxury situation and can mix utf-8 and iso-8859-1 encoded texts and most (all?) of your tests will work. Some of your users, like me, with strange names ("Schlüter") will be annoyed as your application breaks them ("Schlüter"), but these will be edge cases. There are bigger issues with mixing encodings but that's not what I wanted to tell now.
Handling these encodings in PHP correctly is tough. PHP, currently, has a quite simple approach to the problem in general: PHP doesn't care about encodings. A string is simply a sequence of bytes. Well in general. The details are difficult. Core features like JSON-handling or XML processing expect that your PHP strings are encoded using utf-8, which is a sane choice, for JSON it is part of the JSON specification, for XML it's the only way to be able to work with all documents without additional information from you along every XML operation.
On the other side we have browsers. A browser has to read documents from all over the world which can be encoded in any encoding you like. For whatever reason browser developers decided that iso-8859-1 would make a great default encoding, which means that if a response to a browser's request doesn't specify anything else the browser assumes the document is encoded using iso-8859-1. The document's encoding will also be used when sending form data back to the server. To handle this, PHP has a php.ini setting default_charset which, if set, will set the selected encoding in the HTTP header. The default value of this is setting is empty - so browsers fall back on their default, iso-8859-1.
Recently Rasmus made a commit to PHP trunk which changes the default to utf-8.
In the long-run this changes is good as it works for all languages, iso-8859-1 only works for a limited set of European languages, and works better with the outside environment, where utf-8 adoption is growing (JSON and XML were examples).
In the short-term this might cause trouble for applications depending on the default.The good thing is that the development in trunk has just started and the release of PHP.next is still sometime away and you can easily prepare your application - which is a good thing anyways to protect from administrators making mistakes with current versions already.
To set the encoding from within your application you can for example call ini_set('default_charset', $enc); or header("Content-type: text/html; charset=$enc"); at the beginning of your script, where $enc is your preferred encoding. Please mind that this has no effect on the script itself, which, for instance, means you have to configure your database connection accordingly, too!
Mar 12: Future of PHP 6

Yesterday was a quite thrilling day for the PHP development team and led to some imprecise news articles so let's take a look at what happened: Over the last months many of the core contributors agreed that the current approach to bring Unicode into PHP's engine wasn't the right approach and a good thing would be to rethink it from the start. By a provocative move of one contributor the stalled situation got some more movement and Rasmus declared the current implementation to be discontinued to restart.
The past
When the foundation of what should have become PHP 6 was created a decision was made to use UTF-16 as internal encoding for "everything" inside the engine. The choice for UTF-16 was made due to the fact that PHP 6 would use the ICU library which is focused on offering UTF-16 string functions. By using UTF-16 as default encoding we'd have to convert the script code and all data passed from or to the script (request data, database results, output, ...) from another encoding, usually UTF-8, to UTF-16 or back. The need for conversion doesn't only require CPU time and more memory (a UTF-16 string takes double memory of a UTF-8 string in many cases) but makes the implementation rather complex as we always have to figure out which encoding was the right one for a given situation. From the userspace point of view the implementation brought some backwards compatibility breaks which would require manual review of the code. These all are pains for a very small gain for many users where many would be happy about a tighter integration of some mbstring-like functionality. This all led to a situation for many contributors not willing to use "trunk" as their main development tree but either develop using the stable 5.2/5.3 trees or refuse to do development at all.
The present
Yesterday the stagnation created by the situation has been resolved and it was decided that our trunk in svn will be based on 5.3 and we'll merge features from the old trunk and new features there so that 5.3 will be a true stable branch. The EOL for 5.2 has not yet been defined but I suggest you to really migrate over to 5.3, which usually can be done with very little work, as soon as possible.The future
Right now we're starting different discussions to see what kind of Unicode support we really want. Many contributors react positive on a proposed "string class" which wraps string operations in Unicode and binary forms without going deep in the engine. In my opinion such an approach might also be a way to solve some of the often criticized inconsistencies in PHP's string APIs without the need to break old code. (new code uses the new class, old code the old functions) But that idea is far from a proper proposal or even the implementation, current status is about refocusing the development and get the requirement and design discussions going. By that it's a bit early to decide whether the next version of PHP will be called PHP 5.4, PHP 6 or maybe even PHP 7.PHP is alive and kicking!
Mar 2: Ob ich es je verstehen werde?
Heute feiert die deutsche Internet-Community und ja das Urteil des BVerG ist zu begrüßen. Aber noch immer frage ich mich warum man einem Privatunternehmen deutlich mehr Daten freiwillig gibt.
Ich schrieb es ja zuvor schon: Ich verstehe nicht wie Leute all Ihre Daten einem Unternehmen können und zugleich dagegen sind wenn der Staat ein Bruchteil der Daten per Gesetz bekommt. Ich weiß nicht ob Google heute selber irgendwas böses mit den Daten macht, nach allem was man so hört wohl nicht, aber zum einen haben staatliche Stellen schon heute Zugang zu den Daten bei Google zum anderen weiß man nicht was Google macht, wenn das Werbegeschäft mal was schlechter läuft, inzwischen haben die nen großen Stapel Mitarbeiter zu füttern und Aktionäre zufrieden zustellen.
Ja, Google gibt Daten an Behörden raus. In den USA ist das im Rahmen des USA PATRIOT Act von 2001 geregelt und Eric Schmidt, CEO von Google, hat das ja unlängst in einem Interview betont. In Deutschland haben wir zum Glück (noch?) kein vergleichbares Gesetz, dass Geheimdiensten und Polizeibehörden derart umfangreiche Rechte einräumt, wie dies im Patriot Act der Fall ist aber auch hier gibt es Möglichkeiten, die zur Beschlagnahme von Daten von Google führen können .. oder auch sowas wie Geheimdienstliche Zusammenarbeit, die einen Zugriff deutscher Behörden nicht unmöglich machen (obei dann die Frage ist ob die CIA oder das BKA schlimmer ist )
Aber nur um nochmal zu erinnern was für Daten Google hat, dass sind ja nicht nur die Suchanfragen, die man mal so stellt sondern auch die Information welche Websites man so aufruft (Danke Google Ads und Google Analytics überall) das sind Mails die man so schreibt (und sei es nur weil der Empfänger GMail nutzt) das sind Informationen mit wem man wann so telefoniert (Android des angerufenen is ja an GMail&Co. angebunden) In den USA sind das auch Inhalte von Mailboxnachrichten (die per Spracherkennung als Mail verschickt werden). Dank Picassa und Photoerkennung kennen die auch das Gesicht und Aufenthaltsorte (das is eigentlich der spannende Teil bei der Streetview-Diskussion - Google hat die unzensierten Bilder und automatische Gesichtserkennung ...) und noch viel mehr an Daten. Das kombiniert mit den Suchanfragen bei der Google Suche, der Produktsuche, Google Maps, .... gibt ein extrem detailliertes Persönlichkeitsbild. Nein. Ich verstehe nicht warum man sowas in einer einzigen Datenbank, die man nicht kontrolliert, haben wollen könnte.