Destructing PHP

Already last year I had a session at the fabulous FrOSCon conference about "Destructing PHP" No this wasn't a flaming/trolling talk, but an attempt to teach a bit about some less known language feature of PHP: Deterministic Destructors. To explain this let's take a look at a simple example:

<?php
require('autoload.php');

function foo($db, $data) {
    $db->beginTransaction();

    foreach ($data as $row) {
        $db->insert($row);
    }

    $db->commit();
}

$db = db::factory();
$data = data::factory();

try {
    foo($db, $data);
} catch (Exception $e) {}

$db->insert(data::finalData());
?>

Even if the syntax is correct this program is incorrect: It fails at exception safety. If an exception is thrown the transaction state is leaked. An exception might for instance be thrown by the $db->insert() call or, if $data is an iterator, by the iteration.

According to the program the data::finalData() should be stored independently from whether foo() succeeds or not. Unfortunately this isn't the case: If something in the loop inside foo() throws there will be an open transaction. The final data now becomes part of that transaction. As there is no further handling PHP will clean-up at the end of this program and automatically rewind the transaction. So let's fix this. A typical solution looks something like this:

function foo($db, $data) {
    $db->beginTransaction();

    try {
        foreach ($data as $row) {
            $db->insert($row);
        }
    } catch(\Exception $e) {
        $db->rollback();
        throw $e;
    }
    $db->commit();
}

So we catch the exception and in the error case we rollback and re-throw the exception. Now this program is correct, but admit it, this is pretty verbose and ugly.

The form I would like to see is this:

function foo($db, $data) {
    $transaction = new Transaction($db);

    foreach ($data as $row) {
        $db->insert($row);
    }

    $transaction->commit();
}

Now this is correct and exception safe, while being clean without noise. You might have to look closely to see the difference to the initial version - we simply introduced an transaction object. The reason this works is that PHP's memory management is based on reference counting. With reference counting (which I explained in more detail in this recorded talk) PHP keeps track how many variables refer to an object and when the last reference is gone the object will be cleaned up and the destructor is being called. PHP also is function scoped, which means that when a function ends, whether is might be due to the end of the function, a return statement or by an exception, the variables from that function will be cleaned up. In the code above we have one reference to the transaction object so at the end of the function this will be cleaned up. This is massively different to garbage collected languages like Java or Go where objects are cleaned up at, more or less, random times. In PHP this is deterministic. The only case where PHP fall back on garbage collection is a case where you have cyclic references. As long as you don't have cycles you can figure out exactly when a destructor will be called by reading the code - admittedly, if you pass around an object a lot and store it in multiple places this can be complicated.

Now let's take a look at the implementation of our Transaction class:

class Transaction {
    private $db;
    private $success = false;

    function __construct($db) {
        $this->db = $db;
        $db->begin();
    }
    function commit() {
        $this->db->commit();
        $this->success = true;
    }
    function __destruct() {
        if (!$this->success) {
            $this->db->rollback();
        }
    }
}

The key here is that we track the state. If the destructor is being called without an explicit commit before an rollback i enforced.

Now I have to admit: This pattern is no invention by me. It's a common pattern used in C++, one of the very few other functions with deterministic destructors. C++'s father Bjarne Stroustrup introduced the name RAII for this - Resource Allocation Is Initialisation. So whenever one acquires a resource, in our example a database transaction, one also initialises an object who's lifetime controls the resource's lifetime. The critical part is not to pass this object around without thought. Using this pattern needs some training initially, but once you are used to it is a very good way to write exception safe code in a clean way.

Now, for fun, in my talk I showed another trick which you can play with deterministic destructors: Ensure that a return value is actually being used. So let's assume you have a function which is very expensive and calculates a value and you want to ensure that nobody refractors the code and doesn't check the return value, thus

echo expensiveCalculation();

should work, while

$a = expensiveCalculation();
unset($a);

should throw an error. To achieve this our expensiveCalculation() function won't return the value directly but wrap it in an EnforceUsage object which might be defined like this:

class EnforceUsage {
    private $value;
    private $used = false;

    function __construct($value) {
        $this->value = $value;
    }
    function __toString() {
        $this->used = true;
        return (string)$this->value;
    }
    function __destruct() {
        if (!$this->used) {
            Logger::notice("Return value not used");
        }
    }
}

I admit - unlike the RAII pattern rom above - this is hardly useful in PHP, but shows the power we have at our hands.

For completeness here are the slides of the talk I mentioned in the beginning: