Escaping from the statement mess

One of the issues Web Developers face is making their application robust to prevent SQL injection attacks. Different approaches exist which help. Sometimes people use large abstraction layers (which, sometimes, don't make anything safe ...) and sometimes people use prepared statements as a way to secure queries. Now prepared statements were a nice invention some 30 years ago abut they weren't meant for making things secure and so they do have some shortcomings: One issue is that preparing and executing a query adds a round-trip to the server where it then requires resources. In a classic application this is no issue. The users starts the application up early in the morning and processes data multiple times so the prepared statement handle is re-used quite some time. The system benefits from early optimisations. In a typical PHP Web application this isn't the case. A request and therefore a database connection with its associated statement handles lives way less than a second before being thrown away. The PDO MySQL driver, by default, tries to improve that by emulating the prepared statement on the client side. This emulation faces issues as it is lacking the knowledge of what's valid SQL which can lead to strange behaviour (The simple example is $pdo->prepare("SELECT * FROM t LIMIT ?")->execute(array($_GET['count'])); which will emit an SQL syntax error) and inherits limitations from prepared statements. A second issue with prepared statements is that queries are being built dynamically. A common case which is hard to do with prepared statements is the IN() clause with a dynamic amount of values. With prepared statements you first have to build the list of place holders (the exact amount of place holders (?) separated by a comma, without trailing comma) and then bind the values and mind the offsets when having other values - this typically becomes ugly code.

So why not take a step back. - Let's not try to emulate prepared statements but try to make it simpler to construct queries while escaping data?

An API for doing this might follow the sprintf() semantics and look like this;

$sql = mysqli_format_query($mysqli, "SELECT * FROM t WHERE f1 = %s AND f2 = %i", "foobar", 23);

which would return a string

SELECT * FROM t WHERE fi = 'foobar' AND f2 = 23

which can safely be send to the database. As said the IN clause should work. as we're in PHP we might simply extend it to do this:

$sql = mysqli_format_query($mysqli, "SELECT * FROM t WHERE f1 IN (%s)", array("foobar", 23));

SELECT * FROM t WHERE f1 IN ('foobar', '23')

Well doesn't look fancy? - But there's more: By not pretending to emulate prepared statements we can easily work with more dynamic queries. Something along the lines of

$sql = mysqli_format_query($mysqli, "SELECT * FROM t WHERE uid = %i", $_SESSION['uid']);
if (isset($option['option1']) {
    $sql .= mysqli_format_query($mysqli, "AND option1 = %s", $option['option1']);
}
if (isset($option['option2']) {
    $sql .= mysqli_format_query($mysqli, "OR option2 = %s", $option['option2']);
}

Doing such a thing using prepared statements or in some classic way becomes way harder to maintain. For playing with this approach I quickly cooked up a simple implementation of that logic which should work well with PHP 5.3 and mysqli:

function mysqli_format_query(mysqli $conn, $query /*, ... */) {
	$args = func_get_args();
	array_shift($args);
	array_shift($args);

	return mysqli_format_query_v($conn, $query, $args);
}

function mysqli_format_query_v(mysqli $conn, $query, $args) {
	$modify_funcs = array(
		's' => function($v) use ($conn) { return "'".$conn->real_escape_string($v)."'"; },
		'i' => function($v) { return (int)$v; },
		'f' => function($v) { return (float)$v; }
	);

	return preg_replace_callback(
		'/%(['.preg_quote(implode(array_keys($modify_funcs))).'%])/',
		function ($matches) use ($conn, &$args, $modify_funcs) {
			if ($matches[1] == '%') {
				return '%';
			}
			if (!count($args)) {
				throw new Exception("Missing values!");
			}
			$arg = array_shift($args);

			if ($arg instanceof Traversable) {
				$arg = iterator_to_array($arg);
				$arg = array_map($modify_funcs[$matches[1]], $arg);
				return implode(', ', $arg);
			} elseif (is_array($arg)) {
				$arg = array_map($modify_funcs[$matches[1]], $arg);
				return implode(', ', $arg);
			} else {
				$func = $modify_funcs[$matches[1]];
				return $func($arg);
			}


		},
		$query
	);
}

Which leads to questions like these:

Is this useful at all?
Is it useful to have in the driver? (aka. in C as part of the PHP distribution)
What are the exact requirements?

This might have to much magic in there (and older version required %as for binding and array of strings, now just %s and passing an array is enough)
Should it have special handling for NULL?

Opinions? :-)