This post is archived and probably outdated.

Analysing WHER-clauses in INFORMATION_SCHEMA table implemtations

2012-10-12 17:06:23

The MySQL Server has a quite simple interface for plugins to create tables inside INFORMATION_SCHEMA. A minimal plugin for creating a table with nothing but a counter might look like this:

static int counter_fill_table(THD *thd, TABLE_LIST *tables, Item *cond)
{
  ulonglong value= 0;
  
  while (1)
  {
    table->field[0]->store(value++, true);
  }
  
  return 0;
}

static ST_FIELD_INFO counter_table_fields[]=
{
  {"COUNT", 20, MYSQL_TYPE_LONGLONG, 0, MY_I_S_UNSIGNED, 0, 0},
  {0, 0, MYSQL_TYPE_NULL, 0, 0, 0, 0}
};

static int counter_table_init(void *ptr)
{
  ST_SCHEMA_TABLE *schema_table= (ST_SCHEMA_TABLE*)ptr;

  schema_table->fields_info= counter_table_fields;
  schema_table->fill_table= counter_fill_table;
  return 0;
}

static struct st_mysql_information_schema counter_table_info =
{ MYSQL_INFORMATION_SCHEMA_INTERFACE_VERSION };

mysql_declare_plugin(counter)
{
  MYSQL_INFORMATION_SCHEMA_PLUGIN,
  &counter_table_info,          /* type-specific descriptor */
  "COUNTER",                    /* plugin and table name */
  "My Name",                    /* author */
  "An I_S table with a counter",/* description */
  PLUGIN_LICENSE_GPL,           /* license type */
  counter_table_init,           /* init function */
  NULL,                         /* deinit function */
  0x10,                         /* version */
  NULL,                         /* no status variables */
  NULL,                         /* no system variables */
  NULL,                         /* no reserved information */
  0                             /* no flags */
}
mysql_declare_plugin_end;

This is quite straight forward and documented inside the MySQL documentation. It also has an obvious issue: It will run forever (at least if we assume we don't run in an out of memory situation). Luckily we might have a user who foresees this issue and added a WHERE clause like here:

SELECT COUNT FROM INFORMATION_SCHEMA.COUNTER WHERE COUNT < 10

Looking at the definition of our counter_fill_table function we see that MySQL apparently provides with a pointer to an Item instance, which we call cond, as in condition. When digging through the MySQL code we can also find out that indeed this is where our WHERE condition is hiding. Item is part of a whole class hierarchy for all the items which we might find in an SQL statement. digging a it deeper we also see that depending on the type of the specific item we might have a full tree of operations.

In the query above our WHERE condition will be a quite small tree: We have the lower than operator where the first operand is a field reference and the second a numeric constant. For checking those we declare a new function trying to identify the max value and adapt counter_fill_table accordingly

static ulonglong get_max_value(Item *cond);

static int counter_fill_table(THD *thd, TABLE_LIST *tables, Item *cond)
{
  ulonglong value= 0;
  ulonglong max_value= cond ? get_max_value(cond) : -1;
  
  for (value= 0; value < max_value; ++value)
  {
    table->field[0]->store(value, true);
  }
  
  return 0;
}

Mind: This changed the behavior a bit as it now will always terminate, even if no condition was used, but gives simpler code. So far so easy. Now, as said cond is a tree and our root is the lower than operation. So let's verify we have a lower than operation:

static ulonglong get_max_value(Item *cond)
{
    ulonglong retval= -1;

    if (item->type() == Item::FUNC_ITEM)
    {
        Item_func *item_func= dynamic_cast<Item_func*>(item);
        if (item_func->functype() == Item_func::LT_FUNC)
        {

        }
     }
     return retval;
}

Here we can see that the lower than operator is treated like a function, so after verification we can cast down to Item_func and figure out whether we have a lower than. (Note about the dynamic_cast usage: MySQL is typically compiled without RTTI and in most parts of the Server code uses C-style casts)

The next step then is to look at the operands and verify we're comparing a field to a numeric literal:

if (dynamic_cast<Item_func*>(item)->functype() == Item_func::LT_FUNC)
{
   if (item_func->arguments()[0]->type() == Item::FIELD_ITEM &&
       item_func->arguments()[1]->type() == Item::INT_ITEM)
{
   }
}

Now we're almost there. Only thing left is to check whether we're looking at the correct column and verify the value:

Item_field *item_field= dynamic_cast<Item_field*>(item_func->arguments()[0]);
if (!strcmp(item_field->field_name, "COUNT"))
{
    retval= item_func->arguments()[1]->val_uint();
}

Now if we put all these things together the query from above will run incredibly faster. But, once again, we have some issues. Some quite notable:

We are only checking the field name to be COUNT. We don't check whether that refers the correct table. For that one can look into item_field->table.

Also we are ignoring charsets. To handle those properly we proably might use system_charset_info->coll->strnncollsp instead of strcmp.

Then our accepted SQL is quite limited. We're supporting lower than only. Adding support for greater than or greater/lower or equal should be trivial, once you look into the declarations for Item_func.

Another limitation in our SQL support is that we check for something the SQL parser identified as integer. but there areother expressions which in the end can be treated like a constant value. So replace item_func->arguments()[1]->type() == Item::INT_ITEM with item_func->arguments()[1]->const_item() and it will work with anything that can be interpreted as a constant value easily. Now we're quite far.

But there's one extension I want to look into a bit deeper: How can we support a lower bound, too?

SELECT COUNT FROM INFORMATION_SCHEMA.COUNTER WHERE COUNT > 5 AND COUNT < 10

Supporting AND seems like a good extensions. Well, AND, is an COND_ITEM and can have multiple arguments so we have to iterate over them. And while being at it we can also do this recursively. An implementation then might look like this:

ulonglong get_max_value_with_and(Item* item)
{
  if (item->type() == Item::COND_ITEM)
  {
    ulonglong retval= -1;
    
    if (dynamic_cast<Item_cond*>(item)->functype() == Item_func::COND_AND_FUNC)
    {
      List_iterator<Item> li(*(dynamic_cast<Item_cond*>(item))->argument_list());
      Item *operand;
      while ((operand= li++))
      {
        ulonglong value;
        if (operand->type() == Item::FUNC_ITEM)
          value= get_max_value_with_and(operand);
        else
          value= get_max_value(operand);

        if (value > retval)
          retval= value;
      }
    }
    return retval;
  }
  else if (item->type() == Item::FUNC_ITEM)
    return get_max_value(item);
  
  return -1;
}

This implementation only returns the highest value but it should be a simple exercise for the reader to build on top of it. As are all examples shown here. The examples were developed on a recent 5.6 development version but while copying over, reformatting or simplifying I might have made mistakes. For further,and up to date, examples take a look at sql/sql_show.cc in the MySQL source tree.