This post is archived and probably outdated.

MySQL 8.0: JSON Aggregation functions

2016-09-15 16:54:02

In MySQL 5.7 the new JSON support had been added. The JSON support consists out of three features:

All three features combined allow building very powerful applications without committing to a fixed data structure on every part, but I was missing one thing a lot: Aggregation.

A topic I'm thinking a lot about is finding ways to select nested data. In MySQL we typically have two ways for this. Either one sends multiple queries to retrieve different nesting levels or one builds JOINs which tend to deliver repetitive responses. A tool we got for a while is GROUP_CONCAT. With GROUP_CONCAT we can get some of our nested information:

mysql> SELECT blog.*, 
              GROUP_CONCAT(blog_comment.id) AS comments
         FROM blog
         JOIN blog_comment ON blog.id = blog_comment.blog_id
     GROUP BY blog_id;
+----+-------------+----------------------+----------+
| id | title       | body                 | comments |
+----+-------------+----------------------+----------+
|  1 | Hello World | This is my new Blog! | 1,2      |
|  2 | spam trap   | spam goes here       | 3        |
+----+-------------+----------------------+----------+
2 rows in set (0.07 sec)

This gives us the ids for all comments to a specific blog post. But it's not really nice to process. To improve this we can try to abuse GROUP_CONCAT like this:

mysql> SELECT blog.*, 
              CONCAT('[', GROUP_CONCAT( JSON_OBJECT("author", blog_comment.author, "body", blog_comment.body)), ']') AS comments
         FROM blog
         JOIN blog_comment ON blog.id = blog_comment.blog_id
     GROUP BY blog_id;
+----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+
| id | title       | body                 | comments                                                                                               |
+----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+
|  1 | Hello World | This is my new Blog! | [{"body": "Great blog!", "author": "Your Friend"},{"body": "Great new blog", "author": "Best Friend"}] |
|  2 | spam trap   | spam goes here       | [{"body": "Buy these pills to enlarge your blog", "author": "Visitor"}]                                |
+----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)


But this has multiple issues, not only is it kind of ugly and error-prone to build our JSON string manually using CONCAT() like this, but also GROUP_CONCAT uses an internal buffer with a default size of only 1024, which, while configurable, can easily be too little for our use case.

Now in the new development milestone release for MySQL 8.0 we can see two new aggregation functions: JSON_ARRAYAGG() and JSON_OBJECTAGG() which build proper JSON arrays or objects without any need for a hack:

mysql> SELECT blog.*,
              JSON_ARRAYAGG( JSON_OBJECT("author", blog_comment.author, "body", blog_comment.body) ) AS comments
         FROM blog
         JOIN blog_comment ON blog.id = blog_comment.blog_id
     GROUP BY blog_id;           
+----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+
| id | title       | body                 | comments                                                                                               |
+----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+
|  1 | Hello World | This is my new Blog! | [{"body": "Great blog!", "author": "Your Friend"},{"body": "Great new blog", "author": "Best Friend"}] |
|  2 | spam trap   | spam goes here       | [{"body": "Buy these pills to enlarge your blog", "author": "Visitor"}]                                |
+----+-------------+----------------------+--------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)


This is now quite nice and by using the proper aggregation function we can be sure that our result will always be valid JSON, which allows further easy processing.