Creating a JSON column type for SQLAlchemy

This is part one of a two-part post on storage of JSON using SQLAlchemy. This post will touch on the basics of creating a JSON column type and tracking mutability, and is mostly a rehash of the SQLAlchemy documentation. The second post will cover the tracking of mutability in arbitrarily nested JSON structures, and goes beyond what is covered in the documentation.

The past weeks have been pretty busy. We moved from Leeuwarden to Hoofddorp, which in and of itself went pretty smooth, but the new apartment is still very much unfinished. In between work and getting the apartment in order, there hasn’t been a lot of time to spend on side projects, but this seemed interesting enough.

A while ago I needed to store a relatively small amount of variable-format data. The requirements are only to store the data, not search in parts of it or anything else. Additionally, it’s of such a small volume that it doesn’t warrant setting up a document database like MongoDB next to the existing SQLA+Postgres setup. The data itself consisted of a flat mapping of strings to basic scalar types. Something roughly like this:

{
    'title': 'Example data object',
    'description': 'Information about this example object',
    'storage_location': 'Top shelf in the back',
}

Serializing the data

There are a number of ways to serialize the above, but for sheer ease of use and simplicity, there’s nothing that really beats JSON. If the data structures had contained custom types, pickling would have been the likely choice. JSON has the additional benefit that because of its simplicity, every programming language in use has a (de)serialization function for it. This makes it extremely portable and allows other tools to work with this data as well.

Creating a type that serializes and deserializes JSON automatically is pretty straight forward and common enough to warrant a section in the custom types documentation.

For completeness, I’ve included the type definition. The highlighted line associates the type with the class MutableDict which keeps track of changes to the dictionary. This is covered in the mutable extension documentation.

import json
import sqlalchemy as sqla
from sqlalchemy.ext import mutable

class JsonEncodedDict(sqla.TypeDecorator):
  """Enables JSON storage by encoding and decoding on the fly."""
  impl = sqla.String

  def process_bind_param(self, value, dialect):
    return simplejson.dumps(value)

  def process_result_value(self, value, dialect):
    return simplejson.loads(value)

mutable.MutableDict.associate_with(JsonEncodedDict)

Using the JsonEncodedDict type

To experiment with the newly defined JSON type, we’ll have to set up an initial model and database. For this we’ll use an in-memory SQLite database. This is ideal for quick testing because it requires very little effort to set up, and even less to clean up after use.

The following provides just that, or a starting point for something more involved:

import sqlalchemy as sqla
from sqlalchemy import orm
from sqlalchemy.ext import declarative

Base = declarative.declarative_base()

class FlexibleStorage(Base):
  __tablename__ = 'flexible_storage'
  id = sqla.Column(sqla.Integer, primary_key=True)
  data = sqla.Column(JsonEncodedDict)

  def __init__(self, content):
    self.content = content

# We set query echoing to True for demonstration purposes
engine = sqla.create_engine('sqlite://', echo=True)
Base.metadata.bind = engine
Base.metadata.create_all()
session = orm.sessionmaker(bind=engine)()

With the flexible_storage table defined and a session created, we can start exploring the possibilities of the JSON type and the included mutation tracking.

Adding and editing records

We create an initial record and provide it with a name.

bob = FlexibleStorage({'name': 'Bobby'})
session.add(bob)
session.commit()

Given we’re defining a person, it seems only fair to provide a name and age:

bob.data['surname'] = 'Selbat'
bob.data['age'] = 5
session.commit()

After committing, accessing the record again will trigger a refresh from the database (due to the expire_on_commit setting, which defaults to True). We’ll see that the two fields we added to the record have been persisted to the database (query echoing shows this as well).

This is because the class MutableDict we have associated with class JsonEncodedDict marked the field as modified as soon as we changed the dictionary. This marking as changed will happen this for __setitem__ and __delitem__ methods only. Other methods that change the dictionary in place (like clear, pop, or update) will not mark the dictionary as having changed.

Changing the age or removing a surname will both trigger updates of the record (you can see them happen if you set engine.echo = True.

del bob.data['surname']
session.flush()
bob.data['age'] = 19
session.commit()

Nested mutable structures

The structure we have now is fairly flexible, but also fairly basic. It allows us to store all sorts of information, but what if you have multiple of the same. Storing them as incrementally numbered fields is hardly elegant; we should store them as a list of values. Let’s start with that right away:

bob.data['interests'] = ['computers']
session.commit()

So far so good, the list was persisted to the database, much as expected. Let’s add a second interest to the list and store that:

bob.data['interests'].append('databases')
session.commit()
print bob.data['interests'] # will show only ['computers']

This, unfortunately, is because the change tracking of class MutableDict only goes so far. When we alter the interests lists in place, nothing changes on the dictionary. It still contains the same reference to the same list. The latter has just grown a bit. In the next post, we’ll have a look at how to track changes throughout arbitrarily nested structures.

Serializing the data

Using the JsonEncodedDict type

Adding and editing records

Nested mutable structures

Related Posts:

Comments