Aggregating relationships into JSON objects

In day to day use and operation, we strive for a certain degree of normalization of the data in our database. In reporting though, this normalization causes some friction: we often want the output to contain these duplicates of the data, for sake of row to row completeness. If we’re reporting on top-grossing films in the last decade, it’s easy enough to join the director table and list their name and year of birth. Collecting a set of properties for which we don’t know the names and quantity ahead of time is a little more challenging and interesting.

In our example, we’ll work with a schema that’s a little bit like an Entity–attribute–value model, but with strings for both the attribute and the value to keep things a little simpler. Of course, we’ll pick everyone’s favourite tool for bundling arbitrary data: JSON, and collect these properties into a single object.

Before we get into the actual SQL syntax, let’s describe the schema we’ll be working with:

CREATE TABLE item (
    id SERIAL NOT NULL,
    description TEXT,
    PRIMARY KEY (id))

CREATE TABLE item_property (
    id SERIAL NOT NULL,
    item_id INTEGER,
    label TEXT,
    value TEXT,
    PRIMARY KEY (id),
    CONSTRAINT uq_property_label UNIQUE (item_id, label),
    FOREIGN KEY(item_id) REFERENCES item (id))
more ...

Subset checking in PostgreSQL with SQLAlchemy

Let’s assume we have a system where events are stored for multiple services and tenants. Let’s also assume that our fictional system has a means of updating many of these events at a time, for instance to mark them as unimportant. And for the sake of keeping things relevant, let’s assume that this service is available via some authenticated public API.

Given all of the above, and the knowledge that we can’t just trust anyone to limit themselves to events that are theirs to edit, we’ll have to verify that all of the events selected for editing are within the scope of editing for the user.

The simplest way to do this would be to load every item from the database and check whether it’s eligible for modification. However, this is something that scales terribly past a few dozen records, so let’s not even consider that.

Set theory to the rescue

If the phrase “verify a set of given IDs are all part of a set of valid IDs” makes you think of sets and subset checking, you already know what this is about. Python has a set type that provides a bunch of useful operations that allow us to check whether a given set A ({1, 4}) has all of its values present in set B ({2, 4, 6}; it does not). We can use this to solve our problem:

user_selected_ids = {1, 4}
permissible_ids_q = session.query(Event.id).filter_by(
    service_id=relevant_service,
    tenant_id=current_tenant)
permissible_event_ids = {row.id for row in permissible_ids_q}
assert user_selected_ids <= permissible_event_ids
more ...