Storing and verifying passwords with SQLAlchemy

I really enjoy abstractions.

Abstractions are the lifeblood of programming. They take complex operations and make them easy to work with through accessible interfaces. This article will be about doing that with the way we store and verify passwords (or rather their cryptographic hashes) in (web-)applications based on SQLAlchemy. And in such a way that we can upgrade the security of passwords as old encryption schemes are broken or proven insufficient and new ones get introduced.

The ways of verifying passwords

There are many different ways to deal with password verification, but they can be classified in three rough categories:

Ad-hoc verification

This approach is often used for one-off scripts or trivial applications that have a single verification and have very little to gain from reusable components. The following shows a very basic User model and password verification code:

import bcrypt
from sqlalchemy import Column, Integer, Text

class User(Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)
    name = Column(Text)
    password = Column(Text)

pwhash = bcrypt.hashpw(login_data['password'], user.password)
if user.password == pwhash:
    print 'Access granted'

The snippet above uses the Python bcrypt package for key derivation, but you may well use another key derivation method, like PBKDF2 or scrypt. To be sure: It should not have a static salt, and it should definitely not be a single round of optimized-for-speed SHA1 or SHA2 functions. [1] In short, use proven and existing methods.

Assuming that the hashing algorithm is secure, this code is perfectly fine for the stated goal. However, most applications are not trivial and most one-off scripts tend to find their way into repeated use. And over time you find yourself confronted with more and more repeats of the same code to verify a user’s password. Something needs to be done, it’s time to ...

more ...

Mordac the (query) preventer

A small update, a much delayed update, but still an update.

A few days ago, cwillu asked the following question in #sqlalchemy [1]:

<cwillu_> is there a way to entirely disable autocommit?
<cwillu_> i.e., any insert outside of an explicit transaction
          will either error out, or just immediately rollback?

What eventually worked for this case was simply disabling autocommit on the connection level. Explicit rollbacks were issued at the end of each test and all worked fine. But given that SQLAlchemy features a pretty nifty event system, I was pretty sure there was a better solution available. Also, I was waiting for an excuse to experiment with that part of SQLAlchemy, and this offered exactly that.

As it turns out, preventing queries from happening is about as straightforward as it gets. A quick look at the list of SQLAlchemy Core events identifies two likely candidates: 'before_cursor_execute' and 'before_execute'. Both events are triggered when queries are executed, but the latter is documented to work on a higher level: it’s signaled before the query is compiled for the appropriate dialect. Given that we want to entirely prevent queries outside of transactions, stopping them before doing any unnecessary work seems best, so we’ll use that one.

When the event triggers, our function is called and the caller provides us with the connection and a few other arguments, which we’ll collect in *args. The connection has a method in_transaction(), which indicates whether a transaction has explicitly been started for it. That is, even with autocommit turned off, it will return False after the first execute() call. This is exactly what we need to know and the last thing to do is raise an appropriate error. And so Mordac the Preventer [2] is born:

more ...

Mutation tracking in nested JSON structures using SQLAlchemy

This is part two of a two-part post on storage of JSON using SQLAlchemy. The first post covered the basics of creating a JSON column type and tracking mutations. In this post, we will continue from there to cover mutation tracking in arbitrarily nested structures.

In the previous post we ended with an example of appending to an existing list. Upon committing the changes in the session and reloading the object, it was shown the appended string had not been stored. This happened because changing the list in-place did not trigger the changed() method of the class MutableDict. Only setting or deleting a key from the dictionary marks it as changed, and marking it as changed upon access (which is all we did on the dictionary itself) would cause far too many updates of the database.

What we wanted (and perhaps expected) is behavior where changing the list marks the dictionary it’s part of as changed. And for completeness, if the dictionary contained a number of nested dictionaries, changing any of them at any level should mark the class MutableDict as changed. To achieve this, we need a solution that consists of the following parts:

  1. Replacement types for list and dict where all methods that change the object in-place flag it as having changed.
  2. A means to propagate the notification of change up to the top so that it reaches the class MutableDict.
  3. Conversion of all mutable types to the defined replacement types. Both when they are added to the existing structure, as well as on load from the database.
more ...

Creating a JSON column type for SQLAlchemy

This is part one of a two-part post on storage of JSON using SQLAlchemy. This post will touch on the basics of creating a JSON column type and tracking mutability, and is mostly a rehash of the SQLAlchemy documentation. The second post will cover the tracking of mutability in arbitrarily nested JSON structures, and goes beyond what is covered in the documentation.

The past weeks have been pretty busy. We moved from Leeuwarden to Hoofddorp, which in and of itself went pretty smooth, but the new apartment is still very much unfinished. In between work and getting the apartment in order, there hasn’t been a lot of time to spend on side projects, but this seemed interesting enough.

A while ago I needed to store a relatively small amount of variable-format data. The requirements are only to store the data, not search in parts of it or anything else. Additionally, it’s of such a small volume that it doesn’t warrant setting up a document database like MongoDB next to the existing SQLA+Postgres setup. The data itself consisted of a flat mapping of strings to basic scalar types. Something roughly like this:

    'title': 'Example data object',
    'description': 'Information about this example object',
    'storage_location': 'Top shelf in the back',
more ...