sqlalchemy

Bookie weekly status report: Sept 18 2011

Updates this week

Slow week. The changes to the fulltext search involved a small bug in updating the tag_str property of the bookmark. I got this fixed and added a migration to rebuilt the tag_str for all bookmarks in the system. The big sign of this was that the edit ui wasn't showing the tags on the bookmark for edit.

Part of fixing that was a full update to SqlAlchemy 0.7 and moving all post/pre hooks into the new events system SqlAlchemy provides.

Along with that I also updated things to use the new release of Pyramid 1.2. This big thing with this is the new debug toolbar is available for development. It's definitely pretty cool and helps provide some timing/insight as testing requests.

What's next?

There's a ton to work on. The celery task runner, the backbone.js in place edit ui, and investigating using tagger to provide some tag suggestions according to the content.

Alpha Testing

We have a signup for if you're interested in alpha testing the hosted install at https://bmark.us. If you'd like to try it out fill out the signup form here.

Taking Part

If you care to help or follow along make sure to follow the project on github: http://github.com/mitechie/Bookie

  • Current Chrome Extension: 0.3.16
  • Most recent code updates: develop branch
  • Current working branch: develop

In search of better enums

I've been fighting now to deal with a pretty common use case in my Pylons applications. I often have a db model (with the marvelous SqlAlchemy) and the object in question has some type of limited set of options for a field. It's basically an enum field, but I don't want to deal with the enum on the DB end. It makes it hard to support with different databases and such.

One solution is to actually create a table for the options. For instance you might have a state table that is joined to your object. It has all the US states in there for you and whenever you query you have to do a join on the state table to get the pretty name. The use cases I'm talking about are sets of 2->5 options. The models might have 5 or more of these as well. So I don't want to deal with creating tables, migrations, and joins for them all.

I had a few failed attempts, but I ended up learning some great new things about SqlAlchemy and have something working. First up I need to create an object that contains the options. Forgive the cheesy names, but let's say we have a table with a Severity field in it.

[sourcecode lang="python"] # first we want to inherit from a common base for all of these enums

class MyEnum(object): """Enum-like object type

Each implementation need only extend MyEnum and define the dict _enum

""" def __getattribute__(self, name): if name in object.__getattribute__(self, '_enum'): return object.__getattribute__(self, '_enum')[name] else: # Default behaviour return object.__getattribute__(self, name)

def to_list(self): """Generate a list of tuples we use to quickly create <select> elements""" return [(val, string) for val, string in self._enum.iteritems()]

class Severity(MyEnum):

_enum = { 'low': 'Low', 'med': 'Medium', 'high': 'High', } [/sourcecode]

So the base MyEnum object gives up a method of accessing the values via an object like interface.

[sourcecode lang="python"] my_instance = SomeModel() my_instance.severity = Severity().low [/sourcecode]

Now this is cool because it helps us have nice refactorable code to reference these strings we're storing into the database for the column. Yep, I know that having lots of strings is more resource intensive on the database. In most of these low scale tables though I'd rather have the table easy to read. Not only that, when I get the values out they're actually the prettified version of the string. So output is much easier.

Also just to note, in order for the MyEnum.__getattribute__ stuff to work you have to have an instance of the object. So that's why you have to enter Severity().high because Severity.high won't hit __getattribute__ and you'll get that there's no attribute by the name 'high'.

I also added a nice to_list() method because we use the webhelpers package to generate html elements and it'll accept a list of value tuples. I can just shoot Severity().to_list() to the webhelper for generation now

This was a good start and works well, but the one other thing I get nervous about is making sure the database doesn't take in garbage, it should be able to make sure that the option I'm setting is in fact in the Severity list of options. In order to do this I added a custom SqlAlchemy column type using the TypeDecorator. In doing this I basically have access to when the value is pulled from the database and placed upon the instance of the model, as well as when the value gets put onto the model from the code side. A database getter/setter in practice.

[sourcecode lang="python"] class MyEnumType(types.TypeDecorator): """SqlAlchemy Custom Column Type MyEnumType

This works in partnership with a MyEnum object

"""

# so this is basically usable anywhere I have a column type of Unicode impl = types.Unicode

def __init__(self, enum_obj, *args, **kwargs): """Init method

:param enum_obj: The enum class that this column should be limited to

e.g. severity = Column(MyEnumType(Severity, 255)) for a unicode column that has a length allowance of 255 characters

""" types.TypeDecorator.__init__(self, *args, **kwargs) self.enum_obj = enum_obj

def process_bind_param(self, value, dialect): """When setting the value make sure it's valid for our MyEnum

# allow setting to None for empty value if value is None or value in self.enum_obj._enum.itervalues(): return value else: assert False, "%s -- %s" % (value, self._enum)

def process_result_value(self, value, dialect): """Going out to the database just return our clean value""" return value

# brief usage example class SupportTicket(meta.Base): __tablename__ == 'tickets'

severity = Column(MyEnumType(Severity, 255)) [/sourcecode]

So now I have things tied into my database model as well. Hopefully this holds up as a valid method for doing this. It seems like it'd be a common use case. What I love is that my code never has to get down to specifying strings. If I want to limit a column or check a value I can use somewhat cleaner (imho that is)

[sourcecode lang="python"] if some_ticket.severity == Severity().low: ... do something

# or in a query results = SupportTicket.query.\ filter(SupportTicket.severity == Severity().high).\ all()

[/sourcecode]

Anyway, food for though. let me know if you've got an alternative I'm missing that I should be using.

SqlAlchemy Migrations for mysql and sqlite and testing

I really want to be a unit tester, I really do. Unfortunately it's one of those things I can't seem to get going. I start and end up falling short before I get over the initial setup hurdle. Or I get a couple of tests working, but then as I have a hard time trying to test parts of things I fade. So here goes my latest attempt. It's for a web app I'm working on at work and I REALLY want to have this under at least basic unit tests. Since it's a database talking web application my first step is to get a test db up and runnging to run my tests against. At least with that up I can start some actual web tests that add and alter objects via some ajax api calls.

In order to get a test db I first had to figure out how to setup a database for the tests. For speed and ease purposes I'd rather be able to use sqlite. This way I don't need to setup/reset a mysql db on each host I end up trying to run tests on.

Of course this is complicated because I'm using sqllchemy-migrate for my application. This means part of the testing should be to init a new sqlite db and then bring it up to the latest version. In order to do this I had to convert my existing migrations to work in both MySQL and Sqlite.

Step 1: I need a way to tell the migrations code to use the sqlite db vs the live mysql db. I've setup a manage.py script in my project root so I hacked it up to check for a --sqlite flag. Not that great, but it works. [sourcecode lang="python"] """ In order to support unit testing with sqlite we need to add a flag for specifying that db

python manage.py version_control --sqlite python manage.py upgrade --sqlite

Otherwise it will default to using the mysql connection string

""" from migrate.versioning.shell import main

import sys if '--sqlite' in sys.argv: main(url='sqlite:///apptesting.db',repository='app/migrations') else: main(url='mysql://connection_stringl',repository='app/migrations') [/sourcecode]

Step 2: Not all of my existing migrations were sqlite friendly. I had cheated and added some columns by straight sql like

[sourcecode lang="python"] from sqlalchemy import * from migrate import *

def upgrade(): sql = "ALTER TABLE jobs ADD COLUMN created TIMESTAMP DEFAULT CURRENT_TIMESTAMP;" migrate_engine.execute(sql);

def downgrade(): sql = = "ALTER TABLE jobs DROP created;" migrate_engine.execute(sql); [/sourcecode]

This worked great with MySQL, but sqlite didn't like it. In order to get things to work both ways I moved to using the changeset tools to make these more SA happy.

[sourcecode lang="python"] from sqlalchemy import * from migrate import * from migrate.changeset import * from datetime import datetime

meta = MetaData(migrate_engine) jobs_table = Table('jobs', meta)

def upgrade(): col = Column('created', DateTime, default=datetime.now) col.create(jobs_table)

def downgrade(): sql = "ALTER TABLE jobs DROP created;" migrate_engine.execute(sql); [/sourcecode]

A couple of notes. This abstracted the column creation so that sqlite and mysql would take it. Notice I did NOT update the drop command. Sqlite won't drop columns, and I honestly didn't care because the goal is for my unit tests to be able to bring up a database instance for testing, I'm not going to run through the downgrade commands in the sqlite database.

Step 3. So with all of the migrations moved to do everything via SA abstracted code vs SQL strings, I was in business. My final problem was one migration in particular. I had changed a field from a varchar to a int field. Sqlite won't let you do simple 'ALTER TABLE...' and even when I had the command turned into SA based changeset code my db upgrade failed due to sqlite tossing an exception.

What did I do? I cheated. First, I updated the migration with the original column definition to an Integer field. I mean, any new installs could walk that migration up just fine. I happen to know that the two deployments right now have already made the change from varchar to int. So for them, the change won't break anything for upgrade/downgrade.

I then kept the migration with the change, but tossed it in a try/except block so that I could trap it nice and just output a message "If this fails, it's probably sqlite choking.". It's hackish, but works for all my use cases I need.

Now I can create a new test database with the commands [code] python manage.py version_control --sqlite python manage.py upgrade --sqlite [/code]

Now I can start building my test suite to use this as the bootstrap to create a test db. I'll have to them remove the file on teardown so that I don't get any errors, but that'll be part of the testing setup. Not in memory, but oh well...it works.

A follow up, more dict to SqlAlchemy fun

Just a quick follow up to my last post on adding the ability to add some tools to help serialize SqlAlchemy instances. I needed to do the reverse. I want to take the POST'd values from a form submission and tack them onto one of my models. So I now also add a fromdict() method onto base that looks like. [sourcecode lang="python"] def fromdict(self, values): """Merge in items in the values dict into our object if it's one of our columns

""" for c in self.__table__.columns: if c.name in values: setattr(self, c.name, values[c.name])

Base.fromdict = fromdict [/sourcecode]

So in my controllers I can start doing [sourcecode lang="python"] def controller(self): obj = SomeObj() obj.fromdict(request.POST) Session.add(obj) [/sourcecode]

Hacking the SqlAlchemy Base class

I'm not a month into my new job. I've started working for a market research company here locally. Definitely new since I don't know I've ever found myself really reading or talking about 'market research' before. In a way it's a lot like my last job in advertising. You get in and there's a whole world of new terms, processes, etc you need to get your head around. The great thing about my new position is that it's a Python web development gig. I'm finally able to concentrate on learning the ins and outs of the things I've been trying to learn and gain experience with in my spare time.

So hopefully as I figure things out I'll be posting updates to put it down to 'paper' so I can think it through one last time.

So started with some more SqlAlchemy hacking. At my new place we use Pylons, SqlAlchemy (SA), and Mako as the standard web stack. I've started working on my own first 'ground up' project and I've been trying to make SqlAlchemy work and get into the way I like doing things.

So first up, I like the instances of my models to be serializable. I like to have json output available for most of my controllers. We all want pretty ajax functionality right? But the json library can't serialize and SqlAlchemy model by default. And if you just try to iterate over sa_obj.__dict__ it won't work since you've got all kinds of SA specific fields and related objects in there.

So what's a guy to do? Run to the mapper. I've not spent my time I pouring over the details of SA parts and the mapper is something I need to read up on more.

Side Notes: all these examples are from code using declarative base style SA definitions.

The mapper does this great magic of tying a Python object and a SA table definition. So in the end you get a nice combo you do all your work with. In the declarative syntax case you normally have all your models extend the declarative base. So the trick is to add a method of serializing SA objects to the declarative base and boom, magic happens.

The model has a __table__ instance in it that contains the list of columns. Those are the hard columns of data in my table. These are the things I want to pull out into a serialized version of my object.

My first try at this looked something like

[sourcecode lang="python"] def todict(self): d = {} for c in self.__table__.columns: value = getattr(self, c.name) d[c.name] = value

return d [/sourcecode]

This is great and all but I ran into a problem. The first object I ran into had a DateTime column in it that choked since the json library was trying to serialize a DateTime instance. So a quick hack to check if the column was DateTime and if so put it to string got me up and running again.

[sourcecode lang="python"] if isinstance(c.type, sqlalchemy.DateTime): value = getattr(self, c.name).strftime("%Y-%m-%d %H:%M:%S") [/sourcecode]

This was great and all. I attached this to the SA Base class and I was in business. Any model now had a todict() function I could call.

[sourcecode lang="python"] Base = declarative_base(bind=engine) metadata = Base.metadata Base.todict = todict [/sourcecode]

This is great for my needs, but it does miss a few things. This just skips over any relations that are tied to this instance. It's pretty basic. I'll also run into more fields that need to be converted. I figure that whole part will need a refactor in the future.

Finally I got thinking, "You know, I can often do a log.debug(dict(some_obj)) and get a nice output of that object and its properties." I wanted that as well. It seems more pythonic to do

[sourcecode lang="python"] dict(sa_instance_obj) # vs sa_instance_obj.todict() [/sourcecode]

After hitting up my Python reference book I found that the key to being able to cast something to a dict is to have it implement the iterable protocol. To do this you need to implement a __iter__ method that returns something that implements a next() method.

What does this mean? It means my todict() method needs to return something I can iterate over. Then I can just return it from my object. So I turned todict into a generator that returns the list of columns, values needed to iterate through.

[sourcecode lang="python"] def todict(self): def convert_datetime(value): return value.strftime("%Y-%m-%d %H:%M:%S")

d = {} for c in self.__table__.columns: if isinstance(c.type, sa.DateTime): value = convert_datetime(getattr(self, c.name)) else: value = getattr(self, c.name)

yield(c.name, value)

def iterfunc(self): """Returns an iterable that supports .next() so we can do dict(sa_instance)

""" return self.todict()

Base = declarative_base(bind=engine) metadata = Base.metadata Base.todict = todict Base.__iter__ = iterfunc [/sourcecode]

Now in my controllers I can do cool stuff like

[sourcecode lang="python"] @jsonify def controller(self, id): obj = Session.query(something).first()

return dict(obj) [/sourcecode]

Auto Logging to SqlAlchemy and Turbogears 2

I've been playing with Turbogear2 (TG2) for some personal tools that help me get work done. One of the things I've run into is an important missing feature that my work code has that isn't in my TG2 application yet. In my PHP5 app for work, I use the Doctrine ORM and I have post insert, update, delete hooks that will actually go in and log changes to the system. It works great and I can build up a history of an object over time to see who changes which fields and such.

With my TG2 app doing inserts and updates I initiall just faked the log events by manually saving Log() objects from within my controllers as I do the work that needs to be done.

This sucks though since the point is that I don't have to think about things. Anytime code changes something it's logged. So I had to start searching the SqlAlchemy (SA) docs to figure out how to duplicate this in TG2. I wanted something that's pretty invisible. In my PHP5 code I have a custom method I can put onto my Models in case I want to override the default logged messages and such.

I found part of what I'm looking for in the SA MapperExtension. This blog post got me looking in the right direction. The MapperExtension providers a set of methods to hook a function into. The hooks I'm interested in are the 'after_delete', 'after_insert', 'after_update' method. These are passed in the instance of the object and a connection object so I can generate an SQL query to manually save the log entry for the object.

So I have something that looks a little bit like this:

[sourcecode lang="python"] from sqlalchemy.orm.interfaces import MapperExtension

class LogChanges(MapperExtension):

def after_insert(self, mapper, connection, instance): query = "INSERT INTO log ( username, \ type, \ client_ip, \ application ) VALUES( '%s', %d, '%s', '%s')" % ( u"rick", 4, u'127.0.0.1', u'my_app')

connection.execute(query) [/sourcecode]

Then I pass that into my declarative model as:

[sourcecode lang="python"] __mapper_args__ = {'extension': LogChanges()} [/sourcecode]

This is very cool and all, but it's not all the way where I want to head. First, the manual SQL logging query kind of sucks. I have an AppLog() model that I just want to pass in some values to to create a log entry. I'm thinking what I really should do is find a different way to do the logging itself. I'm debating between actually doing a separate logging application that I would call with the objects details.

The problem with this is that one of the things I do in my current app is store the old values of the object. This way I can loop through them and see which values actually changed and generate that in the log message. This is pretty darn useful.

The other downside is that I don't have a good way to have a custom logging message generator is I just call a Logging app API.

So I think I might try out the double db connection methods that SA and TG2 support. This way I could actually try to use the second db instance with a Logging() object to write out the changes without messing up the current session/unit of work.

The missing part here is that I'm still not really sure how to get the 'old' object values in order to generate a list of fields that have been changed. Guess I have some more hacking to do.