Grok needs a great relational database integration story. Grok already has a great database story: by default, we use an object database: the ZODB. The ZODB is great as you can just store normal Python objects. You can persist complicated nested structures easily. But this entry isn't about the ZODB. Relational databases are also great. You can query tables every which way very easily. You can manage your data in a RDMS, a familiar system for many, and integrate with existing RDMS that already exist in many places. Thanks to object relational mappers (ORMs), working with relational databases has become pretty easy as well. Python has a number of object relational wrappers available for it. Recently I started a project that uses Grok with a relational database. It was a good opportunity to investigate this topic more deeply, keeping in mind what might be useful for Grok. I evaluated two object relational mappers: Storm and SQLAlchemy. I don't claim to have done anything like an in-depth technical evaluation. I didn't. I don't believe there are many technical reasons to choose one or the other. In the end I went with SQLAlchemy for this project. I think it would make a good default integration story for Grok. I looked at Storm. Storm is nice, and is used by two large parties in the Zope community: Canonical (where it originated) and Lovely Systems (who use it a lot). The Storm developers maintain Zope 3 integration together with Storm. In the Grok community, we also have Christian Klinger actively working on Grok integration. One thing bothered me though: Storm lacks the ability to generate database schemas from Python code. I asked why that may be so. It turns out this lack is a deliberate choice, and the Storm developers have good reasons: usually, with large scalable relational databases, you don't want auto-generation of schemas. Control and flexibility is important in these cases, and being limited by a Python-based schema language would hinder a project and, possibly even worse, piss off the database admins. When I approached the Storm mailing list about this topic I got friendly and helpful responses, but it was quite clear that schema-generation, while not completely out of the question (perhaps in an extension), isn't part of the Storm culture. It's like threading and Twisted - you can do it if you want, but if you talk to the developers they will keep reminding you that in most cases you shouldn't do it. It also reminded me of the Zope 3 attitude - control and flexibility are of paramount importance. Grok emphasizes slightly different things however. We happily use the control and flexibility that Zope 3 offers, and don't want to lose it if we can help it, but we let ease of use weigh more heavily when we make choices. That said, the "control & flexibility" attitude is actually great for Grok - it's very nice to be able to build on such a flexible and well-engineered base! SQLAlchemy seems to share the Zope 3 attitude as well: it offers a lot of flexibility and control. People are building more high level declarative systems on top of it, such as Elixir. SQLAlchemy also offers some extra features compared to Storm. The one that mattered to me was schema generation from Python code. I myself think schema generation can be quite useful. Not for large-scale relational databases, but projects often start out small, and more often stay small. If you're just trying to develop a small web application, it's very nice to be able to get started quickly and just write Python code. This approach suited my project, and I think it will suit Grok as well. For something like Grok, making it easy to get started is very important, and to me that means it needs to include schema-generation. I realize that schema-generation is the wrong choice for many projects, but I don't think it's the wrong choice for all projects, and I didn't feel like rolling my own with Storm while it was already there for SQLAlchemy. When evaluating Storm, we saw it had several strong connections to our Zope community. Such connections are valuable, as it means that if I, as a community member, have a question or a problem, there's a good chance I'll get an answer. It also means that useful integration code is likely to be written. What's interesting is that while Storm has buy-in from larger companies in our community, SQLAlchemy seems to have more activity in open-source land. The evidence I have for this is that we have almost too many integration layers that connect Zope's transaction machinery to SQLAlchemy. I've used z3c.sqlalchemy and z3c.zalchemy (yes, they are different!) in small projects in the past. This time I felt it was time to try out collective.lead, born in the Plone world. Besides these, there are also people advocating WSGI-based integrations! I actually had to use an unreleased branch of collective.lead (I hope there will be a release soon, and for your reference, it's elro-tpc), and it didn't work right with sqlite. Once I switched to PostgreSQL, it did work quite well. Laurence Rowe, developer of collective.lead, was very helpful in answering my questions. Don't ask me for technical reasons on why it might be better than the others, though! Some time in the future, we'll need to figure out which Zope 3/SQLAlchemy integration layer is right for Grok. So what does Grok code that uses SQLAlchemy look like? I'll give just a sketch here - this is not a complete application. Actually besides the database setup code, my project's code ended up very similar to code that uses the ZODB - the patterns are the same. Let's start with setting up the database. collective.lead needs us to register a global utility that includes the database URL, table definitions and the object/relational mappers: I find the use of underscores in collective.lead here a bit ugly. I actually hope we can make much of this utility go away again by defining some clever grokkers and using some declarative approach combining class and table definitions (Elixir is one option to explore). What the right way is needs careful thought (perhaps at the Grokkerdam sprint?). Above we define a table user, then map it to the class User. We haven't defined this class yet, so let's do that now: This is in fact an entirely normal Grok model. You can do everything with it that you can do with a normal model, connect views to it, adapters, traversers, etc. Just about the only thing that you probably shouldn't be doing is stuffing it into the ZODB - that would be rather confusing! I think eventually Grok wants to introduce a new base-class for ORMed classes, but for now, the grok.Model approach actually appears to work just fine. The fields in a record, as defined by the user table, become attributes on User instances. Creating an object is a trifle more involved than just instantiating it and sticking it into a container, like you'd do with the ZODB, but still very easy. You can create new users like this: The getUtility call looks up the utility we defined above. It's a bit verbose to keep looking up the session this way, but this can be easily abstracted away into a small utility function. More high-level abstractions also are possible: there is some work going on involving ORM-based container classes, which act like Zope 3 containers (folders). Zope 3 containers in turn act much like Python dictionaries. Such a container should make this pattern even more similar to the ZODB pattern. All the stuff you can do with Grok add and edit forms also works fine, including applyData. To make automatic form generation work, you do need to define a Zope interface for User to define the schema. Unfortunately defining the Zope 3 schema for User violates DRY (Don't Repeat Yourself) - you need to define the columns (which become attributes) in the user table definition and then also define them again, in a different way, in the IUser schema. This is another important topic to think about - perhaps we can devise a way to generate Zope 3 schemas from SQLAlchemy table definitions or vice versa. There are projects in the Plone world (Alchemist, collective.mercury) that offer such a facility, so we need to investigate those. Grok makes it very easy to make custom traversers to set up your URL space. Let's combine it with a query example. If you want to make an object that acts as the base of all users in URL space (base/username), you could do this: This contains a query that looks for users by name. If it can be found, a User instance is returned. If not, SQLAlchemy's first() will return None, which is exactly what traverse wants too if the object cannot be found. Nice! This code is similar to the code that would go into a generic ORM container (base)class that I talked about before. In fact, in my project, the code almost automatically started evolving other bits and pieces needed to create such a container, such as way to remove items, and a way to get all the items in the container (defined by a query). With use of zope.location.location.locate() you can make the results of such a query fit right into Grok, so you can also use view.url() on User instances to get their absolute URL, just like you can do with ZODB-backed objects. In my project, I combined the ZODB with SQLAlchemy. The transaction machinery is integrated, so if you have an uncaught exception in your code both the ZODB transaction and the relational transaction are aborted (and rolled back) at the same time. A successful request commits both. This way I could store things that most easily fit in the ZODB in the ZODB, and use relational tables where I wanted to. In many systems, the filesystem is used directly where a relational databases don't work well, but then you lose the benefit of transactions. A combination of ZODB and RDMBS gives us quite interesting possibilities to explore. In conclusion: the Grok community has various things to think about and improve concerning relational database integration - some of it I discussed here. Even without those features, using SQLAlchemy with Grok already felt very natural and easy. This is no doubt in large part thanks to SQLAlchemy's and Zope 3's flexibility. I hope we can get some people to work on this during the Grokkerdam sprint and afterward, and soon make ORM integration a first-class citizen for Grok. It will add another important feature to Grok, and take away another reason not to use Grok (unfamiliarity with the ZODB). Once we have the basic integration done, we can explore building higher-level features. If you don't mind some DRY violations, it's not hard to create powerful CRUD interfaces with Grok and SQLAlchemy right now, but of course we want to eliminate the DRY violations.
(15) Tue Apr 08 2008 11:27 SQLAlchemy with Grok:
from collective.lead import Database
from collective.lead.interfaces import IDatabase
import sqlalchemy as sa
from sqlalchemy import Table, Column
class MyDatabase(grok.GlobalUtility, Database):
grok.provides(IDatabase)
grok.name('mydb')
_url = sa.engine.url.URL(drivername='postgres', database='foo')
def _setup_tables(self, metadata, tables):
user_table = Table(
'user', metadata,
Column('user_id', Integer, primary_key=True),
Column('username', String(20), unique=True, index=True),
Column('password', String(100)))
metadata.create_all(self._engine)
tables['user'] = user_table
def _setup_mappers(self, tables, mappers):
mappers['user'] = mapper(
User, tables['user'],
)
import grok
class User(grok.Model):
pass
from zope import component
session = component.getUtility(IDatabase, name='mydb').session
user = User()
session.save(User)
class UserContainer(grok.Model):
def traverse(self, name):
session = component.getUtility(IDatabase, 'mydb').session
return session.query(User).filter(User.username == name).first()
- Comments:
Posted by Peter Bengtsson at Tue Apr 08 2008 12:48
Personally I like the Storm approach and this is enforced even more now. I like having a file full of CREATE TABLE statements because they transcend any programming language and gives you a great overview of the data as time goes by. The alternative, as you show, is to RY (Repeat Yourself) anyway with Column instances in the setup utility.Secondly, the Grok+Storm story seems to be tighter. You're using an unreleased branch and you had to roll your own code to get the transaction management.I hope the Storm guys can help us with that "extension" you mention to ease the creation of the tables so I at least won't have to manually write the CREATE TABLE code. One thing I'm genuinely impressed by but haven't used myself yet is the Elixir effort. That's to SQLAlchemy what Grok is to Zope3. Perhaps Grok+Elixir is a better match.Kudos on the traverse() pattern. It's awesome!
Posted by Martijn Faassen at Tue Apr 08 2008 12:56
Concerning the Storm approach or the SQLAlchemy approach to table creation, I realize I did it to myself when I get a debate about this, but I hope people will also comment on other aspects of my post, as you did.Concerning integration: I think SQLAlchemy integration is just fine - I mentioned 3 integration layers. They are in various states of "finishedness". I wouldn't weigh this point very strongly.I consider it unlikely the Storm guys themselves will be very motivated to create a database schema generation layer. They'll be willing to help people out if they want to do it, but if they were likely to work on this themselves I would have expected a different response than the one I did get.Elixir looks cool. One thing that bothers me is the meta class as base class, something I try to avoid with Grok if possible, as I believe their initialization behavior can be rather unexpected. The SQLAlchemy declarative layer has a required base class as well (also a meta class?), so perhaps this is inescapable. And of course ZODB's Persistent is a meta class as well.
Posted by Malthe Borch at Tue Apr 08 2008 13:09
I think there's a strong story for database schema generation; the ability the stay in Python is important.Another challenge is automatic containment relationships, like we typically have with ZODB-based applications. Here, having __name__ and __parent__ would simplify many tasks.Not sure how to accomplish that with SQLAlchemy.
Posted by Martijn Faassen at Tue Apr 08 2008 13:28
Malthe, setting up the automatic container relationships is actually quite easy: use zope.location.location.located(). You get your object from SQLAlchemy through some query, and then you say:located(obj, parent, name)For instance, located(obj, self, name) if you were to do it in a traverse() method (though that isn't typically needed as in fact Grok will do it for you automatically during traversal - only if you want to be able to call traverse() manually).This will create a LocationWrapper and then everything that needs this (such as URL generation) just works.While this is pretty easy already, it'd be even nicer to write a custom container base class to automate this behavior for you.
Posted by Christian Klinger at Tue Apr 08 2008 15:36
Hi Martijn, hi Grokkers;first thanks Martijn for this awesome post. I worked the last months on an Grok-RDB Project. For this i have developed a container implementation for Grok-Storm-Zope. The cool thing on an "RDB-Container" is: Don´t learn new things - just use the standard container API at least for CRUD operations. On the other side we need a smart way to integrate the power and flexiblity of (constructed) sql queries for our "Content Objects" in grok. The reasons why i choose Storm:
- out of the box zope integration
- declarativ and simple syntax
- *natural* python syntax
I´m looking forward to work and discuss these topics in Grokkerdam.ChristianPosted by Sean Upton at Tue Apr 08 2008 16:10
ORM is a leaky abstraction, you need things on top of it to hide the rough edges (e.g. lack for support of collection fields, the flatness of everything, table normalization patterns). For applications I'm working on at work (which are not all web apps, but are using zope.schema), we wanted RDBMS storage. What we've done is build a "table maker" component that introspects zope schema with getFieldsInOrder and creates tables for fields that map to ORM; this complements storm. We are now working on largely abstracting ORM underneath another component called a "content space" that deals with ensuring that all bound "content model" classes have tables and the adding/removal of assets; at the moment, we are working on collection field support using a partially serialized object graph as a complement to ORM (where we are unable to store information in a single table for an item); we also plan to support containment hierarchy. All of this is being done using Storm. This is unfinished early work; it will eventually be released as free/OSS when ready to digest, but it gives me good hope that Storm is appropriate for these types of applications (one just needs the appropriate infrastructure above the ORM layer).Sean
Posted by mike bayer at Tue Apr 08 2008 18:37
hey Sean -why store collections as serializations ? The ORM can handle storing the parent, the collection and its items within proper relations. Serialization has issues when you go to load data three months later after the object model has changed.
Posted by Kevin Smith at Tue Apr 08 2008 19:59
I'd think that http://code.google.com/appengine/ demands urgent attention in Grok/Zope3's relational database integration story.
Posted by mike bayer at Tue Apr 08 2008 20:22
except the google datastore isn't really relational (i.e., no joins, fks, anything like that).
Posted by Kevin Smith at Tue Apr 08 2008 20:40
Good point Mike, but from what I understand so far, there is no writing to the filesystem (and no threads!), hence no zodb, hence no Grok.[warning: alarmist commentary below]Sound the hunting horns!IMO,GoogleAppEngine is a hugely disruptive beast that will reshape the internet landscape in a big way. So much so, that it should be a Grok/Zope3 imperative to figure out a GoogleAppEngine story and have working code within 30 days. It's a big day for DJango, but someone will have to pry my cold fingers from my club before I give up Grok.
Posted by Martijn Faassen at Tue Apr 08 2008 21:12
Call me blind, but I don't believe the GoogleAppEngine is going to destroy everything in its wake before it if we don't respond in 30 (thirty) days.That said, I don't want to stop anyone from looking into Grok integrations, within 30 days or not. It'd require making pieces of Grok run on the google platform as far as I understand, which is basically an exercise in porting libraries.
Posted by Martin Aspeli at Wed Apr 09 2008 01:39
Hi Martijn,Thanks for a nice writeup! I'm obviously invested in collective.lead (I wrote it, and it's used in my book). I'm convinced that it's a good base for Zope/SQLAlchemy integration because it's so simple. It has two goals: make it painless to set up (and later find) your ORM connection/session; and provide proper transaction integration with the ZODB that you don't need to think about. I think it meets these well.I do feel pretty strongly that unless we have a good reason not to, we should go with SQLAlchemy, because that seems to be where other frameworks are headed. I've heard great things about Storm, but I know SQLAlchemy is great, too, and I am convinced it has more traction elsewhere. Avoiding "ghettoisation", as Chris McDonough says, is an important principle to adhere to.I take your point about underscores. It's basically done because these are "private" template methods that are called by the standard Database utility base class. More on that at: http://martinaspeli.net/articles/component-architecture-design-lessons.I'm very open to change here, though. I also think this provides an excellent opportunity to get some Grok patterns into the Plone world. If we could make this even easier and still keep it working in Zope 2/Plone, we should (possibly in a separate package - collective.lead should stay simple).Ideally, we should make it possible to do SQLAlchemy ORM with Zope 3 IPublishTraverse semantics, schema generation and formlib/z3c.form integration. I think it'd be possible to create a generic RDBMS API that works very much like the current Zope 3 folder/item semantics (i.e. like dictionaries), that can be published and traversed, and where we can get sensible edit forms for CRUD operations for free. It shouldn't even be all that hard. And it should work in Zope 2 with a minimum of fuzz. :)@Peter: "You're using an unreleased branch and you had to roll your own code to get the transaction management." The branch issue is just because there's ongoing work, but I think we could roll a release very quickly. I don't see any hand-rolled transaction integration code above, though. The transaction integration is basically half of what collective.lead does (the other half is threadlocal session binding)Cheers,
MartinPosted by Martijn Faassen at Wed Apr 09 2008 01:55
Ah, Martin, I didn't realize you'd written collective.lead; I was under the impression it was Laurence Rowe, but I guess he's a recent maintainer and has done some of the other SQLAlchemy stuff?You should also get involved in the zope-dev discussion I started on trying to have less SQLAlchemy integrations in Zope if we can. :)Wish I could sprint with you on some of these things (or other things) sometime, Martin!
Posted by Jonathan Ellis at Wed Apr 09 2008 06:57
Thanks for the example, Martijn!
Posted by Martin Aspeli at Wed Apr 09 2008 12:23
Hi Martijn,I wrote it because at the time, all the other solutions seemed to try to do too much (and thus, in my opinion, unnecessarily invalidating some of the standard SQLAlchemy patterns and documentation), and because I needed something simple, stable and releasable.Laurence has done a lot of work on it, and probably is the de-facto maintainer now, not at least because he understands the TPC stuff a lot better than I do.I already replied to the zope-dev thread. ;)Martin
