jump to navigation

Everything Is an Edge Case: Lessons from Frameworks that I Applied to Django 16 July 2009

Posted by manniwood in Django, Programming, Python.
add a comment

Commenters on my From J2EE to Django: Observations from Porting a Web App have noted that I don’t use a lot of Django’s capabilities: I wrote my own SQL mapper/templater (Pybatis), thereby jettisoning Django’s ORM; I wrote my own session persistence layer that leverages Pybatis; I swapped out Django’s templates for Jinja 2; I don’t use the admin features; and I don’t use the automated form generation!

That doesn’t leave a whole lot, but let me give props to Django on the stuff I had no desire to write on my own:

  • I have no desire to deal with all the raw response headers and such.
  • I have no desire to deal with mapping urls to functions in modules. I also love the way Django does it, both from the pretty-url point of view, and the functions-not-classes point of view. (It seems so bloated to me when I look back on my J2EE code and see that I had to make an entire servlet object for each new kind of user interaction, whereas with Django, I only make one function. Less is more!)
  • I have no desire to work hard at integrating Python with Apache. mod_wsgi and Django’s support for it make me very happy.
  • I have no desire to write the configuration stuff that Django provides.
  • I have no desire to write the middleware stuff that Django provides.

In my opinion, even though I’ve jettisoned what a lot of people might see as Django’s major components, I think the features I’ve chosen to use make using Django worth-while. And I really appreciate the fact that Django does not force me to use its ORM or its templating language. Thank you, Django team.

But, why don’t I use the other features of Django? The short answer is the title of this blog entry: Everything is an edge case. There’s a great description of Perl that I wish applied to web development: “Perl makes the easy things easy, and the hard things possible.” But it doesn’t apply to web development; it only applies to Perl.

With web development, there’s not a lot of easy stuff: it’s all hard stuff. For instance, for the last database-backed web application I wrote, there were no forms that could have been auto-generated even if I’d wanted to: every form was unique and complicated in its own way. Each form served the application well, but there wasn’t enough commonality between the forms that form code generation was possible. (Jinja includes eased some of the pain, but that’s templating, not code generation.)

Same with ORM: I already think that ORM is the Vietnam of computer science, and I wanted to create my data model in the database, directly in SQL, so I could get exactly the schema I wanted. I wanted to write my database access code in SQL myself (especially the reporting code). The ORM would have been in my way. So I used a thin, template-based access layer like Pybatis.

The whole automatically-generated admin site that Django could produce for me? It would have been great, but the demands of my project outstripped what could be automatically generated. Again, every form was tweaked and twisted and user-frienlified to the point where only hand-coding would do.

And this is not a criticism of Django, by the way. When I used J2EE, I had stopped using frameworks altogether: it was easier for me to use raw servlets and iBATIS, and cobble together my own “framework” of often-used idioms as I went along.

And you know what would happen? I’d start to try to codify stuff in higher levels of abstraction (“Now all my editors can be simply configured from a configuration file! The problem editing things is solved!”) only to have a new requirement render my newest abstraction useless. As the title of this blog says, when building non-trivial database-backed web applications for demanding clients, everything is an edge case. On all the web sites I end up working on, there are no common, repeated, trivial cases of anything that can be encoded in a framework.

On the other hand, there are a number of basic housekeeping features that a lot of frameworks provide nowadays. These features go unsung, and yet they are often the most useful.

Finally, I think one thing that works against (or with?) Django and many other Python frameworks is that Python is such a productive language to code in (compared to Java, anyway) that it’s often quite easy to take a framework that takes care of stuff you didn’t want to write yourself, and then code the rest of what you needed by hand in a couple of days. If there’s one thing that continues to impress me about Python, it’s that if I can’t find a library to do something I need, I generally surprise myself by how quickly I can cobble something together to do what I need.

The RDBMS as Final Arbiter of Your Data Model 8 July 2009

Posted by manniwood in Django, SQL.
4 comments

Mwanji Ezana asked a question about my blog post, Objects are Hammers: Why do I want my RDBMS to be final arbiter of my data model? Shouldn’t my application’s object model be the final definition of my data model, and shouldn’t my RDBMS be subordinate to my application’s object model?

In some ways, my blog entry Only the R matters in RDBMS answers that question, but I was not specifically talking on this topic.

So, let me state my bias clearly: I think making your application’s object model the final arbiter of your data model is wrong. I realise this may be an unpopular bias. After all, even Django, which I have fallen in love with, encourages you to model your data with Python objects, using Django’s ORM facilities to write all your DDL code for you.

In fact, my copy of The Definitive Guide to Django says in chapter 5, page 66, that

Writing Python is fun, and keeping everything in Python limits the number of times your brain has to do a “context switch.” It helps productivity if you keep yourself in a single programming environment/mentality for as long as possible. Having to write SQL, then Python, and then SQL is disruptive.

I think that, for most developers, we could swap out Python for Java, or Python for Ruby, and we would find much agreement that writing the application language is fun, and that (perhaps I’m inferring this) writing SQL is not so much fun.

I disagree strongly on at least two counts.

The first count is that context switching from language to language on a single project, even during a single coding session, is acceptable if the languages make the expression of certain ideas easier. For instance, my Django book does not go on to claim that all the HTML templating code be written in Python. There is a context switch every time I switch from Python to HTML to Django’s templating language (or, in my case, Jinja). And it’s totally worth it, because HTML is best expressed in, well, HTML, not Python. And dynamic HTML is better handled using Jinja (or Django’s templating language) and not Python.

But the same goes for SQL, and I cannot stress this enough. If you have data that needs to be modeled, manipulated, and stored reliably, you’ll find that the context switch from Python to SQL and back is as appropriate as the switch you’re already making from Python to HTML/templates and back.

Which brings me to my second count: coding in SQL is not only fun, but preferable to coding in Python (or any other general purpose programming language) when SQL is used for what it’s best at: data representation and manipulation. It’s actually kind of funny, because there’s a lot of interest in domain-specific languages lately, and yet developers are falling over themselves to avoid learning SQL, the ultimate domain-specific language for data modeling and manipulation!

I think that most developers are secretly afraid of learning SQL. I don’t know why this is, but I think that anybody who creates database-backed web sites for a living, as I do, has to know SQL. Not knowing SQL will severely limit a developer’s growth, not to mention his career. Also, on the face of it, there’s no sense in the idea that computer programmers don’t want to learn a computer programming language the way developers don’t want to learn SQL. If a pilot didn’t want to learn how to fly more than one type of plane, we would have a right to ask if the pilot enjoyed being a pilot, wouldn’t we?

Up until now, I have assumed that your RDBMS really is the best place to model and store your data. But I should address an obvious question: what if the object model in the business layer of your application really is the best representation of your data? Shouldn’t your RDBMS be subordinate to that data model?

I would argue no.

One reason is that, with object-relational modeling being the Vietnam of computer science, you should not be persisting your objects to an RDBMS; you should be using an object store or object database instead.

But I’ll confess that I’d have a difficult time believing that an application’s object model was the best representation of its problem-domain’s data, because the object oriented paradigm is a paradigm and not a data model. To quote C.J. Date:

So what about other data models?—the “object oriented model”, for example, or the “heirarchic model”, or the CODASYL “network model”, or the “semistructured model”? In my view, these other models are just not in the same ballpark [as the relational model]. Indeed, I seriously question whether they deserve to be called models at all.

(– C. J. Date, SQL and Relational Theory, Appendix A, “The Relational Model”)

Again, this goes back to domain-specific languages, or domain-specific anything. The sole purpose of the realational data model, and its implentation in RDBMSs, is to correctly describe and store data, and allow useful queries on those data. Not using SQL for your data needs is like not using a hammer to drive home a nail. And having your ORM write your SQL for you makes no sense either. SQL is succinct and powerful enough that you should want to write your SQL by hand to ensure you are getting the results you want. (The same way you write your HTML and CSS by hand, to get the results you want.)

Or, to put it simply, whenever I think data, I think SQL. (Actually, having read a lot of Joe Celko and C.J. Date, I also think the relational data model.) Philip Greenspun’s books re-inforce my bias. In his writings, when Greenspun starts talking about the data model behind a web site, the SQL examples start flowing. See his chapter on user registration and mangagement to see what I mean. Any discussion of data modelling naturally has code samples in SQL; not Java, not Ruby, not Python. SQL.

Thinking of Philip Greenspun, I am reminded as to why it is I am also biased towards beginning any one of my projects by writing SQL and not Python: Because I write websites that are actually databases. This will explain a lot of my biases.

In fact, my goal is to get as thin a wrapper as possible over my database. It turns out that the code wrapping the database of any project I’ve ever worked on is quite thick, and that’s because a lot of user-friendliness, security, polish, and error correction must liaise between a web browser and a database. But with the projects I’ve worked on, I consider getting the data model right to be probably the most important step. The stuff that wraps the database in many ways naturally comes together after the data model has been figured out.

Or, as Fred Brooks would say:

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

(– Frederick P. Brooks, Jr., The Mythical Man Month, chapter 9, section “Representation Is the Essence of Programming”.)

There is another pragmatic reason I have found to make the RDBMS the final arbiter of your data model, and to have the rest of your application be subservient to the RDBMS: databases take on lives beyond the applications they were originally written for. In most corporations, once a database has been created, the data in that database become useful for a lot of different purposes. So the web application you wrote on top of that database becomes only one of many portals into the database. In addition to servicing its web front end, the database also services raw queries from developers, command-line queries from reporting engines running on cron jobs, and maybe even queries from applications developed later on. It’s most fitting, then, that the database is the final arbiter of its own data model, because that is the expectation of every other client of that database.

Finally, there is the often-stated requirement that another advantage of making your object model the final word on your data model is that it allows you to more easily switch SQL implementations should you wish to do so.

This makes no sense. There is a common misperception that because all databases run SQL, all SQLs are essentially the same, except for some pesky differences in syntax which your ORM should abstract away so that you don’t get tied to a particular vendor. This makes me wish that each RDBMS had a differnt name for its SQL implementation. This way, people would realise that the differences between, let’s say, MySQL and PostgreSQL are as large as the differences between Ruby and Python (actually, probably greater). Anybody who knows anything about databases would choose MySQL or PostgreSQL for completely different purposes. The same way you give a lot of thought to which language you will use before embarking on a project, you should put a lot of effort into choosing an RDBMS that helps you correctly model your data and keep it consistent and easy to query.

But once you have chosen your RDBMS, you should be no more afraid of using its most powerful features as you would fear leveraging the unique capabilities of the general-purpose programming language you just chose.

In fact, can you imagine choosing Python for your project, but not using some of its more dynamic features, just in case you had to switch to Java? Of course not. Or can you imagine using a language abstraction layer that dis-allowed the use of language features unique to only Ruby or Python, in case you wanted to switch from one to the other? I can’t imagine it either. So why do we do this with RDBMSs?

When I choose an RDBMs, I actually take advantage of its unique features, the same way I choose a general purpose programming language for its feature set (and libraries). That’s the point. And I use all of those features to make my (carefully-chosen) RDBMS the final arbiter of my application’s data model.

From J2EE to Django: Observations from Porting a Web App 8 July 2009

Posted by manniwood in Django, J2EE, Java.
15 comments

I ported a database-backed web application from Java Servlets to Python/Django, and I’m so very happy that I did. I was nearing the end of work on a medium-sized web app when one of my programmers left for greener pastures. I decided that I could make up for the missing programmer by porting our J2EE app to Python, and then finish adding the final features to the Python port. The assumption was that Python would make me so much more productive that I’d finish faster even including the porting time.

It turns out I was right.

If you’ve taken to reading my blog at all, you’ll know that I love RDBMSs, and that my application designs use the RDBMs as the final arbiter of what the application’s data model is. So really, I wasn’t translating a huge amount of code for my port, because I could leave the database alone.

Of course, because I think the database is king, I opted out of using Django’s ORM on top of my carefully designed database. Instead, I ported iBATIS (which I was using in my J2EE version) to Pybatis, and used that for all of my database access needs.

I originally wanted to make Django’s templating system work with Pybatis (and maybe I still will some day), but I could not find an easy way to make Django’s templating system run well outside of Django, and I wanted Pybatis to be able to run in Python apps that were not neccesarily Django-based. So, I turned to the Jinja 2 templating engine, and liked it so much that I am now using it in both Pybatis and my Django app.

You may wonder what parts of Django this leaves behind, but, in my opinion, the parts of Django that I’m using are the parts that I had no desire to write. I love the way Django maps urls to functions, for instance.

I love the way Django renders templates to the browser. There’s no complicated forwarding from servlet to JSP as there is in Java-land; just a method call to the template engine, which you can hand whatever data structures you need for page rendering. It may sound like a small difference, but somehow, in practice, it’s a big difference. Like many things Java, Servlets and JSPs feel over-architected after using Django.

I also love using mod_wsgi and Apache with Django. It feels like a more natural fit than Apache with Tomcat. My production environment is Linux, so Apache is therefore process-based and not threaded. I prefer processes over threads as a general rule (I agree with Eric Raymond that threads are a performance hack; you can’t always avoid threads, but it’s nice when you can). So getting threaded Tomcat (or any other Servlet container) to talk to process-forked Apache always struck me as a bit of a mismatch.

With Django, there’s no mismatch: through mod_wsgi, each Apache process contains its own Python interpreter, and each Python interpreter contains a single copy of the web app. All kinds of good things come from this. For instance, my Python code does not have to be thread safe: each Apache process handles one request at a time, and each Apache process contains exactly one instance of the web app, so there’s no contention for resources. (Except on the database, but databases are designed to deal with that.)

With this setup, you don’t need connection pooling. Because Apache processes handle one request at a time, and because each process contains its own copy of the web app, each web app instance only needs to be configured with a single connection to the database. You just write your code to use The Connection, and the rest is taken care of for you by Apache’s pre-forking goodness. You just need to configure your database to handle as many open connections as your Apache setup will fork.

So not only does all this feel cleaner and simpler (though perhaps a tad more resource hungry—there are always tradeoffs), it’s also got horizontal scaling built in!

Well, mostly. There’s the problem of sessions. Apparently, Django can be configured to persist session data to shared memory, so that it doesn’t matter which Apache process responds to a user’s request, it has access to that user’s session. I decided to write my own session implementation that persists all session data in the database. (I don’t imagine this would work with large traffic volume, but for our product’s projected traffic volumes, this was fortunately an approach we could take.) That way, it wouldn’t matter if a user established a session on one Apache process and then returned to another Apache process on a different machine in our server farm: with sessions being stored in the database, they would be findable by any Apache process on any machine.

All in all, I’ve been very happy with the switch from J2EE to Django. Although there aren’t always Python equivalent to Java libraries I got used to using, I also found that porting to Python only the parts of Java libraries that I used took very little time.

The realities of the market are such that my next project may be a J2EE project: it’s still the most popular stack, so the opportunity to use Python is not always there. But if I’m in a position to choose, I’ll choose Python. If I’m in a position to offer a recommendation, I’ll recommend Python if I think the library support is there, and the project can benefit from that choice.