jump to navigation

Comments on Joel Spolsky’s “The Duct Tape Programmer” 28 September 2009

Posted by manniwood in Programming.
add a comment

I blogged not too long ago about heroic programming, and how when I was younger, I thought Real Programmers used Real Languages like C. Real Programmers littered their code with Difficult Stuff like memory allocation and crazy pointer arithmetic. Yet as I gained experience and looked at the architecture of open source projects I admired, I saw that Real Programmers actually cordoned off the difficult stuff and were confident enough in their own skills to not feel the need to show off. Real Programmers embraced simplicity and elegance (which is sometimes actually harder than just picking a couple of design patterns out of a book and hammering them into shape to fit your problem).

I thought that Joel Spolsky’s The Duct Tape Programmer was going in the same direction:

One principle duct tape programmers understand well is that any kind of coding technique that’s even slightly complicated is going to doom your project. Duct tape programmers tend to avoid C++, templates, multiple inheritance, multithreading, COM, CORBA, and a host of other technologies that are all totally reasonable, when you think long and hard about them, but are, honestly, just a little bit too hard for the human brain.

George Orwell said: “Never use a long word where a short one will do.” Orwell thought people who are trying to hide something, or who lack the conviction of their beliefs, hide behind large words and flowery, indirect language. I’ve always suspected that over-architected, baroque codebases hide something too.

So you would think that Spolsky would conclude his essay with “therefore, we should all try to be a bit more like duct tape programmers: while we do not want to write messy code or bad code, we do want to write simple code and not embrace complexity for its own sake. We should not be embarrassed when we do the simplest thing possible. We do not need to show off our programming chops at every turn. It’s OK to admit there was an easy solution and ship on time.”

But Spolsky does not say this. Instead, he concludes with:

One thing you have to be careful about, though, is that duct tape programmers are the software world equivalent of pretty boys… those breathtakingly good-looking young men who can roll out of bed, without shaving, without combing their hair, and without brushing their teeth, and get on the subway in yesterday’s dirty clothes and look beautiful, because that’s who they are. You, my friend, cannot go out in public without combing your hair. It will frighten the children. Because you’re just not that pretty. Duct tape programmers have to have a lot of talent to pull off this shtick. They have to be good enough programmers to ship code, and we’ll forgive them if they never write a unit test, or if they xor the “next” and “prev” pointers of their linked list into a single DWORD to save 32 bits, because they’re pretty enough, and smart enough, to pull it off.

Really? Because I thought the programmers who aren’t clever/silly enough to xor their “next” and “prev” pointers should especially try to emulate the best attributes of Duct Tape Programmers. Instead of hiding behind technologies that “are, honestly, just a little bit too hard for the human brain”, regular programmers should also try to just think hard about what it is they are trying to solve, and do it in the simplest, cleanest way possible.

I agree with Spolsky that whereas Duct Tape Programmers use every trick in the book to ship code fast, the average programmer should avoid those tricks. Makes sense to me.

But where Spolsky goes off the rails is implying that average programmers should also avoid the good habits of Duct Tape Programmers, like doing the simplest thing possible. Avoiding difficulty for its own sake (or out of a need to show off; or out of the need to follow a “best practice”) is not a clever trick reserved to the programming elite. It’s sound advice that should especially be followed by mere programming mortals, to keep them out of trouble, and to help them meet their deadlines.

Comments on Stephan Schmidt’s “Is Java Dead?” 21 September 2009

Posted by manniwood in Java.
1 comment so far

Stephan Schmidt’s Is Java Dead? is a perceptive look at how Java is not dead, but how it is nonetheless not the only language one might want to start a new project in.

Five years ago, it would have been easy to pick a language to specialise in: Java, hands down.

But, with popular web sites being built in technologies like C#, Ruby, Python, and PHP, job postings are no longer Java only. Java still seems to dominate, but many other languages are nipping at its heels.

One of the reasons I got so deep into Python is that Google uses a lot of Python. With their work on unladen-swallow, and Python 3’s use of UTF-8 as the default character set, the future of Python seems to be one of better performance and better internationalisation (two things Java currently does quite well). Oh: and Guido Van Rossum works at Google, so I’d say that’s quite an endorsement of Python over at the search giant.

There’s been a lot of research into JVMs lately (LLVM, Parrot), but I think Stephen Schmidt is on to something when he says that whereas Java might become less popular, the JVM might remain so. The next language that gets accepted into the enterprise might be JVM-based, making in harder for non-JVM languages to gain traction. In fact, languages like Clojure, that run on the JVM and even call back into Java, may be the ones with the brightest future. (Schmidt would choose Scala for his next project.)

So this is a strike against Python, but conversely a plus for Jython, which seems to be in active development.

It’s definitely a good time to be interested in picking up a new language, though. As Schmidt says:

But just because Java is not dead doesn’t mean it has a future. Developers need to open their eyes and learn new languages. I’m really disappointed in interviews when candidates show no interest in programming beside Java.

It would be so much easier if there was a clear successor to Java, the way Java seemed to just clobber Perl/CGI for web programming. (Though I’m certain that history looks different to the PHP crowd, so a shout-out to all of you.)

This time, there is no clear successor. But, learning a new language (any language) is probably a good idea right about now. So even if my investment in Python doesn’t pay the dividends I’d hoped (and I really do like Python), and Python doesn’t get traction outside of Google, I’m still glad I learned it. That’s an investment in itself.

Heroic Programming and Simple Programming 15 September 2009

Posted by manniwood in Uncategorized.
1 comment so far

The first popular language for web sites was Perl. I remember having C envy as a Perl programmer. My C envy was rooted in what I call productivity guilt. At the time, I thought Real Programmers did their own garbage collection. Real Programmers slung around null-terminated strings. Real Programmers used pointers. Perl took care of all that for me, and I got working code out the door fast, but I felt guilty about it, like I wasn’t using a Real Programming Language.

My guilt at not using A Real Programming Language had a silver lining: I learned C and C++. I ended up liking C quite a lot, even though I rarely use it professionally.

Joel Spolsky decries the rise of Java Schools (he wishes programmers still knew C/C++), and he has a point: even though I’ve never programmed a lot of C/C++ during my day job, a knowledge of C certainly makes me more conscious of what is going on under the hood of Perl and Java. It makes me a better programmer of both of those languages.

My favourite benefit of learning C is that it allows me to understand the code of open source software (a lot of which is still written in C). At one point, I got really interested in Apache, and my C knowledge came in handy.

I made a discovery, while learning about Apache, that has stuck with me until this day. It has to do with what I may as well call heroic programming versus simple programming.

Here was my view of heroic programming, back when I was a Perl developer, envious of the Real Coders Who Used C: Heroic programmers used their superior intellects to craft cleverly written software with all the pointer arithmetic and memory allocation/deallocation sprinkled throughout their code in a bug-free manner.

Here’s something I discovered with Apache: garbage collection, one of the more difficult aspects of C programming, was abstracted away behind a brilliant architecture that made it easier to use.

The Apache designers took advantage of the fact that there are a lot of things that happen in a web server that are life-cycle based. The best example of this is servicing a request: it has a definite beginning and end. So the Apache designers thought: Why don’t we attach a pool of memory to each request? Whenever a piece of code servicing a request needs to allocate memory, it will allocate the memory out of the pool associated with that request. At the end of the request, the pool will be automatically deallocated.

This taught me a huge lesson: real programmers do not make code difficult for its own sake. There is no honor in tackling a complex problem in a complex way, when you could tackle a complex problem in a simple way.

Real programmers take the most difficult problems of a project and solve them at the outset, abstracting them behind an API. The rest of the codebase leverages the API, and is simpler and easier to maintain as a result.

When I learned servlets, I immediately appreciated a similar design win: servlets make dealing with threads almost entirely worry-free. For most purposes, you only have to follow one rule: have absolutely no class variables in your servlets and your servlets will be thread-safe. Why? The servlet API takes care of starting and stopping of threads for you: already-started threads call into your servlets.

It is generally accepted that there are a few books on Java threading that every good developer has to have read. But I also appreciate how knowing a technology does not mean using it at every opportunity.

If I embark on a project that has a lot of Java threading, I’ll break out my copy of Java Concurrency in Practice, build a decent threading API (much like the way the servlet API does), and leverage that throughout the rest of my project. Assuming I can’t find an API that already solves my problem.

Similarly, I’ve also abandoned my “productivity guilt” using higher-level languages. If a project allows me to use a higher-level language that abstracts away garbage collection or pointers or threading, I’ll do it, as long as I can still meet the performance requirements.

I no longer think Real Programmers always do their own garbage collection, or always manage their own threads, or always do their own pointer math. They know how to; but they also know when to.

The best programmers ship working, easy-to-maintain code before their competitors do.

Comments on Why It’s Impossible to Become a Programming Expert 12 September 2009

Posted by manniwood in Uncategorized.
add a comment

A couple of quotes from Justin James’ Why it’s impossible to become a programming expert rang true to me:

All too often, an expert programmer is the person who is adept at using a variety of reference tools and documentation to find out how to achieve their goals.

and

…if you were to grill [good programmers] on anything outside a narrow area … there is a really good chance that they will know where to get the answer from but not actually know the answer.

So true!

Steve Yegge makes a good case that programmers should know more math (and they should—I should, anyway ;-) and Paul Graham thinks programmers have a lot to learn from painters and other makers.

But Justin James is on to something when he says, in essence, good programmers are good researchers.

This is where I get to chuckle a bit, and blow my own horn, because I have a master of library science degree. I’m quite happy admitting that I have math envy of people with CS degrees (hey—we’ve all go our weak spots) but I’m a killer researcher. (I wonder if, one day, computer programming will be seen as the truly multi-disciplinary field that it really is? Topic for another post…)

Another thing that struck me was that Justin James said at the start of his article that he wanted to learn more Lisp but just didn’t have the time, but at the end of his article, bemoaned the fact that programming languages and APIs and frameworks have grown to the point where you don’t have the time to become an expert in anything any more.

(I remember a joke that says an expert is someone who knows more and more about less and less.)

Perhaps the modern-day key to being a good programmer is being an effective, targeted generalist. (And a killer researcher.)

For instance, if you don’t do your reading and follow the tech world in general, how are you supposed to evaluate which technologies you should spend your time learning, and which you can safely ignore?

At a higher level, the tech world always seems to follow patterns: data structures (trees, lists, maps, graphs), encapsulation (functions or objects), code generation (compilers, templating systems, IDEs, Lisp macros), etc. A lot of these problems get solved with different tools, but the basic problems and goals keep repeating. Even architectures come back in new guises: how different is browser/server to client/server, really?

If you are a good generic programmer, you probably know a lot of the higher-level terrain of computing, so you don’t lose your bearings getting closer to any particular part of the landscape.

One final observation: although computing sometimes seems cyclical (trends fall into disuse and become re-popularised, disguised as new innovations), other parts of it are following a reasonably discernible evolutionary path.

I’m always fond of quoting Paul Graham when he says programming languages are becoming more and more like Lisp, so if you want to arrive at the final destination of programming languages, learn Lisp today.

Or, as Phil Greenspun puts it in his tenth rule:

Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp.

I don’t know if Justin James will ever get around to learning a lot of Lisp, but I bet he’s been doing a lot of reading about Lisp because he knows Greenspun’s tenth rule.

I’ve posted in previous blogs about wanting to know more Lisp myself. Justin James just gave me another reason: it will make me a more effective generalist.

jQuery Django REST 11 September 2009

Posted by manniwood in Uncategorized.
add a comment

While working on a webapp, I had a situation where a handful of items of the same type were presented on the same page, and all were editable, for the user’s convenience.

In the old days, I would have wrapped the handful of items in a single form, encouraging the user to edit all of the items in one go, and submit the bunch all at once. I would either report back with a “your changes have been saved” page (really old school) or reloaded the page with a “your changes have been saved” notification, and the forms filled out with the new edits.

But with AJAX now a robust and well-supported technology (especially under the spiffy new frameworks), I figured why not

  • put each item in its own form
  • allow each item to be submitted separately
  • use AJAX to notify the user of form submission success/failure using the current page, so that no full form-submission/round-trip/page-load would be necessary?

And if I was going to write back-end code to support this sort of item update, why not see how RESTful I could make it?

First off, let me say that the solution I came up with is only RESTful; maybe only REST-like or REST-ish.

I’ve been doing some poking around on REST and what it really is, and it seems that it is an architecture more than a specification, which is kind of nice, because you don’t have to adopt all of it, especially if it gets in the way of solving your problem.

(Aside: How I Explained REST to My Wife is the best explanation I’ve read of REST.)

Anyway, let me first dispense with two things I did in my code that were decidedly not RESTful.

First, because this work was in the context of a larger application, I required a user to be logged in to perform the updates on the items. My understanding of REST is that it should be stateless. My takeaway is that login credentials would have to be provided with each individual action to enable true statelessness. I ignored this.

Second, true RESTful resources are accessible individually through regular URLs without appended key/value pairs. Hence, the ideal format of URLs to my items would have been


https://myserver.com/items/1234

https://myserver.com/items/2345

whereas I was going to still access my items like so:


https://myserver.com/itemEditRest/?id=1234

https://myserver.com/itemEditRest/?id=2345

And, in fact, even that is not true, because what I was really going to do was pass the form data in the body of the request (POST-like data, if you will) and not even in the URL.

I chose to do this out of a desire for simplicity: all the other attributes of the items were already going to be passed in the body of the request, and were already going to be parsed out and incorporated into my SQL update statement. So getting all of the attributes from one source (the request body) instead of two sources (ID from the URL but other attributes from request body) seemed a better idea, for my purposes.

This still left me with some interesting RESTful stuff to do. First off, because my items were being updated (rather than created or destroyed), I would use the recommended HTTP PUT method, because apparently, PUT is the HTTP method that RESTful services use (by convention) to allow updates on items.

My first goal was to see if I could send a PUT request through jQuery (and, by extension, the XMLHTTPRequest facility provided by all the major browsers).

It turns out I could! Whenever one of my forms’ save buttons (all sharing the same class=”saveButton”) was clicked, I would serialise that form’s data and send it to my server in the request body using PUT:

$('.saveButton').click(function() {
    // code omitted here, but basically just determining
    // the ID of the form that the
    // submit button was in, and assigning it to formID
    var formDataString = $(formID).serialize();
    $.ajax( { url:  '/itemEditRest',
        type: 'PUT',
        data: formDataString,  // data is request body
        processData: false,
        dataType: 'text',  // could also be "json"
                     // but I'll parse return data manually
        success: function(data, status) {
            // code omitted here;
            // do whatever I do when I'm successful
        },
        error: function(xhr, text_status, error_thrown) {
            var status = xhr.status;
            if (status == '400') {
                // code omitted;
                // handle bad form input
            } else if (status == '410') {
                // code omitted;
                // handle item no longer there
            } else {
                // code omitted;
                // handle any other truly
                // unexpected problem
            }
        }
      });
});

Some notes on the above code:

How cool is $(formID).serialize()? My understanding of PUT is that anything can be in a PUT’s request body: a text file, a PDF, a PNG, anything. I happened to want to put my form data in there (rather like a POST request). jQuery’s .serialize() method will take a locator that resolves to a form tag, and serialise all of that form’s “foo=bar, one=two” form inputs to the expected “foo=bar&one=two” query string. Very nice!

One thing that’s interesting is how little information I decided to send back from my server. For instance, if the data were successfully saved, I figured I may as well just send back a response with HTTP status code 204 (which means ‘no content’). jQuery correctly takes any 2xx response and calls its success handler! Very nice.

I figured if my form data were bad (for instance, the user input text where a number should be) I’d use the HTTP return status of 400 (which means bad request). What’s really cool is that an HTTP 400 response can have a body, so I actually send back JSON in the response body. The JSON contains a map of form field names and their associated human readable errors (such as “{ ‘transfer_amount’: ‘Not a valid monetary value’ }”). But it turns out I ignore the JSON in practice, because I’m already doing JavaScript validation on the client side, so the bad fields are already being called out. (In essence, I’m just having the server protect itself from garbage data if a user hits the “Save” button, ignoring the bad input warnings.)

Likewise I use status code 410 for missing data.

Remaining 4xx status codes, and all 5xx status codes can be handled by the “else” part of my error hanlder. Brilliant!

Of course, none of this is any good if my server side cannot produce this output. But with Django, I can.

First off, in my urls.py, I map my URL to my module and function that will handle my RESTful call. Note that all HTTP method calls to itemEditRest will go to item_edit.rest: GET, POST, PUT, DELETE; they will all go to item_edit.rest:

(r'^itemEditRest$', item_edit.rest),

So, here’s what item_edit.rest looks like:

# decorater that requires we be logged in;
# not very RESTful, I know...
@logon.require_login_rest
def rest(request):
    allowed = ['PUT']  # extend this
                       # as we add more
    if request.method == 'PUT':
        return restful_update(request)
    elif request.method == 'POST':
        #not implemented yet
        return HttpResponseNotAllowed(allowed)  # status 405
    elif request.method == 'GET':
        #not implemented yet
        return HttpResponseNotAllowed(allowed)  # status 405
    else:
        return HttpResponseNotAllowed(allowed)  # status 405

Unlike handling a GET/POST directly, as you normally would in Django, you instead look at the request method, and call the appropriate handler from there. (There’s even an HTTP response code for unsupported methods! Very cool…)

Here’s my handler for the PUT method:

(But first, at the top of my file, I have to handle some changes from Python 2.5 [which I use in production] and 2.5 [which I'm using in dev]):

local_parse_qsl = None
import urlparse
if hasattr(urlparse, 'parse_qsl'):
    local_parse_qsl = urlparse.parse_qsl  # Python v. >= 2.6
else:
    import cgi
    local_parse_qsl = cgi.parse_qsl  # Python v. < 2.6

try:
    import json  # Python version >= 2.6
except ImportError:
    import simplejson as json  # Python version < 2.6

OK, on to my handler function:

def restful_update(request):
    key_val_pairs = local_parse_qsl(request.raw_post_data)
    form_values = {}
    for kvp in key_val_pairs:
        form_values[kvp[0]] = kvp[1]

    error_messages = {}
    # validate function populates error_messages
    # dict with form field names as keys, and
    # human-readable error messages as values,
    # e.g. { 'phone': 'Not a valid phone number' }
    # If error_messages remains empty, it means
    # all form fields were good.
    validate(form_values, error_messages)

    if len(error_messages) != 0:
        # status 400 means bad request
        return HttpResponse(json.dumps(error_messages),
                            status=400,
                            mimetype='application/json')

    # code omitted; detect if item
    # or one of its parents is deleted
    if item['is_deleted'] == True:
        # HttpResponseGone is like HttpResponse
        # but uses a 410 status code
        return HttpResponseGone("{ 'details':
               'Item parent deleted.'}",
               mimetype='application/json')

    app_config.SQL_MAP.execute_commit(
        file='items/update.pgsql',
        map=form_values)

    # status 204 means no content,
    # which is very useful
    # we just need to indicate successful save
    return HttpResponse(status=204)

There are a few interesting things to note. First, Django will not parse out the form values for you, because this is not a GET or a POST. This is what all the local_parse_qsl stuff is about above. (Note, too, that because I am not using duplicate form field names, I can confidently turn my form values into a regular dict, rather than resorting to a dict of lists, “just in case” one of my form values is a multiple value.)

Another interesting thing is that although I’m returning some nice JSONified info in the request bodies, my front-end currently does not bother using the data, using only the return codes to decide what to do next. On the other hand, and future RESTful client may find the info useful.

Finally, my return code is not a typical HTTP 200 result; it is an explicit body-less 204 result. It’s lean, it’s mean, and it’s all that’s required to indicate a successful save.

As I’ve been happy to admit, this isn’t completely RESTful, but this dabbling has taught me a lot, and I now know jQuery and Django give me the tools I need to create full-blown RESTful services in the future, should I need to do so.

Now… go and get some REST.

FizzBuzz, Nostalgia, and Baby Steps in a New Language 9 September 2009

Posted by manniwood in Programming.
2 comments

Jeff Atwood wrote about FizzBuzz, a programming “challenge” that apparently a lot of CS grads have trouble solving. (I wonder how one graduates with a CS degree not knowing about if/else and looping? I’m self-taught, so I can’t relate any stories about classmates who just didn’t get it.)

Anyway, while reading about FizzBuzz, I quickly hacked up a solution in Python’s REPL (apparently, this desire to hack a solution is very common), but then I figured I should hack a solution up in Lisp, which I’ve been learning lately:

(loop for i from 1 to 100 do
  (cond ((and (= (mod i 3) 0) (= (mod i 5) 0))
         (print "FizzBuzz"))
        ((= (mod i 3) 0)
         (print "Fizz"))
        ((= (mod i 5) 0)
         (print "Buzz"))
        (t
         (print i))))

Of course, this got me to thinking about all the other programming languages I know, so I had to code something up in every C-syntax language I know:

In C, I can’t do anything outside of a function, so we have the standard main(). Before I learned C, I had C envy, and I figured “real” programming languages did not let you just hack code outside of a containing function (or method and class in OOP languages). But Lisp is considered a real languge, and my solution, above, executes just fine.

#include <stdio.h>

int main() {
    int i;
    for (i = 1; i <= 100; i++) {
        if (i % 3 == 0 && i % 5 == 0) {
            printf("FizzBuzz\n");
        } else if (i % 3 == 0) {
            printf("Fizz\n");
        } else if (i % 5 == 0) {
            printf("Buzz\n");
        } else {
            printf("%i\n", i);
        }
    }
}

In Java, of course, everything has to be an object, so I have to declare an object and a main method just to begin coding. And let’s not forget System.out.prinln instead of printf(). All this extra typing always makes me feel so Java-ish. ;-) The code is visibly wider than the C code. :-0

public class FizzBuzz {
    public static void main(String[] args) {
        for (int i = 1; i <= 100; i++) {
            if (i % 3 == 0 && i % 5 == 0) {
                System.out.println("FizzBuzz");
            } else if (i % 3 == 0) {
                System.out.println("Fizz");
            } else if (i % 5 == 0) {
                System.out.println("Buzz");
            } else {
                System.out.println(i);
            }
        }
    }
}

Turning to JavaScript was fun: I discovered that Ubuntu 9.04 has a command-line JavaScript parser built in. (Or had I installed it before?) So you can just write .js files with shebang notation at the top, and you’re one “chmod +x” away from running your JavaScript on the command line. Go, Rhino! Unlike the other C-like languages, JavaScript puts the Script in JavaScript: I just start hacking code outside of a function definition, as I would with Perl or Python:

#!/usr/bin/js

for (i = 1; i >= 100; i++) {
    if (i % 3 == 0 && i % 5 == 0) {
        print("FizzBuzz");
    } else if (i % 3 == 0) {
        print("Fizz");
    } else if (i % 5 == 0) {
        print("Buzz");
    } else {
        print(i);
    }
}

I’ll always have a soft spot for Perl: it’s the first language I was paid to code in; it launched my career.

#!/usr/bin/perl

for ($i = 1; $i <= 100; $i++) {
    if ($i % 3 == 0 && $i % 5 == 0) {
        print "FizzBuzz\n";
    } elsif ($i % 3 == 0) {
        print "Fizz\n";
    } elsif ($i % 5 == 0) {
        print "Buzz\n";
    } else {
        print $i, "\n";
    }
}

Bash is only slightly yukkier than Perl. Doing math inside $(()) is just a little odd, syntactically, but it gets the job done:

#!/bin/bash

for i in $(seq 100)
do
    if [ $((i % 3)) == 0 ] && [ $((i % 5)) == 0 ]; then
        echo 'FizzBuzz'
    elif [ $((i % 3)) == 0 ]; then
        echo 'Fizz'
    elif [ $((i % 5)) == 0 ]; then
        echo 'Buzz'
    else
        echo $i
    fi
done

SQL fanboy that I am, I couldn’t resist doing this in PostgreSQL. It came out rather clean and terse, which I guess shouldn’t surprise me too much. SQL is my favourite domain-specific language!

select case when i % 3 = 0 and i % 5 = 0 then 'FizzBuzz'
            when i % 3 = 0               then 'Fizz'
            when i % 5 = 0               then 'Buzz'
            else                         cast(i as text)
       end
  from (select generate_series(1, 100) as i) as lst;

And here’s the code I hacked together in Python’s REPL, starting this whole trip down languages-I’ve-known memory lane:

for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print 'FizzBuzz'
    elif i % 3 == 0:
        print 'Fizz'
    elif i % 5 == 0:
        print 'Buzz'
    else:
        print i

I must admit. Looking at the Python, I’m very glad I learned it. Is it just me, or does Python look like pseudo code? Like if you were thinking in your head “well, first I’d have to loop through the numbers 1 through 100, and then I’d have to check modulo 3 and 5, and then…” would the pseudo code in your head look a lot like Python? Python has often been called runnable pseudo code, and looking at all the languages I know, FizzBuzz certainly looks the cleanest in Python.

Then there’s Lisp, which I’ve been teaching myself lately. By contrast, Lisp is just a little bit yucky. I understand that, as Paul Graham says, Lisp doesn’t actually have a syntax, and its lack of syntax allows Lisp to support all sorts of interesting features like its awesome macros.

But tell me I’m not the only one who thinks “((and (= (mod i 3) 0) (= (mod i 5) 0))” is not very much fun for human eyes to parse. If nothing else, prefix notation puts verbs and nouns a little bit too far from each other. (I wonder if I’d be more OK with this if my mother tongue was Latin? Word order doesn’t matter in Latin…)

Makes me wonder what FizzBuzz would look like in PHP and Ruby.

Or Maybe I’ll Learn Lisp Instead 9 September 2009

Posted by manniwood in Programming.
add a comment

I promised myself that I was going to learn more about ORM, but in my off time, I’ve been learning Lisp instead.

I still want to learn more ORM stuff (it’s everywhere, and popularity is a good reason to learn any technology) but it struck me that I already have strong SQL skills and 10+ years hacking Java (not to mention reading the GoF book) so learning ORM really well would be about as difficult as learning a new library. Combine my knowledge of OOP with my knowledge of SQL and use that knowledge to learn a library like Hibernate, or Django’s ORM, or Rails’ ORM. Done.

What I really wanted to do was learn Arc or Lisp, and really get a handle on a language whose feature set Paul Graham says other languages are converging on anyway.

Besides: Andrew Hunt and David Thomas recommend in their book, The Pragmatic Programmer, that I should learn a new programming language every year. I moved my J2EE project to Python/Django back in November; it’s been almost a year already! Time to crack open my copy of Practical Common Lisp and get serious once and for all.

Lorenzo Alberton’s Graphs in the Database 9 September 2009

Posted by manniwood in SQL.
add a comment

Here’s a recommendation to read Lorenzo Alberton’s Graphs in the database: SQL meets social networks.

Alberton shows how to represent both directed and undirected graphs in SQL, with a heavy focus on how that applies to linkedin– and facebook–style social networking.

The SQL fanboy in me loves articles like this.

Git Branches and Remote Repositories 5 September 2009

Posted by manniwood in Uncategorized.
2 comments

I’ll be honest: it took me a while to wrap my head around Git, and how it really worked, and how use it effectively.

I’m going to put in a high recommendation for Scott Chacon’s excellent Git Internals, published by PeepCode. It’s a US$9.00 pdf, and it’s better than any other book or online resource I’ve ever read about Git.

Chacon does a better job than anybody else of showing you how Git works, so that by the time he gets into every day tasks, why you are doing what you are doing makes perfect sense, and you’re just learning the commands and syntax to leverage Git’s capabilities.

When it comes to sharing Git branches between repositories, there’s still not a perfectly good, clear resource out there, so I’m going to share with you what I scribbled on my copy of the tear-out “Git Command Quick Reference” from Travis Swicegood’s Pragmatic Version Control Using Git. Hopefully, what I had to scribble on my quick reference card will appear in a second edition.

First off, let me describe what I want Git to do for me.

I want to have a clone of my repository on another geographically remote server. (For purposes of this discussion, I will assume that it has already been created, and is called myremote.)

I want to create a new branch in my local repository on my development machine at my desktop, and at the end of the work day, I want to push this branch out to my remote repository—perhaps for sharing, perhaps just for easy backup purposes.

Git does not push new local branches out by default, and I (and maybe it’s just me) find the documentation very uninformative on how to manage remote branches.

Here’s how.

Let’s say I’m at my desktop computer and I’m using my local Git repository.

Let’s say I make a new branch based on master:


git branch my-new-branch master

Now let’s say it’s the end of the work day, and my work in the branch my-new-branch is not complete. I don’t want to merge my-new-branch back into master, but I do want to push my-new-branch out to my remote repository for backup purposes.

Here’s how:


git push myremote my-new-branch

From now on, while working locally in my-new-branch, doing a


git push myremote

should do the right thing and push out changes I make in my-new-branch.

Now let’s say it’s a day or two later, and I’ve merged my-new-branch back into master. I know I no longer need my-new-branch, so I delete it locally:


git -d my-new-branch

Done.

Of course, my-new-branch still exists in my remote repository; I’ve only deleted it in my local repository.

I delete my remote copy of my-new-branch like so:


git push myremote :my-new-branch

Yes, that’s right: putting a colon in front of a branch and pushing it will delete it on the remote repository. Not very obvious, is it?

I may go into detail about pushing local tags out to remote repositories in another post.

Happy Gitting in the meantime.

Comments on Mark Pilgrim’s [XML/XHTML] Thought Experiment 4 September 2009

Posted by manniwood in Uncategorized.
add a comment

I think Mark Pilgrim is spot on about the realities of XML (especially XHTML) today: browsers have been accepting malformed XHTML since forever, so generating correctly-formed XHTML is really difficult, because every XHTML generator has a long history of never having had to.

Here’s the thing, though: I really wish Pilgrim had emphasised this point: the only reason why we can’t/won’t/don’t generate valid XHTML today is because of bad decisions that were made in the beginning. Conversely, the reason why the C programming language is parsed in such a consistent way is because of decisions that were made in the beginning of C. There’s no rule that says everything everywhere has to be poorly specified from the start, poorly implemented from the start, become popular, and have to therefore stay poorly specified and poorly implemented because “it’s always been that way, and it’s too difficult to change now”.

If anything, XHTML should be a lesson in the benefits and pitfalls of non-rigorous format specification and parsing.

I would be horrified if the takeaway from Pilgrim’s article was that we should always design clients for all new formats to be as accepting of garbage as possible.

Nonsense!

One lesson should be: take a little more care the next time you write a specification.

I can think of a great example: JSON.

Can you point to trillions of lines of malformed legacy JSON out in the wild? Nope. Bad JSON doesn’t parse. But bad XHTML does. Why the difference?

JSON is simpler than XHTML. It’s easy to implement, and easy to parse.

Another lesson should be: design simpler markup languages.

Then again, sometimes, you need something with the complexity and expressiveness of XHTML.

Yet another lesson could be: if you’re going to design something with the complexity and expressiveness of XHTML, expect the benefits and pitfalls of XHTML.

In other words, perhaps there is an inverse rule between the “richness” of a markup language’s feature set, and the expectations on the robustness the parsers of that will have to parse it.

I’ve been (re-)reading a lot of Paul Graham lately, and one thing he says that rings true to me is:

Everyone by now presumably knows about the danger of premature optimization. I think we should be just as worried about premature design—deciding too early what a program should do.

—Paul Graham, Hackers and Painters

If there’s one thing I think XML generally (not just XHTML in particular) suffers from is culture of premature design—especially in the way that it is used.

I remember a horrible phase of the late 1990s and early 2000s where everything had to be stored in XML. Key/value pairs were stored in XML instead of .ini files; tabular data was stored as XML rather than as .csv or fixed width files; hierarchical data were stored as XML rather than as JSON; sometimes entire databases were stored in XML instead of in an RDBMS… it was a horror show.

George Orwell’s second rule in his essay “Politics and the English Language” was “Never use a long word where a short one will do.” I think the same apples to markup schemes: never use a complex one where a simple one will do.

Or: Don’t use XML unless you absolutely have to.

When it comes to the current state of browsers, though, I think there’s a catch: I think the current demands we put on our browsers require us to use XML—or something of equal complexity that would end up looking a lot like it. There is no .csv or JSON solution for the browsers markup problem. We need something like XML.

With XHTML, that’s exactly what we have.

HTML5 and the abandonment of XHTML 2.0 actually improves the situation: there’s a tacit admission that the way we (mis)parse XHTML4 now is its own markup language that is neither valid SGML nor valid XML. HTML5 is not strictly SGML or XML—but it’s something of equal complexity that ended up looking a lot like it. ;-)

So I have at least two lessons from Pilgrim’s though experiment:

1. We cannot turn back the clock and correctly implmement XHTML as actual, correct, XML. Much to its credit, HTML5 accepts this: it is neither SGML nor XML—it has become its own markup language that merely looks like its forbears. Much of the markup that was “wrong” under XHTML (even though it would parse anyway) is now “correct” under HTML5 (because, well, it parses anyway).

2. Friends don’t let friends use XML. As the evolution of (X)HTML(5) has shown, large, feature-rich markup languages are hard to get right, and although they carry many benefits, they also carry problems. So if you need to solve a problem with markup, really look to see if you can use JSON or .ini or .csv or even a fixed-with flat file before jumping on the XML bandwagon. XML is often overkill anyway—except when it’s not.