Static blogs with Python and GitHub

I have been on and off keeping a blog well before the term was coined. My on-line presence started with hand-written HTML using_notepad for windows 3.0, then HTML 2.0 came out and I start experimenting with HoTMetaL and HotDog (among the first HTML editors on the market). Then it came GeoCities, and when I got my fist domain I started using dynamic sites, using Movable Type at first, then WordPress and later on Drupal_.

But if there is one thing that all this technological wandering taught me, is that I seldom if ever use dynamic features on my sites: the only facility important to me is the possibility to generate feeds and tag pages automatically. Besides, a dynamic sites means more maintenance, and more hosting costs.

Go static

So, this time around I started looking for some simple and lean tool, that would generate static pages that I could push up to some repository server (like Bitbucket or Github), without having to worry about server maintenance and security patches. Ever.

Given the rise of GitHub to a prominent actor in this kind of solutions, I obviously started my quest by looking into Jeckyll and to the very nice Octopress. No doubt these are very excellent solutions, but - at least for me - they have two major drawbacks:

They are based on ruby, which is a language that completely fails to intrigue me (don’t take this as a judgement on the language in itself, though: it’s just my personal taste!).
To leverage the integration with GitHub to its best, you should have the source of your site available on a GitHub repository (and the free account on GitHub only allows for public repository).

I would have rather chosen a python-based solution, provided it would have been actively maintained, and supporting the markdown syntax.
Documentation too had to be extensive and well organised.

Meet Pelican

Lo and behold, after having considered a few alternatives, I settled for Pelican of which - I must confess - I never heard before.

The pros:

The underlying technologies are python, markdown and jinja2. The best of the breed, at least for me.
Stable, and adopted widely enough to have its little community of contributors (and themes, and plugins…)
Very well documented.
Active: the last commit was days before I started working on this project.
Flexible organisation of your source files.
Small but sensible set of feature.

The cons (or at least the few glitches that I have met so far):

Poor support for Python3 (according to the documentation it should have worked out-of-the-box, but in reality it did not work at all for me).
Compulsory use of categories: these are internal to the logic of Pelican, beside the possibility to use tags. In a perfect world, I would have very much preferred to have an abstract taxonomy that could represent both categories and tags.
Non-intuitive naming convention: for example, the file holding the configuration used in development is called pelicanconf.py, while the one use for producing the “deployable” pages is called publishconf.py. (I would have considered devel-conf.py and prod-conf.py more apt names). But also the makefile targets: devserver is the target to start the server, but stopserver is the one for stopping it.
Helpers are half-backed and inconsistent. For one, there are a couple of Makefiles and shell scripts tossed around for no good reason: a python script would have resulted in much simpler and cleaner code. Besides, scripts like pelican-quickstart offers support on certain areas where one probably would feel safe to do stuff manually (pushing the site live), but misses out on important ones like installing extra themes or plugins.
There are very few good themes, and the few that look nice are not responsive.

Configuration

For those interested in trying out a similar setup to mine, here’s a breakdown of my directory layout:

.                            (1)
├── content
│   ├── extra                (2)
│   ├── images
│   └── pages
├── develop_server.sh
├── Makefile
├── output                   (3)
├── pelicanconf.py
├── pelicanconf.pyc
├── plugins                  (4a)
├── publishconf.py
├── publishconf.pyc
└── themes                   (4b)

(1): This is the main repository (in my case a private repository)
(2): Certain files, like robot.txt needs to be copied verbatim every time the site is generated. I stored such files here and used the FILES_TO_COPY setting to make pelican aware of it.
(3): This is the directory where the static pages are generated. I set this to be an ignored directory for the main repo at (1) and within it I cloned my github pages repo (i.e.: the repo, to which I have to push every time I want to to publish a new version of the site). See below for more details.
(4): Both of these directories are git “submodules”, linked to the themes and plugins repositories. In other words: these are “sub-repositories”. See below for more details.

Git magic

It took me a couple of attempts before getting this right, but this is the key two concept to keep in mind in the above configuration:

When invoking a git command, git will progressively look “one directory up” until it will find a valid repository (a .git directory). So, it is possible to nest Git repositories and treat them independently, but one will have to:
- Tell the container repo to ignore the directory where the contained one lives.
- Move to the contained repo every time a command need to be issued for that specific repository. Essentially, the two repository will not even be aware of the existence of the other one.
Submodules are a different story: the repositories marked as submodules are known to their container. In fact, commits in the main repository contain information on what commit is checked out in each submodule. This is very convenient, as - given a master repository called foobar, one can issue g clone --recursive foobar and all submodules will be checked out correctly, too. You can read more on submodules here.

If you have suggestions for a novice pelican user, you are welcome to leave them in the comments below! :)