Bookie Weekly Update: April 22nd 2012

Another week, another few lines of code, and yay for two weeks in a row!


Not a ton here, just some CSS updates and updating the backup script for pulling the INI correctly.

Bookie Parser

I spent some time cleaning up the CSS. I did some research on the most readable fonts for screens and surprisingly, it seems that sans serif wins on digital displays. So I updated the CSS and combined with some work on the Bookie main CSS files to make the readable pages a bit nicer. I've still got some more cleanup to do, but it reads a bit nicer now.

I also fixed the html generated to not have the empty body tag. It was due to the way the readable parsing library was giving me a full html document of content. See the updates over there for some bigger updates.

Finally, I added a form on the main page so you can try it out on a url just by entering it. So if you're just curious what it does, go try it out!

Bookie Api

Just added a ping command. It should help make sure that the configuration is correct for new users. It's also a nice start to a non-admin specific api command. A little bit of cleanup aside from that, but nothing major.


Currently, Bookie uses a library called decruft for parsing html pages for the actual important article content. The bookie_parser project is using a different fork of that called readability_lxml. The author is a bit open to merging changes in and actually says she's in 'maintenance mode'. Since I kind of want a really decent library for this, it's an important feature, I started hacking on it. In the process, this is where my week of hacking went.

First I updated it to allow me to get back only a partial html document vs an entire <html> doc. I then fixed some bugs, started cleaning up the code (adding tests, making the command line client all nice and argepare'y) etc. In the process I noticed that there's a big branch in Github that adds a ton of things like multiple page document support and such. I've started to try to pull his branch into my work and the origin author's code. It's a LOT of git cherry-pick and really a pain since I want to clean up the code as I go. Unfortunately, this just means that Git gets confused on future merges since the code's changed between commits. Ugh!

I'm about half way done though and I hope this will leave us with one solid library to do this parsing. I'm hoping to kind of take over stewardship of the library as I complete this work. It should hopefully make Bookie and bookie_parser all the more awesome.

The coming week

I'm giving a talk on the YUI JavaScript library at Penguicon. This means my hacking time will be a bit less since I've got a presentation to prepare for. Next week's status report might be a bit light and boring, but hey, maybe I'll scrounge up some more beta users of Bookie while at the conference.