Bookie Status Report: Jun 15th 2011

I just finished up reading Start Small, Stay Small and there were some good points in there. One is that writing about your progress on a project each week helps people move forward. There is something about putting down what you've accomplished and what you plan into the public that helps keep the motivation motor running.

In an effort to keep Bookie from stagnating, I think that's a good thing to start doing. Count this as the first of a series of weekly progress reports I'm going to be doing. I also like that it helps show, beyond links to commit logs, that Bookie is moving forward and getting updates.

This past week has been a bit crazy. There hasn't been a ton of time to put into things, but I've managed to move a few big things forward:

First, work on making Bookie work via user accounts and logins is moving forward. Basically all of the urls in the application needed to be updated. Currently there are two sets. If you leave out a username from the url, you get overall, full site info.

In this way, a url of /recent will pull the 50 most recent bookmarks from all users on the site. However, /rick/recent will only pull the 50 most recent bookmarks of the user rick. The API urls needed to be updated as well. There's a ton of work in getting this going, but it's a major step of progress to allowing me to host a version of Bookie that other users can sign up for. Since that's really the big goal that I've set myself by the end of the year, I'm feeling good on this one.

The idea of multiple users has me realizing that my little readability.py script that fetches url content from bookmarks and stores the clean, readble parsed html for that page needs some work. It'll never scale that way. So I've split the work into a couple of parts.

One part is a node.js script that will fetch a list of urls to go fetch and asyncronously goes out and fetches the html content. It then shoves the bookmark id and the content into a beanstalkd queue for processing. The queue is polled by a python script that then calls a new Bookie API call with the content and the id. Bookie then runs the parsing code against the content and stores it in the database. The async code on node.js can fetch the html content in a hurry. In testing with my SSD hard drive and sqlite, I'm able to pull, process, and store more than one url per second. This is with 1 node.js producer and two running instances of the python consumer.

I'm definitely looking forward to ramping this up on a real server with Postgresql running. I'd love to be able to pull down and parse content at some decent rates to be able to cope with new users signing up to the service.

So that's this week's report. Next up is more work on the multi user setup. The tag urls still need work and all of the unit tests that I had need to be updated to test the new urls. This also means some duplicate tests to check both with/without usernames in the urls. Work is never done!

If you care to help or follow along make sure to follow the project on github: http://github.com/mitechie/Bookie