Migrating Drupal comments to Disqus

After learning about Disqus this week, I became quite enamored of it. Drupal's comment system is swell and all, but I know that I don't have time to set it up as to be as nifty as it could be. So I think it's swell if someone wants to take it off my hands for free. The downside is that if I upgrade to D7, I wouldn't get those swell forthcoming RDF-a comments. But there's a Disqus API, so I can solve that later if it's important to me.

The problem is that having both Disqus and comments enabled makes things confusing. Each post has a link at the bottom with '13 comments 13 comments and 0 Reactions' or something similar. Not swell. I don't want to abandon my existing comments--in particular, my Puppet vs Chef and SPARQL posts got good feedback from their respective communities. The Puppet post is #1 on Google for 'puppet vs chef' largely because the authors of both projects commented there. Hopefully, Disqus would let people track those authors to comments on my site in the future.

So the obvious solution is to just import my comments into Disqus! I'll just download some module or converter and...and...nuts. Wordpress and Blogger have the goods, but not Drupal. So there went another afternoon: I wrote a basic importer in Ruby. Ruby is where the good Disqus library was (even if I did have to fix a bug).

Drupal comments have features that do not map to cleanly to Disqus, so it has weaknesses. All the comments are anonymous, and I really miss markdown. But it worked for me. Hopefully it helps someone else, too.

Rapid and Incremental Infrastructure Development with Puppet

So pretty much everyone has at least heard of Puppet by now. And yes, it's awesome. But it can be daunting to get started--generally speaking, configuring something with Puppet takes longer than just doing it, and that means that setting something up requires, if not Big Design Up Front, some Big Work Up Front. You need a puppetmasterd, and some servers, and some config, and it gets a complicated quickly.

What I want is a simple testbed, a dry run setup, where I can run my code repeatedly, just like during normal development. I want to develop incrementally and flexibly. Can it be easier? Of course it can, that's why we have clouds. I've been working on some infrastructure for a new project, and the workflow I'm using is easy and effective. I just boot a community EC2 AMI, get git, and pull down my puppet repo. That repo has a handy script that installs ruby, gem installs puppet (every distro is using a dated version--forget them), and then I'm ready to go.

The directory structure, like most Puppet installations, looks like this:

- etc
  - puppet
    - setup_puppet.sh
    - modules
      - (the goods)
    - manifests
      - nodes.pp
      - site.pp

From here, it's easy. puppet manifests/site.pp will run the config locally, without a server or any other trouble. In my development, I have a branch for the actual puppetmaster, which replaces nodes.pp from a default node that includes everything into something more meaty. Everything else is the same. From here, I can hack away, testing as I go. Add a --noop to do dry runs. Add -d to enable debug mode and see exactly what commands are run.

The setup script is dirt simple:

#!/usr/bin/env bash

# libopenssl-ruby1.8 isn't necessarily required with this, 
# but you do need it for the puppetmaster server.
sudo apt-get install -y rubygems libopenssl-ruby1.8

sudo gem sources --remove http://gems.rubyforge.org/
sudo gem sources --add http://rubygems.org 
sudo gem install puppet --no-rdoc --no-ri

# Debian weird path
export PATH=$PATH:/var/lib/gems/1.8/bin

4 hours of very productive infrastructure work cost me about 35 cents. No puppetmasterd, no existing servers, and no temptation to store meaningful config on the local disk (since I shut these instances down after a few hours, as I'm easily distracted). No messing around. I really like this workflow.

As an aside, this workflow is fast enough that two Ubuntu gotchas are now actually a problem for me. Firstly, the official AMIs from Cannonical now require the initial login to be via the 'ubuntu' user, which is a pain, because now root can't effectively git clone anything without more hoops. Secondly, Rubygems is broken on Debian. It flatly refuses to run gem update --system, and when you force it to with the rubygems-update gem, it manages to lose track of all installed gems, including Puppet. Since any Puppet gathers all information before doing anything, it will read that Puppet is installed, and any code that runs the gem update won't understand that Puppet now needs to be reinstalled. I'm not sure this can be worked around in Puppet at all; it might have to be out of band.

This is a well-known issue, it's more than 2 years old, and I can't find why this is the way it is. Google results are too full of people working around the problem to find an actual discussion of the original issue. Rubygems.org has a faq on the issue, but I did find a the Debian issue in which this appears to have been done instead of simply fixing the problem. I'm not sure if this has to do with a wonky directory setup, or if Debian just assumes I'd put an eye out with that much Power. Either way, tons of people have problems with it, and it seems curious to me that Debian has decided they should suffer. And it is a decision: Rubygems has been updated a number of times since 2007, and the disabling code had to be explicitly upgraded at least once that I found.

Anyways, the next bit of agile infrastructure work I do will be on a RightScale Centos 5.4 AMI. What's the point of being agile if you don't try new things?

Updated: Centos 5.4 is running Ruby 1.8.5. I guess being agile is about trying hopelessly outdated things, too.

Hacking on RDF in Ruby

RDF.rb is easily the most fun RDF library I've used. It uses Ruby's dynamic system of mixins to create a library that's very easy to use.

If you're new at Ruby, you might know about mixins in other languages--Scala traits, for example, are almost exactly functionally equivalent. They're distinctly more powerful than Java interfaces or abstract classes. A mixin is basically an interface and an abstract class rolled into one. Rather than extend an abstract class, one includes a mixin into your own class. A mixin will usually require that a given class implement a particular method. Ruby's own Enumerable class, for example, requires that implementing classes implement #each. For that tiny bit of trouble, you get a ton of methods (listed here), including iterators, mapping, partitions, conversion to arrays, and more. (If you're new to Ruby, it might also help you to know that #method_name means 'an instance method named method_name').

RDF.rb uses the principle extensively. RDF::Repository is, in fact, little more than an in-memory reference implementation for 4 traits: RDF::Enumerable, RDF::Mutable, RDF::Queryable, and RDF::Durable. RDF::Sesame::Repository has the exact same interface as the in-memory representation, but is based entirely on a Sesame server. In order to work as a repository, RDF::Sesame::Repository only had to extend the reference implementation and implement #each, #insert_statement, and #delete_statement. Nice! Of course, implementing those took some doing, but it's still exceedingly easy.

RDF::Enumerable is the key here. For implementing an #each that yields RDF::Statement objects, one gains a ton of functionality: #each_subject, #each_predicate, #each_object, #each_context, #has_subject?, #has_triple?, and more. It's a key abstraction that provides huge amounts of functionality.

But the module system goes the other way--not only is it easy to implement new RDF models, existing ones are easily extended. I recently wrote RDF::Isomorphic, which extends RDF::Enumerable with #bijection_to and #isomorphic_with? methods. The module-based system provided by RDF.rb means that my isomorphic methods are now available on RDF::Sesame::Repositories, and indeed anything which includes RDF::Enumerable. This is everything from Repositories to Graphs to query results! In fact, query results themselves implement RDF::Enumerable, and thus implement RDF::Queryable and can be checked for isomorphism, or whatever else you want to add. This is functionality that Sesame does not have natively, and which I wrote for a completely different purpose (testing parsers). Every RDF::Enumerable gets it for free because I wanted to compare 2 textual formats. Neat!

For example, here's what it takes to extend any RDF collection, from RDF::Isomorphic:

require 'rdf'
module RDF
  ##
  # Isomorphism for RDF::Enumerables
  module Isomorphic

    def isomorphic_with(other)
      # code that uses #each, or any other method from RDF::Enumerable goes here
      ...
    end

    def bijection_to(other)
      # code that uses #each, or any other method from RDF::Enumerable goes here
         ...
    end
  end

  #  re-open RDF::Enumerable and add the isomorphic methods
  module Enumerable 
    include RDF::Isomorphic
  end
end

Of course, this just can't be done without monkey patching. Mixins and monkey patching together make for a powerful toolkit. To my knowledge, this is the first RDF library that takes advantage of these features.

It's possible to provide powerful features to a wide range of implementations with this. RDF.rb does not yet have a inference layer, but any such layer would instantly work for any store which implements RDF::Enumerable. Want to prototype some custom business logic that operates over existing RDF data? Copy it into a local repository and hack away. No need for the production RDF store to be the same at all, but you can still apply the same code.

As a counter-example, compare this to the Java RDF ecosystem. There are some excellent implementations (RDF::Isomorphic is heavily in debt to Jena), but they're all incompatible. Jena's check for isomorphism is not really translatable to Sesame, or anything else. RDF.rb, in addition to providing a reference implementation, acts as an abstraction layer for underlying RDF implementations. The difference is night and day--with RDF.rb, you only need to implement a feature once, at the API layer, to have it apply to any implementation. This is not a knock at the very talented people behind those Java implementations; making this happen is a lot of work in a language without monkey patching, and RDF.rb is only as good as it is because of the significant influences those projects have been on Arto's design.

The end result of the mixin-based approach is a system that is incredibly easy to extend, and just downright fun. It would be a fairly simple task to extend a Ruby class completely unrelated to RDF with an #each method that yields statements, allowing it to work in RDF::Enumerable. Voila, your existing classes now have an RDF representation. Along the same lines, if one is bothered by the statement-oriented nature of RDF.rb, building a system which took a resource-oriented view would not require one to 'break away' from the RDF.rb ecosystem. Just build your resource-oriented model objects and implement #each, and away you go--you can now run RDF queries and test isomorphism on your model. Build it to accept an RDF::Enumerable in the constructor and you can use any existing repository or query to initialize your model.

RDF.rb is not yet ready for production use, but it's under heavy development and already quite useful. Give it a shot. You can post any issues in the GitHub issue queue.

Syndicate content