RDF.rb is easily the most fun RDF library I've used. It uses Ruby's dynamic system of mixins to create a library that's very easy to use.
If you're new at Ruby, you might know about mixins in other languages--Scala traits, for example, are almost exactly functionally equivalent. They're distinctly more powerful than Java interfaces or abstract classes. A mixin is basically an interface and an abstract class rolled into one. Rather than extend an abstract class, one includes a mixin into your own class. A mixin will usually require that a given class implement a particular method. Ruby's own Enumerable class, for example, requires that implementing classes implement #each. For that tiny bit of trouble, you get a ton of methods (listed here), including iterators, mapping, partitions, conversion to arrays, and more. (If you're new to Ruby, it might also help you to know that #method_name means 'an instance method named method_name').
RDF.rb uses the principle extensively. RDF::Repository is, in fact, little more than an in-memory reference implementation for 4 traits: RDF::Enumerable, RDF::Mutable, RDF::Queryable, and RDF::Durable. RDF::Sesame::Repository has the exact same interface as the in-memory representation, but is based entirely on a Sesame server. In order to work as a repository, RDF::Sesame::Repository only had to extend the reference implementation and implement #each, #insert_statement, and #delete_statement. Nice! Of course, implementing those took some doing, but it's still exceedingly easy.
RDF::Enumerable is the key here. For implementing an #each that yields RDF::Statement objects, one gains a ton of functionality: #each_subject, #each_predicate, #each_object, #each_context, #has_subject?, #has_triple?, and more. It's a key abstraction that provides huge amounts of functionality.
But the module system goes the other way--not only is it easy to implement new RDF models, existing ones are easily extended. I recently wrote RDF::Isomorphic, which extends RDF::Enumerable with #bijection_to and #isomorphic_with? methods. The module-based system provided by RDF.rb means that my isomorphic methods are now available on RDF::Sesame::Repositories, and indeed anything which includes RDF::Enumerable. This is everything from Repositories to Graphs to query results! In fact, query results themselves implement RDF::Enumerable, and thus implement RDF::Queryable and can be checked for isomorphism, or whatever else you want to add. This is functionality that Sesame does not have natively, and which I wrote for a completely different purpose (testing parsers). Every RDF::Enumerable gets it for free because I wanted to compare 2 textual formats. Neat!
For example, here's what it takes to extend any RDF collection, from RDF::Isomorphic:
require 'rdf'
module RDF
##
# Isomorphism for RDF::Enumerables
module Isomorphic
def isomorphic_with(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
def bijection_to(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
end
# re-open RDF::Enumerable and add the isomorphic methods
module Enumerable
include RDF::Isomorphic
end
end
Of course, this just can't be done without monkey patching. Mixins and monkey patching together make for a powerful toolkit. To my knowledge, this is the first RDF library that takes advantage of these features.
It's possible to provide powerful features to a wide range of implementations with this. RDF.rb does not yet have a inference layer, but any such layer would instantly work for any store which implements RDF::Enumerable. Want to prototype some custom business logic that operates over existing RDF data? Copy it into a local repository and hack away. No need for the production RDF store to be the same at all, but you can still apply the same code.
As a counter-example, compare this to the Java RDF ecosystem. There are some excellent implementations (RDF::Isomorphic is heavily in debt to Jena), but they're all incompatible. Jena's check for isomorphism is not really translatable to Sesame, or anything else. RDF.rb, in addition to providing a reference implementation, acts as an abstraction layer for underlying RDF implementations. The difference is night and day--with RDF.rb, you only need to implement a feature once, at the API layer, to have it apply to any implementation. This is not a knock at the very talented people behind those Java implementations; making this happen is a lot of work in a language without monkey patching, and RDF.rb is only as good as it is because of the significant influences those projects have been on Arto's design.
The end result of the mixin-based approach is a system that is incredibly easy to extend, and just downright fun. It would be a fairly simple task to extend a Ruby class completely unrelated to RDF with an #each method that yields statements, allowing it to work in RDF::Enumerable. Voila, your existing classes now have an RDF representation. Along the same lines, if one is bothered by the statement-oriented nature of RDF.rb, building a system which took a resource-oriented view would not require one to 'break away' from the RDF.rb ecosystem. Just build your resource-oriented model objects and implement #each, and away you go--you can now run RDF queries and test isomorphism on your model. Build it to accept an RDF::Enumerable in the constructor and you can use any existing repository or query to initialize your model.
RDF.rb is not yet ready for production use, but it's under heavy development and already quite useful. Give it a shot. You can post any issues in the GitHub issue queue.
The W3C SPARQL working group (previously the Data Access Working Group) has recently released their first versions of the updated SPARQL standards, or SPARQL 1.1. The group's roadmap has these finalized a year from now, but they have asked for comments and I suppose these are mine.
I believe that these documents are a step further down a wrong path for SPARQL and, to a lesser degree, for RDF in general.
The latest round of changes includes a number of changes to SPARQL, including aggregate functions, subqueries, projection expressions, negations, updates and deletions, more specific HTTP protocol bindings, service discovery, entailment regimes, and a RESTful protocol for managing RDF graphs (the last one is not really just SPARQL, but it's in the updates).
So I'll start with my comments, which are mostly critical.
To start, an RDF-specific complaint, not really related to the rest of the post. Why would the one mandated format to be supported in the new RESTful RDF graph management interface be RDF/XML? What would it take for a the semweb community to move on from this failed standard, which has had known issues for more than 5 years? (those two issues were raised in 2001 and are currently marked 'postponed') Why should such an increasingly irrelevant standard as RDF/XML be chosen instead of the widely-supported and easy to implement N3, N-Triples, or Turtle?
As for SPARQL, the 1.1 standards continue to give named graphs first class citizen status, both in the web APIs and in more SPARQL syntax than they had before. It's not so much triples as quads these days. Other meta-metadata, such as time of assertion or validity time, are not covered. While named graphs are admittedly a particularly often-found case, why does it need to invade the syntax of SPARQL? Not every use case needs named graphs, but every SPARQL implementor must support them. The 1.1 standard now includes precedence rules when for named graph and base URIs when they conflict in HTTP query options and inside the query itself, attempting to solve this self-created problem.
How about subqueries? What about variables during insertions? What about subqueries during insertions? Do we really need implementors to consider these kinds of things for every SPARQL endpoint on the web?
None of these things is really all that bad by itself, but one must consider the bigger picture. SPARQL 1.0 was released in January of 2008 (with some comment period before that) and there is still no implementation of a SPARQL engine in PHP or Ruby (exceptions apply, see [1]). One does not increase the participation of that ecosystem by adding a selection of entailment regimes to the standard.
While a SPARQL implementation exists for the excellent RDFLib in Python, it's only one of the current big 3 (with Ruby and PHP) in web development, and there's only one. The fact that no SPARQL engines exist for Ruby or PHP should be considered a failure of the standard. Why are we adding complexity when there is no SQLite for SPARQL? Why are there at least 3 monolithic Java implementations (Jena, Sesame, Boca), all financially sponsored to some degree or another, but so little 'in the wild'? How long can RDFLib herd 16 cats as committers on the project? While I don't have a lot of direct experience with RDFLib, I pity the project 'leads' (I cannot find evidence that the project is sponsored or that anyone is 'in charge') trying to look towards the future of implementing 6 working papers of new standards.
One of the biggest success stories for semweb in widespread use is the Drupal RDF module, which has found wide acceptance in the Drupal community and started an ecosystem of modules. Drupal 7 will output RDFa by default and Drupal 6 supports a ton of wonderful features, including reversing the RSS 1.0 to 2.0 downgrade back to RDF. But Drupal remains a producer of simple triples and a consumer of SPARQL queries generated by other endpoints. Data in those sites remains locked down. Why? Because implementing SPARQL in PHP is nontrivial, and in a chicken-egg problem, nobody's paying for it before someone has a need for SPARQL.
I could go on, but these are symptoms (well, not that RDF/XML thing, I don't think there's a good reason for that). I feel that the working group is attempting to solve the wrong problem. Namely, it is attempting to define a somewhat-human-readable query language, SPARQL that works for almost all use cases. But why must the whole 'kitchen sink' be well-defined? Such a standards body should be attempting to define the easiest possible thing to implement and extend, not the the last tool anyone would ever use.
The SPARQL 1.0 standard's grammar was well-defined as a context free grammar. It also had extension functions, which were uniquely defined by URIs. Why the distinction between CFG elements and extension functions? Why not make syntax elements like named graphs and aggregate functions as discoverable as extensions? Well, the reason is that it's hard to write a parser of a human-readable format and make those things optional and discoverable. (Here's a SPARQL parser implementation in Scala, a language with powerful pattern matching features for good parsing, and it's 500 lines of code. It compiles to S-expressions, the parsing of which is about 30 lines. Hmm.)
If the protocol had been defined as S-expressions, the distinction would not exist and the syntax could be as expandable as the current functions (the current syntax would just be more functions). The new 1.1 service discovery mechanism is excellent and extendible and would allow the standard to grow dynamically instead of becoming bogged down in features for particular use cases. New baseline implementations of SPARQL would be easy to implement and grow incrementally, and the current human-readable format can be implemented in terms of these expressions.
The web of ontologies has grown with ad-hoc definitions created by people used to fill their needs. Standards grow organically around the ones that are needed most, others languish. Why should SPARQL functions have this kind of flexibility, but not the syntax? The distinction makes implementation overly difficult and is slowing the expansion of the Semantic Web.
In fact, it turns out that Jena has been parsing to S-expressions for some time. If you're an implementor, why would you do it any other way, especially when the standard can change as much as it does in 1.1? Any implementation will have to come up with something equivalent to S-expressions if you are going to be able to upgrade your engine implementation to meet standards like this when they are finalized. If people are doing it anyway, why not just make it the standard?
The SPARQL Working Group should be working on a definition for a function list and discovery protocol for S-expressions, and not for what we currently call SPARQL. What we call SPARQL is something that should compile to a simpler standard if various vendors want to implement it. S-expressions allow maximally simple parsing maximally simple serialization, and the ability to do feature discovery on core features of the language, not just portions which are blessed with the ability to be extended. S-expressions are easier for machines to generate for wide variety of automated use cases, far wider, I would venture, than the set of use cases for the human-readable queries.
Please, please, please do not doom the world to write the SPARQL equivalent of SQLAlchemy and ActiveRecord for the next 20 years! We can define a standard that machines can use natively. Now's the time.
At any rate, that's my beef in a nutshell. The working group won't come up with a successful standard until it's easy enough to implement it that workable implementations appear in the languages that are defining the web today. And when people can use those languages to implement that standard without an army of VC-funded engineers.
The SPARQL 1.1 proposals make the standard better than before, but it's not the standard we need. The SPARQL algebra is what needed expansion and specification, not the syntax.
[1]: The PHP ARC project has an implementation, but it attempts to directly convert SPARQL to an SQL query on particular table layout in MySQL, and is difficult to convert to general use. Despite SPARQL's complexity, ARC managed to implement this in just 6400 lines of code. The parser alone is 2000 lines and the engine another 4400. The serialization/parsing libraries, however, are fine, and were integrated successfully into the Drupal RDF module. The PHP RAP project has also done some good work and is perhaps more wrappable than ARC, but implements only a subset of SPARQL.
scor did the hard work on putting together a presentation on RDFa for the keynote at DCDC. Just like last year, I did the legwork on the screencast while smarter, more productive people did the hard parts. It would seem that I am the voice of RDF in Drupal.
But alas, due to timeframe it was not to be. Fortunately, Boris played a version during his RDFa presentation at Drupalcon DC 2009, and scor ran two BoFs which I couldn't really participate in, having to give Openband's presentation on Thursday and being cut down by food poisoning on Friday.
Now scor did a writeup on the semweb group, and Dries noted it on his blog. I won't belabor the point when others have already worded it better by reiterating it all here, but the more exposure RDFa gets, the better! More credits for the work are in both of those links. You can view the video in the above links flashy style, and I've also put up the higher-res original here.
We finished the video for Dries' keynote just under the wire, as pretty much all such events need to be. Arto, Miglius and I had stayed up until past sunup for the last few days to make it happen. First Dan left, and then Miglius left on Saturday morning so that he could get stuck in Frankfurt for 24 hours. Once he got in to Boston, he logged on quick like a bunny and went back at it. Arto and I worked another 30-odd hours during Saturday and Sunday. Sometime during Monday, which I largely slept through, some of the office folk sent out a message noting that that our pile of pizza remains, chicken bones and coffee stains was not particularly helpful to the kitchen's ambiance. I don't think they know who did it, and I'm kind of afraid to fess up. Sorry, ladies.
Unlike most demo work, a ton of what went into this will be useful later. If our organizations were not keen on using RDF, we'd not have worked on this so hard. Arto's module stuff is anything but smoke and mirrors, and we figured out a lot of limitations to Exhibit and Potluck that will be important to understand later. These are now posted in our internal wiki and I will go and post them on the Simile project's site if I ever get a chance. It's worth a whole post in and of itself.
While Arto busied himself turning Drupal into the world's easiest to use RDF endpoint, Miglius and I combed datasets that would make for a decent demo and messed with Exhibit views. There's a lot of RDF data out there, but it doesn't all lend itself to being shown on a map, and people can only read so much on a video screen during a presentation. At the end of the day, I'm the only one with Leopard (and thus Screenflow), so I ended up doing the actual screencast.
Screencasting is an interesting thing. It's easier to script than a regular movie, but difficult to properly realize. There's a fine line between too little and too much data, without having awkward pauses and without skipping over too much. You have to take into account that different viewers have different levels of experience with the material, different reading speeds, whatever. I made a detailed narration that was a bit too fast paced for the keynote; that wasn't a problem, as Dries had already communicated that he'd prefer to do the narration himself.
On Monday, Arto and I woke up about a half hour before the talk and got on IM. As the talk began, we realized that we really needed to have this data up where people could get it. And we really wanted them to be able to get it--we'd worked ridiculous hours on this thing. So that's when we decided the site needed to be public.
We started to make that happen. There was a fair bit of configuration to be done to make it useful; Arto got the video onto s3 while I messed about with some permissions and redirects. I typoed just about everything I did related to that--I don't think I did a single thing once. Halfway through the whole thing I realized I had stage fright; I couldn't type because my hands were shaking. The video I had worked so hard on was about to be placed up to awe or bore a sizable number of people, on whom much depends. And there was still a possibility that Dries would use my narration, in my mind, as we'd given him the final cut of the video with extremely little time to rehearse anything he wanted to say. So there I was, still in bed, with the door shut and the window blocking out what passes for sunshine in Stuttgart, and I was nervous as hell about being up in front of a crowd.
Stage frightened of nobody at all. What a cool world we live in, that such a feeling can now be transferred over the wire.
Anyways, we did a good job (well, mostly Arto did a good job) of getting the video out there for anyone who wanted it, and at least a couple of people did. Here's another copy, if you're curious: