The Promising Future of the Unlicense

So the public launch of the unlicense movement on January 1st has gone better than expected. Arto's post hit top of the list for 'most controversial' on Reddit for a while, and unlicense.org itself is seeing decent traffic. Since the target audience of the site is developers, even a few thousand eyeballs is a good number, and that number is being handily beaten.

There are three main concerns that seem to keep appearing in discussions in regards to releasing software into the public domain. I'd like to briefly offer a response to them, and then provide an example of why public domain might be what you're looking for for your software.

Concern 1: Only the GPL preserves Freedom!

This is amazing to me. There is an incredible amount of worry, it seems, that faceless corporate Cthulhu-associated entities are lurking in the shadows, just waiting to pounce on vulnerable code unprotected by the GPL's armor. These faceless horrors have crushed promising startup projects with vulnerable licenses like Apache (the world's most-installed web server), SQLite (the world's most-installed SQL server) (and public domain, not actually licensed at all, by the way), BIND (the world's most-installed DNS server), and other sad stories. I find this concern so unrealistic it boggles the mind.

But just for fun, let's start a real flame war. Numerous folks out there claim, with a certain sort of correctness, that the GPL keeps software free from the lockdown of derivative works. This is called Freedom (capital F). However, that software's Freedom is enforced by a copyright, which restricts the actions of people by proscribing certain kinds of copy and use. These proscriptions are enforced via a system with far reaching effects, which prevent me, for example, from purchasing a DVD player that disregards region codes.

The inevitable conclusion is that the GPL is about valuing the 'Freedom' of bits over the freedom of humans.

You, dear reader, are far more important than my code, regardless of your choice of license. I have no time for a moral system that makes such claims on your autonomy. I will avoid that system as much as possible: by using the unlicense.

Concern 2: Statutory Law vs Common Law

Statutory law, such as that of many European states, often fails to specify a process for explicit public domain donations, leading many to wonder if such a thing is even possible. Folks better educated than I seem to have differing opinions. But I'll note that if you're concerned about it, there is precedent. The original w3 server, which was owned by CERN (guess what the 'E' stands for?), was placed into the public domain in 1993.

Apache and Netscape both trace their heritage back to European public-domain software. You'd be foolish to accept this post as legal advice, of course, but in my mind, those are perfectly acceptable European counter-examples.

Concern 3: Moral rights cannot be relinquished

A true concern, many jurisdictions specifically prohibit the relinquishment of 'moral rights' of authorship, namely, the right to be the named author of a work, and the right to not have works you did not author attributed to you. To my mind, this a not a problem of copyright, it's basically statutory encoding of the fact of authorship. The issue is muddied by several states conflating copyright enforcement with moral rights enforcement. While again not legal advice, I'd say that simple attribution covers you.

Concerns 2 and 3 really bother me, because legal arguments against them are complicated and require specialized knowledge. I'm not qualified to argue them in a bulletproof manner. But most software will never become big enough to have a license (or unlicense) issue, and you can issue someone an explicit license if it's ever a problem (as SQLite does).

How public domain lets your software grow

"CERN's decision to make the Web foundations and protocols available on a royalty free basis, and without additional impediments, was crucial to the Web's existence. Without this commitment, the enormous individual and corporate investment in Web technology simply would never have happened, and we wouldn't have the Web today."

Tim Berners-Lee, Director, WWW Consortium

The public domain is the best way that others can take your ideas and run with them. CERN's public-domain dedication is probably the best example of that, but if you want your software to change the world, you need to allow others to use it as freely as possible. I'll give you a little personal example.

A few days ago, I published promising future, a Ruby gem that adds Scheme-style promises and futures to Ruby. I did this because I happen to love promises and futures, and it drives me absolutely nuts whenever they are not available.

If my goal is (and it is) to always have promises and futures available, the ideal would be that it were in the Ruby core library, or even a language future. I could start a rallying cry on a mailing list somewhere, and may yet, but my odds are slim. But what if a much better known author, with a much more popular library, wants to use these lovely little things in his code? Well, they could add a gem dependency, but that's not a popular option for various reasons. If the licenses work out, they could incorporate the code, usually requiring a note of attribution.

But my promises and futures code has the maximum possible flexibility. Anyone, with or without any license, can copy/paste my promises and futures into their code, without attribution, and be done with it. Problem solved. My code lives on, or at least inspires the creation of equivalent functionality implemented in a better way. And maybe, one day in a promising future, every Rubyist everywhere will have promises and futures available. That SQLite can be embedded in other software is a huge factor in its unparalleled adoption.

How could my software be more free? How could I be more free? How could you be more free? What could be better? Public domain promises a very promising future indeed.

Quantity.rb: first-class quantities for Ruby

I've just put out a first release of Quantity.rb, which scratches an itch I had and much more.

Quantity.rb provides first-class Quantity objects, like '12 meters', '1 liter', or '1 dozen'. More significantly, it supports things like '12 meters * 1 kilogram / 2 seconds**2'. It was an outgrowth of an attempt to do some automated unit conversions of a project I am working on involving some monitoring, and I wasn't happy with what was out there. In particular, I wanted to eventually provide the ability to divide one time series of data points by another, regardless of units. It needed to be something more than 'meters to feet'. Maybe it didn't need to be this involved, but it's the right way to do it: anything can be built on top of this.

It's not the first attempt, and perhaps not even the first success. Quanty is the earliest one I can find, and it does most of what I want. Unfortunately, it uses yacc, which I have no intention of learning, and the English docs are sparse. There's something called the Quantity Management Framework, but I can't find much info about it.

Besides, I figured it would be fun. I would learn something, and sometimes it's good to have a project with a well-defined scope so that you can Finish It. Especially when you have a handful of muddy projects mixed with a handful of very long term ones. So it was the charge of the light brigade. And I did learn something. Earlier versions used some class inheritance features that made me learn far more about Ruby's metaobject system than I had ever hoped to. That was kind of like this for me.

Anyways, there's more to do, but I'm pleased with the results so far. Some of the things you can do, from the README:

require 'quantity/all'
1.meter                                                 #=> 1 meter
1.meter.to_feet                                         #=> 3.28083... foot
c = 299792458.meters / 1.second                         #=> 299792458 meter/second

newton = 1.meter * 1.kilogram / 1.second**2             #=> 1 meter*kilogram/second^2
newton.to_feet                                          #=> 3.28083989501312 foot*kilogram/second^2
newton.convert(:feet)                                   #=> 3.28083989501312 foot*kilogram/second^2
jerk_newton = newton / 1.second                         #=> 1 meter*kilogram/second^3
jerk_newton * 1.second == newton                        #=> true

mmcubed = 1.mm.cubed                                    #=> 1 millimeter^3
mmcubed * 1000 == 1.milliliter                          #=> true

[1.meter, 1.foot, 1.inch].sort                          #=> [1 inch, 1 foot, 1 meter]

m_to_f = Quantity::Unit.for(:meter).convert_proc(:feet)
m_to_f.call(1)                                          #=> 3.28083... (or a Rational)

It's made my IRB shell quite the handy calculator. Try it out for that, if you're CLI-inclined.

This whole affair was also an excuse to release something meaningful via the unlicense (I also did a growl-amqp thingee but it hardly counts). The unlicense is a framework for releasing code not with a license, but as public domain. Public domain is something that old timers remember: what used to older copyrighted works. Originally some pithy few years, copyright these days now lasts for an author's lifetime + 70 years, and it's been several years since anything entered the public domain in the US due to numerous extensions. Some countries have gone so far down the rabbit hole that one cannot dedicate things to the public domain.

This is all the more ridiculous when one considers that most people now believe copyright is bunk. Eventually, legal frameworks will respect how the world is, and not how it was. A lot of people won't release software under the public domain because of the spotty legal status. A few years ago, people were equally afraid of the GPL until some court cases affirmed the common-sense interpretation of the license. Let's release some public domain software and push the issue of what happens when you don't have a license at all. I was hoping to release this on the first of January for public domain day, but it needed more work. I guess it's not much of a holiday since nothing enters the public domain anymore anyway.

Anyways, 'gem install quantity' and have fun.

Is W3C going the wrong direction with SPARQL 1.1?

The W3C SPARQL working group (previously the Data Access Working Group) has recently released their first versions of the updated SPARQL standards, or SPARQL 1.1. The group's roadmap has these finalized a year from now, but they have asked for comments and I suppose these are mine.

I believe that these documents are a step further down a wrong path for SPARQL and, to a lesser degree, for RDF in general.

The latest round of changes includes a number of changes to SPARQL, including aggregate functions, subqueries, projection expressions, negations, updates and deletions, more specific HTTP protocol bindings, service discovery, entailment regimes, and a RESTful protocol for managing RDF graphs (the last one is not really just SPARQL, but it's in the updates).

So I'll start with my comments, which are mostly critical.

To start, an RDF-specific complaint, not really related to the rest of the post. Why would the one mandated format to be supported in the new RESTful RDF graph management interface be RDF/XML? What would it take for a the semweb community to move on from this failed standard, which has had known issues for more than 5 years? (those two issues were raised in 2001 and are currently marked 'postponed') Why should such an increasingly irrelevant standard as RDF/XML be chosen instead of the widely-supported and easy to implement N3, N-Triples, or Turtle?

As for SPARQL, the 1.1 standards continue to give named graphs first class citizen status, both in the web APIs and in more SPARQL syntax than they had before. It's not so much triples as quads these days. Other meta-metadata, such as time of assertion or validity time, are not covered. While named graphs are admittedly a particularly often-found case, why does it need to invade the syntax of SPARQL? Not every use case needs named graphs, but every SPARQL implementor must support them. The 1.1 standard now includes precedence rules when for named graph and base URIs when they conflict in HTTP query options and inside the query itself, attempting to solve this self-created problem.

How about subqueries? What about variables during insertions? What about subqueries during insertions? Do we really need implementors to consider these kinds of things for every SPARQL endpoint on the web?

None of these things is really all that bad by itself, but one must consider the bigger picture. SPARQL 1.0 was released in January of 2008 (with some comment period before that) and there is still no implementation of a SPARQL engine in PHP or Ruby (exceptions apply, see [1]). One does not increase the participation of that ecosystem by adding a selection of entailment regimes to the standard.

While a SPARQL implementation exists for the excellent RDFLib in Python, it's only one of the current big 3 (with Ruby and PHP) in web development, and there's only one. The fact that no SPARQL engines exist for Ruby or PHP should be considered a failure of the standard. Why are we adding complexity when there is no SQLite for SPARQL? Why are there at least 3 monolithic Java implementations (Jena, Sesame, Boca), all financially sponsored to some degree or another, but so little 'in the wild'? How long can RDFLib herd 16 cats as committers on the project? While I don't have a lot of direct experience with RDFLib, I pity the project 'leads' (I cannot find evidence that the project is sponsored or that anyone is 'in charge') trying to look towards the future of implementing 6 working papers of new standards.

One of the biggest success stories for semweb in widespread use is the Drupal RDF module, which has found wide acceptance in the Drupal community and started an ecosystem of modules. Drupal 7 will output RDFa by default and Drupal 6 supports a ton of wonderful features, including reversing the RSS 1.0 to 2.0 downgrade back to RDF. But Drupal remains a producer of simple triples and a consumer of SPARQL queries generated by other endpoints. Data in those sites remains locked down. Why? Because implementing SPARQL in PHP is nontrivial, and in a chicken-egg problem, nobody's paying for it before someone has a need for SPARQL.

I could go on, but these are symptoms (well, not that RDF/XML thing, I don't think there's a good reason for that). I feel that the working group is attempting to solve the wrong problem. Namely, it is attempting to define a somewhat-human-readable query language, SPARQL that works for almost all use cases. But why must the whole 'kitchen sink' be well-defined? Such a standards body should be attempting to define the easiest possible thing to implement and extend, not the the last tool anyone would ever use.

The SPARQL 1.0 standard's grammar was well-defined as a context free grammar. It also had extension functions, which were uniquely defined by URIs. Why the distinction between CFG elements and extension functions? Why not make syntax elements like named graphs and aggregate functions as discoverable as extensions? Well, the reason is that it's hard to write a parser of a human-readable format and make those things optional and discoverable. (Here's a SPARQL parser implementation in Scala, a language with powerful pattern matching features for good parsing, and it's 500 lines of code. It compiles to S-expressions, the parsing of which is about 30 lines. Hmm.)

If the protocol had been defined as S-expressions, the distinction would not exist and the syntax could be as expandable as the current functions (the current syntax would just be more functions). The new 1.1 service discovery mechanism is excellent and extendible and would allow the standard to grow dynamically instead of becoming bogged down in features for particular use cases. New baseline implementations of SPARQL would be easy to implement and grow incrementally, and the current human-readable format can be implemented in terms of these expressions.

The web of ontologies has grown with ad-hoc definitions created by people used to fill their needs. Standards grow organically around the ones that are needed most, others languish. Why should SPARQL functions have this kind of flexibility, but not the syntax? The distinction makes implementation overly difficult and is slowing the expansion of the Semantic Web.

In fact, it turns out that Jena has been parsing to S-expressions for some time. If you're an implementor, why would you do it any other way, especially when the standard can change as much as it does in 1.1? Any implementation will have to come up with something equivalent to S-expressions if you are going to be able to upgrade your engine implementation to meet standards like this when they are finalized. If people are doing it anyway, why not just make it the standard?

The SPARQL Working Group should be working on a definition for a function list and discovery protocol for S-expressions, and not for what we currently call SPARQL. What we call SPARQL is something that should compile to a simpler standard if various vendors want to implement it. S-expressions allow maximally simple parsing maximally simple serialization, and the ability to do feature discovery on core features of the language, not just portions which are blessed with the ability to be extended. S-expressions are easier for machines to generate for wide variety of automated use cases, far wider, I would venture, than the set of use cases for the human-readable queries.

Please, please, please do not doom the world to write the SPARQL equivalent of SQLAlchemy and ActiveRecord for the next 20 years! We can define a standard that machines can use natively. Now's the time.

At any rate, that's my beef in a nutshell. The working group won't come up with a successful standard until it's easy enough to implement it that workable implementations appear in the languages that are defining the web today. And when people can use those languages to implement that standard without an army of VC-funded engineers.

The SPARQL 1.1 proposals make the standard better than before, but it's not the standard we need. The SPARQL algebra is what needed expansion and specification, not the syntax.

[1]: The PHP ARC project has an implementation, but it attempts to directly convert SPARQL to an SQL query on particular table layout in MySQL, and is difficult to convert to general use. Despite SPARQL's complexity, ARC managed to implement this in just 6400 lines of code. The parser alone is 2000 lines and the engine another 4400. The serialization/parsing libraries, however, are fine, and were integrated successfully into the Drupal RDF module. The PHP RAP project has also done some good work and is perhaps more wrappable than ARC, but implements only a subset of SPARQL.

Syndicate content