RDFa video from Drupalcon

scor did the hard work on putting together a presentation on RDFa for the keynote at DCDC. Just like last year, I did the legwork on the screencast while smarter, more productive people did the hard parts. It would seem that I am the voice of RDF in Drupal.

But alas, due to timeframe it was not to be. Fortunately, Boris played a version during his RDFa presentation at Drupalcon DC 2009, and scor ran two BoFs which I couldn't really participate in, having to give Openband's presentation on Thursday and being cut down by food poisoning on Friday.

Now scor did a writeup on the semweb group, and Dries noted it on his blog. I won't belabor the point when others have already worded it better by reiterating it all here, but the more exposure RDFa gets, the better! More credits for the work are in both of those links. You can view the video in the above links flashy style, and I've also put up the higher-res original here.

Import/Export BoF at Drupalcon 2009 DC

My food poisoning at DCDC (don't eat rare meat in DC!) abated just long enough for me to attend an excellent BoF towards the end of Friday at Drupalcon DC 2009. The goal was workable configuration management in Drupal, and in no particular order, attending were:

In terms of configuration management, having these folks in a room is a bit like the G7, but with Voltron attending.

In addition, one or two more interested parties showed up, including myself (I've tried to limit rattling off names to people who contributed to the projects listed; if I missed one, sorry). Openband is trying to stand up our stack with more than 200 modules, and configuration management is probably our biggest challenge right now. We'll be applying some resources to the problem and want to make sure that whatever solution happens is one that provides us a workable upgrade path for the future.

Probably the biggest trick about all of this is separating what's needed from the use cases. It's difficult to separate what really needs to happen from the end result, and the community is so quick to jump on any potential solution to this widespread problem that it's hard to work on something that enables solutions without talking about solutions. The short version of what needs to happen is that 'each module needs to do its own part', but there's no straightforward answer.

Going over my notes, the discussion had two main themes: requirements and implementation.

Requirements

Context and Dependencies

The context module does something really sweet, in that it provides a 'context' for a page load or a feature (a 'space' from the spaces module is more or less a context definition for a feature). This is an important idea, in that it maps settings to functionality and not to modules. It was generally agreed that this idea needs to be incorporated in any end solution, but perhaps not agreed that it was core's job to deal with it. Along related lines, there was also some question on whether or not core should handle ideas like dependencies, and I believe the general consensus was no.

Roadmap

Most everyone agreed that nobody wants to throw D6 to the wolves, and that the solution should come in the form of a hook/hooks added to D6 in contrib which can hopefully be added to D7 core. In either case, anything that builds on these hooks is a contrib space thing--this hook needs to be implemented by core, but won't be called by it. So the degree to which D6/D7 get supported after the hook's implementation can be left up to whoever has the resources and inclination to do it.

Limitations

Related to roadmap, we need to be cognizant of what we're doing. It would be easy to replace a Data API by being too broad--the truth is that config and content have no inherent difference (what's a group, after all?). The Data API (or something else) should eventually take care not only of this problem, but of the content problem as well. Thus, this is not something that will live forever, nor is it something that will be perfect. It will pick some low-hanging fruit in the problem space, and do it broadly, but it's not going to be all things to all people. This is frustrating, as where to draw the line between 'config' and 'data' is something that varies from site to site and it's going to be a problem for people for whom the line is drawn just a tad bit off.

Implementation

Hooks in contrib6/core7, the rest in contrib

It was generally agreed upon is that there needs to be some kind of hook, or set of hooks, that *each module* can implement. The work needs to be associated with each module, so that it only need be done once. The exact nature of that hook has some argument left; there are two basic paths, outlined below. There's also consideration for a hook_default_config hook.

A self-consuming API?

One idea is that modules simply export a configuration and import it: a module can eat whatever it outputs. But this turns out to be a Godzilla task for most modules to implement. Forms API has a couple of layers of validation in a couple of places, some of which, such as what kind of data something is, or whether it's in a list of allowed values, is never implemented by a developer. Such a high-level import/export API would force modules to write specific validation for everything, and a lot of modules would need to refactor custom validation from form code into reusable code. But even that would not be enough--thanks to the magic of form_alter, modules do not even necessarily know what consists of a valid configuration for themselves! To match existing functionality 100%, we'd have to have a hook_export_alter and hook_import_alter to allow modules to clobber each other as much as they already do. Thus, not only would each module need to implement the hook, it would have to re-alter every module it already does. Fun stuff!

Is Forms API the API?

The only API that *every* drupal module supports is Forms API. Everyone's a bit nervous about this, since the Forms API does not technically support macros, and even if it did, it's not the best way to visualize configuration for a module; a pretty form does not automatically transfer to a data structure. But as things stand, this is the only API that is 100% compatible.

Current Implementations

Context / Spaces

Spaces is a module that provides context for a feature, with import/export functionality. I understood that it implements its config import/export mostly on its own, without using Forms API. Associating features with a context allows them to be quickly enabled or disabled, or for modules/themes to change their behavior based on the context , and probably more. I need to play with this more.

Patterns

Gravitek Labs have written Patterns, which convert snippets of YAML and XML into Form API calls. They have support for most of core, plus views and cck, and it's fairly trivial to write patterns for modules that don't support it by identifying fields in their forms.

They have also started some work on something called the Configuration Framework for D6, which appears to be a standardized way to write data for Forms API and some magic for processing it. It provides some hooks for modules to implement which are a bit like import/export, but designed to provide input to their forms. It's also got the idea that modules should be able to ship with their default config in a text file. It uses patterns as the representation of config, which means it can be XML or YAML (with more to come).

Deployment Framework

Greg Dunlap uses both methods (for some things he used drupal_execute, others not). Much of the deployment framework is solving another problem, however, content, and I think it was generally agreed that this system should not attempt to create a layer that would be used for passing around content. It's a significantly more complicated problem anyway, as Greg discovered when he added a lot of things I suspect the Data API folks will end up having to do anyway, in particular indexing content by both auto-incremented id's *and* unique identifiers.

All the rest

Other issues were mentioned, including, but not limited to, the D7 variables patch, and context and how important it is (solving, for example, the global uid problem),

At any rate, we agreed that the next phase of deliverables are:

  • An import-export API for discussion
  • A best-practices document to describe how to write exportable modules

Vendor fun and games with git

I spent some time today setting up the SCM and issue tracking for the seasteading website, having recently fooled them into thinking I'm qualified to administrate their site. This was the first time I put an existing project into git.

I was pretty disappointed with how it went. When I finally learned SCM, it was on subversion (I can't use CVS, and don't intend to learn). I've shoved a system down my developers' throats in which vendor code is saved off in vendor land, just like all vendor code is done in all subversion repos, and that works fine. I've also recently been using some git for some ruby stuff on the side, and I'm way impressed. The Github model, with public pushing and pulling, makes it so ridiculously easy to contribute back to a project it's almost easier to give a patch back than not to. I'm going to start ordering Github Kool-Aid by the case.

But today, the first thing I wanted to do was update the Drupal core, so it's time for vendor code. In subversion, I'd put both versions of Drupal (current and previous) and copy the changeset of the upgrade into my working copy:

$ cd vendor/drupal/core
$ svn copy 6.2 6.3
$ cd 6.3
$ 
$ svn ci -m "Update drupal core to 6.3"
Committed revision 5
$ cd ../../../trunk/drupal/core
$ svn merge -c 5 .

This copies the same set of changes from the vendor upgrade to the trunk upgrade. I can even generate changesets between whatever version I want:

$ cd trunk/drupal/core
$ svn merge http://svn/vendor/drupal/core/6.0 http://svn/vendor/drupal/6.4 .

Backwards merges (merge a changeset back):

$ cd trunk/drupal/core
$ svn merge -c -5 .

You get the idea. I can merge any changeset, or the difference between two versions of any two files at any revision, and apply it to any set of files that can accept that changeset; the original ancestry is irrelevant.

Git's not letting me do this. The folks on IRC are helpful, and I can do what I need, but I feel it's a lot more awkward. There's no way to create a changeset from the difference between two files at different places in the repo, and there's no way to apply a change to anything but the file in which said change was originally made.

In the git model, you would start from scratch, with core drupal, and commit it. Then you'd code away. When it comes time to update drupal, you branch and apply. If you've edited core, you branch from your very first commit--from naked drupal--and commit there. Then you merge back to your master and merge that changeset back.

It works pretty well, but what if I have a project that's already halfway done? I can do it backwards, by starting from Drupal, then exploding the new project on top of it, but that feels awkward to me. What I ended up doing was creating a branch, installing the base version of the version of Drupal I wanted to upgrade, committed, exploded the most recent version of Drupal, and committed. Then I can switch back to master and apply the difference between the two commits:

$ git checkout master
$ git branch drupal-core
$ git checkout drupal-core
$ 
$ git commit -a
$ 
$ git commit -a
$ git checkout master
$ git diff 89dca98 0f1ac4s | git-apply

This is uncomfortable to me; perhaps I'll get used to it. I don't really think it's any more or fewer steps than putting things in vendor, but when integrating several pieces of vendor code, I think this would get confusing. I don't like how vendor code has to exist in the exact same directory on the trunk branch as the branch where it's unedited. Especially considering that there is an excellent copy of Drupal on Github, it drives me nuts that there's no way to do this. Why can't I tell Github 'Give me the difference between Drupal 6.1 and Drupal 6.4'? Even if I download this repo, there's no way to do it; it's a different repository and I cannot find a way to copy the information into mine (I would not want to, anyway: the Drupal repo is some 22 megs).

All of that being said, I still prefer git. Now that I've climbed this little hurdle, it's completely appropriate for the project and hopefully we can make it public (waiting to hear back from the original devs about licensing) and get some fixes and all. But I'm now skeptical of git for a project that is mainly one of integration, which is my day job.

To be fair, this doesn't have to be a problem. This is as much a problem with PHP and/or Drupal as it is with git. Rails neatly solves the problem by having vendor code in /vendor right there in the live copy of the software, rather than only in the fantasy world of SCM. Code you write is here. Code you download is over there; it's like apartheid software development. Need to edit it? Re-open the classes and monkey patch it. Compare this with PHP, in which you re-open classes after typing ?> and tossing in some css style definitions in the middle of a constructor.

Git provides very nice little submodules; they work quite well if vendor code is completely separate from user code as in rails. They're useless for the usual PHP model of throwing everything in the main directory. Things like config.inc.php. Seriously, there are still projects with this model, Drupal included. Why? Why do Drupal sites live, from index.php, in ./sites/sitename? Why is there any sharing of directory space at all? I'm continually amazed that something as genuinely useful as Drupal comes from such miserable beginnings.

I hope I'm missing something obvious, and I'll bet that 10 seconds after I post this, someone is going to tell me how to do what I want. But oh well--such embarrassment is occasionally the price of knowledge.

Syndicate content