Setting up your system for file conversions with File Framework

An important requirement of the platform for SPAWAR (for whom my employer, Openband, works) is a set of full-featured file functionality. Our solution to that is Miglius' File Framework, which is an exceedingly powerful solution for handling files on a Drupal 6 site. It reqires Arto's RDF Framework, which means it's easy for other modules to interact with it, and has a ton of other features, but the one I'm writing about today is the file conversion bits. Out of the box, FF supports logic for tons of conversion paths, so anything your users upload is downloadable as a preview, as a full file, playable in the browser, viewable as html, whatever. Importantly, it uses OpenOffice to do all of those nifty office format conversions: viewing Word documents in the browser, inline, is exceedingly useful.

It'd be silly to write that much conversion code yourself, and naturally we didn't want to reinvent the wheel. But that means there's a ton of software to install to make all of this work, from PHP 5.2 to JOD Converter, and that's what this post is about.

Adventures in packet-munging with Telefonica

Good thing that so many ISPs are sullying their names lately, or I might have spent more time on this one.

One of our developers recently moved to Spain to work with some of our contractors for a few months, and he immediately had problems. He was having problems submitting certain forms on certain pages to a particular machine. There’s got to be a server issue when both phpmyadmin and the application using that database are failing, and for everyone at the office. They had a connection with Telefonica, the local Spanish behemoth.

We played with it for a while. We looked at error logs, we tried other forms on other websites (another developer with our contractor could not log in to Rubyforge), and we had some other musings, but none of it went anywhere. I even spent a while messing with bandwidth measurements, since a TCP accelerator problem with one of our satellites in Africa caused very similar issues a few months back. Eventually, I took a tcpdump at the same time he did and got some strange results.

Developer’s results:

13:24:33.798068 IP x.x.x.x.local.55864 > y.y.y.y.http: . ack 1 win 65535
13:24:33.798606 IP x.x.x.x.local.55864 > y.y.y.y.http: P 1:678(677) ack 1 win 65535
13:24:34.323942 IP y.y.y.y.http > x.x.x.x.local.55864: . ack 678 win 65535
13:24:34.324056 IP x.x.x.x.local.55864 > y.y.y.y.http: . 678:2138(1460) ack 1 win 65535
13:24:34.493515 IP y.y.y.y.http > x.x.x.x.local.55864: . ack 678 win 6579
13:24:34.957342 IP y.y.y.y.http > x.x.x.x.local.55864: . ack 678 win 6579
13:24:35.312801 IP x.x.x.x.local.55864 > y.y.y.y.http: . 678:2138(1460) ack 1 win 65535
13:24:35.944393 IP y.y.y.y.http > x.x.x.x.local.55864: . ack 678 win 6579
13:24:37.325466 IP x.x.x.x.local.55864 > y.y.y.y.http: . 678:2138(1460) ack 1 win 65535

My results:

13:24:34.039365 IP x.x.x.x.21483 > y.y.y.y.www: . ack 1 win 5840
13:24:34.044362 IP x.x.x.x.21483 > y.y.y.y.www: . 1:732(731) ack 1 win 65535
13:24:34.044378 IP y.y.y.y.www > x.x.x.x.21483: . ack 732 win 6579
13:24:34.494733 IP x.x.x.x.21483 > y.y.y.y.www: . 2126:2192(66) ack 1 win 65535
13:24:34.494745 IP y.y.y.y.www > x.x.x.x.21483: . ack 732 win 6579
13:24:35.494664 IP x.x.x.x.21483 > y.y.y.y.www: . 2126:2192(66) ack 1 win 65535
13:24:35.494670 IP y.y.y.y.www > x.x.x.x.21483: . ack 732 win 6579

Huh! Why are the packet sizes changing by the time it gets to me from him? This is where I can thank our friendly ISP’s for their incompetence in keeping their packet-adjusting programs transparent: I would never have guessed, even 6 months ago, that ISPs would so blatantly screw with their customer’s packets just to have a story to tell at the pub later. Thanks to recent highly publicized displays of unconscionable behavior by ISP’s, I needn’t convince myself that the rest of the world is wrong, and I have a router problem or something. ISP’s really are that bad.

Fortunately, our developers’ office was switching from Telefonica to Jazztel the next day, for a bandwidth upgrade. We could see if that fixed the problem.

So what happened the next day?

Developer’s results:

11:46:15.199847 IP x.x.x.x.local.63182 > y.y.y.y.http: S 4092983779:4092983779(0) win 65535 <mss 1460,nop,wscale 3,nop,nop,timestamp 47429206 0,sackOK,eol>
11:46:15.339426 IP y.y.y.y.http > x.x.x.x.local.63182: S 1014263802:1014263802(0) ack 4092983780 win 5792 <mss 1380,sackOK,timestamp 663357440 47429206,nop,wscale 7>
11:46:15.339497 IP x.x.x.x.local.63182 > y.y.y.y.http: . ack 1 win 65535 <nop,nop,timestamp 47429207 663357440>
11:46:15.339813 IP x.x.x.x.local.63182 > y.y.y.y.http: P 1:678(677) ack 1 win 65535 <nop,nop,timestamp 47429207 663357440>
11:46:15.484783 IP y.y.y.y.http > x.x.x.x.local.63182: . ack 678 win 56 <nop,nop,timestamp 663357478 47429207>
11:46:15.484855 IP x.x.x.x.local.63182 > y.y.y.y.http: . 678:2046(1368) ack 1 win 65535 <nop,nop,timestamp 47429209 663357478>
11:46:15.639983 IP y.y.y.y.http > x.x.x.x.local.63182: . ack 2046 win 79 <nop,nop,timestamp 663357517 47429209>
11:46:15.640083 IP x.x.x.x.local.63182 > y.y.y.y.http: . 2046:3414(1368) ack 1 win 65535 <nop,nop,timestamp 47429210 663357517>
11:46:15.640129 IP x.x.x.x.local.63182 > y.y.y.y.http: . 3414:4782(1368) ack 1 win 65535 <nop,nop,timestamp 47429210 663357517>
11:46:15.799374 IP y.y.y.y.http > x.x.x.x.local.63182: . ack 3414 win 102 <nop,nop,timestamp 663357555 47429210>
11:46:15.799473 IP x.x.x.x.local.63182 > y.y.y.y.http: . 4782:6150(1368) ack 1 win 65535 <nop,nop,timestamp 47429212 663357555>
11:46:15.799511 IP x.x.x.x.local.63182 > y.y.y.y.http: P 6150:6209(59) ack 1 win 65535 <nop,nop,timestamp 47429212 663357555>
11:46:15.819146 IP y.y.y.y.http > x.x.x.x.local.63182: . ack 4782 win 124 <nop,nop,timestamp 663357560

My results:

11:46:15.272449 IP z.z.z.z.63182 > y.y.y.y.www: S 1042651241:1042651241(0) win 65535 <mss 1380,nop,wscale 3,nop,nop,timestamp 47429206 0,sackOK,eol>
11:46:15.272466 IP y.y.y.y.www > z.z.z.z.63182: S 37547187:37547187(0) ack 1042651242 win 5792 <mss 1460,sackOK,timestamp 663357440 47429206,nop,wscale 7>
11:46:15.408746 IP z.z.z.z.63182 > y.y.y.y.www: . ack 1 win 65535 <nop,nop,timestamp 47429207 663357440>
11:46:15.421489 IP z.z.z.z.63182 > y.y.y.y.www: P 1:678(677) ack 1 win 65535 <nop,nop,timestamp 47429207 663357440>
11:46:15.421505 IP y.y.y.y.www > z.z.z.z.63182: . ack 678 win 56 <nop,nop,timestamp 663357478 47429207>
11:46:15.578903 IP z.z.z.z.63182 > y.y.y.y.www: . 678:2046(1368) ack 1 win 65535 <nop,nop,timestamp 47429209 663357478>
11:46:15.578914 IP y.y.y.y.www > z.z.z.z.63182: . ack 2046 win 79 <nop,nop,timestamp 663357517 47429209>
11:46:15.731438 IP z.z.z.z.63182 > y.y.y.y.www: . 2046:3414(1368) ack 1 win 65535 <nop,nop,timestamp 47429210 663357517>
11:46:15.731444 IP y.y.y.y.www > z.z.z.z.63182: . ack 3414 win 102 <nop,nop,timestamp 663357555 47429210>
11:46:15.751551 IP z.z.z.z.63182 > y.y.y.y.www: . 3414:4782(1368) ack 1 win 65535 <nop,nop,timestamp 47429210 663357517>
11:46:15.751557 IP y.y.y.y.www > z.z.z.z.63182: . ack 4782 win 124 <nop,nop,timestamp 663357560 47429210>
11:46:15.892222 IP z.z.z.z.63182 > y.y.y.y.www: . 4782:6150(1368) ack 1 win 65535 <nop,nop,timestamp 47429212 663357555>

What a shocker—fairly damning proof of another ISP running some sort of ridiculous filtering/inspection/shaping/proxying software. Except this stuff isn’t even good enough to be invisible: it couldn’t even put the packets back together properly. My day is basically spent trying to define <=> across the set of priorities “A”, “Top”, “1”, “A1”, “1A”, “First”, “Primary”, and “Pretty Important”, and this stuff makes me exceedingly frustrated. Where can I send an invoice for my time to companies that make me waste 2 hours to discover that we’re not even dealing with IP anymore, but some embraced and extended version of it?

I hope that one day it will be profitable enough to run a company that actually provides internet service instead of an ad pipeline. I know I’d pay more. In the meantime, nothing to do but grind our teeth until ad replacement by ISP’s makes the internet no longer profitable.

The Drupal-Based Toolchain

I made a choice to go into system administration about a year ago, after a hiatus from web development. I’m getting back into coding again in a hurry, and rebuilding my toolchain is a pretty painful process. The biggest part of that toolchain has been Drupal 6, and it’s both a wonderful revelation and a hugely frustrating experience.

This will make more sense with some background. I stopped doing web development in ‘03, after using the Perl on Rails framework, which is exactly what we did not call the combination of Perl 5, Class::DBI, Template Toolkit, and CGI::Application those days, long before Rails was a blip on the radar. I got out of it because I saw I wasn’t capable of doing bigger projects alone, not with those tools, and I supported myself on making minor updates to other applications and the occasional small-scale bookkeeping app. When I came back to trying to find some full-time work, I went for system administration, which involves more interaction with people, or at least it used to—coding is much more social than it used to be.

I lucked into a job running one of the most complicated Drupal installations out there, with some 600k lines of code across 130 modules. I work with some terribly smart people, and we’re doing some very cool stuff. It took me quite some time to start getting into the Drupal stuff itself, but once I started to figure it all out, it clicked pretty well. It’s a welcome kind of uncomfortable not to be the smartest person in the room.

My previous toolchain consisted of some Perl’s copious libraries, the windows ports of MySQL and Apache, cygwin, and vi. In hindsight, this was childish, pathetic, comical, the kind of toolchain that wears clown shoes. It still let me do an awful lot, compared to the scripts I was writing in ‘99 that amounted to little more than CGI to SQL wrappers, but it was clear to me how limited they were.

What a difference a few years makes: now I’m on a Macbook Pro with a ton of handy Macports, a native set of system tools equivalent to or better than the Linux equivalents I’m used to, and tons of useful apps, all set at the low, low price of $20-$40 apiece, because the cult of Mac is a true religion, complete with tithes. I’m finally getting practice with source control in the kind of complicated environment where it matters, and the devs around me know a ton of nifty tricks and tools. All of that stuff is a vast improvement, but the single biggest change to the toolchain is Drupal.

Before I get into what I like and don’t like, there’s a question about how one should see Drupal: as part of a toolchain, or as an end-user piece of software. It’s designed, from the ground up, to let people start with nothing and end with a website, so perhaps considering it just a link in a chain is inappropriate, but I don’t think so. No website, no matter how little code is involved, should be considered to exist in a vacuum, that’s not how the web works today. Each site should be considered a node on a graph in addition to a site in and of itself, just like a library written for a project should be considered in the scope of reusability. In this respect, I’m weighing Drupal against web frameworks like Ruby on Rails or Django as opposed to other CMS’s, such as that recurring villain of the Drupal graphic novel, Wordpress.

At any rate, wow oh wow, what an improvement! I really used to write menu templates that had to check for what page was currently being loaded and make that option have ‘current page’ css? Really? Write SQL for anything that required a join and an order or limit at the same time? Really? Write out form HTML? Hand-write javascript for the most basic form validation? Write templates at all? Drupal lets me worry about coding and not display, and that makes me actually interested in writing software again. It’s an amazing tool, and as the browser becomes a platform, it’s great to have so many of the basic tools in the web designer’s toolkit be done better than most desktop client frameworks do desktop.

So the move to Drupal really is a groundbreaker for me, and now I’m busy with web development and writing silly blogs when I get home from busy days of the kinds of things sysadmins fill their days with. Unfortunately, I still see a lot of problems. The gains make it all very worth worthwhile, but some of the parts of Drupal are exceedingly frustrating when it’s seen as a strong link in a toolchain and not an end-all, be-all website.

The first place this toolchain could be improved is an automated module install system. There are about 40 competing solutions, but on the whole, it’s ridiculous that I can’t tell my website to install the google analytics module into itself. The Drupal modules site doesn’t really help a newbie find the kinds of modules that every site should have, things like pathauto and google analytics. It’s a library distribution system that’s technologically behind pear, ruby gems, easy install, and even behind CPAN, which is about a decade old. There’s a lot of competing solutions to this problem, and hopefully one emerges as a workable base soon.

I find that some parts of the system are configured in strange places. Blocks, for example, are edited in a special area for blocks. That’s appropriate. The location a block appears on, including page-by-page exceptions, is edited with the content of the block. Meanwhile, the node’s own page selects menus, URL aliases, and publishing options, just inline options if the viewer is an administrator. While the ability to make a block display based on some PHP code is perfect sometimes, other times, it’s more maintainable for a page to control what blocks it has, and not the other way around. Besides, it’s rarely appropriate to give a non-administrator PHP rights. Breaking up the configuration of a page like this makes it harder to reuse any special configuration or code that particular page might have. I think a few instances of this kind of change might go a long way towards maintainability.

Hand in hand with maintainability is resuability. I have a lot of experience with this as a Drupal sysadmin. All of those great CCK types and views and whatnot require significant overhead to export and import properly, and keeping them in version control is even more difficult. It’s a huge problem when you manage as many sites as we do, and lots of people are trying to solve it: there’s a lot of good work going on in the change management group, tools like CoCKtail solve the problem by letting CCK types be a simple bit of text, and the ephemeral autopilot threatened to put the entire databases under version control to deal with the problem.

That problem looks like it’s being solved: good. It’s a shame that so much work is being spent on the simple problem that Drupal litters the definition of a site across code, a database, and a filesystem, and that it does it in a more or less unrepeatable way. Point to a particular file in a site’s files directory: is it in use, or is it some poor lost inode, left adrift from 2 upgrades ago? Best just to leave it be.

None of these problems are terribly hard to solve for D7—it’s just a point of view change. And really, at the end of the day, that’s a long blog post for a bunch of problems that aren’t going to keep me from picking something else, so let’s not assume anyone’s about to abandon all of those useful contributed modules in order to do something drastic. Let’s just keep an open mind.

Stage Fright under the Covers

We finished the video for Dries’ keynote just under the wire, as pretty much all such events need to be. Arto, Miglius and I had stayed up until past sunup for the last few days to make it happen. First Dan left, and then Miglius left on Saturday morning so that he could get stuck in Frankfurt for 24 hours. Once he got in to Boston, he logged on quick like a bunny and went back at it. Arto and I worked another 30-odd hours during Saturday and Sunday. Sometime during Monday, which I largely slept through, some of the office folk sent out a message noting that that our pile of pizza remains, chicken bones and coffee stains was not particularly helpful to the kitchen’s ambiance. I don’t think they know who did it, and I’m kind of afraid to fess up. Sorry, ladies.

Unlike most demo work, a ton of what went into this will be useful later. If our organizations were not keen on using RDF, we’d not have worked on this so hard. Arto’s module stuff is anything but smoke and mirrors, and we figured out a lot of limitations to Exhibit and Potluck that will be important to understand later. These are now posted in our internal wiki and I will go and post them on the Simile project’s site if I ever get a chance. It’s worth a whole post in and of itself.

While Arto busied himself turning Drupal into the world’s easiest to use RDF endpoint, Miglius and I combed datasets that would make for a decent demo and messed with Exhibit views. There’s a lot of RDF data out there, but it doesn’t all lend itself to being shown on a map, and people can only read so much on a video screen during a presentation. At the end of the day, I’m the only one with Leopard (and thus Screenflow), so I ended up doing the actual screencast.

Screencasting is an interesting thing. It’s easier to script than a regular movie, but difficult to properly realize. There’s a fine line between too little and too much data, without having awkward pauses and without skipping over too much. You have to take into account that different viewers have different levels of experience with the material, different reading speeds, whatever. I made a detailed narration that was a bit too fast paced for the keynote; that wasn’t a problem, as Dries had already communicated that he’d prefer to do the narration himself.

On Monday, Arto and I woke up about a half hour before the talk and got on IM. As the talk began, we realized that we really needed to have this data up where people could get it. And we really wanted them to be able to get it—we’d worked ridiculous hours on this thing. So that’s when we decided the site needed to be public.

We started to make that happen. There was a fair bit of configuration to be done to make it useful; Arto got the video onto s3 while I messed about with some permissions and redirects. I typoed just about everything I did related to that—I don’t think I did a single thing once. Halfway through the whole thing I realized I had stage fright; I couldn’t type because my hands were shaking. The video I had worked so hard on was about to be placed up to awe or bore a sizable number of people, on whom much depends. And there was still a possibility that Dries would use my narration, in my mind, as we’d given him the final cut of the video with extremely little time to rehearse anything he wanted to say. So there I was, still in bed, with the door shut and the window blocking out what passes for sunshine in Stuttgart, and I was nervous as hell about being up in front of a crowd.

Stage frightened of nobody at all. What a cool world we live in, that such a feeling can now be transferred over the wire.

Anyways, we did a good job (well, mostly Arto did a good job) of getting the video out there for anyone who wanted it, and at least a couple of people did. Here’s another copy, if you’re curious:

Another web site

Here we go: another attempt at a website for me. I’ve had brief flirts with sundry websites in the past, but I think this one is The One. I’m committed.

There’s a couple of reasons that I think this one will be it. The biggest one is easily that I finally have a reason to have a website. I went ahead and accidentally became tolerable at what I do, and I thus have a pile of things that other people might actually be interested in. In addition to some of my pontificating, I’ll be posting those useful tidbits. When I have enough content, I’ll even add some fancy menus and whatnot, and maybe even a tab or two. For the record, web design is not my deal.

The second reason this site might be here to stay might stick is the name. I’ve gone through a long string of internet aliases in my time, as have most folks my age. Most of us had a period of paranoia about our real names being leaked to the internet, but such days are over for me, and it’s time to skip all of that. The problem with said thing is that while I’m not ashamed of my name, Lavender, it doesn’t make a great URL (although my dad’s own LavenderInk site for his publishing stuff works out. There’s a story here about bad experiences with folks misconstruing computer-assigned usernames of ‘lavenderben’ as a username that was chosen as opposed to assigned, a misconception which can, and has, led to significant misunderstandings for native speakers of English. As a small comfort, such connotations are almost always lost on my sundry European friends.

Forgoing Lavender, and being lost in the noise on the ‘Ben’ front, I needed a username one day and came up with bhuga, which is properly understood of as the first half of ‘booga wooga’, but with a more civilized spelling. Rarely capitalized, it captures the savage quickness with which I always make decisions that come back to haunt me, such as self-identification. Fortunately, the other such savages on the web are few and far between. For reference: me, me , me, not me,and not me. The final form of my name is not yet decided. Ben? Bhuga? Ben Bhuga? All of them work. It’s hard to say. I’ll figure something out.

Syndicate content