Wednesday, June 17, 2009

Webmachine Tutorial: File Uploads

I've vented about the client side of HTTP file uploads in my post below, but once you've got the client side down, you still have to deal with a messy multipart/form-data request body on the server. Fortunately, version 1.3 of Basho's homegrown and production-proven HTTP toolkit, Webmachine, has just been released, with support for parsing multipart bodies.

I've checked in a fully-functional file-upload demo to BitBucket here, and I'll go through the interesting bits of using webmachine_multipart in this post.

Before parsing a multipart body, you need to parse out the browser-generated part separator, or "boundary", from the Content-Type header.

ContentType = wrq:get_req_header("content-type", ReqData),
Boundary = string:substr(ContentType, string:str(ContentType, "boundary=")
+ length("boundary=")),

Once you have the boundary string, Webmachine provides two options for parsing the actual request body. The first, webmachine_multipart:get_all_parts/2 parses an entire body and returns a structure with the body metadata and content. The second option, which I used here, uses the new-in-1.3 streaming body API to read data in chunks off the wire. This API returns a function that allows you to read successive elements of the multipart body lazily. Here's the relevant usage of this API in the file upload example (called from process_post/2):

%% (from process_post/2...)
{FileName, FileSize, Content} =
get_streamed_body(webmachine_multipart:stream_parts(
wrq:stream_req_body(ReqData, 1024),
Boundary), [],[]),
....

get_streamed_body(done_parts, FileName, Acc) ->
Bin = iolist_to_binary(lists:reverse(Acc)),
{FileName, size(Bin)/1024.0, Bin};
get_streamed_body({{"filedata", {Params, _Hdrs}, Content}, Next}, Props, Acc) ->
FileName = binary_to_list(proplists:get_value(<<"filename">>, Params)),
get_streamed_body(Next(),[FileName|Props],[Content|Acc]).
The function takes the result of wrq:stream_req_body.2 and returns a 3-tuple of {FileName::string(), FileSize::float(), Bin::binary()} describing the uploaded file and it's contents.

With that data, once can write the uploaded file to a disk and return a new page indicating success, along with the filename and file size, which is what the rest of process_post in my example does.

For completeness, here's the entire process_post/2 function:

process_post(ReqData, C) ->
ContentType = wrq:get_req_header("content-type", ReqData),
Boundary = string:substr(ContentType, string:str(ContentType, "boundary=")
+ length("boundary=")),
{FileName, FileSize, Content} = get_streamed_body(
webmachine_multipart:stream_parts(
wrq:stream_req_body(ReqData, 1024),
Boundary), [],[]),
StorePath = filename:join([?STORE_PATH,"/", FileName]),
filelib:ensure_dir(StorePath),
file:write_file(StorePath, Content),
NewRD = wrq:append_to_response_body(
io_lib:format(
"<html><head><title>Upload Complete</title></head>"
"<body><h1>Upload Complete</h1>"
"<p>Received file ~s (~.2fK)</p></body></html>",
[filename:basename(StorePath), FileSize]), ReqData),
{true, NewRD, C}.
The entire source for the file upload resource can be found here. It's 42 lines total, including boilerplate and supporting HTML.

Happy Uploading!

Tuesday, June 16, 2009

HTTP File Upload Woes

I recently implemented support for email attachments in our sales prospecting application, Basho Open, and learned how painful file uploads on the Web can be. I've been on the server side since 2005, so this was my first step back into the HTML/CSS/Javascript waters in a long time.

Using only HTML and HTTP (i.e. no Flash), you're required to use a form and a file input element. This means native controls, styling nightmares, no progress reporting, and iframe hacks if you want to support uploads without page reloads. Since our user base supports Flash, I broadened my search for solutions. (For the gory details the client side issues of file uploads, see here).

We use jQuery, and a quick search for "jQuery file upload plugin" yielded Uploadify, which had a slick demo and seemed like it would meet my needs. Slightly annoying was the fact that Uploadify assumes you're using PHP on the server side, with the client-side JS code expecting certain responses that I had to reverse-engineer to make it work with Webmachine. No big deal, until I tested it on our staging cluster. Turns out Flash doesn't like wildcard SSL certs, which we use. Showstopper.

With about a day left to complete the feature, I went back to a pure-HTML+HTTP solution. I searched around a bit, and found a couple of valuable resources:

  • Jack Born's Multiple File Upload Magic With Unobtrusive Javascript was very informative, but the solution he outlines requires a page reload to actually perform the upload, which due to the nature of our app wasn't acceptable.

  • This page, which outlines the invisible-iframe technique I ultimately ended up adapting and using.

Even with the deadline looming, I managed to pull off HTML/HTTP-only solution and gussy it up enough to not be completely embarrassing. After some cleanup, I hope to release my final solution as a jQuery plugin. Some browser inconsistencies did pop up, namely:

  • Firefox populates the value of the <input> element with the basename of the file, while IE provides the entire client-side path to the file, complicating display issues.

  • Firefox also provides the size of the input file on the client side, allowing the developer to perform max-size checks on the client before a user tries to upload a larger-than-allowed file.


And that's just dealing with the client side. Next post: Easy (server-side) file upload processing with Webmachine.

Wednesday, June 10, 2009

Justin Sheehy : REST as a business advantage

Great post by Justin Sheehy, a co-worker of mine at Basho , on how the virtues of REST aren't just wonky idealism - in fact, following RESTful principles can give your business a competitive advantage.

His post highlights the Jigsaw HTTP API, which was a joy to work with, compared to some of the clunkier SOAP APIs out there.

Meta - I'm starting this blog up again, and will be posting mostly about Erlang, Webmachine, the parts of Java that don't make me cry, distributed systems, startup life, and other fun stuff. It's good to be back!

Saturday, September 27, 2008

Can't stop watching

Thursday, September 04, 2008

Seeking Web Developers!

Job description follows. Send CVs/Resumes/Portfolios to me.


Position: Web Interface Developer
Location: Cambridge MA

Everybody has CSS and HTML on their resume. I want someone that can really make them sing. We have a team of top-notch engineers building a groundbreaking Web application and we want another browser UI star to join that team.

You should have the proven ability to develop interactive web applications that follow the most important rule of design: design for the users. We are seeking someone with the ability to improvise and innovate, and whose creations surprise and delight.

We would be extremely surprised if the person we want didn't have:

  • Experience creating user-centric and standards-friendly web applications
  • 3+ years of dedicated web application development
  • Proficiency with all the following: HTML, CSS, JavaScript, AJAX
  • Experience developing in open source environment
  • Experience working well with a fast-moving team


People that happen to have experience with Erlang, Python, or Flash will certainly start on the right foot.

If you think you'd be a good fit, please send a resume and a portfolio or work sample, and we'll get back to you quickly. You must already be authorized to live and work in the US, and you must want to produce truly amazing Web applications.

Monday, November 19, 2007

Back to Boston

Moving back East. Details soon.

Monday, July 30, 2007

py2dot

I've been getting a bunch of emails asking about the status of some of my old projects . The bad news is a lot of them were lost in a catastrophic server failure a while back, but I've managed to dig up a few of them. A lot of people asked for the py2dot script, which reads takes Python source code as input and outputs a .dot file (openable with GraphViz) that shows a graphical representation of the input AST.

Anyways, here it is: py2dot.py