Wednesday, June 17, 2009

Webmachine Tutorial: File Uploads

I've vented about the client side of HTTP file uploads in my post below, but once you've got the client side down, you still have to deal with a messy multipart/form-data request body on the server. Fortunately, version 1.3 of Basho's homegrown and production-proven HTTP toolkit, Webmachine, has just been released, with support for parsing multipart bodies.

I've checked in a fully-functional file-upload demo to BitBucket here, and I'll go through the interesting bits of using webmachine_multipart in this post.

Before parsing a multipart body, you need to parse out the browser-generated part separator, or "boundary", from the Content-Type header.

ContentType = wrq:get_req_header("content-type", ReqData),
Boundary = string:substr(ContentType, string:str(ContentType, "boundary=")
+ length("boundary=")),

Once you have the boundary string, Webmachine provides two options for parsing the actual request body. The first, webmachine_multipart:get_all_parts/2 parses an entire body and returns a structure with the body metadata and content. The second option, which I used here, uses the new-in-1.3 streaming body API to read data in chunks off the wire. This API returns a function that allows you to read successive elements of the multipart body lazily. Here's the relevant usage of this API in the file upload example (called from process_post/2):

%% (from process_post/2...)
{FileName, FileSize, Content} =
get_streamed_body(webmachine_multipart:stream_parts(
wrq:stream_req_body(ReqData, 1024),
Boundary), [],[]),
....

get_streamed_body(done_parts, FileName, Acc) ->
Bin = iolist_to_binary(lists:reverse(Acc)),
{FileName, size(Bin)/1024.0, Bin};
get_streamed_body({{"filedata", {Params, _Hdrs}, Content}, Next}, Props, Acc) ->
FileName = binary_to_list(proplists:get_value(<<"filename">>, Params)),
get_streamed_body(Next(),[FileName|Props],[Content|Acc]).
The function takes the result of wrq:stream_req_body.2 and returns a 3-tuple of {FileName::string(), FileSize::float(), Bin::binary()} describing the uploaded file and it's contents.

With that data, once can write the uploaded file to a disk and return a new page indicating success, along with the filename and file size, which is what the rest of process_post in my example does.

For completeness, here's the entire process_post/2 function:

process_post(ReqData, C) ->
ContentType = wrq:get_req_header("content-type", ReqData),
Boundary = string:substr(ContentType, string:str(ContentType, "boundary=")
+ length("boundary=")),
{FileName, FileSize, Content} = get_streamed_body(
webmachine_multipart:stream_parts(
wrq:stream_req_body(ReqData, 1024),
Boundary), [],[]),
StorePath = filename:join([?STORE_PATH,"/", FileName]),
filelib:ensure_dir(StorePath),
file:write_file(StorePath, Content),
NewRD = wrq:append_to_response_body(
io_lib:format(
"<html><head><title>Upload Complete</title></head>"
"<body><h1>Upload Complete</h1>"
"<p>Received file ~s (~.2fK)</p></body></html>",
[filename:basename(StorePath), FileSize]), ReqData),
{true, NewRD, C}.
The entire source for the file upload resource can be found here. It's 42 lines total, including boilerplate and supporting HTML.

Happy Uploading!