Downloading podcasted videos with awk

One of the workshops we run as part of our technology program gives students the chance to create a stop-motion animation using SAM Animation and some modelling clay. Once they've finished taking the photographs, SAM exports a movie which is uploaded to our servers using the Podcast Capture utility. The workflow then adds the title and date to a splash screen at the start of the movie before publishing it to the Podcast Library wiki. I was pretty happy with how this all worked together (particularly given it was my first attempt at anything like it), but one thing was missing - an easy way to download the video files so students could have a copy of their movie.

On the face of it, this should be pretty easy as Podcast Composer offers multiple options in the workflow for publishing the final file. Unfortunately permissions issues between the two computers processing the files plagued attempts to simply export the file into a shared folder. I'm sure someone that actually knew what they were doing with server administration would have been fine, but it was well beyond my knowledge.

My first attempt to grab the files used Automator and worked pretty well apart from the fact that the videos were given the same 64-character filename as they had on the server ('export-plugin-quicktime-UUID.m4v'). Having presenters manually rename the files at the end of every workshop wasn't really feasible, so I started looking for a way to rename the videos as they were downloaded.

Given that the Podcast Library was able to play the videos, I figured that the HTML source must have links to the actual video files in it somewhere. Sure enough, this turned out to be the case - <span> tags similar to the one below are used to display the 'Play' button for each video. Handily, the tags also contained the title of the video as well as the path to the video itself.

<span class="cc-token playbtn" style="min-width: 50px; text-align: center;" title="Rolling Ball" assetURL="https://server.local/podcastlibrary/document/uuid/7DF98AA3-CCEA-4158-84FA-BC4DEC04DCFF/export-plugin-quicktime-E43D860D-31BB-454F-83AD-B2ECFFED9E92.m4v">

I've used grep in the past and it seemed like a good starting point given that the tags contained a few unique identifiers such as the 'cc-token playbtn' CSS class. I'd also heard about awk, a programming language designed to work with text files. It seemed like a pretty good option, so I had a look at some tutorials. This didn't go particularly well because awk isn't really a command-line tool like grep, it's a full programming language. After a number of false starts over a few days, I got the following to work:

curl "$@" | grep -o 'title=".*" assetURL=".*"' | awk 'BEGIN { FS ="\"" } ; { system("curl --insecure --create-dirs -o \"~/Desktop/Podcasted/" $2 ".m4v\" "$4) }'

To make this easy for everyone to use, I made an application in Automater with the actions 'Get Current Webpage from Safari" and 'Run Shell Script'. The curl command downloads the HTML from the Safari page, after which the grep command pulls out the relevant lines from the HTML source and feeds them to awk. The awk command splits the lines up into fields every time it finds a double quotation mark, giving a total of four fields. The second and fourth field of each line are the title and url of the video respectively which are then used in a curl command to download the video.

There are a couple of things I'd like to fix up but I was so happy that it worked at all, I haven't got around to them. Firstly, I'm pretty sure the grep command is unnecessary and the whole thing could be done with awk. The second is a situation where where this script is likely to fail: if two videos are submitted with the same name. This does happen in our workshops (particularly where the group is given a specific topic to create a video about), but I haven't figured out a solution yet.

Although it started off pretty slowly and ended up a bit more hacky than I'd like, I'm happy with how this turned out. If nothing else, it gave me a chance to take a look at awk which will definitely be useful in the future.