Diversity at NDC London

Jakob BradfordJon Skeet, Chris O’Dell and I have exchanged a number of emails, Google Hangouts, and suggestions in the past few months that could be used to improve diversity at NDC conferences. I was a speaker at NDC Sydney, and it was one of the most heavily gender skewed events I had attended in years.

Ultimately, NDC London 2017 ended up at 23% female speakers. That’s a great result compared to the gender diversity numbers at the Sydney NDC conference. I am keen to see if the attendee ratio changes as a result. NDC have also promised a blog post on the history of their diversity numbers, and what their plans are for the future to continue to improve.

For other reasons, I’m not involved with NDC any more, but I wanted to say congrats to the London committee, and especially to Chris O’Dell, my ex 7digital colleague and friend, who helped to shape that agenda. She’ll be speaking on How to get your submission accepted at NDC London.

The Agenda is out – NDC London 2017 from NDC Conferences on Vimeo.

Metro: the ABC’s new Media Transcoding Pipeline

In December last year, the ABC launched a new video encoding system called Metro (“Media Transcoder”), which converts various sources of media into a standardised format for iview.

It’s been a fantastic project for the ABC’s Digital Network division – we’ve built a cheap, scalable, cloud-based solution that we can customise to suit our specific needs.

Metro has been live for a month, successfully transcoding thousands of pieces of content. Here’s an overview of how it’s been designed and what it does.

Background

Our previous transcoding system had a fairly straightforward job: produce the same set of renditions for each piece of content it was given. Both input and output files were fairly standardised. The previous system was convenient, but there were some aspects we couldn’t customise, and we didn’t use its #1 proposition: on-demand transcoding. Most of the content the ABC publishes is available to us days in advance, so we just need to make sure that it’s transcoded before it’s scheduled for release online.

We calculated that we could replace the existing system for less than the previous system cost, and take advantage of AWS services and their scalability. Other systems like the BBC’s Video Factory have been successfully built using the same model. Writing our own system would allow us to start batching up jobs to process in bulk, or use different sized instances to help reduce costs in the long term.

Our first step was to replicate what the existing system did, but allow it to scale when needed, and shut down when there’s nothing to do.

Architecture

Metro is a workflow pipeline that takes advantage of queuesautoscaling compute groups, a managed database, and notifications. Logically, the pipeline follows this series of steps: File upload > Queue Job > Transcode > Transfer to CDN > Notify client

 

transcodearchitecture

The pipeline is coordinated by the “Orchestrator”, an API written in node.js that understands the sequence of steps, enqueues messages, talks to our database, and tracks where each job is in the system. It’s also responsible for scaling the number of transcoding boxes that are processing our content.

Each step in our pipeline is processed by a small, isolated program written in Golang (a “queue listener”), or a simple bash script that knows only about its piece of the pipeline.

We are able to deploy each piece independently, which allows us to make incremental changes to any of the queue listeners, or to the Orchestrator.

Interesting bits

Autoscaling the Transcoders

The transcoders are the most expensive part of our system. They’re the beefiest boxes in the architecture (higher CPU = faster transcode), and we run a variable number of them throughout the day, depending on how much content is queued.

Before a piece of content is uploaded, we check to see how many idle transcoders are available. If there are no spare transcoders, we decide how many new ones to start up based on the transcoding profile. Higher bitrate outputs get one transcoder each; lower bitrates and smaller files might share one transcoder over four renditions. Once we process everything in the queue, we shut down all the transcoders so that we’re not spending money keeping idle boxes running.

Here’s a snapshot of the runtime stats (in minutes) on boxes over a 4 hour window:

ec2_transcoders2

There’s definitely some optimisation we can do with our host runtime. In future, we’d like to optimise the running time of our transcoders so that they run for a full hour, to match Amazon’s billing cycle of one hour blocks. We’d also like to take advantage of Amazon’s spot instances – using cheaper computing time overnight to process jobs in bulk.

FFmpeg

FFmpeg is the transcoding software we use on our transcoders. It’s open source, well maintained, and has an impressive list of features. We’re using it to encode our content in various bitrates, resize content, and add watermarks. We create an AMI that includes a precompiled version of FFmpeg as well as our transcoder app, so that it’s ready to go when we spin up a new box.

There’s still a way to go before we’re using FFmpeg to its full extent. It’s capable of breaking a file into even chunks, which would make it perfect to farm out to multiple transcoders, and likely giving us even faster, consistent results every time. We can also get progress alerts and partial file download (e.g taking the audio track only, avoiding downloading a bunch of video information that you won’t use).

SQS Queues

We utilise SQS queues to keep our pipeline resilient. We’ve got different queues for various step in our system, and each queue has a small app monitoring it.

When a new message arrives, the app takes the message off the queue and starts working. If an error occurs, the app cancels its processing work and puts the message back at the head of the queue, so that another worker can pick it up.

If a message is retried a number of times without success, it ends up in a “Dead Letter Queue” for failed messages, and we get notified.

Things seem to be working well so far, but we’d like to change the queues so that consumers continually confirm they’re working on each message, rather than farming out the message and waiting until a timeout before another consumer can pick it up.

In Production

Metro has been transcoding for a month, and is doing well. Our orchestrator dashboard shows all of the jobs and renditions in progress:

orchestrator2-small

And some of the work done by transcoders in a 4 hour window:

transcode_chart

The Future

We have more features to add, such as extracting captions, using cheaper computing hardware in non-peak times, and building priority/non-priority pipelines so that content can be ready at appropriate times. Metro has been really interesting to build, much cheaper than our previous solution, and we can customise features to suit our needs. I’m really looking forward to where it goes next.

Code Sydney – a Javascript study group

I’ve done quite a few random side projects using Javascript, but I’ve never learned it “properly”, and I’ve always wanted to. In a nice coincidence, a fellow geek Lucy Bain started a Javascript study group a couple of months ago called Code Sydney, which uses the Odin Project‘s course material – so of course I signed up.

Course Content

I’ve really been enjoying the course so far.  It doesn’t assume previous knowledge about Javascript, so it starts with the basics – variables, functions and jQuery. It then progresses through objects & prototypes, the DOM, events, callbacks, scope, closures, and popular frameworks like jQuery, Angular and Node.

Every week  you have to do some homework reading about a specific topic, e.g. prototypes.  There will also be an accompanying coding project to build, which uses the knowledge you’ve just read about. We start the coding project as a group during the study group meeting, and complete it at home later during the week.  Nobody is teaching the material for the study group, so it’s up to each participant to do their homework.

My contributions so far are on github as source code and demos (disclaimer: there is almost zero CSS effort put into these). The more fun projects so far have been rebuilding games, including snake and tic tac toe.

Format

We meet in the Atlassian office once a week for around 2.5 hours.  There are 2 or 3 tutors each week who’ve generously volunteered their time to help out, answer questions and review code.

The format of each night is roughly:

  • Check in (5 mins). Attendance is recorded as a motivational factor.
  • Demos (15-20 mins). A few people demo their solutions to the previous week’s project, and people can discuss different approaches.
  • Questions & Suggestions (5-10 mins). People have a chance to bring up any additional questions for the tutors, or the tutors can suggest “best practice” recommendations after the demos.
  • Start practical coding problem (up to 2 hours). We start the week’s coding problem in class, and finish the rest of it at home. If you aren’t sure how to approach something, you can ask a tutor.

Things I love about the study group model

  • There’s a set time and place to focus on learning something new, so there’s a natural deadline for you to achieve something by
  • I’ve learned much more than if I tried to do the course by myself
  • I’m seeing progress and building on my knowledge each week, which is rewarding and motivating
  • I’ve met new people
  • I get the chance to ask experienced people questions if I’m unsure about something
  • I’m building up a portfolio of fun projects (minesweeper this week!)
  • It’s much cheaper, and arguably better quality than an official course run by someone getting paid to teach. We discuss a lot of our solutions and get to see the merits of different approaches.
  • Nothing stops you paying it forward – feel free to organise your own study group, using the same material. All you need is a space to meet up.

I’m so excited about the format that I’m thinking about co-starting one for algorithms & data structures, as I’ve wanted a refresher and the ability to think/learn about them in a non-pressured environment. Part of the challenge is finding existing people who are knowledgable and enthusiastic about the subject to be tutors, or whether to run it without tutors. In any case, watch this space 🙂

Creating a Music Matrix with the Web Audio API

Last week I stumbled on this Tone Matrix, which uses the Web Audio API to generate and play sounds. I got really interested in the mechanics of sound generation and wondered how they did it, but unfortunately, there’s no source… so I decided to learn more about the Web Audio API, and recreate the matrix as an exercise. The source is available on github.

I’ll be running a tutorial for Women Who Code Sydney in July on how this works, and plan to cover some different filters and effects you can run sound through to get more interesting results (the matrix in its current state is pretty basic). There is more mathematics than I had bargained for, but producing a basic sound doesn’t really require a lot of code.

You’ll need to have a fairly recent version of your browser to play with the demo.

Screen Shot 2014-06-18 at 8.14.51 pm

Things I learned:

  • You can use existing sound sources (existing files, microphone) but also generate a sound wave with an Oscillator.
  • It’s a pipeline. Only one thing should output to your speakers.
  • Each audio buffer note can only be played once. You need to recreate each new note to play.
  • Sometimes there are loud ‘click’ noises as you abruptly change the notes through the speakers. You need to cater for this with some filters, or gain (aka volume control).
  • The API has been in development/experimental phase for quite a while and there’s not a lot of comprehensive documentation available. Most of the learning came from code samples.

Some interesting reading:

Three days of Haskell

I spent three days up in Brisbane between March 17-19 on a course called “Introduction to Functional Programming using Haskell“.  It was intense!

The course was run by Tony Morris & Mark Hibberd from NICTA, and Katie Miller from Red Hat. It was originally billed as Lambda Ladies, but it turns out there weren’t quite enough ladies to fill the course, so anyone else interested was invited along too.

The course is a bunch of practical exercises. They excluded the standard Haskell library from the project, and we spent time reimplementing first principles, starting with functions involving Lists.  It’s a very hands-on way of learning how Haskell works. The first day covers pattern matching, folding and functional composition, the next couple deals with abstracts on binding & functors, getting towards monads. You spend some time implementing a couple of concrete problems – a string parser, and a problem involving file IO – to see Haskell in practice.

If you’re familiar with functional programming, you’d understand that’s a LOT of material to cover in three days. I would say that the average learning curve went a bit like this:

Screen Shot 2014-03-20 at 4.39.53 PM

However, having a solid understanding of programming concepts (e.g. lambdas) meant that the more complex concepts were a lot easier to pick up (to a degree).  When I was learning functional programming at university, it took me days to reimplement map properly in Haskell!  Earlier this week, it took five minutes.

Getting to your solution for each problem felt a lot like algebraic substitution and refactoring. First, you make it work, and then you refactor constantly to get the most elegant (read: shortest) solution by taking advantage of functional composition.

I was surprised at how much it ended up looking like a normal chained method once you introduce the point notation, aka functional composition, which is something C# looks to have borrowed heavily from when introducing LINQ.

To take the example from the link above,

ghci> map (\xs -> negate (sum (tail xs))) [[1..5],[3..6],[1..7]]  
[-14,-15,-27]

turns into…

ghci> map (negate . sum . tail) [[1..5],[3..6],[1..7]]  
[-14,-15,-27]

I was also surprised just how much of a rush it was to a) have a solution that type checked properly, and b) actually worked.  Haskell felt like an all-or-nothing proposition, where it either compiled and worked, or was otherwise hopelessly broken and gave you a type checked error that was difficult to decipher.  Otherwise, most other programming languages have a more granular feedback loop and are much easier to debug – you can put logging statements in, for example.

The best takeaway of all were these amazing lambda earrings!

Lambda Earrings

Learn You a Haskell is an excellent (and cute, and free) resource for learning Haskell.