“Developer-esque” relations

It’s not technically my official job, but I do some “developer-esque” relations for the ABC’s Digital Network division. I started doing it because I like it, and it’s been very educational.

To clarify, the phrase  “developer relations” could be interpreted in a couple of ways:

  1. Your company has products or services it sells, and you’re trying to help people in the community to make better use of them, to encourage new users, and to get an idea about future features that people are seeking. The end goal is to get more people using your product, and be happier while they’re doing so.
  2. Your company is trying to be more open about the way it builds things, and to build some community. You’re sharing knowledge about your internal processes and decisions, mistakes you’ve made, and what’s coming up in the future. The end goal is to let people find out more about your environment, get people interested in the products you build, exchange ideas, and perhaps even entice some future employees to join you.

My “developer-esque” relations work at the ABC falls firmly in the second camp, and it includes work over the last year such as GovHack 2015, our hackathons, and a new tech talk series we’re starting up. You can sign up here! The first talk is on our internal transcoding system, Metro.

I’m also pretty excited about the latest endeavour, the launch of the ABC developer blog developers.digital.abc.net.au 🙂

The ABC has some really interesting products that millions of people use daily. However, when it comes to who, how, or why we build those products, there is zero external visibility. If you were trying to find information about the ABC’s development team, there really wasn’t anything to see. In an age where almost every company has engineering blogs, talks at conferences or events, and has a community presence, we were falling desperately behind.

The lack of information correlates to a difficulty in attracting good candidates. When you have choices, why would you choose the place that you know the least about?

So the theory is that by sharing more, we hope to get more people in the door. I’m pretty excited to be part of that. If you are too, then please join us.

Metro: the ABC’s new Media Transcoding Pipeline

In December last year, the ABC launched a new video encoding system called Metro (“Media Transcoder”), which converts various sources of media into a standardised format for iview.

It’s been a fantastic project for the ABC’s Digital Network division – we’ve built a cheap, scalable, cloud-based solution that we can customise to suit our specific needs.

Metro has been live for a month, successfully transcoding thousands of pieces of content. Here’s an overview of how it’s been designed and what it does.

Background

Our previous transcoding system had a fairly straightforward job: produce the same set of renditions for each piece of content it was given. Both input and output files were fairly standardised. The previous system was convenient, but there were some aspects we couldn’t customise, and we didn’t use its #1 proposition: on-demand transcoding. Most of the content the ABC publishes is available to us days in advance, so we just need to make sure that it’s transcoded before it’s scheduled for release online.

We calculated that we could replace the existing system for less than the previous system cost, and take advantage of AWS services and their scalability. Other systems like the BBC’s Video Factory have been successfully built using the same model. Writing our own system would allow us to start batching up jobs to process in bulk, or use different sized instances to help reduce costs in the long term.

Our first step was to replicate what the existing system did, but allow it to scale when needed, and shut down when there’s nothing to do.

Architecture

Metro is a workflow pipeline that takes advantage of queuesautoscaling compute groups, a managed database, and notifications. Logically, the pipeline follows this series of steps: File upload > Queue Job > Transcode > Transfer to CDN > Notify client

 

transcodearchitecture

The pipeline is coordinated by the “Orchestrator”, an API written in node.js that understands the sequence of steps, enqueues messages, talks to our database, and tracks where each job is in the system. It’s also responsible for scaling the number of transcoding boxes that are processing our content.

Each step in our pipeline is processed by a small, isolated program written in Golang (a “queue listener”), or a simple bash script that knows only about its piece of the pipeline.

We are able to deploy each piece independently, which allows us to make incremental changes to any of the queue listeners, or to the Orchestrator.

Interesting bits

Autoscaling the Transcoders

The transcoders are the most expensive part of our system. They’re the beefiest boxes in the architecture (higher CPU = faster transcode), and we run a variable number of them throughout the day, depending on how much content is queued.

Before a piece of content is uploaded, we check to see how many idle transcoders are available. If there are no spare transcoders, we decide how many new ones to start up based on the transcoding profile. Higher bitrate outputs get one transcoder each; lower bitrates and smaller files might share one transcoder over four renditions. Once we process everything in the queue, we shut down all the transcoders so that we’re not spending money keeping idle boxes running.

Here’s a snapshot of the runtime stats (in minutes) on boxes over a 4 hour window:

ec2_transcoders2

There’s definitely some optimisation we can do with our host runtime. In future, we’d like to optimise the running time of our transcoders so that they run for a full hour, to match Amazon’s billing cycle of one hour blocks. We’d also like to take advantage of Amazon’s spot instances – using cheaper computing time overnight to process jobs in bulk.

FFmpeg

FFmpeg is the transcoding software we use on our transcoders. It’s open source, well maintained, and has an impressive list of features. We’re using it to encode our content in various bitrates, resize content, and add watermarks. We create an AMI that includes a precompiled version of FFmpeg as well as our transcoder app, so that it’s ready to go when we spin up a new box.

There’s still a way to go before we’re using FFmpeg to its full extent. It’s capable of breaking a file into even chunks, which would make it perfect to farm out to multiple transcoders, and likely giving us even faster, consistent results every time. We can also get progress alerts and partial file download (e.g taking the audio track only, avoiding downloading a bunch of video information that you won’t use).

SQS Queues

We utilise SQS queues to keep our pipeline resilient. We’ve got different queues for various step in our system, and each queue has a small app monitoring it.

When a new message arrives, the app takes the message off the queue and starts working. If an error occurs, the app cancels its processing work and puts the message back at the head of the queue, so that another worker can pick it up.

If a message is retried a number of times without success, it ends up in a “Dead Letter Queue” for failed messages, and we get notified.

Things seem to be working well so far, but we’d like to change the queues so that consumers continually confirm they’re working on each message, rather than farming out the message and waiting until a timeout before another consumer can pick it up.

In Production

Metro has been transcoding for a month, and is doing well. Our orchestrator dashboard shows all of the jobs and renditions in progress:

orchestrator2-small

And some of the work done by transcoders in a 4 hour window:

transcode_chart

The Future

We have more features to add, such as extracting captions, using cheaper computing hardware in non-peak times, and building priority/non-priority pipelines so that content can be ready at appropriate times. Metro has been really interesting to build, much cheaper than our previous solution, and we can customise features to suit our needs. I’m really looking forward to where it goes next.

Running Women Who Code

I’ve helped to run Women Who Code Sydney for about a year (along with Lucy Bain and Peggy Kuo), and it’s been a blast. We organise practical hands-on workshops for a variety of technology like Arduino, Golang, Sass, Scala and Swift.

Participants spend about 1.5-2 hours working through a tutorial or problem set on their laptops, and can ask volunteers for help if needed, so it’s slightly different to a typical user group: the attendees are expected to code at every event.  I know that I personally learn more when I’m forced to do something, as opposed to listening to someone’s experience, and after you’ve done the workshop, everything’s installed on your laptop ready to experiment more at home.

A typical event might be:

  • 6:00pm: Arrive and dinner, chat to others
  • 6:30pm: Announcements and introductions
  • 6:40pm: Speaker topic for the night (e.g. introduction to Reactive Extensions)
  • 7:00pm: Commence hacking
  • 8:45pm: Feedback forms
  • 9:00pm: Finish

Things I’ve learned while running a hands-on user group:

Have a target that people can aim for
Inviting people to “learn some javascript” where there’s no specific learning material makes for a confusing meetup, because there’s no target to aim for. People will ask a variety of questions from all different angles (e.g. “what does var mean?”, “what do you think of angular vs react?”, “can you explain promises?”). If you provide a tutorial or set of exercises, there’s a defined path that is supposed to be followed to learn something, which cuts down lines of questioning and also provides people with a goal.

Designing a tutorial from scratch takes a lot of work (and rework).
You’re not likely to get it right on the first go, so unless you’re aiming for something you can re-use, you are better off going with already published material.

Utilise existing interactive online tutorials
They’re a big win. The tutorials have already been tested by hundreds of people before you, and someone has put a lot of effort into designing them. They explain concepts step-by-step probably better than you will first time.

Always try out the tutorial first
It’s important to gauge difficulty and identify what prerequisites are required. Also, sometimes the instructions change and it’s not the same tutorial any more!

Give people the answers upfront
If you are writing a custom set of exercises or tutorial, give everyone access to the answers. When people start with a working solution, it’s a lot easier to break various bits to see what they do, rather than having broken code and trying to diagnose what needs fixing.

Have some helpers available to answer questions
This is the thing that people don’t have access to at home. It really helps.

Aim to maximise everyone’s learning experience
You won’t actually cover that much material in a two hour window, so try to pick content that people can try at their own pace – that way everyone learns something.

Select an audience for your meetup
Choose either beginners, or people who are already familiar with programming. It is very difficult to cater for both at the same meetup.

Clarify prerequisites
State whether people need to understand simple if/else statements, or something more involved like recursion. If people turn up to an advanced tutorial but only know basic programming, they might start feeling like they don’t know anything and get discouraged – that’s the last thing you want!

Limit the speaker’s time in chunks
A 10 or 15 minute window is a good amount of time to keep people’s attention (especially after they’ve done a full day of work). Talk for a bit, let people experiment and try what you talked about. Repeat. This is difficult to keep in balance with a self-paced set of exercises, because some people will be ready for the next section before others, but it keeps people focused.

 

Review of Coursera’s Algorithms Part I by Princeton

This is the first in a series of two posts about a study group I organised for learning Algorithms & Data Structures. This post focuses on the content of the course, which is Princeton’s Algorithms I on Coursera.

The course covers a variety of data structures and searching and sorting algorithms from a programmatic implementation angle (as opposed to mathematic proofs; more on that in my second post). Specifically, this one covered union-find, binary search, stacks, queues, insertion sort, mergesort, quicksort, binary heaps, binary search trees and red-black trees, and a lot more.

The course has multiple types of content to help you learn:

  • The major component is the video lectures, which form about 2.5 hours of content each week and present some algorithm theory
  • Detailed lecture slides
  • Exercise questions, which test you understand that theory
  • Assignments, which make you put an implementation of an algorithm in practice
  • Final exam (I haven’t done this yet, but I still have a few weeks.)

Officially, the course is 6 weeks long, and requires 6-12 hours a week of effort. I think most people in our group underestimated how much time this really takes from your life.

Good things about this course

The combination of lectures, exercises and assignments was a really good way to cover the material from different angles. If you appreciate structured approaches to learning, this will tick all the boxes.

Videos: All of the material is professionally shot and edited. The entire series is presented by Robert Sedgwick, who is a very good lecturer. There is a good level of detail and explanation for each algorithm, especially the animated walk-throughs of each sorting algorithm. You have the option of watching the videos at a faster speed on the Coursera website, and I chose to watch it at 1.25x most of the time, as Sedgwick speaks quite methodically but slower than I am used to. (If you download the videos, you can’t take advantage of this feature, and you also miss the interim quizzes in the videos). It’s very helpful watching videos instead of listening to a live person, as you can pause them and rewind whenever you need to.

Slides: The slides are great. Very detailed, well-laid out and with diagrams to illustrate various concepts. The only thing missing from the slides were the animated walk-throughs of how each algorithm works, but you can always re-watch the videos.

Interim quiz: at the end of each video, and sometimes in the middle, you have to answer a question about the content you just watched.

Discussion boards: The Coursera site includes discussion boards where you can post questions. They’re monitored by people at Princeton who are helping to run the course. It’s a great resource when you get stuck.

Auto-marking for assignments: Assignments are all auto-marked for each submission, and the results from marking are really detailed. Each submission is run though lots of tests, and it also analyses your usage of memory & time relative to input (i.e. are you using constant, logarithmic, linear, quadratic time, etc). I found this quite valuable.

Real-world examples: Discussions of practical implementations of algorithms, such as a physics engine for particles, or easily finding whether lines/objects intersect, were really interesting.

Credibility: Sedgwick is a professor at Princeton’s school of Computing Science. You’re hearing from one of the people who has spent a lot of their life studying and working with algorithms – he’s written several books on algorithms, one of which is used as a reference for the Coursera course (the content is available for free on a corresponding book site). He also found a more efficient variant of Red-Black Trees in 2007, which he discusses in the lectures.

Choose your level of participation: You can cut out various parts of the course if you prefer – for example, if you were to only watch the lectures and make your own notes, you could spend 3 hours a week doing this course and still get something out of it. The minimum I’d recommend is watching the lectures and doing the exercises, as the exercises force you to step through the algorithms and work out what they’re doing.

Language-independent: It’s possible to complete these assignments in different langauges, as our study group proved. We had people write solutions in Rust, Python, Golang, C# and Scala. However, the majority of them completed in Java to take advantage of the auto-marker for the assignments.

Things I’d change about this course

Assignments not geared for unit testing: The APIs for the assignments were quite strict – it was almost impossible to test using dependency injection, or trying to refactor one giant (public) method into smaller public methods so you could test them independently. I did write some tests, but I also ended up submitting assignments to the auto-marker to get feedback for some aspects. I’d prefer if the API was less strict so that you can package your own classes, and break things into smaller chunks.

The assignments vary in complexity. Some require only 2 or 3 hours; others could take another 10 hours just by themselves.

Course schedule: The start date and schedule of the course is advertised as fixed, and quite intense. When they say you need 6-12 hours, they do mean it (and more). In reality, all assignments and lectures have “hard” and “soft” deadlines, and can be submitted up to 6 weeks after the lecture is released.  If we had known, we would have built some catch-up weeks into our study group dates to allow people to keep pace. This isn’t Cousera’s fault, but some knowledge that the content would be around for ~3 months would have helped plan a better schedule for our study group.

Some content not as relevant: This is a personal preference, but the course covers a lot of different searching and sorting algorithms in depth; in reality only a handful of them are in use by major languages. I’d prefer to concentrate on the ones in use, and not cover the ones that have been superseded.

Summary

The course was intense, but I learned a lot and it helped connect some dots on how to solve particular types of problems. For me, the best moment was an email from Aidan, one of our study group members, in the last week:

I actually used a weighted quick-union at work yesterday! Im as shocked as everyone else.

Proof that it is actually relevant 🙂

As for Algorithms Part II, I’m sadly stretched working on various things in life, including this blog, Women Who Code Sydney and organising a variety of things with my work at the ABC.  However, Caspar Kreiger is continuing the second half starting this week, so get in touch if you’re interested!  I plan to pick up Part II in October when it is run again.

Study Group: Algorithms & Data Structures

Since doing a Javascript study group last year, I’ve been keen to organise a Data Structures & Algorithms study group (partly to brush up on interviewing).

I’m pleased to announce that the study group will start January 28th. If you’re interested and live in Sydney, read on.

What will I learn?

We will be doing the Algorithms I course by Princeton university.

It involves a series of lectures & quizzes you watch at home, followed by a group meeting every Wednesday. At the group meeting you can ask questions about anything you didn’t understand, and start to go through coding exercises.

The course material is presented in Java, however you can choose a language of your choice to complete the problems. If you would like Coursera to mark your assignments and final exam, you would need to complete the course in Java. (Note: Completion certificates aren’t issued for this course.)

If you are unsure of Java syntax, please read up on a quick syntax guide before starting the course.

Where and when do we meet?

Atlassian has kindly agreed to host our meetings. Their office is Level 6, 341 George Street Sydney. The building entrance is off Wynyard St.

We’ll meet on Wednesdays at 6:30pm (please be prompt).

The course is 6 weeks and runs from January 28th until March 4th. There will be an optional week after the course ends to practice answering technical interview questions.

How much will it cost?

The course is free if you attend 5 out of the 6 meetings. You can skip one meeting without a penalty.

Everyone will be asked to pay 6 x $10 per meeting at the first meeting, a total of $60. For every meeting you attend, you’ll be credited $10 back.

For anyone who misses a meeting, their money goes into a pot. At the end of the course, the pot will be divided among the people who attended the most meetings. Nerdery pays 🙂

This is mainly an attempt to identify the people who really want to participate, and to motivate people to stick with the group.

Prerequisites

This is not a beginner’s course. You should:

  • Be able to code confidently in a language of your choice
  • Be comfortable with git
  • Understand the concept of a class, objects, functions, arrays, lists, sets, loops, recursion and the core types available in your chosen language
  • Understand what unit testing is
  • Be willing to discuss your approaches to problems, and demo code
  • Be willing to spend 4-12 hours a week watching lectures and completing code assignments

I’m not teaching this content, I want to learn it and would like other motivated people around at the same time.

How can I sign up?

The study group will be limited to 15 people. The first 15 people who contact me (@daphnechong) and bring a refundable $60 to the first meeting will be eligible.

See you there 🙂

Hello, Arduino

Last September, Women Who Code Sydney ran a Learn Arduino event. I’m generally not very keen on hardware, so I hadn’t bothered to investigate Arduino in depth, but this blinking green light from the workshop was one of the most exciting things I’d seen in ages. It was programming in physical form: I’d written the code, sent it to the motherboard, plugged in the wires and resistors to control the current, then seen something in my environment that I could actually touch and change.

IMG_7199

Arduino has been around for a while. It is a small version of a computer with very simple inputs and outputs, and that’s what makes them really fun to play with.  There are lots of different input sensors you can use, like temperature sensors, movement, infrared, light.

It’s fairly inexpensive to get a basic Arduino kit, as cheap as $30 depending on where you get it from. Atlassian kindly sponsored our event and donated 20 starter kits which included the basic Arduino board, and a whole lot of extra sensors to play with.

  • 1 x 830pt Breadboard
  • 4 x LED
  • 2 x RGB LED
  • 1 x 9V Plug
  • 1 x 9V Lead
  • 1 x Breadboard Power Module
  • 4 x Tactile Switch
  • 1 x Small Slide Switch
  • 10 x Resistors
  • 1 x pack of jumper wires
  • 1 x Light Dependant Resistor
  • 1 x Small Plastic Servo
  • 1 x Buzzer
  • 1 x Linear Rotary Potentiometer
  • 1 x Ultrasonic Sensor
  • 1 x Hall Effect Sensor
  • 1 x 7 Segment Display
  • 1 x Temperature sensor
  • 1 x IR phototransistor
  • 1 x NPN transistor BC547

Our host, Natalia Galin did a phenomenal job preparing for the event, even down to these cheat sheets with components separated out and nicely labelled, which made it easy to work on our tasks.

IMG_7194

First up was a crash-course on electronics, and how the Arduino’s breadboard circuitry works.

IMG_7193

Then a series of programming tasks to connect up the wiring so that lights work, and using physical switches to turn lights on and off. It was addictive!

We were limited by how many kits were available, but we had around 25 people attend the workshop, and the atmosphere was great. A huge thanks goes to Google for sponsoring the venue and catering for the night, and Atlassian for the Arduino kits.

IMG_7200