Fun With Public Transport Data

I am a transport nerd, and a map nerd, as evidenced by all the previous hackathons I seem to do involving maps.

Thus, when I discovered that Sydney’s public transport system data is available to download, it seemed only logical that I should involve a map somewhere.  The result is a map to show you where you can live if you want to be within “x” minutes of the city by train. I defined the city to be any of the following stations: Central, Circular Quay, Martin Place, Museum, St James, Town Hall, Wynyard.

transport-maps

There are some unexpected results, because the trains don’t stop at all stations for every journey.

For example:

  • The central corridor supported by T2 inner west line and T1 western line has the best density of minimum times across all stations.
  • Getting to the city from Sutherland or Campbelltown is faster than getting to the city from Hornsby or Pennant Hills.
  • Bondi Junction is a measly 7 minutes away!
  • The fastest train to Glenfield is 14 minutes faster than to its neighbouring station, Macquarie Fields.
  • Eastwood station is just 21 minutes to the city, faster than 3 stations on either side of it.
  • Burwood, Ashfield and Petersham – all on the same line – have almost the same minimum travel time at 10 or 11 minutes.

You can explore the map yourself at http://daphnechong.github.io/transport-maps/.  I’d like to do a lot more on it, such as adding the bus and ferry timetables and identifying the individual lines, but it’s a work in progress. If you have any ideas, I’d love to hear them!

A Startup Retrospective

A few weeks ago, I started to pull back work on Vine Trails, and committed to just one day a week.

My co-founder Matt and I weren’t able to keep the same schedules – I was full time, while Matt was only available for a day. At the time I thought it would be better to change my hours to suit his, and have everyone progressing on the same page at the same time. But later, I realised that by agreeing to cut down my hours, I had actually declared that my interest was fading. It just wasn’t for me.

Everyone talks about the fact that you need to be really passionate about your startup idea to succeed, because things will get hard at some point. Your passion is what will drive you through the dip to see the other side. As the weeks progressed, and we learned more things, we adjusted the idea and the focus of the product. But Vine Trails was turning into something I was getting less interested in building.  I was running into barriers, and didn’t possess the drive to break through them.

My original idea was purely travel related – a trip itinerary generator. I wanted to build something that could answer this question:

I have three weeks for a holiday, and I want to go to New Zealand.

What should I do while I’m there?

That’s an enormous problem, and difficult to know where to start. So I decided to cut it down to a really focused vertical that was easy to define: wine tourism. Vine Trails was born.

The thing is, I really like wine. I enjoy travelling to wine regions and tasting wine. I would love if a product like Vine Trails existed already, and I would use it.  But there’s a difference between wanting to use a product, and having the drive to turn an idea into something real. I have friends who like to read about new wine releases, participate in forums, research wine regions, and subscribe to winery mailing lists. For them, that’s just fun and they love reading about it. For me, it would be necessary research rather than something I’d choose to do. When we started putting more focus on Vine Trails appealing to wineries, it just got less interesting to build.

I realised that the data element of the product is what I was passionate about – taking information about their wines and making it available in a new format, or letting people search through it in unusual ways.  I find analysing and visualising data really interesting – and it doesn’t really matter whether that information is about wine, or public transport, or economic growth. Making data accessible is where my interest lies.  I am passionate about data at a completely different level than I am passionate about wine tourism.

Lesson learned.

(Along with other things I learned about startups and team composition.)

Vine Trails still exists, in the capable hands of my co-founder Matt and my husband Niall, two of the biggest wine nerds I know. They’re both working on it part time, which means it will take a little longer to mature, but it’s definitely in the pipeline. I will be pitching in occasionally, but I won’t be the principal driver any more.

In the meantime, if you know anywhere in Sydney looking for data nerds, please drop me a line.

Creating a Music Matrix with the Web Audio API

Last week I stumbled on this Tone Matrix, which uses the Web Audio API to generate and play sounds. I got really interested in the mechanics of sound generation and wondered how they did it, but unfortunately, there’s no source… so I decided to learn more about the Web Audio API, and recreate the matrix as an exercise. The source is available on github.

I’ll be running a tutorial for Women Who Code Sydney in July on how this works, and plan to cover some different filters and effects you can run sound through to get more interesting results (the matrix in its current state is pretty basic). There is more mathematics than I had bargained for, but producing a basic sound doesn’t really require a lot of code.

You’ll need to have a fairly recent version of your browser to play with the demo.

Screen Shot 2014-06-18 at 8.14.51 pm

Things I learned:

  • You can use existing sound sources (existing files, microphone) but also generate a sound wave with an Oscillator.
  • It’s a pipeline. Only one thing should output to your speakers.
  • Each audio buffer note can only be played once. You need to recreate each new note to play.
  • Sometimes there are loud ‘click’ noises as you abruptly change the notes through the speakers. You need to cater for this with some filters, or gain (aka volume control).
  • The API has been in development/experimental phase for quite a while and there’s not a lot of comprehensive documentation available. Most of the learning came from code samples.

Some interesting reading:

Heroku vs Amazon from Australia

I thought this was interesting to share.  I live in Sydney, Australia, and I was looking for relatively easy hosting setup for an Australian audience. I had two options: go with a platform-as-a-service provider like Heroku, or spend more time setting up my own infrastructure with AWS which has a datacentre in Sydney. Azure isn’t available here yet, but it’s coming real soon now(tm).

Both Heroku and AWS offer free tiers, so I didn’t need to shell out any money. I’ve used AWS before, so I figured I’d give Heroku a try.

Heroku

I won’t cover the pros/cons of Heroku as lots of people have already done it, but deploying my first app was really smooth and easy. At the time of writing you can only host in the US (EU is in beta), so I went with the default US option.

When you deploy, your app runs on a unit called a dyno, which is heroku’s equivalent of a server, and your hosting dyno will sleep after an hour of inactivity if you’re on a free tier.   I noticed a lag on page load occasionally when the dyno was waking up, so I threw it in web page test to measure*.

It takes 9 seconds for the user to see anything meaningful on the screen. Ouch.

heroku-summary

heroku-graph

 Amazon Web Services

I decided to try Amazon using their platform-as-a-service, Elastic Beanstalk. You do share underlying architecture when you’re on the free tier, but there’s no concept of your machine ‘sleeping’ like the dyno does.

Once I got the deployment working, it was noticeably faster than Heroku for a cold start.  Time to start rendering is much faster at just under 2 seconds (and reducing that time is my problem now, not the hosting). Here are the comparison graphs.
amazon-summary

amazon-graph

There is a downside, though – hosting with Amazon takes a lot more persistence because their documentation kinda sucks. It’s huge and contradicts itself in different articles, so you aren’t quite of the right thing to do. I followed these instructions to deploy a node.js app and ran into three different issues, one of which was because I’m trying to deploy region which isn’t the default US-East. If I wasn’t already familiar with AWS, I might have given up.

* Technically, I didn’t test from Australia – I used the Wellington, NZ agent in case the Sydney agents were hosted in an Amazon data centre.

The Hipster, Hacker and Hustler

The term “Hipster, Hacker and Hustler” was coined in 2012 and describes the “dream” startup team. It consists of a hipster (designer) to make your product look great, a hacker to build it, and a hustler to think about strategy and marketing. Many VCs and seed funds swear by this mix, and admit that it affects your chances of funding and acceptance into accelerator programs.  After starting Vine Trails, I think I’m a convert.

Starting with a Team of One
I’m a software developer by trade, and I assumed I’d have a “head start” on creating a startup, because I have the skills to build it myself.  Plus, I could spend more time on fun things I actually wanted to build.

Nope!

I totally busted that idea within about a week. I was researching market sizes, volumes of goods, tourism numbers, checking out other competitors in the wine space, seeing how the existing industry worked, wondering how to build two sides of a marketplace concurrently, and how to attract people to a new product. I had lots to do, and was unsure of the best use of my time at any given point. Most importantly, I didn’t write a single line of code, unless you count a landing page (I don’t.)

Wanting to Share
About two weeks in, I realised I wanted a business co-founder, aka “hustler”. Someone who had skills in analysing a market, connecting with other businesses, or getting word out to consumers.  I lacked a lot of this experience myself, and it would be great to learn from someone else who’s done it before. I could also see the advantage of splitting tasks according to areas of expertise, achieving more in a shorter time frame.

Complexities
The tricky thing is finding your co-founder. You can’t pick any random person – it should be someone you respect and trust, has a complementary working style, and shares the same passion in the product as you do. Someone I meet at a hackathon is unlikely to fill that description. Also, someone who has my exact skill set isn’t going to add much diversity to the team – although it is more fun working with people you already know.

Every person you introduce brings a communication overhead, and potential divergence of product vision. There’s a chance you’ll move slower, because there are more people you need to convince before you try something. (Of course, there will be a lot of new ideas brought to the table too, which is both positive and negative as you’ll need to weigh up new ideas for goodness and feasibility.)  A couple of really interesting articles by SNTMNT and Derek Dukes talk about this issue.

There is also an issue of balancing time and dedication to the startup – what happens if one person has more time to dedicate than another? Or feels like they have more interest in the project, or is willing to invest more effort? No situation is perfect, and people bring their own imperfections to the mix as well.

Benefits
The immediate benefit I can see of the Hipster, Hacker and Hustler is keeping your momentum when you run into problem “X” – where X can be anything from “a good strategy for cold calling someone”, “analysing user metrics” or “how to make this cross browser compatible”.  Right now when I run into something I don’t know, I hit up Google, open about 10 tabs, read a bit, think a bit, find out if there’s any meetups about it, see click bait for an unrelated topic, read the twitter page for the author… and say goodbye to another hour of productivity.

If you split your tasks up, you can reclaim mental space that used to be dedicated to marketing, or tourism research, or whatever.  Sharing gives you more time and focus on what you do best. You also get the benefit of new ideas, another person who’s vested in driving the idea forward, and two chances of having a great day for the product overall.

My Experience
Vine Trails currently has one person who’s spent a lot of time on research and data (me), and two others who are part time enthusiasts. We’re definitely further along than if it was just a single person working on the product. However, we’ve had some hurdles too – everyone has different amounts of time they can spend on the project, and it can be challenging to bring everyone on the same page when we meet up. It’ll be interesting to see where we go from here.

On Starting Up

I’ve been working on my own project recently, a tourism related startup called Vine Trails. Its aim is to help people understand and navigate Australia’s wine regions based on wines they already like.

I’m really enjoying it so far, and feel like I’ve learned an enormous amount in the last six weeks. I love seeing what’s fun, what’s difficult, and what kind of tasks I enjoy doing.  It’s been an intense, energetic, self-driven and rewarding experience so far, with some occasional bouts of confusion, doubt and contradiction. Learning to manage the emotions around ups and downs is high on my priority list, but discovering that I am a good analytical business thinker is nice.

Hacking to Learn

I’ve learned that a lot of the early stages in a startup are basically hacking things together to learn something about your customer, or your market. It’s known as the “wizard of oz“. As a developer, I really, really, really dislike hacking things if I know I’m going to have to repeat it multiple times, so I found this stage of learning pretty challenging, even though I ultimately found what would/wouldn’t be viable in this process.

Shaving those Seventeen Yaks

I’ve also learned that startups are about balancing learning with action.  It feels like you need to explore seventeen different avenues at the same time, but how do you prioritise them all when you’re just one person? You want to know how big the market is, what’s the likelihood of conversion, what business model would succeed, where you can get the data from, who’s currently a competitor, how is your idea different, what problem are you trying to solve, how do you make it look good cheaply, and ultimately does the customer really want it?

Coding – Not Really That Critical

My biggest surprise is that most of my six weeks has been spent on research/thinking/analysis/adminstration, with only about 20% on code. Here is a sample of technologies or tasks I’ve worked on recently:

Code:

  • Twitter bootstrap
  • jQuery
  • AngularJS
  • Font Awesome
  • Node
  • Neo4j
  • Google Maps technologies

Non-code:

  • Hosting research – Heroku, Amazon, GrapheneDB, Bitbucket
  • Trying to work out a name (this is torture, since most of the internet is parked-up)
  • Reveal.js presentation framework
  • Domain name providers
  • Online wireframing – mockingbird
  • Design and colour schemes – kuler
  • Researching free HTML5 themes
  • Image providers / creative commons implications
  • Cost of freelancing for certain tasks – data entry, design
  • Researching wine regions
  • Writing up itineraries in wine regions (the “wizard of oz“)
  • Co-working spaces and trialling them
  • Building a lean canvas
  • Learning about startup accelerators
  • Looking at business cards
  • Researching statistics on wine tourism
  • Putting together a pitch
  • Going to networking/tech events – How to start up in Sydney, Women Pitch, SheHacks, SydJS, Women Who Code
  • Trying out Google AdWords
  • Researching potential revenue models
  • Registered GoogleApps for business account
  • Working out product/market fit for product iterations
  • Creating a mailing list organisation on Mailchimp
  • Google Analytics
  • Built & customised landing page
  • Investigating grants
  • Data entry
  • Checking out potential new meetups or events that are worth going to

Exploring the startup scene

I’ve enjoyed going to some of the entrepreneurial meetups, co-working spaces and courses around Sydney (at FishburnersGeneral AssemblyTank Stream Labs) to name a few. Taking the time out of your product development to explore the ecosystem is really important, to get exposed to new ideas and meet new people.  I’d even say that not doing this will lessen your chances of success dramatically.

Learning to Delegate, and Learning to Pay

Actually paying for something made me realise there was a lot of value in delegation. I’d rather spend the money on this service to solve my problem instead of trying to do it myself, or trying to shoehorn a free version into what I wanted to do.  There are loads of service providers that will help you bootstrap your idea, for example mailchimp for a free mailing list. It’s just a matter of researching what’s there for free, or deciding you are happy to pay for something that’s well-known that will save you time.

Keeping a Diary

I’ve started keeping a diary, just a sentence or two covering what I did that day, and how I felt.  It really helps to show me what I achieved in the last day/week/month. It’s nice to read over when you’ve had a crappy day.

The Non-Traditional Path

I’ve reached a point where I feel comfortable calling it a “startup”, but a lot of other people were calling it that before I was able to. I felt that there was a certain level of maturity needed to warrant the label “startup”, compared to a hobby you work on in your spare time. There’s also a lot of expectation from other people when you start calling it a “startup” (dealing with comments like “fantastic, tell me when I can buy shares!”)

While I don’t happen to be traditionally employed at the moment, in my mind I actually have a job, and I keep hours that make it feel like I have a job (though I am frequently thinking/researching at night or on weekends too). I appreciate my weekends much more when working on a startup idea. It definitely doesn’t feel like I’ve been on a weekly treadmill that will repeat exactly the same for the next 6 months.

Trying the startup life has freed me up to really think about what I like and want to do. I look at the few months I’ve had off as a chance to learn things I wouldn’t have otherwise, and they’re great. No matter what happens with Vine Trails, I’ve learned a ton.

Winners of She Hacks 2014!

I was really excited to attend the inaugral SheHacks 2014 hackathon in Sydney, organised by the lovely women from Girl Geeks Sydney – Georgi Knox, Denise Fernandez, Kris Howard, Sera Prince McGill and Peggy Kuo. It was held at Google’s offices in Pyrmont and was a fantastic event! (SheHacks was running in parallel in Melbourne too, so you can check out a rundown of the Melbourne event by Tammy Butow.)

Everyone hard at work

It was the first hackathon for quite a lot of people, and it was great to see people getting involved in an event they might not otherwise attend.  Tickets for the event were sorted into several types:

  • Developers (the majority of the tickets)
  • UX/Designer
  • Non-technical

People were encouraged to form teams of about 5 people – 3 developers, 1 ux person, and one non technical – with the goal that your devs can build, the UX person makes it look amazing, and your non-technical person can coordinate and concentrate on your presentation (following excellent advice laid out by Kris just a month ago on presenting your hackathon project.)

Team Disasterama (minus me)

I was also amazed at the generous catering – pizza, caffeine, snacks, lots of cookies made by team mate Denise, and a decidedly un-male breakfast spread of yoghurt, muesli and fresh-cut fruit!

snacks aplenty

The result? 50 women in 11 teams competed for some great prizes donated by Google, Atlassian, Microsoft and Razorfish. There were some fantastic team hacks presented, and I personally enjoyed:

  • Mini Jobs – finding odd jobs for younger people to do to boost their confidence/skills and earn some pocket money
  • Share the Paw Paw – crowdsourcing locations around your neighbourhood where fruit and vegetables are freely available, or if you have a surplus to give away
  • Coffee Run – formalising coffee rounds in the office, including keeping tabs of who owes who

HOWEVER… our team of Denise Fernandez, Luciana Carrolo, Kim Chatterjee, Anna Zaitsev and myself won first prize with our “Mission Possible” app!! The site is designed to connect volunteers with coordinators to assist with disaster relief. The amazing prezi designed by Anna and Kim describes the idea in detail.

The source code is available on GitHub. The app was designed to be realtime so that volunteers can see up-to-the-minute information about where their help is needed, and in our demo we used two screens to great effect (realtime updates are always a crowd pleaser!). It was written using node, socket.io, handlebars, google maps, twitter bootstrap and a lovely set of custom icons designed by Kim.

A screenshot from our app shows a shaded area where the “disaster” has occurred (an oil spill), and a point which is the muster point for volunteers to go to to help (save the penguins!). Everything updated in real time from a master coordinator, who would add extra muster points and specify numbers of volunteers that should be at each point.

Mission Possible

Our prize was a Nexus 7 tablet and a 3D printed trophy, which was a pink computer.

Hello, computer!

I was pretty happy with the outcome of that! It’s the third hackathon in a year that I’ve participated in and won prizes for.  I really love the energy and creativity that comes out of such an intense situation, and it’s a lot of fun to see what everyone else does in such a short time as well.

Thanks very much to Girl Geek Sydney for a great event!

Save the penguins!

Three days of Haskell

I spent three days up in Brisbane between March 17-19 on a course called “Introduction to Functional Programming using Haskell“.  It was intense!

The course was run by Tony Morris & Mark Hibberd from NICTA, and Katie Miller from Red Hat. It was originally billed as Lambda Ladies, but it turns out there weren’t quite enough ladies to fill the course, so anyone else interested was invited along too.

The course is a bunch of practical exercises. They excluded the standard Haskell library from the project, and we spent time reimplementing first principles, starting with functions involving Lists.  It’s a very hands-on way of learning how Haskell works. The first day covers pattern matching, folding and functional composition, the next couple deals with abstracts on binding & functors, getting towards monads. You spend some time implementing a couple of concrete problems – a string parser, and a problem involving file IO – to see Haskell in practice.

If you’re familiar with functional programming, you’d understand that’s a LOT of material to cover in three days. I would say that the average learning curve went a bit like this:

Screen Shot 2014-03-20 at 4.39.53 PM

However, having a solid understanding of programming concepts (e.g. lambdas) meant that the more complex concepts were a lot easier to pick up (to a degree).  When I was learning functional programming at university, it took me days to reimplement map properly in Haskell!  Earlier this week, it took five minutes.

Getting to your solution for each problem felt a lot like algebraic substitution and refactoring. First, you make it work, and then you refactor constantly to get the most elegant (read: shortest) solution by taking advantage of functional composition.

I was surprised at how much it ended up looking like a normal chained method once you introduce the point notation, aka functional composition, which is something C# looks to have borrowed heavily from when introducing LINQ.

To take the example from the link above,

ghci> map (\xs -> negate (sum (tail xs))) [[1..5],[3..6],[1..7]]  
[-14,-15,-27]

turns into…

ghci> map (negate . sum . tail) [[1..5],[3..6],[1..7]]  
[-14,-15,-27]

I was also surprised just how much of a rush it was to a) have a solution that type checked properly, and b) actually worked.  Haskell felt like an all-or-nothing proposition, where it either compiled and worked, or was otherwise hopelessly broken and gave you a type checked error that was difficult to decipher.  Otherwise, most other programming languages have a more granular feedback loop and are much easier to debug – you can put logging statements in, for example.

The best takeaway of all were these amazing lambda earrings!

Lambda Earrings

Learn You a Haskell is an excellent (and cute, and free) resource for learning Haskell.

Angry Birds in CSS

I recreated an Angry Bird in CSS as an experiment to learn more front end styling.  It has been tested on recent versions of Chrome and Firefox, but cross-browser compatibility wasn’t really the goal – I wanted to try drawing shapes and learn more about CSS transformations.

The code is on github, and you can preview the output here.

Learnings:

  • Any kind of non-standard shapes are difficult! Particularly curves and the border-radius property, which has a slightly confusing syntax.
  • Triangles can’t have borders easily 😦
  • This.

angry-bird-css

Finding a Memory Leak

This post originally appeared on the 7digital developer blog on 15th February 2011. It has been moved here for preservation. 

A few weeks ago, we launched the shiny, redesigned new 7digital.com to a beta audience. Unfortunately, we had a memory leak.

The new site was hosted on the same set of hardware as a few other applications, and it was gradually bringing the other sites down. We put a limit on the amount of virtual memory to shield the other sites from the memory leak,  but performance kept deteriorating. Thankfully, the memory leak was eventually found – here’s a set of steps I followed to find it.

Step 1: Take a memory dump from the live site

Graham, a fellow dev, helpfully pointed out userdump and also gave me a crash course in windbg. Userdump is a command line tool which will take a snapshot of the memory space used by a process. It’s important to note that it freezes your process while it takes the dump, so if you’re doing this in live, your site might stop for a minute or more. You can use the inbuilt iisapp.vbs script on the command line to find out exactly which w3wp process belongs to which Application Pool, and therefore which process to dump. Once you have the process id, take the memory dump and examine it with windbg.  Two useful articles were Getting Started with windbg by JohanS, and Tess Ferrandez’s excellent lab/tutorial on how to navigate through a memory dump.

Step 2: Add some performance counters

Since the live dump didn’t highlight any obvious problems (it only had information for a minute or less of runtime before the app pool recycled), we added some performance counters to see if we could find any trends. You can access perfmon under Start > Administrative Tools > Performance.  MSDN has a good explanation of the different counters and what they mean. Since we were concentrating on memory, I added the following counters and waited for any trends to appear.

.NET CLR Exceptions\#Exceps thrown
.NET CLR Memory\#Bytes in all Heaps
.NET CLR Memory\Gen 2 Heap Size
.NET CLR Memory\Large Object Heap Size

Edit: It’s possible to show counters for a single process, but if you have multiple w3wp processes running on the same box (as we do), it’s difficult to get the counters for the right one.  I was looking at counters for the whole box, which didn’t give me a lot of detail.

Step 3: Do some local profiling 

A live memory dump is all well and good, but it just looks like a screen full of hex 🙂 Local profiling gives you some lovely graphs, stack traces, statistics on running time, etc which you can use to drill down into specific methods or lines of code. If you know what user action is causing the leak (e.g. clicking the “Purchase” button), you can profile that on your local machine and easily identify which method or line of code is causing the problem.

I downloaded ANTS Memory ProfilerDotTrace, and AQTime to try some local profiling. The learning curve on ANTS seemed to be the gentlest, although if you are familiar with any of the tools, it would help greatly. The ANTS inline help files were an excellent refresher course on how .NET garbage collection works.

Step 4: Local profiling with load testing

I spent about a day learning how ANTS works, and doing some common page loads on my local machine. I didn’t see anything unusual. But…. my mistake was to profile without load. It’s very difficult to spot trends unless the changes being made by an action are exaggerated.

ApacheBench was recommended, which is a command line tool for benchmarking performance, but also handy for making lots of concurrent requests. So I lined up multiple requests (and executed them multiple times, all while running ANTS) for common pages in our site, like the search page, artist page and album page. Nothing really turned up until I tried to add products to a basket – and got my breakthrough. Here are the two graphs of memory usage from ANTS. The first shows code behaving itself and being cleaned up by the garbage collector when some normal actions were load tested. The second illustrates our memory leak – the line in green highlights the total memory (managed + unmanaged) being used by our process, the line in red is the amount of managed memory allocated by .NET. Unforunately, this meant that our leak was in unmanaged memory, which ANTS couldn’t help me track down.

Good memory profile:

trace-good

Bad memory profile:

trace-bad

Step 5: Finding unmanaged memory leaks

So, back to the dump taken from the live site with userdump.  James Kovacs has written a helpful article which, among other things, lists reasons why you might be leaking unmanaged memory.  I took another memory dump with more user activity to examine, and had a look at the assemblies in the app domain. Along with the usual suspects:

Assembly: 034a3fd8 [C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\b\970be4ca\1a5ec57f\assembly\dl3\139d25740cf5f9d_99b8cb01\Lucene.Net.dll]
ClassLoader: 034a4048
SecurityDescriptor: 034a3d18
Module Name
04ac1d74 C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\b\970be4ca\1a5ec57f\assembly\dl3\139d25740cf5f9d_99b8cb01\Lucene.Net.dll

....

There were an enormous number of dynamic assemblies being loaded into our app domain:

Assembly: 286ff688 (Dynamic) []
ClassLoader: 286ff6f8
SecurityDescriptor: 286ff600
Module Name
0062429c Dynamic Module
0062461c Dynamic Module

This was the reason that the memory kept increasing. Some piece of code was dynamically loading assemblies, and once there, they never get unloaded. However, it’s very difficult to get any more information about them in windbg for framework version 2.0.  Windbg for v2.0 has less commands than windbg for v1.1 (strange!), and the internet seems to be full of demos using windbg 1.1 showing more information than you get now.   They are a good starting point, but be aware you won’t be able to follow them 100%. Tess Ferrandez again has a great tutorial on chasing down unmanaged memory leaks if dynamic assemblies aren’t your problem.

Step 6: Local debugging

The Modules window in Visual Studio shows you which assemblies have been loaded, and it gives you more information than windbg (the name of the assembly, at least) so it was just a matter of repeating the step that caused the error with the debugger attached, and watching when the number of assemblies changed. The culprit was finally found – it was the Application_Error event handler.  We were mis-using a piece of 3rd party code which was creating dynamic assemblies every time an error occurred. And unfortunately for us, it was a catch-22 because our beta users were finding errors we’d missed in testing, making the leak worse.

Step 7: Verification Profiling

We fixed the offending code, and then re-profiled with ApacheBench to verify that the memory was no longer leaking. The whole process took almost three days to track down and fix, mostly because I hadn’t managed to isolate what action was causing the leak. Once I started load testing, the leak was much easier to identify. I was amazed at the number of tools and apps used when trying to find the leak, mostly to rule things out in a process of elimination. Quite satisfying once found, though 🙂