Linux Australia expense breakdown – a data visualisation in d3.js

After learning a lot of new techniques and approaches (and gotchas) in d3.js in my last data visualisation (Geelong Regional Libraries by branch), I wanted to turn my new-found skills to Linux Australia’s end of year report. This is usually presented in a fairly dry manner at the organisation’s AGM each year, and although we have a Timeline of Events, it was time to add some visual interest to the presentation of data.

Collecting and cleaning the data

The dataset that I chose to explore was the organisation’s non-event expenses – that is, the expenditure of the organisation not utilised on specific events – items like insurance, stationery, subscriptions to online services and so on. These were readily available in the accounting software – Xero, then a small amount of data cleansing yielded a simple CSV file. The original file had a ‘long tail’ distribution – there were many data point that had only a marginal value and didn’t help in explaining the data, so I combined these into an ‘other’ category.

Visualising the data

Using the previous annular (donut chart) visualisation as the base, I set some objectives for the visualisation;

  • The colours chosen had to match those of Linux Australia’s branding
  • The donut chart required lines and labels
  • The donut chart required markers inside each arc
  • The donut chart had to be downloadable in svg format so that it could be copied and pasted into Inkscape (which has svg as its standard save format)
Colour choice

There was much prototyping involved with colour selection. The first palette selected used shading of a base colour (#ff0000 – red), but the individual arcs were difficult to distinguish. A second attempt added many (16) colours into the palette, but they didn’t work as a colour set. I settled on a combination of three colours (red, yellow, dark grey) and shades of these, with the shading becoming less saturated the smaller the values of the arc.

For anyone interested, the color range was defined as a d3.scaleOrdinal object as below.

var color = d3.scaleOrdinal()
Lines and markers

I hadn’t used lines (polylines) and markers in d3.js before and this visualisation really needed them – because the data series labels were too wordy to easily fit on the donut chart itself. There were some examples that were particularly useful and relevant in figuring this out:

The key learning from this exercise about svg polylines is that the polyline is essentially a series of x,y Cartesian co-ordinates – the tricky part is actually using the right circular trigonometry to calculate the correct co-ordinates. This took me right back to sin and cos basics, and I found it helpful to sketch out a diagram of where I wanted the polyline points to be before actually trying to code them in d3.js.

A gotcha that tripped me up for about half an hour here was that I hadn’t correctly associated the markers with the polylines – because the markers only had a class attribute, but not an id attribute. Whenever I use markers on polylines from now on, I’ll be specifying both class and id attributes.

    .attr('class', 'marker')
    .attr('id', 'marker')

I initially experimented with a polyline that was drawn not just from the centroid of the arc for each data point out past the outerArc, but one that also went horizontally across to the left / right margin of the svg. While I was able to achieve this eventually, I couldn’t get the horizontal spacing looking good because there were so many data points on the donut chart – this would work well with a donut chart with far fewer data points.

Markers were also generally straightforward to get right, after reading up a bit on their attributes. Again, one of the gotchas I encountered here was ensuring that the markerWidth and markerHeight attributes were large enough to contain the entire marker – for a while, the markers were getting truncated, and I couldn’t figure out why.

    .attr('markerWidth', '12')
    .attr('markerHeight', '12')

Once the positioning for the polylines was solved, then positioning the labels was relatively straightforward, as many of the same trigonometric functions were used.

The challenge I encountered here was that d3.js by default has no text wrapping solution built in to the package, although alternative approaches such as the below had been documented elsewhere. From what I could figure out, d3.js does not support the tspan svg element. That is, I can’t just append tspan elements to text elements to achieve word-wrapping.

  • Example block from Mike Bostock – ‘Wrapping long labels‘: in which Mike Bostock (the creator and maintainer of d3.js) has written a custom function for wrapping text
  • d3-textwrap function from Vijith Assar: which provides a function that can be included into d3.js projects, like a plugin

In the end I ended up just abbreviating a couple of the data point labels rather than sink several hours into text wrapping approaches. It seems odd that svg provides such poor native support for text wrapping, but considering the myriad ways that text – particularly foreign language text – can be wrapped – it’s incredibly complex.

Downloadable svg

The next challenge with this visualisation was to allow the rendered svg to be downloaded – as the donut chart was intended to be part of a larger infographic. Again, I was surprised that a download function wasn’t part of the core d3.js library, but again a number of third party functions and approaches were available:

  • Example block from Miłosz Kłosowicz – ‘Download svg generated from d3‘: in this example, the svg node is converted to base-64 encoded ASCII then downloaded.
  • d3-save-svg plugin: this plugin provides a number of methods to download the svg, and convert it to a raster file format (such as PNG). This is a fork of the svg-crowbar tool, written for similar purposes by the New York Times data journalism team.

I chose to use the d3-save-svg plugin simply because of the abstraction it provided. However, I came up against a number of hurdles. When I first used the example code to try and create a download button, the download function was not being triggered. To work around this, I referenced the svg object by id:'#BaseSvg').node(), config);

The other hiccup with this approach was that CSS rules were not preserved in the svg download if the CSS selector had scope outside the svg object itself. For instance, I had applied basic font styling rules to the entire body selector, but in order for font styling to be preserved in the download, I had to re-specify the font styling at the svg selector level in the CSS file. This was a little frustrating, but the ease of using a function to do the download compensated for this.

Linux Australia expenses 2015-2016 infographic
Linux Australia expenses 2015-2016 infographic



Geelong Libraries by branch – a data visualisation

At a glance


Geelong Regional Libraries Corporation (GRLC) came on board GovHack this year, and as well as being a sponsor, opened a number of datasets for the hackathon. Being the lead organiser for GovHack, I didn’t actually get a chance to explore the open data during the competition. However, while I was going for a walk one day – as it always does – I had an idea around how the GRLC datasets could be visualised. I’d previously done some work visualising data using a d3.chord layout, and while this data wasn’t suitable for that type of layout, the concept of using annulars – donut charts – to represent and compare the datasets seemed appealing. There was only one problem – I’d never tackled anything like this before.

Challenge: accepted

Understanding what problem I was trying to solve

Of course the first question here was what problem I was trying to solve (thanks Radia Perlman for teaching me to always solve the right problem – I’ll never forget your LCA2013 keynote). Was this an exploratory data visualisation or an explanatory one? This led to formulating a problem statement:

How do the different Libraries in the Geelong region compare to each other in terms of holdings, membership, visits and other attributes?

This clearly established some parameters for the visualisation: it was going to be exploratory, and comparative. It would need to have a way to identify each Library – likely via colour code, and have appropriate use of shapes and axes to allow for comparison. While I was tempted to use a stacked bar chart, I really wanted to dig deeper into d3.js and extend my skills in this Javascript library – so resolved to visualise the data using circular rings.

Colour selection

The first challenge was to ensure that the colours of the visualisation were both appealling and appropriate. While this seems an unlikely starting place for a visualisation – with most practitioners opting to get the basic shape right first, for this project getting the colours right felt like the best starting point. For inspiration, I turned to the Geelong Regional Library Corporation’s Annual Report, and used the ColorZilla extension to eyedropper the key brand colours used in the report. However, this only provided about 7 colours, and I needed 17 in order to map each of the different libraries. In order to identify ‘in between’ colours, I used this nifty tool from Meyerweb, which is super-handy for calculating gradients. The colours were then used as an array for a d3.scaleOrdinal object, and mapped to each library.

var color = d3.scaleOrdinal()
        "Geelong West",
        "Waurn Ponds",
        "Ocean Grove",
        "Mobile Libraries",
        "Barwon Heads",
        "Western Heights College"

Annular representation of data using d3.pie

Annular representation of data - step 1
First step in annular representation

The first attempt at representing the data was … a first attempt. While I was able to create an annular representation (donut chart) from the data using d3.pie and d3.arc, the labels of the Libraries themselves weren’t positioned well. The best tutorial I’ve read on this topic by far is from data visualisation superstar, Nadieh Bremer, over on her blog, Visual Cinnamon. I decided to leave labels on the arcs as a challenge for later in the process, and instead focus on the next part of visualisation – multiple annulars in one visualisation.

Multiple annulars in one visualisation

Annular representation of data - step 2

The second challenge was to place multiple annulars – one for each dataset – within the same svg. Normally with d3.js, you create an svg object which is appended to the body element of the html document. So what happens when you place two d3.pie objects on the svg object? You guessed it! Fail! The two annulars were positioned one under the other, rather than over the top of each other. I was stuck on this problem for a while, until I realised that the solution was to place different annulars on different layers within the svg object. This also gave more control over the visualisation. However, SVG doesn’t have layers as part of its definition – objects in SVG are drawn one on top of the other, with the last drawn object ‘on top’ – sometimes called stacking . But by creating groups within the BaseSvg like the below, for shapes to be drawn within, I was able to approximate layering.

var BaseSvg ="body").append("svg")
    .attr("width", width)
    .attr("height", height)
    .attr("transform", "translate(" + (width / 2 - annularXOffset) + "," + (height / 2 - annularYOffset) + ")");

  Layers for each annular

var CollectionLayer = BaseSvg.append('g');
var LoansLayer      = BaseSvg.append('g');
var MembersLayer    = BaseSvg.append('g');
var EventsLayer     = BaseSvg.append('g');
var VisitsLayer     = BaseSvg.append('g');
var WirelessLayer   = BaseSvg.append('g');
var InternetLayer   = BaseSvg.append('g');
var InternetLayer   = BaseSvg.append('g');
var TitleLayer      = BaseSvg.append('g');
var LegendLayer     = BaseSvg.append('g');

At this point I found Scott Murray’s SVG Primer very good reading.

Annular representation of data - step 3
The annulars are now positioned concentrically

I was a step closer!

Adding in parameters for spacing and width of the annulars

Once I’d figured out how to get annulars rendering on top of each other, it was time to experiment with the size and shape of the rings. In order to do this, I tried to define a general approach to the shapes that were being built. That general approach looked a little like this (well, it was a lot more scribble).

General approach to calculating size and proportion of multiple annulars
General approach to calculating size and proportion of multiple annulars

By being able to define a general approach, I was able to declare variables for elements such as the annular width and annular spacing, which became incredibly useful later as more annulars were added – the positioning and shape of the arcs for each annular could be calculated mathematically using these variables (see the source code for how this was done).

var annularXOffset  = 100; // how much to shift the annulars horizontally from centre
var annularYOffset  = 0; // how much to shift the annulars vertically from centre
var annularSpacing  = 26; // space between different annulars
var annularWidth    = 22; // width of each annular
var annularMargin   = 70; // margin between annulars and canvas
var padAngle        = 0.027; // amount that each segment of an annular is padded
var cornerRadius    = 4; // amount that the sectors are rounded

This allowed me to ‘play around’ with the size and shape of the annulars until I got something that was ‘about right’.

Annular representation of data - step 4
Annular spacing overlapped


Annular representation of data - step 3
Annular widths and spacing looking better

At this stage I also experimented with the padAngle of the annular arcs (also defined as a variable for easy tweaking), and with the stroke weight and colour, which was defined in CSS. Again, I took inspiration from GRLC’s corporate branding.

Placing dataset labels on the arcs

Now that I had the basic shape of the visualisation, the next challenge was to add dataset labels. This was again a major blocking point, and it took me a lot of tinkering to finally realise that the dataset labels would need to be svg text, sitting on paths created from separate arcs than that rendered by the d3.pie function. Without separate paths, the text wrapped around each arc segment in the annular – shown below. So, for each dataset, I created a new arc and path for the dataset label to be rendered on, and then appended a text element to the path. I’d never used this technique in svg before and it was an interesting learning experience.

Annular representation of data - step 6
Text on arcs is a dark art

Having sketched out a general approach again helped here, as with the addition of a few extra variables I was able to easily create new arcs for the dataset text to sit on. A few more variables to control the positioning of the dataset labels, and voila!

Annular representation of data - step 7
Dataset labels looking good

Adding a legend

The next challenge was to add a legend to the diagram, mostly because I’d decided that the infographic would be too busy with Library labels on each data point. This again took a bit of working through, because while d3.js has a d3.legend function for constructing legends, it’s only intended for data plotted horizontally or vertically, not 7 data sets plotted on consecutive annulars. This tutorial from Zero Viscosity and this one from Competa helped me understand that a legend is really just a group of related rectangles.

var legend = LegendLayer.selectAll("g")
    .attr('x', legendPlacementX)
    .attr('y', legendPlacementY)
    .attr('class', 'legend')
    .attr('transform', function(d, i) {
        return 'translate(' + (legendPlacementX + legendWidth) + ',' + (legendPlacementY + (i * legendHeight)) + ')';

    .attr('width', legendWidth)
    .attr('height', legendHeight)
    .attr('class', 'legendRect')
    .style('fill', color)
    .style('stroke', legendStrokeColor);

    .attr('x', legendWidth + legendXSpacing)
    .attr('y', legendHeight - legendYSpacing)
    .attr('class', 'legendText')
    .text(function(d) { return d; });
Annular representation of data - step 8
The legend isn’t positioned correctly

Again, the positioning took a little work, but eventually I got the legend positioned well.

Annular representation of data - step 9
The legend is finally positioned well

Responsive design and data visualisation with d3.js

One of the other key challenges with this project was attempting to have a reasonably responsive design. This appears to be incredibly hard to do with d3.js. I experimented with a number of settings to aim for a more responsive layout. Originally, the narrative text was positioned in a sidebar to the right of the image, but at different screen resolutions the CSS float rendered awkwardly, so I decided to use a one column layout instead, and this worked much better at different resolutions.

Next, I experimented with using the Javascript values innerWidth and innerHeight to help set the width and height of the svg element, and also dynamically positioned the legend. This gave a much better, while not perfect, rendering at different resolutions. It’s still a little hinkey, particularly at smaller resolutions, but is still an incremental improvement.

Thinking this through more deeply, although SVG and d3.js in general are vector-based, and therefore lend themselves well to responsive design to begin with, there are a number of elements which don’t scale well at different resolutions – such as text sizes. Unless all these elements were to be made dynamic, and likely conditional on viewport and orientation, then it’s going to be challenging indeed to produce a visualisation that’s fully responsive.

Adding tooltips

While I was reasonably pleased with the progress on the project, I felt that the visualisation needed an interactive element. I considered using some sort of arc tween to show movement between data sets, but given that historical data (say for previous years) wasn’t available, this didn’t seem to be an appropriate choice.

After getting very frustrated with the lack of built in tooltips in d3.js itself, I happened upon the d3.tip library. This was a beautifully written addition to d3.js, and although its original intent was for horizontal and vertical chart elements, it worked passably on annular segments.

Annular representation of data - step 10
Adding tooltips

Drawbacks in using d3.tip for circular imagery

One downside I found in using this library was the way in which it considers the positioning of the tooltip – this has some unpredictable, and visually unpleasant results when data is being represented in circular format. In particular, the way that d3.tip calculates the ‘centre’ of the object that it is applied to does not translate well to arc and circular shapes. For instance, look at how the d3.tip is applied to arc segments that are large and have only small amounts of curvature – such as the Geelong arc segment for ‘Members’. I’ve had a bit of a think about how to solve this problem, and the solution involves a more optimal approach to calculating the the ‘centre’ point of an arc segment.

This is beyond what I’m capable of with d3.js, but wanted to call this out as a future enhancement and exploration.

Adding percentage values to the tooltip with d3.nest and d3.sum

The next key challenge was to include the percentage figure, as well as Library and data value in the d3.tip. This was significantly more challenging than I had anticipated, and meant learning up on d3.nest and d3.sum functions. These tutorials from Phoebe Bright, and LearnJS were helpful, and Zan Armstrong’s tutorial on d3.format helped me get the precision formatting correct. After much experimentation, it turned out that summing the values of each dataset (in order to calculate percentage) was but a mere three lines of Javascript:

var CollectionItemCount = d3.nest()
    .rollup(function (v) { return d3.sum(v, function(d) { return d.Items})})

Concluding remarks

Data visualisation is much more challenging than I thought it would be, and the learning curve for d3.js is steep – but it’s worth it. This exercise drew on a range of technical skills, including circular trigonometry, HTML and knowledge of the DOM, CSS and Javascript, and above all the ability to ‘break a problem down’ and look at it from multiple angles (no pun intended).














GovHack Geelong 2016

After the great success of GovHack Geelong 2015, exemplified by the recent production release of Geelong Free Wifi Data by Parham Hausler and Daniel McCarthy, I was thrilled to be given the opportunity to run it all over again this year.

Event planning and promotion

Planning for the event started around May, which was a little later than the 2015, however this year’s event had been scheduled later in July so that it didn’t fall over the University and School holidays. This year’s organisation was much easier given that we knew what to expect, and had learned a number of lessons from our inaugural year, including the importance of strong, easy wireless, clear instructions on what had to be done, and ensuring our GovHackers were well fed. We decided early on to split up into teams to spread out the organisational load of the event – with a national liaison / admin type role, a marketing and communications role, a sponsorship role and a logistics / catering / venue liaison role. This worked out quite well, and it’s a strategy that we’ll likely employ in future years.

Our promotional strategy this year was a little different from last year, where significant efforts were made to engage with the secondary school community around Geelong. These had only limited success, and so this year we turned our attention to attempting to engage at the business and industry level in Geelong, in particular with development and software companies. This was an excellent idea by Todd Hubers, our communications and marketing lead, and we were able to attract at least one corporate-based team. By using registrations from last year, we were also able to engage developers who’d participated in 2015. We also engaged media early on, with a write up in the Geelong Advertiser. This helped to spread the word early on. Of course, we also put significant effort into social media, with our Twitter and Facebook presence growing significantly in the lead up to this year’s event – a mechanism which can then be leveraged further into future year’s events. We supplemented all these efforts with a poster campaign that was delivered through key developer and student channels (GovHack Geelong 2016 Learn New Skills poster, 2MB PDF).

Open data providers

City of Greater Geelong Open Data Portal

In terms of data providers, Andrew Downie from City of Greater Geelong did a great job engaging with multiple institutions around Geelong, and we were delighted to have Geelong Regional Libraries Corporation come on board this year, releasing data sets related to their collections. The open data love continued, with the City releasing their open data portal in the week before GovHack!

There are however some Geelong institutions and open data sources I’d really like to include in future year events so that we continue to build and grow on the momentum that’s already being generated. In particular, it would be great to have organisations like Barwon Water, Department of Education (in particular I’d really like Year 12 completion data by location or school), and Barwon Health more actively engaged. Of course, both the engagement component and the ‘doing’ component – gathering, cleansing, governing and releasing open data – has a resource overhead. While movements like GovHack help to prove the value of open data, it can take some time for large organisations to come around to the viewpoint that their data is more valuable when it’s shared openly, and be willing to invest time and resources in doing so.

The event itself – hackers, hacking and hacks

The event itself was a blast! We had great representation from our sponsors – City of Greater Geelong, Deakin University, Geelong Regional Library Corporation and Aconex – and this allowed us to provide strong catering (thanks to the folks at Waterfront Kitchen). Well fed hackers are happy hackers. Some of the glitches we experienced last year with internet connectivity were ironed out before this year, and we only had one instance of someone being unable to connect to the wireless network. The network survived a massive onslaught, with video, massive datasets and large imagery all testing its mettle!

The biggest technical issue of the event turned out to be the GovHack Hackerspace itself. With the flood of submissions just before ‘keyboards down’ at 5pm on the Sunday, the site ground to a halt – causing consternation amongst the hackers that they’re hard earned projects may not be recognised. The Hackerspace was kept open for an additional few hours to allow the teams to upload their finished products.

The atmosphere of the day was fantastic. One of my favourite moments was Baby Olive helping out her Dad – Ian Priddle of Codeacious.

Inclusion was the theme of the day, with some great diversity of participants and skillsets in all the teams.

Above all, it was great to see creativity unleashed  in all its forms, including mood, temperature and open hardware sensing!

Popular tools and techniques

IBM Watson Twitter Analysis for @KathyReid
IBM Watson Twitter Analysis for @KathyReid

Again, visualisation – both in 2D, and increasingly in 3D with tools like Unity – was a key theme of the event, and one of my favourite tools was the super-easy-to-use IBM Watson Twitter visualisation. Javascript and visualization libraries such as d3.js again featured heavily this year. Javascript – and jQuery – is now considered an essential for all front end web developers – HTML and CSS (and variants such as SASS and, decreasingly, LESS) are no longer sufficient. Mapping tools – Google Maps, MapBox, CartoDB and so on, are also gaining more prominence as ‘must have’ skills, particularly for visualization of geospatial data.

Without a doubt, having good Git skills for any coder or developer is becoming a prerequisite for all hackathons now, as is a GitHub account. There was mention of using Bitbucket occasionally, but Git is still by far the most popular choice for distributed version control for developers.

Slack really took off this year, with the GovHackHQ Slack having nearly 900 simultaneous users during the event. It’s fair to say that Slack has hit widespread adoption – and that’s largely due to both the large range of integrations it has available, as well as the excellent user experience it provides – irrespective of platform or operating system. It’s one of the few tools I can use that has both a native client on the operating systems I use – mainly Android and Linux – as well as an excellent and feature-equivalent web interface.

Hardware wise, I observed lots of wireless mice, and lots of ‘second screens’ – one screen just doesn’t seem to be enough anymore – perhaps a symbol of our ever-more-multitasked world?

UX skills and techniques also seem to be gaining traction – I saw user stories, wireframing, storyboarding and even some basic persona mapping going on.

Projects delivered by Geelong teams at GovHack Geelong 2016

Team name
Link to project
Link to project video
Pet-tential Claws and Effect Homepage Video
NewsPulse Settlers of Cremorne Homepage Video
5-D City Explorer IDeEA Lab Homepage Video
Video (alt)
SmartPath SmartPath Homepage Video
GreenWalking GFox