Linux Australia expense breakdown – a data visualisation in d3.js

After learning a lot of new techniques and approaches (and gotchas) in d3.js in my last data visualisation (Geelong Regional Libraries by branch), I wanted to turn my new-found skills to Linux Australia’s end of year report. This is usually presented in a fairly dry manner at the organisation’s AGM each year, and although we have a Timeline of Events, it was time to add some visual interest to the presentation of data.

Collecting and cleaning the data

The dataset that I chose to explore was the organisation’s non-event expenses – that is, the expenditure of the organisation not utilised on specific events – items like insurance, stationery, subscriptions to online services and so on. These were readily available in the accounting software – Xero, then a small amount of data cleansing yielded a simple CSV file. The original file had a ‘long tail’ distribution – there were many data point that had only a marginal value and didn’t help in explaining the data, so I combined these into an ‘other’ category.

Visualising the data

Using the previous annular (donut chart) visualisation as the base, I set some objectives for the visualisation;

  • The colours chosen had to match those of Linux Australia’s branding
  • The donut chart required lines and labels
  • The donut chart required markers inside each arc
  • The donut chart had to be downloadable in svg format so that it could be copied and pasted into Inkscape (which has svg as its standard save format)
Colour choice

There was much prototyping involved with colour selection. The first palette selected used shading of a base colour (#ff0000 – red), but the individual arcs were difficult to distinguish. A second attempt added many (16) colours into the palette, but they didn’t work as a colour set. I settled on a combination of three colours (red, yellow, dark grey) and shades of these, with the shading becoming less saturated the smaller the values of the arc.

For anyone interested, the color range was defined as a d3.scaleOrdinal object as below.

var color = d3.scaleOrdinal()
    .range([
      '#ffc100',
      '#ff0000',
      '#393939',
      '#ffcd33',
      '#ff3333',
      '#616161',
      '#ffda66',
      '#ff6666',
      '#888888',
      '#ffe699',
      '#ff9999',
      '#b0b0b0',
      '#fff3cc',
      '#ffcccc',
      '#fff'
    ])
Lines and markers

I hadn’t used lines (polylines) and markers in d3.js before and this visualisation really needed them – because the data series labels were too wordy to easily fit on the donut chart itself. There were some examples that were particularly useful and relevant in figuring this out:

The key learning from this exercise about svg polylines is that the polyline is essentially a series of x,y Cartesian co-ordinates – the tricky part is actually using the right circular trigonometry to calculate the correct co-ordinates. This took me right back to sin and cos basics, and I found it helpful to sketch out a diagram of where I wanted the polyline points to be before actually trying to code them in d3.js.

A gotcha that tripped me up for about half an hour here was that I hadn’t correctly associated the markers with the polylines – because the markers only had a class attribute, but not an id attribute. Whenever I use markers on polylines from now on, I’ll be specifying both class and id attributes.

    .attr('class', 'marker')
    .attr('id', 'marker')

I initially experimented with a polyline that was drawn not just from the centroid of the arc for each data point out past the outerArc, but one that also went horizontally across to the left / right margin of the svg. While I was able to achieve this eventually, I couldn’t get the horizontal spacing looking good because there were so many data points on the donut chart – this would work well with a donut chart with far fewer data points.

Markers were also generally straightforward to get right, after reading up a bit on their attributes. Again, one of the gotchas I encountered here was ensuring that the markerWidth and markerHeight attributes were large enough to contain the entire marker – for a while, the markers were getting truncated, and I couldn’t figure out why.

    .attr('markerWidth', '12')
    .attr('markerHeight', '12')
Labels

Once the positioning for the polylines was solved, then positioning the labels was relatively straightforward, as many of the same trigonometric functions were used.

The challenge I encountered here was that d3.js by default has no text wrapping solution built in to the package, although alternative approaches such as the below had been documented elsewhere. From what I could figure out, d3.js does not support the tspan svg element. That is, I can’t just append tspan elements to text elements to achieve word-wrapping.

  • Example block from Mike Bostock – ‘Wrapping long labels‘: in which Mike Bostock (the creator and maintainer of d3.js) has written a custom function for wrapping text
  • d3-textwrap function from Vijith Assar: which provides a function that can be included into d3.js projects, like a plugin

In the end I ended up just abbreviating a couple of the data point labels rather than sink several hours into text wrapping approaches. It seems odd that svg provides such poor native support for text wrapping, but considering the myriad ways that text – particularly foreign language text – can be wrapped – it’s incredibly complex.

Downloadable svg

The next challenge with this visualisation was to allow the rendered svg to be downloaded – as the donut chart was intended to be part of a larger infographic. Again, I was surprised that a download function wasn’t part of the core d3.js library, but again a number of third party functions and approaches were available:

  • Example block from Miłosz Kłosowicz – ‘Download svg generated from d3‘: in this example, the svg node is converted to base-64 encoded ASCII then downloaded.
  • d3-save-svg plugin: this plugin provides a number of methods to download the svg, and convert it to a raster file format (such as PNG). This is a fork of the svg-crowbar tool, written for similar purposes by the New York Times data journalism team.

I chose to use the d3-save-svg plugin simply because of the abstraction it provided. However, I came up against a number of hurdles. When I first used the example code to try and create a download button, the download function was not being triggered. To work around this, I referenced the svg object by id:

d3_save_svg.save(d3.select('#BaseSvg').node(), config);

The other hiccup with this approach was that CSS rules were not preserved in the svg download if the CSS selector had scope outside the svg object itself. For instance, I had applied basic font styling rules to the entire body selector, but in order for font styling to be preserved in the download, I had to re-specify the font styling at the svg selector level in the CSS file. This was a little frustrating, but the ease of using a function to do the download compensated for this.

Linux Australia expenses 2015-2016 infographic
Linux Australia expenses 2015-2016 infographic

 

Save

My talk picks for #lca2017 – linux.conf.au

linux.conf.au 2017 heads to Hobart, where it was last held in 2009. I absolutely love Tasmania – especially its food and scenery – and am looking forward to heading over.

So, here’s my talk picks  – keeping in mind that I’m more devops than kernel hacker – so YMMV.

Executive Summary

  • Monday 16th – Networking breakfast, possibly some WootConf sessions and / or Open Knowledge Miniconf sessions.
  • Tuesday 17th – Law and policy Miniconf, Community Leadership Summit
  • Wednesday 18th – Future Privacy by Michael Cordover, In Case of Emergency – Break Glass by David Bell, Handle Conflict Like a Boss by Deb Nicholson, Internet of Terrible Things by Matthew Garrett.
  • Thursday 19th – Network Protocols for IoT Devices by Jon Oxer, Compliance with the GPL by Karen Sandler and Bradley M. Kuhn, Open source and innovation by Allison Randall and Surviving the next 30 years of open source by Karen Sandler.
  • Friday 20th – Publicly releasing government models by Audrey Lobo-Pulo

Monday 16th January

I’m keeping Monday open as much as possible, in case there are last minute things we need to do for the Linux Australia AGM, but will definitely start the day with the Opening Reception and Networking Breakfast. A networking breakfast is an unusual choice of format for the Professional Delegates Networking Session (PDNS), but I can see some benefits to it such as being able to initiate key relationships and talking points early in the conference. The test of course will be attendance, and availability of tasty coffee 😀

If I get a chance I’ll see some of the WootConf sessions and/or Open Knowledge Miniconf sessions (the Open Knowledge Miniconf schedule hadn’t been posted at the time of writing).

Tuesday 17th January

The highlight for me in Tuesday’s schedule is the excellent Pia Waugh talking ‘Choose your own Adventure‘. This talk is based on Waugh’s upcoming book, and the philosophical foundations, macroeconomic implications and strategic global trends cover a lot of ground – ground that needs to be covered.

As of the time of writing, the schedule for the Law and Policy Miniconf hadn’t been released, but this area is of interest to me – as is the Community Leadership Summit. I’m interested to see how the Community Leadership Summit is structured this year; in 2015 it had a very unconference feel. This was appropriate for the session at the time, but IMHO what the Community Leadership Summit needs to move towards are concrete deliverables – such as say a whitepaper advising Linux Australia Council on where efforts should be targeted in the year ahead. In this way, the Summit would be able to have a tangible, clear impact.

Wednesday 18th January

I’ll probably head to Dan Callahan’s keynote on ‘Designing for failure’. It’s great to see Jonathan Corbet’s Kernel Report get top billing, but my choice here is between the ever-excellent Michael Codover’s ‘Future Privacy‘ and Cedric Bail’s coverage of ‘Enlightenment Foundation Libraries for Wearables‘. Next up, I’ll be catching David Bell (Director, LCA2016) talking ‘In case of emergency – break glass – BCP, DRP and Digital Legacy‘. There’s nothing compelling for me in the after lunch session, except perhaps Josh Simmon’s ‘Building communities beyond the black stump‘, but this one’s probably too entry-level for me, so it might be a case of long lunch / hallway track.

After afternoon tea, I’ll likely head to Deb Nicholson’s ‘Handle conflict like a boss‘, and then Matthew Garett‘s ‘Internet of terrible things‘ – because Matthew Garrett 😀

Then, it will be time for the Penguin Dinner!

Thursday 19th January

First up, I’m really looking forward to Nadia Eghbal’s ‘People before code‘ keynote about the sustainability of open source projects.

Jon Oxer’s ‘Network Protocol Analysis for IoT Devices‘ is really appealing, particularly given the rise and rise of IoT equipment, and the lack of standards in this space.

It might seem like a dry topic for some, but Bradley M. Kuhn and Karen Sandler from the Software Freedom Conservancy will be able to breathe life into ‘Compliance with the GPL‘ if anyone can; they also bring with them considerable credibility on the topic.

After lunch, I’ll be catching Allison Randall talking on ‘Open source and innovation‘ and then Karen Sandler on ‘Surviving the next 30 years of open source‘. These talks are related, and speak to the narrative of how open source is evolving into different facets of our lives – how does open source live on when we do not?

Friday 20th January

After the keynote, I’ll be catching Audrey Lobo-Pulo on ‘Publicly releasing government models‘ – this ties in with a lot of the work I’ve been doing in open data, and government open data in particular. After lunch, I’m looking forward to James Scheibner’s ‘Guide to FOSS licenses‘, and to finish off the conference on a high note, the ever-erudite and visionary George Fong on ‘Defending the security and integrity of the ‘Net’. Internet Australia, of which Fong is the chair, has many values in common with Linux Australia, and I foresee the two organisations working more closely together in the future.

What are your picks for #lca2017?

Geelong Libraries by branch – a data visualisation

At a glance

Introduction

Geelong Regional Libraries Corporation (GRLC) came on board GovHack this year, and as well as being a sponsor, opened a number of datasets for the hackathon. Being the lead organiser for GovHack, I didn’t actually get a chance to explore the open data during the competition. However, while I was going for a walk one day – as it always does – I had an idea around how the GRLC datasets could be visualised. I’d previously done some work visualising data using a d3.chord layout, and while this data wasn’t suitable for that type of layout, the concept of using annulars – donut charts – to represent and compare the datasets seemed appealing. There was only one problem – I’d never tackled anything like this before.

Challenge: accepted

Understanding what problem I was trying to solve

Of course the first question here was what problem I was trying to solve (thanks Radia Perlman for teaching me to always solve the right problem – I’ll never forget your LCA2013 keynote). Was this an exploratory data visualisation or an explanatory one? This led to formulating a problem statement:

How do the different Libraries in the Geelong region compare to each other in terms of holdings, membership, visits and other attributes?

This clearly established some parameters for the visualisation: it was going to be exploratory, and comparative. It would need to have a way to identify each Library – likely via colour code, and have appropriate use of shapes and axes to allow for comparison. While I was tempted to use a stacked bar chart, I really wanted to dig deeper into d3.js and extend my skills in this Javascript library – so resolved to visualise the data using circular rings.

Colour selection

The first challenge was to ensure that the colours of the visualisation were both appealling and appropriate. While this seems an unlikely starting place for a visualisation – with most practitioners opting to get the basic shape right first, for this project getting the colours right felt like the best starting point. For inspiration, I turned to the Geelong Regional Library Corporation’s Annual Report, and used the ColorZilla extension to eyedropper the key brand colours used in the report. However, this only provided about 7 colours, and I needed 17 in order to map each of the different libraries. In order to identify ‘in between’ colours, I used this nifty tool from Meyerweb, which is super-handy for calculating gradients. The colours were then used as an array for a d3.scaleOrdinal object, and mapped to each library.

var color = d3.scaleOrdinal()
    .range([
        "#59d134",
        "#4CCA62",
        "#40C28F",
        "#33bbbd",
        "#36AFC7",
        "#3b98da",
        "#427DC9",
        "#5148a6",
        "#8647A8",
        "#BC47A9",
        "#f146ab",
        "#F03E85",
        "#F0355E",
        "#f0431e",
        "#F8880F",
        "#FFCC00",
        "#ACCF1A"
      ])
    .domain([
        "Geelong",
        "Belmont",
        "Corio",
        "Geelong West",
        "Waurn Ponds",
        "Ocean Grove",
        "Newcomb",
        "Torquay",
        "Drysdale",
        "Lara",
        "Bannockburn",
        "Queenscliff",
        "Chilwell",
        "Highton",
        "Mobile Libraries",
        "Barwon Heads",
        "Western Heights College"
    ]);

Annular representation of data using d3.pie

Annular representation of data - step 1
First step in annular representation

The first attempt at representing the data was … a first attempt. While I was able to create an annular representation (donut chart) from the data using d3.pie and d3.arc, the labels of the Libraries themselves weren’t positioned well. The best tutorial I’ve read on this topic by far is from data visualisation superstar, Nadieh Bremer, over on her blog, Visual Cinnamon. I decided to leave labels on the arcs as a challenge for later in the process, and instead focus on the next part of visualisation – multiple annulars in one visualisation.

Multiple annulars in one visualisation

Annular representation of data - step 2
Uh-oh!

The second challenge was to place multiple annulars – one for each dataset – within the same svg. Normally with d3.js, you create an svg object which is appended to the body element of the html document. So what happens when you place two d3.pie objects on the svg object? You guessed it! Fail! The two annulars were positioned one under the other, rather than over the top of each other. I was stuck on this problem for a while, until I realised that the solution was to place different annulars on different layers within the svg object. This also gave more control over the visualisation. However, SVG doesn’t have layers as part of its definition – objects in SVG are drawn one on top of the other, with the last drawn object ‘on top’ – sometimes called stacking . But by creating groups within the BaseSvg like the below, for shapes to be drawn within, I was able to approximate layering.

var BaseSvg = d3.select("body").append("svg")
    .attr("width", width)
    .attr("height", height)
    .append("g")
    .attr("transform", "translate(" + (width / 2 - annularXOffset) + "," + (height / 2 - annularYOffset) + ")");

/*
  Layers for each annular
*/

var CollectionLayer = BaseSvg.append('g');
var LoansLayer      = BaseSvg.append('g');
var MembersLayer    = BaseSvg.append('g');
var EventsLayer     = BaseSvg.append('g');
var VisitsLayer     = BaseSvg.append('g');
var WirelessLayer   = BaseSvg.append('g');
var InternetLayer   = BaseSvg.append('g');
var InternetLayer   = BaseSvg.append('g');
var TitleLayer      = BaseSvg.append('g');
var LegendLayer     = BaseSvg.append('g');

At this point I found Scott Murray’s SVG Primer very good reading.

Annular representation of data - step 3
The annulars are now positioned concentrically

I was a step closer!

Adding in parameters for spacing and width of the annulars

Once I’d figured out how to get annulars rendering on top of each other, it was time to experiment with the size and shape of the rings. In order to do this, I tried to define a general approach to the shapes that were being built. That general approach looked a little like this (well, it was a lot more scribble).

General approach to calculating size and proportion of multiple annulars
General approach to calculating size and proportion of multiple annulars

By being able to define a general approach, I was able to declare variables for elements such as the annular width and annular spacing, which became incredibly useful later as more annulars were added – the positioning and shape of the arcs for each annular could be calculated mathematically using these variables (see the source code for how this was done).

var annularXOffset  = 100; // how much to shift the annulars horizontally from centre
var annularYOffset  = 0; // how much to shift the annulars vertically from centre
var annularSpacing  = 26; // space between different annulars
var annularWidth    = 22; // width of each annular
var annularMargin   = 70; // margin between annulars and canvas
var padAngle        = 0.027; // amount that each segment of an annular is padded
var cornerRadius    = 4; // amount that the sectors are rounded

This allowed me to ‘play around’ with the size and shape of the annulars until I got something that was ‘about right’.

Annular representation of data - step 4
Annular spacing overlapped

 

Annular representation of data - step 3
Annular widths and spacing looking better

At this stage I also experimented with the padAngle of the annular arcs (also defined as a variable for easy tweaking), and with the stroke weight and colour, which was defined in CSS. Again, I took inspiration from GRLC’s corporate branding.

Placing dataset labels on the arcs

Now that I had the basic shape of the visualisation, the next challenge was to add dataset labels. This was again a major blocking point, and it took me a lot of tinkering to finally realise that the dataset labels would need to be svg text, sitting on paths created from separate arcs than that rendered by the d3.pie function. Without separate paths, the text wrapped around each arc segment in the annular – shown below. So, for each dataset, I created a new arc and path for the dataset label to be rendered on, and then appended a text element to the path. I’d never used this technique in svg before and it was an interesting learning experience.

Annular representation of data - step 6
Text on arcs is a dark art

Having sketched out a general approach again helped here, as with the addition of a few extra variables I was able to easily create new arcs for the dataset text to sit on. A few more variables to control the positioning of the dataset labels, and voila!

Annular representation of data - step 7
Dataset labels looking good

Adding a legend

The next challenge was to add a legend to the diagram, mostly because I’d decided that the infographic would be too busy with Library labels on each data point. This again took a bit of working through, because while d3.js has a d3.legend function for constructing legends, it’s only intended for data plotted horizontally or vertically, not 7 data sets plotted on consecutive annulars. This tutorial from Zero Viscosity and this one from Competa helped me understand that a legend is really just a group of related rectangles.

var legend = LegendLayer.selectAll("g")
    .data(color.domain())
    .enter()
    .append('g')
    .attr('x', legendPlacementX)
    .attr('y', legendPlacementY)
    .attr('class', 'legend')
    .attr('transform', function(d, i) {
        return 'translate(' + (legendPlacementX + legendWidth) + ',' + (legendPlacementY + (i * legendHeight)) + ')';
});

legend.append('rect')
    .attr('width', legendWidth)
    .attr('height', legendHeight)
    .attr('class', 'legendRect')
    .style('fill', color)
    .style('stroke', legendStrokeColor);

legend.append('text')
    .attr('x', legendWidth + legendXSpacing)
    .attr('y', legendHeight - legendYSpacing)
    .attr('class', 'legendText')
    .text(function(d) { return d; });
Annular representation of data - step 8
The legend isn’t positioned correctly

Again, the positioning took a little work, but eventually I got the legend positioned well.

Annular representation of data - step 9
The legend is finally positioned well

Responsive design and data visualisation with d3.js

One of the other key challenges with this project was attempting to have a reasonably responsive design. This appears to be incredibly hard to do with d3.js. I experimented with a number of settings to aim for a more responsive layout. Originally, the narrative text was positioned in a sidebar to the right of the image, but at different screen resolutions the CSS float rendered awkwardly, so I decided to use a one column layout instead, and this worked much better at different resolutions.

Next, I experimented with using the Javascript values innerWidth and innerHeight to help set the width and height of the svg element, and also dynamically positioned the legend. This gave a much better, while not perfect, rendering at different resolutions. It’s still a little hinkey, particularly at smaller resolutions, but is still an incremental improvement.

Thinking this through more deeply, although SVG and d3.js in general are vector-based, and therefore lend themselves well to responsive design to begin with, there are a number of elements which don’t scale well at different resolutions – such as text sizes. Unless all these elements were to be made dynamic, and likely conditional on viewport and orientation, then it’s going to be challenging indeed to produce a visualisation that’s fully responsive.

Adding tooltips

While I was reasonably pleased with the progress on the project, I felt that the visualisation needed an interactive element. I considered using some sort of arc tween to show movement between data sets, but given that historical data (say for previous years) wasn’t available, this didn’t seem to be an appropriate choice.

After getting very frustrated with the lack of built in tooltips in d3.js itself, I happened upon the d3.tip library. This was a beautifully written addition to d3.js, and although its original intent was for horizontal and vertical chart elements, it worked passably on annular segments.

Annular representation of data - step 10
Adding tooltips

Drawbacks in using d3.tip for circular imagery

One downside I found in using this library was the way in which it considers the positioning of the tooltip – this has some unpredictable, and visually unpleasant results when data is being represented in circular format. In particular, the way that d3.tip calculates the ‘centre’ of the object that it is applied to does not translate well to arc and circular shapes. For instance, look at how the d3.tip is applied to arc segments that are large and have only small amounts of curvature – such as the Geelong arc segment for ‘Members’. I’ve had a bit of a think about how to solve this problem, and the solution involves a more optimal approach to calculating the the ‘centre’ point of an arc segment.

This is beyond what I’m capable of with d3.js, but wanted to call this out as a future enhancement and exploration.

Adding percentage values to the tooltip with d3.nest and d3.sum

The next key challenge was to include the percentage figure, as well as Library and data value in the d3.tip. This was significantly more challenging than I had anticipated, and meant learning up on d3.nest and d3.sum functions. These tutorials from Phoebe Bright, and LearnJS were helpful, and Zan Armstrong’s tutorial on d3.format helped me get the precision formatting correct. After much experimentation, it turned out that summing the values of each dataset (in order to calculate percentage) was but a mere three lines of Javascript:

var CollectionItemCount = d3.nest()
    .rollup(function (v) { return d3.sum(v, function(d) { return d.Items})})
    .entries(CollectionData);

Concluding remarks

Data visualisation is much more challenging than I thought it would be, and the learning curve for d3.js is steep – but it’s worth it. This exercise drew on a range of technical skills, including circular trigonometry, HTML and knowledge of the DOM, CSS and Javascript, and above all the ability to ‘break a problem down’ and look at it from multiple angles (no pun intended).

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save