Joining the Dots – The Art and Science of Data Visualisation came about as the brainchild of Fiona Tweedie – a business analyst and data scientist who has worked in open knowledge, open data and digital humanities for several years, after completing her PhD in humanities at the University of Sydney. At Pycon AU, Fiona identified that most of the talks on data visualisation had strong representation from STEM – science, technology, engineering and mathematics – but poorer representation from the humanities. Held at the Walter and Eliza Hall Institute, part of the broader University of Melbourne research precinct, #jtdwehi sought to address that by providing the opportunity to cross-pollinate multiple disciplines – and by all accounts it was a roaring success.
There were several excellent and engaging presentations over the course of the day, and my personal highlights are covered below.
Deb Verhoeven is incredibly respected in digital humanities for her creative take on visualisation and sonification – and not least of all for her untiring efforts to improve gender representation and diversity in the digital humanities – for more on this, check out her famous ‘Where are the women?’ speech at DH2015:
Her incisive presentation covered broad ground. In particular, her exposé of “gender offenders” in Australian cinema – men who do not work with women, and choose to work exclusively with other men – denying women opportunities in the industry – was one of the most impactful data visualisations I’ve ever seen.
This is what the patriarchy looks like!
– Professor Deb Verhoeven, speaking about data visualising of gender representation in Australian cinema
Using a technique called social network analysis, Verhoeven’s team were able to show the gender of project members and how they clustered. Words don’t do it justice.
Another thought-provoking element of Verhoeven’s keynote was the work her research team had done on sonofication, as part of The Ultimate Gig Guide project. Walking us through the project, Verhoeven explained how the team had gathered data on the spread of bands across Melbourne via gig records. To add an extra degree of difficulty, many of these records were not digitised, and the data had to be gathered manually (another argument for digitalisation projects – it makes accessing and using data so much easier). The team then sonofied the data, resulting in a sequence of notes representing the frequency of gigs and their location as distance from Melbourne CBD. To add additional interest, a backing track was added, and the data was transposed into Cmaj scale. A meta gig – a gig about a gig!
“Visualising the Australian Transport Network” by Xavier Ho, CSIRO
Xavier, an interactive data visualisation specialist with CSIRO, presented on TraNSIT – the Transport Network Strategic Investment Tool. This tool is designed to help identify and implement efficiencies in agribusiness supply chains by mapping the logistics and transport networks of different modes of transport – road, rail, air and sea. This work was amazing – not just because the data needed to be sourced from so many different repositories – another argument for open data – but because of the direct impact data visualisation could have on planning and strategy.
Xavier was a seasoned presenter, with an engaging style – an excellent speaker.
“Ungodly cocktail – visualising three editions of Raynal’s “Histoire”” by Geoff Hinchcliffe, Australian National University
I cannot honestly say that French literature is something which excites me, but Geoff Hinchcliffe’s excellent presentation brought this project – which sought to visualise the differences between editions of Raynal’s Histoire – to life. Using the ‘ungodly cocktail’ of several data visualisation tools, combined with an iterative design and development process (instead of the usual tiered and discrete ‘front end’ and ‘back end’ approach), the changes between versions were mapped and visualised, providing a narrative to explore the influence of collaborator (in writing), Diderot.
What struck me about Hinchcliffe’s approach was the remarkable work that had gone into making something so esoteric and complex so accessible and simple – the true power of data visualisation.
You can follow Hinchcliffe as @gravitron on Twitter.
Throughout the day, I came to a number of conclusions:
Interactivity is not a necessary part of every visualisation – with some visualisations such as Hinchcliffe’s not having a high degree of interactivity.
The interplay between design and development is tightly coupled – as seen with presenters having both back-end and front-end and process ’round tripping’ skills – data visualisation combines design, coding and statistical skills in equal measure and the more highly sought after practitioners will be able to work ‘full stack’.
After learning a lot of new techniques and approaches (and gotchas) in d3.js in my last data visualisation (Geelong Regional Libraries by branch), I wanted to turn my new-found skills to Linux Australia’s end of year report. This is usually presented in a fairly dry manner at the organisation’s AGM each year, and although we have a Timeline of Events, it was time to add some visual interest to the presentation of data.
Collecting and cleaning the data
The dataset that I chose to explore was the organisation’s non-event expenses – that is, the expenditure of the organisation not utilised on specific events – items like insurance, stationery, subscriptions to online services and so on. These were readily available in the accounting software – Xero, then a small amount of data cleansing yielded a simple CSV file. The original file had a ‘long tail’ distribution – there were many data point that had only a marginal value and didn’t help in explaining the data, so I combined these into an ‘other’ category.
Visualising the data
Using the previous annular (donut chart) visualisation as the base, I set some objectives for the visualisation;
The colours chosen had to match those of Linux Australia’s branding
The donut chart required lines and labels
The donut chart required markers inside each arc
The donut chart had to be downloadable in svg format so that it could be copied and pasted into Inkscape (which has svg as its standard save format)
There was much prototyping involved with colour selection. The first palette selected used shading of a base colour (#ff0000 – red), but the individual arcs were difficult to distinguish. A second attempt added many (16) colours into the palette, but they didn’t work as a colour set. I settled on a combination of three colours (red, yellow, dark grey) and shades of these, with the shading becoming less saturated the smaller the values of the arc.
For anyone interested, the color range was defined as a d3.scaleOrdinal object as below.
I hadn’t used lines (polylines) and markers in d3.js before and this visualisation really needed them – because the data series labels were too wordy to easily fit on the donut chart itself. There were some examples that were particularly useful and relevant in figuring this out:
The key learning from this exercise about svg polylines is that the polyline is essentially a series of x,y Cartesian co-ordinates – the tricky part is actually using the right circular trigonometry to calculate the correct co-ordinates. This took me right back to sin and cos basics, and I found it helpful to sketch out a diagram of where I wanted the polyline points to be before actually trying to code them in d3.js.
A gotcha that tripped me up for about half an hour here was that I hadn’t correctly associated the markers with the polylines – because the markers only had a class attribute, but not an id attribute. Whenever I use markers on polylines from now on, I’ll be specifying both class and id attributes.
I initially experimented with a polyline that was drawn not just from the centroid of the arc for each data point out past the outerArc, but one that also went horizontally across to the left / right margin of the svg. While I was able to achieve this eventually, I couldn’t get the horizontal spacing looking good because there were so many data points on the donut chart – this would work well with a donut chart with far fewer data points.
Markers were also generally straightforward to get right, after reading up a bit on their attributes. Again, one of the gotchas I encountered here was ensuring that the markerWidth and markerHeight attributes were large enough to contain the entire marker – for a while, the markers were getting truncated, and I couldn’t figure out why.
Once the positioning for the polylines was solved, then positioning the labels was relatively straightforward, as many of the same trigonometric functions were used.
The challenge I encountered here was that d3.js by default has no text wrapping solution built in to the package, although alternative approaches such as the below had been documented elsewhere. From what I could figure out, d3.js does not support the tspan svg element. That is, I can’t just append tspan elements to text elements to achieve word-wrapping.
Example block from Mike Bostock – ‘Wrapping long labels‘: in which Mike Bostock (the creator and maintainer of d3.js) has written a custom function for wrapping text
In the end I ended up just abbreviating a couple of the data point labels rather than sink several hours into text wrapping approaches. It seems odd that svg provides such poor native support for text wrapping, but considering the myriad ways that text – particularly foreign language text – can be wrapped – it’s incredibly complex.
The next challenge with this visualisation was to allow the rendered svg to be downloaded – as the donut chart was intended to be part of a larger infographic. Again, I was surprised that a download function wasn’t part of the core d3.js library, but again a number of third party functions and approaches were available:
Example block from Miłosz Kłosowicz – ‘Download svg generated from d3‘: in this example, the svg node is converted to base-64 encoded ASCII then downloaded.
d3-save-svg plugin: this plugin provides a number of methods to download the svg, and convert it to a raster file format (such as PNG). This is a fork of the svg-crowbar tool, written for similar purposes by the New York Times data journalism team.
I chose to use the d3-save-svg plugin simply because of the abstraction it provided. However, I came up against a number of hurdles. When I first used the example code to try and create a download button, the download function was not being triggered. To work around this, I referenced the svg object by id:
The other hiccup with this approach was that CSS rules were not preserved in the svg download if the CSS selector had scope outside the svg object itself. For instance, I had applied basic font styling rules to the entire body selector, but in order for font styling to be preserved in the download, I had to re-specify the font styling at the svg selector level in the CSS file. This was a little frustrating, but the ease of using a function to do the download compensated for this.
Geelong Regional Libraries Corporation (GRLC) came on board GovHack this year, and as well as being a sponsor, opened a number of datasets for the hackathon. Being the lead organiser for GovHack, I didn’t actually get a chance to explore the open data during the competition. However, while I was going for a walk one day – as it always does – I had an idea around how the GRLC datasets could be visualised. I’d previously done some work visualising data using a d3.chord layout, and while this data wasn’t suitable for that type of layout, the concept of using annulars – donut charts – to represent and compare the datasets seemed appealing. There was only one problem – I’d never tackled anything like this before.
Understanding what problem I was trying to solve
Of course the first question here was what problem I was trying to solve (thanks Radia Perlman for teaching me to always solve the right problem – I’ll never forget your LCA2013 keynote). Was this an exploratory data visualisation or an explanatory one? This led to formulating a problem statement:
How do the different Libraries in the Geelong region compare to each other in terms of holdings, membership, visits and other attributes?
The first challenge was to ensure that the colours of the visualisation were both appealling and appropriate. While this seems an unlikely starting place for a visualisation – with most practitioners opting to get the basic shape right first, for this project getting the colours right felt like the best starting point. For inspiration, I turned to the Geelong Regional Library Corporation’s Annual Report, and used the ColorZilla extension to eyedropper the key brand colours used in the report. However, this only provided about 7 colours, and I needed 17 in order to map each of the different libraries. In order to identify ‘in between’ colours, I used this nifty tool from Meyerweb, which is super-handy for calculating gradients. The colours were then used as an array for a d3.scaleOrdinal object, and mapped to each library.
The second challenge was to place multiple annulars – one for each dataset – within the same svg. Normally with d3.js, you create an svg object which is appended to the body element of the html document. So what happens when you place two d3.pie objects on the svg object? You guessed it! Fail! The two annulars were positioned one under the other, rather than over the top of each other. I was stuck on this problem for a while, until I realised that the solution was to place different annulars on different layers within the svg object. This also gave more control over the visualisation. However, SVG doesn’t have layers as part of its definition – objects in SVG are drawn one on top of the other, with the last drawn object ‘on top’ – sometimes called stacking . But by creating groups within the BaseSvg like the below, for shapes to be drawn within, I was able to approximate layering.
var BaseSvg = d3.select("body").append("svg")
.attr("transform", "translate(" + (width / 2 - annularXOffset) + "," + (height / 2 - annularYOffset) + ")");
Layers for each annular
var CollectionLayer = BaseSvg.append('g');
var LoansLayer = BaseSvg.append('g');
var MembersLayer = BaseSvg.append('g');
var EventsLayer = BaseSvg.append('g');
var VisitsLayer = BaseSvg.append('g');
var WirelessLayer = BaseSvg.append('g');
var InternetLayer = BaseSvg.append('g');
var InternetLayer = BaseSvg.append('g');
var TitleLayer = BaseSvg.append('g');
var LegendLayer = BaseSvg.append('g');
Adding in parameters for spacing and width of the annulars
Once I’d figured out how to get annulars rendering on top of each other, it was time to experiment with the size and shape of the rings. In order to do this, I tried to define a general approach to the shapes that were being built. That general approach looked a little like this (well, it was a lot more scribble).
By being able to define a general approach, I was able to declare variables for elements such as the annular width and annular spacing, which became incredibly useful later as more annulars were added – the positioning and shape of the arcs for each annular could be calculated mathematically using these variables (see the source code for how this was done).
var annularXOffset = 100; // how much to shift the annulars horizontally from centre
var annularYOffset = 0; // how much to shift the annulars vertically from centre
var annularSpacing = 26; // space between different annulars
var annularWidth = 22; // width of each annular
var annularMargin = 70; // margin between annulars and canvas
var padAngle = 0.027; // amount that each segment of an annular is padded
var cornerRadius = 4; // amount that the sectors are rounded
This allowed me to ‘play around’ with the size and shape of the annulars until I got something that was ‘about right’.
At this stage I also experimented with the padAngle of the annular arcs (also defined as a variable for easy tweaking), and with the stroke weight and colour, which was defined in CSS. Again, I took inspiration from GRLC’s corporate branding.
Placing dataset labels on the arcs
Now that I had the basic shape of the visualisation, the next challenge was to add dataset labels. This was again a major blocking point, and it took me a lot of tinkering to finally realise that the dataset labels would need to be svg text, sitting on paths created from separate arcs than that rendered by the d3.pie function. Without separate paths, the text wrapped around each arc segment in the annular – shown below. So, for each dataset, I created a new arc and path for the dataset label to be rendered on, and then appended a text element to the path. I’d never used this technique in svg before and it was an interesting learning experience.
Having sketched out a general approach again helped here, as with the addition of a few extra variables I was able to easily create new arcs for the dataset text to sit on. A few more variables to control the positioning of the dataset labels, and voila!
Adding a legend
The next challenge was to add a legend to the diagram, mostly because I’d decided that the infographic would be too busy with Library labels on each data point. This again took a bit of working through, because while d3.js has a d3.legend function for constructing legends, it’s only intended for data plotted horizontally or vertically, not 7 data sets plotted on consecutive annulars. This tutorial from Zero Viscosity and this one from Competa helped me understand that a legend is really just a group of related rectangles.
Again, the positioning took a little work, but eventually I got the legend positioned well.
Responsive design and data visualisation with d3.js
One of the other key challenges with this project was attempting to have a reasonably responsive design. This appears to be incredibly hard to do with d3.js. I experimented with a number of settings to aim for a more responsive layout. Originally, the narrative text was positioned in a sidebar to the right of the image, but at different screen resolutions the CSS float rendered awkwardly, so I decided to use a one column layout instead, and this worked much better at different resolutions.
Thinking this through more deeply, although SVG and d3.js in general are vector-based, and therefore lend themselves well to responsive design to begin with, there are a number of elements which don’t scale well at different resolutions – such as text sizes. Unless all these elements were to be made dynamic, and likely conditional on viewport and orientation, then it’s going to be challenging indeed to produce a visualisation that’s fully responsive.
While I was reasonably pleased with the progress on the project, I felt that the visualisation needed an interactive element. I considered using some sort of arc tween to show movement between data sets, but given that historical data (say for previous years) wasn’t available, this didn’t seem to be an appropriate choice.
After getting very frustrated with the lack of built in tooltips in d3.js itself, I happened upon the d3.tip library. This was a beautifully written addition to d3.js, and although its original intent was for horizontal and vertical chart elements, it worked passably on annular segments.
Drawbacks in using d3.tip for circular imagery
One downside I found in using this library was the way in which it considers the positioning of the tooltip – this has some unpredictable, and visually unpleasant results when data is being represented in circular format. In particular, the way that d3.tip calculates the ‘centre’ of the object that it is applied to does not translate well to arc and circular shapes. For instance, look at how the d3.tip is applied to arc segments that are large and have only small amounts of curvature – such as the Geelong arc segment for ‘Members’. I’ve had a bit of a think about how to solve this problem, and the solution involves a more optimal approach to calculating the the ‘centre’ point of an arc segment.
This is beyond what I’m capable of with d3.js, but wanted to call this out as a future enhancement and exploration.
Adding percentage values to the tooltip with d3.nest and d3.sum