Linux Australia expense breakdown – a data visualisation in d3.js

After learning a lot of new techniques and approaches (and gotchas) in d3.js in my last data visualisation (Geelong Regional Libraries by branch), I wanted to turn my new-found skills to Linux Australia’s end of year report. This is usually presented in a fairly dry manner at the organisation’s AGM each year, and although we have a Timeline of Events, it was time to add some visual interest to the presentation of data.

Collecting and cleaning the data

The dataset that I chose to explore was the organisation’s non-event expenses – that is, the expenditure of the organisation not utilised on specific events – items like insurance, stationery, subscriptions to online services and so on. These were readily available in the accounting software – Xero, then a small amount of data cleansing yielded a simple CSV file. The original file had a ‘long tail’ distribution – there were many data point that had only a marginal value and didn’t help in explaining the data, so I combined these into an ‘other’ category.

Visualising the data

Using the previous annular (donut chart) visualisation as the base, I set some objectives for the visualisation;

  • The colours chosen had to match those of Linux Australia’s branding
  • The donut chart required lines and labels
  • The donut chart required markers inside each arc
  • The donut chart had to be downloadable in svg format so that it could be copied and pasted into Inkscape (which has svg as its standard save format)
Colour choice

There was much prototyping involved with colour selection. The first palette selected used shading of a base colour (#ff0000 – red), but the individual arcs were difficult to distinguish. A second attempt added many (16) colours into the palette, but they didn’t work as a colour set. I settled on a combination of three colours (red, yellow, dark grey) and shades of these, with the shading becoming less saturated the smaller the values of the arc.

For anyone interested, the color range was defined as a d3.scaleOrdinal object as below.

var color = d3.scaleOrdinal()
    .range([
      '#ffc100',
      '#ff0000',
      '#393939',
      '#ffcd33',
      '#ff3333',
      '#616161',
      '#ffda66',
      '#ff6666',
      '#888888',
      '#ffe699',
      '#ff9999',
      '#b0b0b0',
      '#fff3cc',
      '#ffcccc',
      '#fff'
    ])
Lines and markers

I hadn’t used lines (polylines) and markers in d3.js before and this visualisation really needed them – because the data series labels were too wordy to easily fit on the donut chart itself. There were some examples that were particularly useful and relevant in figuring this out:

The key learning from this exercise about svg polylines is that the polyline is essentially a series of x,y Cartesian co-ordinates – the tricky part is actually using the right circular trigonometry to calculate the correct co-ordinates. This took me right back to sin and cos basics, and I found it helpful to sketch out a diagram of where I wanted the polyline points to be before actually trying to code them in d3.js.

A gotcha that tripped me up for about half an hour here was that I hadn’t correctly associated the markers with the polylines – because the markers only had a class attribute, but not an id attribute. Whenever I use markers on polylines from now on, I’ll be specifying both class and id attributes.

    .attr('class', 'marker')
    .attr('id', 'marker')

I initially experimented with a polyline that was drawn not just from the centroid of the arc for each data point out past the outerArc, but one that also went horizontally across to the left / right margin of the svg. While I was able to achieve this eventually, I couldn’t get the horizontal spacing looking good because there were so many data points on the donut chart – this would work well with a donut chart with far fewer data points.

Markers were also generally straightforward to get right, after reading up a bit on their attributes. Again, one of the gotchas I encountered here was ensuring that the markerWidth and markerHeight attributes were large enough to contain the entire marker – for a while, the markers were getting truncated, and I couldn’t figure out why.

    .attr('markerWidth', '12')
    .attr('markerHeight', '12')
Labels

Once the positioning for the polylines was solved, then positioning the labels was relatively straightforward, as many of the same trigonometric functions were used.

The challenge I encountered here was that d3.js by default has no text wrapping solution built in to the package, although alternative approaches such as the below had been documented elsewhere. From what I could figure out, d3.js does not support the tspan svg element. That is, I can’t just append tspan elements to text elements to achieve word-wrapping.

In the end I ended up just abbreviating a couple of the data point labels rather than sink several hours into text wrapping approaches. It seems odd that svg provides such poor native support for text wrapping, but considering the myriad ways that text – particularly foreign language text – can be wrapped – it’s incredibly complex.

Downloadable svg

The next challenge with this visualisation was to allow the rendered svg to be downloaded – as the donut chart was intended to be part of a larger infographic. Again, I was surprised that a download function wasn’t part of the core d3.js library, but again a number of third party functions and approaches were available:

  • Example block from Miłosz Kłosowicz – ‘Download svg generated from d3‘: in this example, the svg node is converted to base-64 encoded ASCII then downloaded.
  • d3-save-svg plugin: this plugin provides a number of methods to download the svg, and convert it to a raster file format (such as PNG). This is a fork of the svg-crowbar tool, written for similar purposes by the New York Times data journalism team.

I chose to use the d3-save-svg plugin simply because of the abstraction it provided. However, I came up against a number of hurdles. When I first used the example code to try and create a download button, the download function was not being triggered. To work around this, I referenced the svg object by id:

d3_save_svg.save(d3.select('#BaseSvg').node(), config);

The other hiccup with this approach was that CSS rules were not preserved in the svg download if the CSS selector had scope outside the svg object itself. For instance, I had applied basic font styling rules to the entire body selector, but in order for font styling to be preserved in the download, I had to re-specify the font styling at the svg selector level in the CSS file. This was a little frustrating, but the ease of using a function to do the download compensated for this.

Linux Australia expenses 2015-2016 infographic
Linux Australia expenses 2015-2016 infographic

 

Save

What I learned at #DrupalSouth 2015

Run by the awesome Donna Benjamin, I decided to volunteer for #DrupalSouth because of the community, and also because a lot of the schedule topics interested me, particularly around continuous integration and design processes. The venue, Melbourne Exhibition and Convention Centre, was great – easy to access, and lots of accommodation within easy walking distance.

Day 1 went brilliantly. Donna had prepared everything beforehand, including all the attendee lanyards etc – which were outsourced to an external provider for packaging and alphabetising – which made registration an absolute breeze. Registration opened at 0800hrs, but many delegates didn’t register until 0845hrs – meaning a last minute rush.

The key thing I took away from registration was how heavily Drupal is used in government and in education – with several agencies and tertiary and research institutions represented. T-shirts were issued, and the sizing concerns often besetting technical conferences were avoided by having a wide range of sizes. We decided to issue t-shirts that people had ordered first up, and then doing swaps on Day 2 when we had a better idea of who had registered and who hadn’t – and this worked well.

Better Remote work by Jarkko Oksanan

The first session I room monitored in was by Jarkko Oksanan, a Finn who does a lot of Drupal work remotely. He went through a great presentation on putting together a remote working team, and remote working practices that are highly effective. I was blown away by the statistic quoted, that globally there are over 219 million people who work globally – so imagine the productivity increases if we can improve remote working even marginally!

There was a rundown of the best remote work tools to use, including;

  • Videoconferencing: talky.io, Google hangouts all got a mention
  • IM and team communicatiion: Slack got a huge mention, and IRC is still huge. Still! Hipchat is rocking for people who work with other Atlassian tools. Just to test it out, I created a Slack account and integrated it with my GitHub repo just to take it for a spin, and, quite frankly, I likey.
  • Git all the things: GitHub, private repos, git synching for backups, and integration with GitHub and Slack for team comms. If you’re not into Git, get on to it.

One aspect of this presentation that surprised me was the focus on team building and social opportunities to facilitate remote working – because it’s hard to have conflict with someone you’ve shared a few drinks with.

Drupal 8 Migration Choices – Jerry Maguire (Jam)

Jam gave us a rundown of the architectural decisions around moving to Drupal 8 – essentially, D8 is experimental, useful for small scale deployments or for prototyping, but is not ready for mission critical, complex or massively public facing scenarios. Awesome presenter, would see again A+++

Peter Henderson – 2.9 million words in two months

Peter, from the Australian Pesticides and Veterinary Medicine Authority, gave an engaging presentation on content workflow and website redesign in a heavily regulated government environment. As a centralised web team, they had to convert over 2.9 million words of content into a new CMS (Drupal) within two months. Many shortcuts were taken, and the end result was that the end users of the site didn’t really enjoy the experience – so they refactored by using analytics t0 guide UX improvements.

Because of the high degree of centralisation, they also implemented Dashboards in Drupal, so that pieces of content could be tracked across the complex legal, SME and technical review workflow – something that was all too familiar from my own work experience. The Dashboards worked well, and help to secure senior management buy in in to making content owners accountable for reviewing their content.

At the end of the session, I asked Peter whether or not a decentralised content authoring approach had been considered – and his response was also intuitive – and seen all to often in large organisations;

“they’re not capable of this yet – the maturity isn’t there”

Amelia Schmidt – Red flags in the design process

In what I judged to be one of the most insightful talks of #DrupalSouth, @meelijane took us through a number of ‘red flags’ in the design process. Aside from her compelling and engaging slide deck, some of the point she made were controversial and challenging – such as questioning whether in the digital age, it was still appropriate to get client sign off on designs, as the design itself may not perfectly resemble the finished product. For example, Photoshop layered files cannot always easily translate to HTML and CSS.

She also made a number of compelling points about tools for design – and introduced us to a number of great products such as;

Using Sushi Cats (bonus points, cats as food), she demonstrated design ratio problems for common elements such as lead text and featured images, and took us through some techniques to have better overall design patterns, such as different style crops to match defined styles.

Well worth a look through the slide deck.

Michael Godeck – Go for Continuous Delivery

Michael’s presentation centred around the practice of Continuous Delivery, which incorporates the practice of Continuous Integration, and introduced an opensource tool called ‘Go‘, which is in a similar marketspace to tools such as Jenkins and Ansible. I hadn’t used it before, and Michael provided a great overview.

Michael took us through various development metrics, such as cycle time, lead time and development time – and showed how a continuous delivery framework enabled you to spot where bottlenecks were in your process. He strongly underscored that you need to ensure that you’re building the right thing – in the same way that Agile is a project management methodology, continuous delivery tools allow agile thinking to be applied in the software development process.

This talk spurred a number of great questions, which touched on topics such as how to convince clients to pay for quality – as continuous delivery models allow for greater quality.

Lasting thoughts

DrupalSouth was a fantastic event. Well organised, with a great venue, a space conducive to relationship building and knowledge sharing, very strong wireless internet, and well-prepared Speakers who were clearly experts in their field. The surprising takeaway for me however was just how strong UX, UCD and CX practices are infiltrating traditionally technically-heavy communities – and in so doing, delivering better products and experiences.

DrupalSouth Melbourne 2015
DrupalSouth 2015 Group photo, credit: Peter Lieverdink