Valentine’s Day: A data visualization to learn Chords in d3.js

Estimated reading time: 7 minutes

One of my learning goals this year was to really understand d3.js, and become more proficient in creating interactive data visualizations. In turn, this lead me to attempting to learn and analyse Chord diagrams. Chord diagrams visualize relationships either unilaterally or bilaterally. For example, they have been used to show capital flows inbound and outbound in financial visualization.

Learning how Chord diagrams are constructed in d3.js

Firstly, I wanted a solid primer on how Chord diagrams are constructed in d3.js. Steven Hall’s excellent blog post for Delimited provided the best overview, and clearly articulated elements of a Chord diagram such as the Matrix, the map of flows, arcs and paths. Using this approach, I decided to construct my own scenario and see if I could visualize it. The scenario had to have:

  • Bi-directional data flows which may be asymmetric (x has a relationship with y, but y may have a different relationship with x)
  • A small enough dataset that I could manually construct it (without having to do a lot of CSV or json processing – this exercise was about learning Chords, not about grokking data loading in d3.js)
  • A dataset that could be easily understood by a layperson

I settled on the concept of Valentine’s Day crushes, because they satisfy the above criteria. Next, I constructed a number of statements that were to be visualized. They assumed that one person expressed a crush on one other person, and that this may or may not be mutual. After doing an initial list, I had to call myself out – I’d assumed hetero-normative relationships (male attracted to female and vice-versa), but of course that’s simply not diverse or inclusive thinking.

  • Bob (male, hetero) likes Emily (female, hetero)
  • Giovanni (male, hetero) likes Emily (female, hetero)
  • Kevin (male, hetero) likes Poh (female, hetero)
  • Art (male, hetero) likes Viva (female, hetero)
  • Pyotr (male, hetero) likes Viva (female, hetero)
  • Rohan (male, hetero) likes Rachel (female, hetero)
  • Sasha (male, same-sex attracted) likes Pyotr
  • Emily likes Bob
  • Rachel likes Rohan
  • Poh (female, hetero) likes Sasha
  • Viva likes Pyotr
  • Lee (female, same-sex attracted) likes Poh

The next step was to convert these statements into a matrix.

The matrix

Matrices are usually built from spreadsheet or other tabular datasets. Therefore, it was helpful for me to represent the above relationships in a table.

Valentine’s Day preferences
Name Bob Giovanni Steve Kevin Art Pyotr Rohan Sasha Emily Rachel Poh Viva Lee
Bob No No No No No No No Yes No No No No
Giovanni No No No No No No No Yes No No No No
Steve No No No No No No No No Yes No No No
Kevin No No No No No No No No No Yes No No
Art No No No No No No No No No No Yes No
Pyotr No No No No No No No No No No Yes No
Rohan No No No No No No No No Yes No No No
Sasha No No No No No Yes No No No No No No
Emily Yes No No No No No No No No No No No
Rachel No No No No No No Yes No No No No No
Poh No No No No No No No Yes No No No No
Viva No No No No No Yes No No No No No No
Lee No No No No No No No No No No Yes No

The matrix is an inherent part of Chord diagrams. Chord diagrams are based on a symmetric matrix  – that is, there are as many rows in the matrix as there are columns. One of the first mistakes I made in this exercise was not to have the columns and rows in the same order – I ordered the names of people in the columns differently to the rows. When this was implemented as a Chord layout in d3.js, it was an incorrect representation.

Trap: Ensure that in your data matrix, that the data in rows and columns is in the same order. If you don’t have your data in the same order, the Chord diagram will assume that your row data is in the same order as your column data.

From this table, I was then able to declare a matrix variable:

var matrix = [
  [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], // Bob
  [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], // Giovanni
  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], // Steve
  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], // Kevin
  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], // Art
  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], // Pyotr
  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], // Rohan
  [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], // Rachel
  [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], // Sasha
  [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], // Emily
  [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], // Lee
  [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], // Viva
  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]  // Poh
]

Chords and ribbons

The next step in this process is to calculate the flows in the matrix in both directions. In this example, it means calculating the relationship between each of the people, and that relationship may not be equal or symmetric. For instance, Rachel likes Rohan, and Rohan likes Rachel this is a symmetric relationship. However, Kevin likes Poh, but Poh likes Sasha. The relationship is asymmetric. This is reasonably simple to do (and Steven Hall provides an excellent pseudocode example in his blog post). d3.js being the knock-your-socks-off piece of awesome that is provides an inbuilt function for this:

chords(matrix)

This function calculates several values needed to calculate the chords in a Chord diagram, including the starting and ending angle of the chords, as well as the values (inbound and outbound) of the chords.  See the d3.js manual entry for more information.

Arcs

A Chord diagram often has labels around the outside of the diagram, and these are produced in d3.js by passing the subgroups (from the matrix() function) to the

d3.arc()

generator. See the d3.js manual entry for more information.

Problems encountered

One of the key issues I encountered in getting this far with the data visualization was the syntax changes between version 3 of d3.js and version 4. There are a number of changes to method names, and the way that they are called under version 4. Many of the examples I used as a jumping off point were done prior to mid-2016 (when version 4 was released), using the older syntax. My example used version 4, and this resulted in a number of syntax errors if I was ‘copying and pasting’ code.

The way I handled this was to have a good read through the ChangeLog for version 4, noting in particular the changes to Chord and Ribbon methods.

Another issue that occupied some brain cycles was the different way that the matrices were calculated. Many earlier examples and tutorials included a ‘mapper’ function where series data was mapped to a matrix. In my example, the matrix() function did mapping as well.

As someone who’s familiar with OO-style programming, I’m still getting used to the way that d3.js modifies the document object model, first by selecting, entering and then modifying one or more DOM elements. This is something I’m just going to have to “get used to” as I use d3.js more.

Trap: Ensure you’re using the correct method calls for the version of d3.js you’re using

Visual design aspects

Data is only one part of the data visualization lifecycle. In order to be useful, it has to be visualized in a meaningful way.

Colours

The first major choice was how to represent the different people in the visualisation. It made sense to use different colours for men and women, and following (traditional, socialised, boring, gender-normative – I get it) what people are expecting, I chose blue for men and pink for women (from the Pantone Colours of Spring 2016 palette. Because Pantone). This provided a pleasant looking graphic, but data visualization is about telling a story.

I decided to add in two more colours to represent the same-sex attracted people in the data series (Sasha and Poh). This added visual interest, and made it easier to interpret some of the interesting details about the whole that were not apparent from the initial statements.

Chord diagram with solid colour in ribbons
Chord diagram with solid colour in ribbons

Using gradients in the ribbons

As you can see from the above, the solid colours (well, solid colours with a transparency applied) don’t really narrate the story of this visualisation very well. The pink colour dominates, and nuances (such as the unrequited love triangle between Poh, Sasha and Lee) in the data are less obvious.

Visually, I wanted the ribbons in the diagram to have gradients. At first glance this looked incredibly complex, and I was about to give up, when I found an excellent article by Nadieh Bremer, one of the gurus of d3.js, on this exact topic. Nadieh’s article provides the mathematical basis for visually appealing gradients in ribbons, including how the direction of the gradient is calculated, based on the position and direction of the ribbon. It’s very well articulated, and you don’t even need basic trigonometry skills to get it – it’s visually explained.

In a nutshell, Nadieh’s code calculates the gradient start and stop points for each ribbon, and the angular direction in which the gradient should be applied.

Using Nadieh’s code, I then applied gradients to the ribbons, for a much more informative and meaningful visualization.

Chord diagram with gradients
Chord diagram with gradients

Arc labels

The next tricky piece visually was adding the name of each person to the arc. For this, I relied on code from this Chord example from AndrewRP.  This included applying CSS styles to the svg text, which I hadn’t done before. Because you’re styling text within an svg element, you need to prefix the selector with the svg element:

svg .titles {
  font-size: 180%;
  font-family: "Abel", sans-serif;
  font-weight: bold
}

This wasn’t something I’d done before, so it was a great learning experience.

I’m still not entirely happy with the labels in the arcs – I would much prefer them to be larger, bolder and centred within the arc segments themselves. A good extension activity for another day.

Telling the story

Of course a key point with a data visualization is for the data to tell a story.

Visualizing the Valentine’s day sentiments that we started off with allows us to derive a lot more meaning from the data overall:

  • We can see a tragic unrequited love triangle. Lee is attracted to Poh, who is attracted to Sasha, who is attracted to Pyotr, but Pyotr and Viva are mutually attracted
  • No-one is attracted to Lee, Giovanni, Kevin or Art
  • We can see that Rohan and Rachel, Bob and Emily and Viva and Pyotr have mutual attraction.
  • Rachel and Viva are both liked by two men and thus are the most popular women

Next steps

There were some additional elements I would have liked to have added to this visualization, but ran out of time to implement – but I’m noting them here as extension activities if I come back to this in the future.

  • As mentioned, I’d like to clean up the arc titles and visually enhance them
  • Being able to click on a ribbon and learn more information about a particular relationship would be useful and visually pleasing. Nadieh Bremer again has a worked example of how to achieve this, however the code is quite complex and requires a large code base for the ToolTip functionality.
  • It would also be ideal to isolate a particular ribbon – especially given that so many ribbons are overlapping in the centre, making it harder to follow visually who is attracted to who. This would use some form of opacity change to ‘fade’ the ribbons not selected.
  • A number of the variables are statically declared in the code, as arrays. While this is totally fine as a learning example, for reusability I’d much prefer to put them into CSV or JSON files, then use Javascript to read them in.

Get the code

See the final visualization at:

http://blog.kathyreid.id.au/valentines/

And get the source code on GitHub at:

https://github.com/KathyReid/valentines-dataviz

 

 

BuzzConf Nights – Controlling the Future

Estimated reading time: Less than a minute

Was delighted to be given the opportunity to present tonight at BuzzConf Nights – a user group style offering from the people behind the BuzzConf Technology Festival which is held in Ballan in November (see my previous post on BuzzConf over here). I chose to speak on emerging technologies and machine ethics considerations in user interfaces – an incredibly interesting area.

You can find the slides over at – https://kathyreid.github.io/buzzconf-night-2016-presentation/#/Introduction

GovHack Geelong 2016

Estimated reading time: 5 minutes

After the great success of GovHack Geelong 2015, exemplified by the recent production release of Geelong Free Wifi Data by Parham Hausler and Daniel McCarthy, I was thrilled to be given the opportunity to run it all over again this year.

Event planning and promotion

Planning for the event started around May, which was a little later than the 2015, however this year’s event had been scheduled later in July so that it didn’t fall over the University and School holidays. This year’s organisation was much easier given that we knew what to expect, and had learned a number of lessons from our inaugural year, including the importance of strong, easy wireless, clear instructions on what had to be done, and ensuring our GovHackers were well fed. We decided early on to split up into teams to spread out the organisational load of the event – with a national liaison / admin type role, a marketing and communications role, a sponsorship role and a logistics / catering / venue liaison role. This worked out quite well, and it’s a strategy that we’ll likely employ in future years.

Our promotional strategy this year was a little different from last year, where significant efforts were made to engage with the secondary school community around Geelong. These had only limited success, and so this year we turned our attention to attempting to engage at the business and industry level in Geelong, in particular with development and software companies. This was an excellent idea by Todd Hubers, our communications and marketing lead, and we were able to attract at least one corporate-based team. By using registrations from last year, we were also able to engage developers who’d participated in 2015. We also engaged media early on, with a write up in the Geelong Advertiser. This helped to spread the word early on. Of course, we also put significant effort into social media, with our Twitter and Facebook presence growing significantly in the lead up to this year’s event – a mechanism which can then be leveraged further into future year’s events. We supplemented all these efforts with a poster campaign that was delivered through key developer and student channels (GovHack Geelong 2016 Learn New Skills poster, 2MB PDF).

Open data providers

City of Greater Geelong Open Data Portal

In terms of data providers, Andrew Downie from City of Greater Geelong did a great job engaging with multiple institutions around Geelong, and we were delighted to have Geelong Regional Libraries Corporation come on board this year, releasing data sets related to their collections. The open data love continued, with the City releasing their open data portal in the week before GovHack!

There are however some Geelong institutions and open data sources I’d really like to include in future year events so that we continue to build and grow on the momentum that’s already being generated. In particular, it would be great to have organisations like Barwon Water, Department of Education (in particular I’d really like Year 12 completion data by location or school), and Barwon Health more actively engaged. Of course, both the engagement component and the ‘doing’ component – gathering, cleansing, governing and releasing open data – has a resource overhead. While movements like GovHack help to prove the value of open data, it can take some time for large organisations to come around to the viewpoint that their data is more valuable when it’s shared openly, and be willing to invest time and resources in doing so.

The event itself – hackers, hacking and hacks

The event itself was a blast! We had great representation from our sponsors – City of Greater Geelong, Deakin University, Geelong Regional Library Corporation and Aconex – and this allowed us to provide strong catering (thanks to the folks at Waterfront Kitchen). Well fed hackers are happy hackers. Some of the glitches we experienced last year with internet connectivity were ironed out before this year, and we only had one instance of someone being unable to connect to the wireless network. The network survived a massive onslaught, with video, massive datasets and large imagery all testing its mettle!

The biggest technical issue of the event turned out to be the GovHack Hackerspace itself. With the flood of submissions just before ‘keyboards down’ at 5pm on the Sunday, the site ground to a halt – causing consternation amongst the hackers that they’re hard earned projects may not be recognised. The Hackerspace was kept open for an additional few hours to allow the teams to upload their finished products.

The atmosphere of the day was fantastic. One of my favourite moments was Baby Olive helping out her Dad – Ian Priddle of Codeacious.

Inclusion was the theme of the day, with some great diversity of participants and skillsets in all the teams.

Above all, it was great to see creativity unleashed  in all its forms, including mood, temperature and open hardware sensing!

Popular tools and techniques

IBM Watson Twitter Analysis for @KathyReid
IBM Watson Twitter Analysis for @KathyReid

Again, visualisation – both in 2D, and increasingly in 3D with tools like Unity – was a key theme of the event, and one of my favourite tools was the super-easy-to-use IBM Watson Twitter visualisation. Javascript and visualization libraries such as d3.js again featured heavily this year. Javascript – and jQuery – is now considered an essential for all front end web developers – HTML and CSS (and variants such as SASS and, decreasingly, LESS) are no longer sufficient. Mapping tools – Google Maps, MapBox, CartoDB and so on, are also gaining more prominence as ‘must have’ skills, particularly for visualization of geospatial data.

Without a doubt, having good Git skills for any coder or developer is becoming a prerequisite for all hackathons now, as is a GitHub account. There was mention of using Bitbucket occasionally, but Git is still by far the most popular choice for distributed version control for developers.

Slack really took off this year, with the GovHackHQ Slack having nearly 900 simultaneous users during the event. It’s fair to say that Slack has hit widespread adoption – and that’s largely due to both the large range of integrations it has available, as well as the excellent user experience it provides – irrespective of platform or operating system. It’s one of the few tools I can use that has both a native client on the operating systems I use – mainly Android and Linux – as well as an excellent and feature-equivalent web interface.

Hardware wise, I observed lots of wireless mice, and lots of ‘second screens’ – one screen just doesn’t seem to be enough anymore – perhaps a symbol of our ever-more-multitasked world?

UX skills and techniques also seem to be gaining traction – I saw user stories, wireframing, storyboarding and even some basic persona mapping going on.

Projects delivered by Geelong teams at GovHack Geelong 2016

Project
Team name
Link to project
Link to project video
Pet-tential Claws and Effect Homepage Video
NewsPulse Settlers of Cremorne Homepage Video
5-D City Explorer IDeEA Lab Homepage Video
Video (alt)
SmartPath SmartPath Homepage Video
GreenWalking GFox