A review of Taguette – an open source alternative for qualitative data coding

Motivation and context

As you might know, I’m currently undertaking a PhD program at Australian National University’s School of Cybernetics, looking at voice dataset documentation practices, and what we might be able to improve about them to reduce statistical and experienced bias in voice technologies like speech recognition and wake words. As part of this journey, I’ve learned an array of new research methods – surveys, interviews, ethics approaches, literature review and so on. I’m now embarking on some early qualitative data analysis.

The default tool in the qualitative data analysis space is NVIVO, made by Melbourne-based company, QSR. However, NVIVO has both a steep learning curve and a hefty price tag. I’m lucky enough that this pricing is abstracted away from me – ANU provides NVIVO for free to HDR students and staff – but reports suggest that the enterprise licensing starts at around $USD 85 per user. NVIVO operates predominantly as a desktop-based pieces of software and is only available for Mac or Windows. My preferred operating system is Linux – as that is what my academic writing toolchain based on LaTeX, Atom and Pandoc – is based on – and I wanted to see if there was a tool with equivalent functionality that aligned with this toolchain.

About Taguette

Taguette is a BSD-3 licensed qualitative coding tool, positioned as an alternative to NVIVO. It’s written by a small team of library specialists and software developers, based in New York. The developers are very clear about their motivation in creating Taguette;

Qualitative methods generate rich, detailed research materials that leave individuals’ perspectives intact as well as provide multiple contexts for understanding the phenomenon under study. Qualitative methods are used in a wide range of fields, such as anthropology, education, nursing, psychology, sociology, and marketing. Qualitative data has a similarly wide range: observations, interviews, documents, audiovisual materials, and more. However – the software options for qualitative researchers are either far too expensive, don’t allow for the seminal method of highlighting and tagging materials, or actually perform quantitative analysis, just on text. It’s not right or fair that qualitative researchers without massive research funds cannot afford the basic software to do their research. So, to bolster a fair and equitable entry into qualitative methods, we’ve made Taguette!

Taguette.org website, “About” page

This motivation spoke to me, and aligned with my own interest in free and open source software.

Running Taguette and identifying its limitations

For reproduceability, I ran Taguette version 1.1.1 on Ubuntu 20.04 LTS with Python 3.8.10

Taguette can be run in the cloud, and the website provides a demo server so that you can explore the cloud offering. However, I was more interested in the locally-hosted option, which runs on a combination of python, calibre, and I believe sqlite as the database backend, with SQLAlchemy for mappings. The install instructions recommend running Taguette in a virtual environment, and this worked well for me – presumably running the binary from the command line spawns a flask– or gunicorn– type web application, which you can then access in your browser. This locally hosted feature was super helpful for me, as my ethics protocol has restrictions on what cloud services I could use.

To try Taguette, I first created a project, then uploaded a Word document in docx format, and began highlighting. This was smooth and seamless. However, I soon ran into my first limitation. My coding approach is to use nested codes. Taguette has no functionality for nested codes, and no concomitant functionality for “rolling up” nested codes. This was a major blocker for me.

However, I was impressed that I could add tags in multiple languages, including non-Latin orthographies, such as Japanese and Arabic. Presumably, although I didn’t check this, Taguette uses Unicode under the hood – so it’s foreseeable that you could use emojis as tags as well, which might be useful for researchers of social media.

Taguette has no statistical analysis tools built in, such as word frequency distributions, clustering or other corpus-type methods. While these weren’t as important for me at this stage of my research, they are functions that I envisage using in the future.

Taguette’s CodeBook export and import functions work really well, and I was impressed with the range of formats that could be imported or exported.

What I would like Taguette to do in the future

I really need nested tags that have aggregation functionality for Taguette to be a a viable software tool for my qualitative data analysis – this is a high priority feature, followed by statistical analysis tools.

Some thoughts on the broader academic software ecosystem

Even though I won’t be adopting Taguette, I admire and respect the vision it has – to free qualitative researchers from being anchored to expensive, limiting tools. While I’m fortunate enough to be afforded an NVIVO license, many smaller, less wealthy or less research-intensive universities will struggle to provide a license seat for all qualitative researchers.

This is another manifestation of universities becoming increasingly beholden to large software manufacturers, rather than having in-house capabilities to produce and manage software that directly adds value to a university’s core capability of generating new knowledge. We’ve seen it in academic journals – with companies like EBSCO, Sage and Elsevier intermediating the publication of journals, hording copyrights to articles and collecting a tidy profit in the process – and we’re increasingly seeing it in academic software. Learning Management Systems such as Desire2Learn and Blackboard are now prohibitively expensive, while open source alternatives such as Moodle still require skilled (and therefore expensive) staff to be maintained and integrated – a challenge when universities are shedding staff in the post-COVID era.

Moreover, tools like NVIVO are imbricated in other structures which reinforce their dominance. University HDR training courses and resource guides are devoted to software tools which are in common use. Additionally, supervisors and senior academics are likely to use the dominant software, and so are in an influential position to recommend its use to their students. This support infrastructure reinforces their dominance by ascribing them a special, or reified status within the institution. At a broader level, even though open source has become a dominant business model, the advocacy behind free and open source software (FOSS) appears to be waning; open source is now the mainstream, and it no longer requires a rebel army of misfits, nerds and outliers (myself included) to be its flag-bearers. This begs the question – who advocates for FOSS within the academy? And more importantly – what influence do they have compared with a slick marketing and sales effort from a global multi-national? I’m reminded here of Eben Moglen’s wise words at linux.conf.au 2015 in Auckland in the context of opposing patent trolls through collective efforts – “freedom itself depends upon how we make use of the technologies we are creating”. That is, universities themselves have created the dependence on academic technologies which now restrict them.

There is hope, however. Platforms like ArXiv – the free distribution service and open access archive for nearly two million pre-prints in mathematics, computer science and other (primarily quant) fields – are starting to challenge the status quo. For example, the Australian Research Council recently overturned their prohibition on the citation of pre-prints in competitive grant applications.

Imagine if universities combined their resources – like they have done with ArXiv – to provide an open source qualitative coding tool, locally hosted, and accessible to everyone. In the words of Freire,

“Reading is not walking on the words; it’s grasping the soul of them.”

Paulo Freire, Pedagogy of the Oppressed

Qualitative analysis tools allow us to grasp the soul of the artefacts we create through research; and that ability should be afforded to everyone – not just those that can afford it.

State of my toolchain 2021

I’ve been doing a summary of the state of my toolchain now for around five years (2019, 2018, 2016). Tools, platforms and techniques evolve over time; the type of work that I do has shifted; and the environment in which that work is done has changed due to the global pandemic. Documenting my toolchain has been a useful exercise on a number of fronts; it’s made explicit what I actually use day-to-day, and, equally – what I don’t. In an era of subscription-based software, this has allowed me to make informed decisions about what to drop – such as Pomodone. It’s also helped me to identify niggles or gaps with my existing toolchain, and to deliberately search for better alternatives.

At a glance

Hardware, wearables and accessories

Software

Techniques

  • Pomodoro (no change since last report)
  • Passion Planner for planning (no change since last report)

What’s changed since the last report?

Writing workflow

Since the last report in 2019, I’ve graduated from a Masters in Applied Cybernetics at the School of Cybernetics at Australian National University. I was accepted into the first cohort of their PhD program. This shift has meant an increased focus on in-depth, academic-style writing. To help with this, I’ve moved to a Pandoc, Atom, Zotero and LaTeX-based workflow, which has been documented separately. This workflow is working solidly for me after about a year. Although it took about a weekend worth of setup time, it’s definitely saving me a lot of time.

Atom in particularly is my predominant IDE, and also my key writing tool. I use it with a swathe of plugins for LaTeX, document structure, and Zotero-based academic citations. It took me a while to settle on a UI and syntax theme for Atom, but in the end I went with Atom Solarized. My strong preference is to write in MarkDown, and then export to a target format such as PDF or LaTeX. Pandoc handles this beautifully, but I do have to keep a file of command line snippets handy for advanced functionality.

Primary machine

I had an ASUS Zenbook UX533FD – small, portable and great battery life, even with an MX150 GPU running. Unfortunately, the keyboard started to malfunction just after a year after purchase (I know, right). I gave up trying to get it repaired because I had to chase my local repair shop for updates on getting a replacement. I lodged a repair request in October, and it’s now May, so I’m not holding out hope… That necessitated me getting a new machine – and it was a case of getting whatever was available with the Coronavirus pandemic.

I settled on a ASUS ROG Zephyrus G15 GA502IV. I was a little cautious, having never had an AMD Ryzen-based machine before, but I haven’t looked back. It has 16 Ryzen 4900 cores, and an NVIDIA GeForce RTX 2060 with 6GB of RAM. It’s a powerful workhorse and is reasonably portable, if a little noisy. It get about 3 hours’ battery life in class. Getting NVIDIA dependencies installed under Ubuntu 20.04 LTS was a little tricky – especially cudnn, but that seems to be normal for anything NVIDIA under Linux. Because the hardware was so new, it lacked support in the 20.04 kernel, so I had to pull in experimental Wi-Fi drivers (it uses Realtek).

To be honest I was somewhat smug that my hardware was ahead of the kernel. One little niggle I still have is that the machine occasionally green screens. This has been reported with other ROG models and I suspect it’s an HDMI-under-Linux driver issue, but haven’t gone digging too far into driver diagnostics. Yet.

One idiosyncrasy of the Zephyrus G15 is that it doesn’t have built-in web camera; for me that was a feature. I get to choose when I do and don’t connect the web camera. And yes – I’m firmly in the web-cameras-shouldn’t-have-to-be-on by default camp.

Machine learning work, NVIDIA dependencies and utilities

Over the past 18 months, I’ve been doing a lot more work with machine learning, specifically in building the DeepSpeech PlayBook. Creating the PlayBook has meant training a lot of speech recognition models in order to document hyperparameters and tacit knowledge around DeepSpeech.

In particular, the DeepSpeech PlayBook uses a Docker image to abstract away Python, TensorFlow and other dependencies. However, this still requires all NVIDIA dependencies such as drivers and cudnn to be installed beforehand. NVIDIA has made this somewhat easier with the Linux CUDA installation guide, which advises on which version to install with other dependencies, but it’s still tough to get all the dependencies installed correctly. In particular, the nvtop utility, which is super handy for monitoring GPU operations (such as identifying blocking I/O or other bottlenecks) had to be compiled from source. As an aside, the developer experience for getting NVIDIA dependencies installed under Linux is a major hurdle for developers. It’s something I want NVIDIA to put some effort into going forward.

Colour customisation of the terminal with Gogh

I use Ubuntu Linux for 99% of my work now – and rarely boot into Windows. A lot of that work is based in the Linux terminal; from spinning up Docker containers for machine learning training, running Python scripts or even pandoc builds. At any given time I might have 5-6 open terminals, and so I needed a way to easily distinguish between them. Enter Gogh – an easy to install set of terminal profiles.

One bugbear that I still have with the Ubuntu 20.04 terminal is that the fonts that can be used with terminal profiles are restricted to only mono-spaced fonts. I haven’t been able to find where to alter this setting – or how the terminal is identifying which fonts are mono-spaced for inclusion. If you know how to alter this, let me know!

Linux variants of Microsoft software intended for Windows

ANU has adopted Microsoft primarily for communications. This means not only Outlook for mail – for which there are no good Linux alternatives (and so I use the web version), but also the use of Teams and OneNote. I managed to find an excellent alternative in OneNote for Linux by @patrikx3, which is much more usable than the web version of OneNote. Teams on Linux is usable for messaging, but for videoconferencing I’ve found that I can’t use USB or Bluetooth headphones or microphones – which essentially renders it useless. Zoom is much better on Linux.

Better microphone for videoconferencing and conference presentations

As we’ve travelled through the pandemic, we’re all using a lot more videoconferencing instead of face to face meetings, and the majority of conferences have gone online. I’ve recently presented at both PyCon AU 2020 and linux.conf.au 2021 around voice and speech recognition. Both conferences used the VenueLess platform. I decided to upgrade my microphone for better audio quality. After all, research has shown that speakers with better audio are perceived as more trustworthy. I’ve been very happy with the Stadium USB microphone.

Taskwarrior over Pomodone for tasks

I tried Pomodone for about 6 months – and it was great for integrating tasks from multiple sources such as Trello, GitHub and GitLab. However, I found it very expensive (around $AUD 80 per year) and the Linux version suddenly stopped working. The scripting options also only support Windows and Apple, not Linux. So I didn’t renew my subscription.

Instead, I’ve moved to Taskwarrior via Paul Fenwick‘s recommendation. This has some downsides – it’s a command line utility rather than a graphical interface, and it only works on a single machine. But it’s free, and it does what I need – prioritises the tasks that I need to complete.

What hasn’t changed

Wearables and hearables

My Mobvoi TicWatch Pro is still going strong, and Google appears to be giving Wear OS some love. It’s the longest I’ve had a smart watch, and given how rugged and hardy the TicWatch has been, it will definitely be my first choice when this one reaches end of life. My Plantronics BB Pro 2 are still going strong, and I got another pair on sale as my first pair are now four years old and the battery is starting to degrade.

Quantified self

I’ve started using Sleep as Android for sleep tracking, which uses data from the TicWatch. This has been super handy for assessing the quality of sleep, and making changes such as adjusting going-to-bed times. Sleep as Android exports data to Google Drive. BeeMinder ingests that data into a goal, and keeps me accountable for getting enough sleep.

RescueTime, BeeMinder and Passion Planner are still going strong, and I don’t think I’ll be moving away from them anytime soon.

Assistant services

I still refuse to use Amazon Alexa or Google Home – and they wouldn’t work with the 5GHz-band WiFi where I am living on campus. Mycroft.AI is still my go-to for a voice assistant, but I rarely use it now because the the Spotify app support for Mycroft doesn’t work anymore after Spotify blocked Mycroft from using the Spotify API.

One desktop utility that fits into the “assistant” space that I’ve found super helpful has been GNOME extensions. I use extensions for weather, peripheral selection and random desktop background selection. Being able to see easily during Australian summer how hot it is outside has been super handy.

Current gaps in my toolchain

I don’t really have any major gaps in my toolchain at the moment, but there are some things that could be better.

  • Visual Git Editor – I’ve been using command line Git for years now, but having a visual indicator of branches and merges is useful. I tried GitKraken, but I don’t use Git enough to justify the monthly-in-$USD price tag. The Git plugin for Atom is good enough for now.
  • Managing everything for me – I looked a Huginn a while back and it sounds really promising as a “second brain” – for monitoring news sites, Twitter etc – but I haven’t had time to have a good play with it yet.

State of my toolchain 2019

What’s changed in the last year?

As you might be aware, I’ve been doing a writeup of my toolchain every year or so for the last couple of years (2016, 2018). There are a couple of reasons for this:

  • The type of work that I do has changed in that time, necessitating exploring different tools, and different equipment
  • And the technology that I work with continues to evolve – new models, new ways of working, and new mindsets – and our toolchains need to evolve to

This year, I’m studying a Master of Applied Cybernetics at the 3A Institute in Canberra – back to being a student; which I haven’t done for five years. Interestingly, my tools of choice 5 years ago have remained steady – Zotero for referencing, LibreOffice for writing essay type work, and Atom as my IDE of choice.

The key changes are;

  • A change in the main laptops I use
  • I’ve adopted Trello / Pomodone / RescueTime as a combination for personal productivity, with Passion Planner as a written diary / visual planner
  • My Fitbit Ionic died an inelegant death and has been replaced by the Mobvoi TicWatch Pro

Main laptop

My Asus N76 finally gave up the ghost and had unrecoverable hardware failure, including failure of the Bluray/DVD-rom drive that was built in – it’s not worth repairing and I think I’ll send it to disposal / recycling after taking 7 years’ worth of stickers off the front.

You were a Good Computer, N76. You were a Very Good Computer.

In my previous Toolchain tear-down, you would have read about my interest in System 76‘s Oryx Pro 3. One of my friends was selling hers (huge thanks, Pia!), and I immediately fell in love with this hard working, nerd-first beast of a laptop. I chose to flash it with Ubuntu 18.04 LTS rather than System 76’s POP OS, basically because I’m so familiar with Ubuntu and I didn’t want any additional learning curve. This machine continues to be my desk-based workhorse of choice. It’s a beautiful, solid, high-performance machine, but it’s not a good mobile choice.

Enter the ASUS Vivobook (my model is the X510UQ). I bought one of these devices for Mum, as she needed a new machine, and was so impressed with it – it has 16GB of RAM and a reasonable NVIDIA GPU (!) that I went back to the shop and got one for myself. The mobility is so-so – with a battery of about 4 hours if the screen is reasonably dim, but then I tend to run a lot of CPU- and battery-hungry apps. It’s lightweight, has HDMI out and 3 USB ports and the small bevel means plenty of screen space. I’ve set it up to dual boot Windows and Ubuntu, and if I’m honest it could use a much bigger SSD. That will be a holiday job.

Mobile phone

My Pixel died a couple of months ago after the battery life suddenly dropped to less and 30 minutes after the update to Android 9 – a problem that seems to be quite widespread. I’ve been on a Pixel 3 since; primarily because it’s what JB Hi-Fi in Geelong had in stock. The camera is amazing, and I’ve finally ditched my 3.5mm audio jack headphones for Bluetooth headphones.

Wearables

My Fitbit Ionic was a beautiful device until a release of Android in around November last year; after which I could no longer pair the Ionic with the Pixel phone. Getting support for this was incredibly problematic; it was difficult, time-consuming and very poor after-sales support from Fitbit. As a result, I ditched Fitbit and made the switch to WearOS, and have been on the Mobvoi TicWatch Pro ever since. The device is too chunky for most women, but well, I’m not most women, and it fits on my giant fat wrist just fine. The battery life isn’t great, but I’ve found that the heart rate monitor is the largest drain on battery.

One gotcha with the Mobvoi Ticwatch Pro is the charger. I bought two chargers with the device, and managed to “fry” – short circuit – them both by running higher than 1 Amp current through them (with a high current charger). This is well documented on Reddit. This was pretty poor poor IMHO for a high-end smartwatch.

WearOS has been an unexpectedly smooth experience; it doesn’t have the ecosystem or the integration that FitBit has, but that’s also a positive. I can choose the apps and watch faces that best suit me, from multiple different vendors. I’ve settled on the Venom watch face in neutral colours.

A smartwatch remains a key part of my toolchain – moreso than ever.

Quantified Self

I continue to use and be very happy with RescueTime and BeeMindr. I’ve been through a myriad of to-do tools in the past few years and seem to have settled on a combination of both Trello and Pomodone this year. Pomodone is beautiful; it’s an electron-based app that’s available for Linux (Woot!). Seriously considering upgrading to the paid version in a couple of months if it continues to prove its value.

For visual planning and diarising, I went to Passion Planner, driven by being a full time student again. I’ve been very happy with the model it uses – iterative goal setting and pattern-forming, and have already bought in my 2020 diary. As a visual person, it gives me plenty of space to visualise, to draw and to map out plans, goals and actions. I used the medium size this year, and found it marginally too small; so have upgraded to the large size for 2020.

Headphones

No change, the Plantronics Backbeat Pro bluetooth headphones are still fantastically awesome.

Streaming Media

No change, still Spotify premium.

Input devices

No change.

Voice Assistant

No change, still the awesome Mycroft.AI

Internet of Things and Home Automation

I’m on residential college this year at Burgmann College at ANU. Their Wifi network is a 5Ghz spectrum, PEAP/MSCHAPv2 authenticated beastie, and nothing much in the IoT space speaks to it, because IoT standards and security, what are they even? 🙁

It feels really weird to have to physically turn my light off now – my default behaviours have been changed by home automation.

Gaps in my toolchain and how they’ve been plugged

In the last edition of State of the Toolchain, these were my key bugbears:

  • Visual Git Editor – I’ve given up on this and learned to love the command line. In hindsight it’s been a great learning experience, and my git fluency has improved out of sight (hah!).
  • Better internet – ANU is on gig internet. *laughs in TCP/IP* I’m going to be in dire straights though if/when I have to go back to a copper-based NBN FttN service *cries in copper*.

Have I missed anything? What do you use?