State of my toolchain 2022

Welcome to my now-nearly-yearly State of my Toolchain report (you can see previous editions for 2021, 2019, 2018 and 2016). I began these posts as a way to document the tools, applications and hardware that were useful to me in the work that I did, but also to help observe how they shifted over time – as technology evolved, my tasks changed, and as the underpinning assumptions of usage shifted. In this year’s post, I’m still going to cover my toolchain at a glance, and report on what’s changed, and what gaps I still have in my workflow – and importantly – reflect on the shifts that have occurred over 5 years.

At a glance

Hardware, wearables and accessories

Software

  • Atom with a range of plugins for writing code, thesis notes (no change since last report)
  • Pandoc for document generation from MarkDown (no change since last report)
  • Zotero for referencing (using Better BibTeX extension) (no change since last report)
  • OneNote for Linux by @patrikx3 (no change since last report)
  • Nightly edition of Firefox (no change since last report)
  • Zoom (no change since last report)
  • Microsoft Teams for Linux (no change since last report)
  • Gogh for Linux terminal preferences (no change since last report)
  • Super productivity (instead of Task Warrior) (changed since last report)
  • Cuckoo Timer for Pomodoro sessions (changed since last report)
  • RescueTime for time tracking (no change since last report)
  • BeeMindr for commitment based goals (no change since last report)
  • Mycroft as my Linux-based voice assistant (no change since last report)
  • Okular as my preferred PDF reader (instead of Evince on Linux and Adobe Acrobat on Windows) (changed since last report)
  • NocoDB for visual database work (changed this report)
  • ObservableHQ for data visualistion (changed this report)

Techniques

  • Pomodoro (no change since last report)
  • Passion Planner for planning (no change since last report)
  • Time blocking (used on and off, but a lot more recently)

What’s changed since the last report?

There’s very little that’s changed since my last State of My Toolchain report in 2021: I’m still doing a PhD at the Australian National University’s School of Cybernetics, and the majority of my work is researching, writing, interviewing, and working with data.

Tools for PhD work

My key tools are MaxQDA for qualitative data analysis – Windows only, unfortunately, and prone to being buggy with OneDrive. My writing workflow is done using Atom. One particularly useful tool I’ve adopted in the last year has been NocoDB – it’s an opensource alternative to visual database interfaces like Notion and AirTable, and I found it very useful – even if the front end was a little clunky. Working across Windows and Linux, I’ve settled on Okular as my preferred PDF reader and annotator – I read on average about 300-400 pages of PDF content a week, and Adobe Acrobat was buggy as hell. Okular has fine-grained annotating tools, and the interface is the same across Windows and Linux. Another tool I’ve started to use a lot this year is ObservableHQ – it’s like Jupyter notebook, but for d3.js data visualisations. Unfortunately, they’ve recently brought in a change to their pricing structure, and it’s going to cost me $USD15 a month for private notebooks – and I don’t think the price point is worth it.

Hardware and wearables

The key changes this year are a phone upgrade – my Pixel 3 screen died, and the cost to replace the screen was exorbitant – a classic example of planned obsolescence. I’ve been happy with Google’s phones – as long as I disable all the spyware voice enabled features, and settled on the Pixel 4a 5G. It’s been a great choice – clear, crisp photos, snappy processor, and excellent battery life.

After nearly four years, my Mobvoi Ticwatch Pro started suffering the “ghost touch” problem, where the touch interface started picking up non-existent taps. A factory restart didn’t solve the problem, so I got the next model up – the Ticwatch Pro 2020 – at 50% off. This wearable has been one of my favourite pieces of hardware – fast, responsive, durable – and I can’t imagine not having a smartwatch now. I’ve settled on the Flower watch face after using Pujie Black for a long time – both heavily customisable. The love Google is giving to Wear OS is telling – I have much smoother integration between phone apps and Wear OS apps than even 1-2 years ago.

After having two Plantronics Backbeat Pro headphones – one from around 2017 and the other circa 2021, both still going, but the first with a very poor battery life and battered earpads, I invested in my first pair of reasonable headphones – the Sennheiser Momentum Pro 3. The sound quality is incredible – I got them for $AUD 300 which I thought was a lot to pay for headphones, but they’ve been worth every penny – particularly when listening to speech recognition data.

With so much PhD research and typing, I found my Logitech MK240 just wasn’t what I needed – it’s a great little unit if you don’t have anything else, but it was time for a mechanical keyboard because I love expensive hobbies. After some research, and a mis-step with the far too small HuoJi Z-88, (the keypresses for linux command line tasks were horrendous) I settled on the Keychron K8 and haven’t looked back. Solid, sturdy, blue Gateron switches – it’s a dream to type on, and works well across Windows and Linux. However, on Linux it is using a Mac keyboard layout and I had to do some tweaking with a keymapper – and used keyd. My only disappointment with Keychron is the hackyness needed to get it working properly on Linux.

Productivity

My Passion Planner is still going strong, but I haven’t been as diligent as using it as a second brain as I have been in the past, and the price changes this year meant that shipping one to Australia cost me nearly $AUD 120 in total – and that’s unaffordable in the longer term – so I’m actively looking at alternatives as as Bullet Journalling. The Passion Planner is great – it’s just expensive.

I’ve also dropped Task Warrior in favour of Super Productivity this year. Task Warrior isn’t cross-platform – I can’t use it on Android, or on Windows, and thanks to MaxQDA software, I’m spending a lot more of my time in Windows. The Gothenberg Bit Factory are actively developing Task Warrior – full transparency, I’m a GitHub sponsor of theirs – but the cloud-based and cross-platform features seem to be taking a while to come to fruition.

I’m also using time-blocking a lot more, and am regularly using Cuckoo as a pomodoro timer with a PhD cohort colleague, T. We have an idea for a web app that optimises the timing of Pomodoros based on a feedback loop – but more on that next year.

Current gaps in my toolchain

Visual Git editor

In my last State of My Toolchain report, I lamented having a good Visual Git Editor. That’s been solved in Windows with GitHub’s desktop application, but as of writing the Linux variant appears to be permanently mothballed. I’m sure this has nothing to do with Microsoft buying GitHub. So I am still on the lookout for a good Linux desktop Git GUI. On the other hand, doing everything by CLI is always good practice.

Second Brain

In my last report I also mentioned having taken Huginn for a spin, and being let down at its immaturity. It doesn’t seem to have come very far. So I’m still on the lookout for “Second Brain” software – this is more than the knowledge management software in the space that tools like Roam and Obsidian occupy, but much more an organise-your-life tool. The Microsoft suite – Office, Teams, and their stablemates – are trying to fill this niche, but I want something that’s not dependent on an enterprise login. But I’ve decided to reframe this gap as a “Second Brain” gap – after reading Tiago Forte’s book on the topic.

The Fediverse

Triggered by Elon Musk’s purchase, and subsequent transformation of Twitter into a flaming dumpster fire, I’ve become re-acquainted with the Fediverse – you can find me on Mastodon here, on Pixelfed.au here, and on Bookwyrm here. However, the tooling infrastructure around the Fediverse isn’t as mature – understandably – as commercial platforms. I’m using Tusky as my Android app, and the advanced web interface. But there are a lack of hosting options for the Fediverse – I can’t find a pre-configured Digital Ocean Droplet for Mastodon, for example – and I think the next year will see some development in this space. If you’re not across Mastodon, I wrote a piece that uses cybernetic principles to compare and contrast it with Twitter.

5 years of toolchain trends

After five years of the State of My Toolchain report, I want to share some reflections on the longer-term trends that have been influential in my choice of tools.

Cross-platform availability and dropping support for Linux

I work across three main operating systems – Linux, Windows (because I have to for certain applications) and Android. The tools I use need to work seamlessly across all three. There’s been a distinct trend over the last five years for applications to start providing Linux support but then move to a “community” model or drop support altogether. Two cases in point were Pomodone – which I dropped because of its lack of Linux support, and RescueTime – which still works on Linux for me, albeit with some quirks (such as not restarting properly when the machine awakes from suspend). This is counter-intuitive given the increasing usage of Linux on the desktop. The aspiration of many Linux aficionados that the current year will be “The Year of Linux on the Desktop” is not close to fruition – but the statistics show a continued, steady rise – if small – in the number of Linux desktop users. This is understandable though – startups and small SaaS providers cannot justify supporting such a small user base. That said, they shouldn’t claim to support the operating system then drop support – as both Pomodone and RescueTime have done.

Takeaway: products I use need to work cross-platform, anywhere, anytime – and especially on Linux.

Please don’t make me change my infrastructure to work with your product

A key reason for choosing the Ticwatch Pro 2020 over other Mobvoi offerings was that the watch’s charger was the same between hardware models. I’d bought a couple of extra chargers to have handy, and didn’t want to have to buy more “spares”. This mirrors a broader issue with hardware – it has a secondary ecosystem. I don’t just need a mobile phone, I need a charger, a case, and glass screen protectors – a bunch of accessories. These are all different – they exhibit variety – a deliberate reduction in re-usability and a buffer against commodification. But in choosing hardware, one of my selection criteria is now re-usability or upgradeability – how can I re-use this hardware’s supporting infrastructure. The recent decision by Europe to standardise on USB-C is the right one.

Takeaway: don’t make me buy a second infrastructure to use your product.

I’m happy to pay for your product, but it has to represent value for money, or it’s gone

Several of my tools are open source – Super Productivity, NocoDB, Atom, Pandoc – and where I can, I GitHub sponsor them or provide a monetary contribution.On the whole, these pieces of software are often worth a lot more too me than the paid proprietary software I used – for example, MaxQDA is over $AUD 300 a year – predominantly because it only has one main competitor, NVIVO. I have no issue paying for software, but it has to represent value for money. If I can get the same value – or nearly equivalent – from an open source product, then I’m choosing open source. Taguette wasn’t there over MaxQDA, but Super Productivity has equivalent functionality to Pomodone. Open source products keep proprietary products competitive – and this is a great reason to invest in open source where you are able.

That’s it! Are there any products or platforms you’ve found particularly helpful? Let me know in the comments.

Building a database to handle PhD interview tracking using MySQL and noco-db

So, as folx probably know, I’m currently during a PhD at the Australian National University’s School of Cybernetics, investigating voice data practices and what we might be able to do to change them to have less biased voice data, and less biased voice technology products. If you’d like to see some of the things I’ve been working on, you can check out my portfolio. Two of my research methods are related to interviews; the first tranche being shorter exploratory interviews and the second being in-depth interviews with machine learning practitioners.

Because there are many stages to interviews – identifying participants, approaching them for interviews, obtaining consent, scheduling, transcription and coding – I needed a way to manage the pipeline. My PhD cohort colleagues use a combination of AirTable and Notion, but I wanted an opensource alternative (surprise!).

Identifying alternatives and choosing one to use

I did a scan of what alternatives were available simply by searching for “open source alternative to AirTable”. Some of the options I considered but discarded were:

  • BaseRow: While this is open source, and built in widely adopted frameworks such as Django and Vue.js, and available in Docker and Heroku deploys, the commercial framing behind the product is very much open core. That is, there are a lot of features that are only available in the paid / premium version. I’ve worked with open core offerings before, and I’ve found that the most useful features are usually those that are behind the paywall.
  • AppFlowy: While this looked really impressive, and the community behind it looked strong, the use of Flutter and Rust put me off – I’m not as familiar with either of them compared to Vue.js or Django. I also found the documentation really confusing – for example, to install the Linux version it said to “use the official package”, but it didn’t give the name of the official package. Not helpful. On this basis I ruled out AppFlowy.
  • DBeaver: This tool is more aimed at people who have to work with multiple databases; it provides a using GUI over the top of the database, but is not designed to be a competitor to Notion or AirTable. I wanted something more graphically-focused, and with multiple layout styles (grid, card etc).

This left me with NoCoDB. I kicked the tyres a bit by looking at the GitHub code, and read through the documentation to get a feel for whether it was well constructed; it was. Importantly, I was able to install it on my localhost; my ethics protocol for my research method prevented it being hosted on a cloud platform.

Installation

Installation was a breeze. I set up a database in MySQL (also running locally), then git clone‘d the repo, and used npm to install the software:

git clone https://github.com/nocodb/nocodb-seed
cd nocodb-seed
npm install
npm start

nocodb uses node.js’s httpd server, and starts the application by default on port 80, so to start using it, you simply go to: http://localhost:8080/. One slightly frustrating thing is that it does require an email address and password to log in. nocodb is a commercial company – they’ve recently done a raised and are hiring – and I suspect this is part of their telemetry, even for self-hosted solutions. I run Pihole as my DNS server, and I don’t see any telemetry from nocodb in my block list, however.

Next, you need to provide nocodb with the MySQL database details that you created earlier. This creates some additional tables. nocodb then creates some base views, but at this point you are free to start creating your own.

Deciding what fields I needed to capture to be able to visualise my interview pipeline

Identifying what fields I needed to track was a case of trial and error. As I added new fields, or modified the datatypes of existing ones, nocodb was able to be easily re-synced with the underlying database schema. This makes

Identifying what fields I needed to track was a case of trial and error. As I added new fields, or modified the datatypes of existing ones, nocodb was able to be easily re-synced with the underlying database schema. This makes nocodb ideal for prototyping database structures.

nocodb showing tables out of sync
nocodb now in sync with the underlying tables

In the end, I settled on the following tables and fields:

Interviewees table

  • INTERVIEWEE_ID – a unique, auto-incrementing ID for each participant
  • REAL_NAME – the real name of my participant (and one of the reasons this is running locally and not in the cloud)
  • CODE_NAME – a code name I ascribed to each participant, as part of my Ethics Protocol
  • ROLE_ID – foreign key identifier for the ROLES table.
  • EMAIL_ADDRESS – what it says on the tin.
  • LINKEDIN_URL – I used LinkedIn to contact several participants, and this was a way of keeping track of that information.
  • HOMEPAGE_URL – the participant’s home page, if they had one. This was useful for identifying the participant’s background – part of the purposive sampling technique.
  • COUNTRY_ID – foreign key identifier for the COUNTRIES table – again used for purposive sampling.
  • HOW_IDENTIFIED – to identify whether people had been snowball sampled
  • HAS_BEEN_CONTACTED – Boolean to flag whether the participant had been contacted
  • HAS_AGREED_TO_INTERVIEW – Boolean to flag whether the participant had agreed to be interviewed
  • NO_RESPONSE_AFTER_SEVERAL_ATTEMPTS – Boolean to flag whether the participant hadn’t responded to a request to interview
  • HAS_DECLINED – Boolean to flag an explicit decline
  • INTERVIEW_SCHEDULED – Boolean to indicate a date had been scheduled with the participant
  • IS_EXPLORATORY – Boolean to indicate the interview was exploratory rather than in-depth. Having an explicit Boolean for the interview type allows me to add others if needed (while I felt that a full blown table for interview type was overkill).
  • IS_INDEPTH – Boolean for the other type of interview I was conducting.
  • INTERVIEWEE_DESCRIPTION – descriptive information about the participant’s background. Used to help me formulate questions relevant to the participant.
  • CONSENT_RECEIVED – Boolean to flag whether the participant had provided informed consent.
  • CONSENT_URL – A space to record the file location of the consent form.
  • CONSENT_ALLOWS_PARTICIPATION – A flag relevant to specific type of participation in my ethics protocol, and my consent form
  • CONSENT_ALLOWS_IDENTIFICATION_VIA_PARTICIPANT_CODE – A flag relevant to how participants were able to elect to be identified, as part of my ethics protocol.
  • INTERVIEW_CONDUCTED – Boolean to flag that the interview had been conducted.
  • TRANSCRIPT_DONE – Boolean to flag that the transcript had been created (I used an external company for this).
  • TRANSCRIPT_URL – A space to record the file location of the transcript.
  • TRANSCRIPT_APPROVED – Boolean to indicate the participant had reviewed and approved the transcript.
  • TRANSCRIPT_APPROVED_URL – A space to record the file location of the approved transcript
  • CODING_FIRST_DONE – Boolean to indicate first pass coding done
  • CODING_FIRST_LINK – A space to record the file location of the first coding
  • CODING_SECOND_DONE – Boolean to indicate second pass coding done
  • CODING_SECOND_URL – A space to record the file location of the second coding
  • NOTES – I used this field to make notes about the participant or to flag things to follow up.
  • LAST_CONTACT – I used this date field so I could easily order interviewees to follow them up.
  • LAST_MODIFIED – This field auto-updated on update.

Countries table

  • COUNTRY_ID – Unique identifier, used as primary key and foreign key reference in the INTERVIEWEES table.
  • COUNTRY_NAME – human readable name of the country, useful for demonstrating purposive sampling.
  • LAST_MODIFIED – This field auto-updated on update.

Roles table

  • ROLE_ID – Unique identifier, used as primary key and foreign key reference in the INTERVIEWEES table.
  • ROLE_TITLE – human readable title of the role, used for purposive sampling.
  • ROLE_DESCRIPTION – descriptive information about the activities performed by the role.
  • LAST_MODIFIED – This field auto-updated on update.

If I were to update the database structure in the future, I would be inclined to have a “URLs” table, where the file links for things like consent forms and transcripts are stored. Having them all in one table would make it easier to do things like URL validation. This was overkill for what I needed here.

Thinking also about the interview pipeline, the status of the interviewee in the pipeline is a combination of various Boolean flags. I would have found it useful to have a summary STATUS_ID with a useful descriptor of the status.

Get the SQL to replicate the database table structure

I’ve exported the table structure to SQL in case you want to use it for your own interview tracking purposes. It’s a Gist because I can’t be bothered altering my wp_options.php to allow for .sql uploads, and that’s probably a terrible idea, anyway 😉

Creating views based on field values to track the interview pipeline

Now that I had a useful table structure, I settled on some Views that helped me create and manage the interview pipeline. Views in nocodb are lenses on the underlying database – that restrict or constrain the data that is shown so that it’s more relevant to the task at hand. This is done through showing or hiding fields, and then filtering the selected fields.

  • Data entry view – this was a form view where I could add new Interviewees.
  • Views for parts of the pipeline – I set up several grid views that restricted Interviewees using filters to the part of the interview pipeline they were in. These included those I had and hadn’t contacted, those who had a scheduled interview, those who hadn’t responded, as well as several views for where the interviewee was in the coding and consent pipeline.
  • At a glance view – this was a gallery view, where I could get an overview of all the potential and confirmed participants.

A limitation I encountered working with these views is that there’s no way to provide summary information – like you might with a SUM or COUNT query in SQL. Ideally I would like to be able to build a dashboard that provides statistics on how many participants are at each part of the pipeline, but I wasn’t able to do this.

Updating nocodb

nocodb is under active development, and has regular updates. Updating the software proved to be incredibly easy through npm, with a two-line command:

Uninstall NocoDB package

npm uninstall nocodb

Install NocoDB package

npm install --save nocodb

Parting thoughts

Overall, I have been really impressed by nocodb – it’s a strong fit for my requirements in this use case – easily prototypable, runs locally, and is easily updateable. The user interface is still not perfect, and is downright clunky in places, but as an open source alternative to AirTable and Notion, it hits the spot.

Solving MaxQDA error 1001: Error while converting the project!

As folx might know, I’m currently undertaking a PhD at ANU’s School of Cybernetics – where I’m researching voice and speech data and datasets that are used to train machine learning models used for things like speech recognition and wake word detection. And if you’ve been following my posts on the State of my Toolchain, and my previous post exploring Taguette, you’ll know that I’ve settled on MaxQDA as my qualitative data analysis software. In general, I’ve found MaxQDA to be great software – the user interface is intuitive and the analytical features it have make qualitative data analysis faster. It’s expensive – and is a yearly subscription – but at the moment, it’s earning its price tag.

One definite bugbear I have though is how MaxQDA interacts with SharePoint. As part of my ethics protocol, I am storing my PhD data on university systems – not to external cloud tools like DropBox or Next Cloud. Instead, I save the MaxQDA files to my local (Windows – MaxQDA doesn’t have a Linux client, unfortunately) machine. This is then synced with OneDrive to the University’s SharePoint server.

This works well. Except when it doesn’t.

A couple of months ago I had an error that seemed like a once-off; an error where MaxQDA apparently couldn’t convert the project file. This error was presented when I opened the MaxQDA file (a .mx22 file):

MaxQDA error 1001: Error while converting the project!
MaxQDA Error code: 1001 “Error while converting the project!”

Like so many error messages, it violated design principles for good error messages; it wasn’t a precise description of what had gone wrong, it wasn’t human readable, and it didn’t give me any helpful advice on how to solve the problem. So, I had to figure it out myself.

I tried the obvious things first;

  • I closed MaxQDA and re-launched the software; the error persisted.
  • I restarted my computer and then re-launched the MaxQDA software; the error persisted.
  • I stopped and started the OneDrive service; the error persisted.

At this point, it was clear I’d have to dig deeper into OneDrive. In File Explorer, I could see that the file was still synchronising with OneDrive:

Windows file explorer showing MaxQDA file still synchronising in OneDrive
MaxQDA file still synchronising with OneDrive

By rights, stopping and starting OneDrive should have re-synchronised the file; but it hadn’t.

OneDrive was also showing a synchronisation error:

MaxQDA file causing a Sync issue in OneDrive. OneDrive thinks that the file is in use.
MaxQDA – file still open in OneDrive

Clearly, OneDrive thought that the MaxQDA file was still open; and was not syncing the file to the cloud for this reason. However, closing MaxQDA, OneDrive and a whole reboot had not fixed this error.

My conclusion from this investigation is that MaxQDA somehow leaves an open file handle; for example if the application closes unexpectedly. The open file handle is not cleared via MaxQDA, via OneDrive, or via the underlying Windows operating system. So how else might you clear an open file handle?

Windows isn’t my preferred operating system; and I don’t know enough about the OS internals to go digging into file handles and how to clear them. So I went rm -rf; or about as close to it as you can get on Windows …

The solution

The only thing that did fix the issue was uninstalling OneDrive, re-installing OneDrive, and then re-authenticating OneDrive and allowing it to sync to the cloud. My working hypothesis is that the uninstallation of OneDrive forces Windows to clear any open OneDrive file handles; then the re-installation returns the MaxQDA file to a known good state.

All in all, this took about an hour of investigation to identify the issue and find a workaround. And to be clear – the solution is just a workaround – it doesn’t address the underlying problem – which is that MaxQDA files that synchronise from a local machine to the cloud via OneDrive or SharePoint are prone to synchronisation failure that manifests in an open file handle; which in turn leads to an obscure error message.

This has now happened to me twice – but at least now I know how to fix it next time …

Update: Resetting the OneDrive cache appears to resolve this issue

So, after encountering this issue with MaxQDA for around the fifth time, and even after uninstalling and reinstalling OneDrive, I did a bit more digging, and found some blog posts that suggested resetting the OneDrive cache.

This is covered in this how-to-guide, but the commands are essentially:

  1. Open the Windows command tool as Administrator (you need to be an administrator to clear the OneDrive cache)
  2. Run the command %localappdata%\Microsoft\OneDrive\onedrive.exe /reset\
  3. Then restart OneDrive by running the application

This worked for me – so it’s another possible workaround for this very frustrating issue!