The Mycroft Mark II and the wind-down of Mycroft AI: it’s all about ecosystems, infrastructures and the friction of privacy

As avid readers may know, I used to work in DevRel for an open source voice assistant company called Mycroft.AI. In the past two weeks, Mycroft AI’s CEO, Michael Lewis, has announced that the company is winding down operations, and will be unable to fulfill any further Kickstarter pledges. In this post, I first give the Mark II a technical once-over, and conclude that it’s a solid piece of kit. So, why couldn’t a privacy-focused voice assistant go to market and reach adoption in age of growing awareness of surveillance capitalism 1Zuboff, S. (2019). The age of surveillance capitalism: The fight for a human future at the new frontier of power: Barack Obama’s books of 2019. Profile books., datafication 2Sadowski, J. (2019). When data is capital: Datafication, accumulation, and extraction. Big data & society, 6(1), 2053951718820549. and commodified, open hardware? In the second half of this post, I take a critical lens to voice assistants more broadly, and reflect on the ecosystems, infrastructures and other components that are needed to provide value to the end user – and why voice assistants are currently hitting the “trough of disillusionment”.

For full transparency, it’s important that I state that as part of my contract arrangement with Mycroft – I contracted for around 18 months between September 2017 and January 2019 – I was offered, and took up, stock options – although they’re now worthless.

The Mycroft Mark II Hardware

Unboxing

Mycroft Mark II box

The box for the Mark II was sturdy, and well-designed, carrying forward many of the design features of its predecessor, the Mark 1: folded cardboard which provided protection and buffering for the device inside. It survived a long international journey without a scratch.

Inside was a useful “Get started” card that provide Quick Start links, and a hardware warranty leaflet.

Mycroft smiley face upon opening the box

The smiley face upon opening the box was a particularly thoughtful touch. We are used to anthropomorphising voice assistants – imbuing them with human characteristics. I’m torn on this – on one hand, they perform part of the function of interacting with another human, but lack so much of the nuance and meaning that emerges from engaging with humans.

Variety of power connectors

One other very pleasing aspect of the unboxing was that I didn’t have to buy an Australian general power outlet adaptor. Australia uses Type I sockets – and foreign-manufactured devices often require an adaptor (thank goodness for USB).

The Australian Type I socket isn’t shown here because it’s connected to the Mycroft Mark II.

Setting up the device

I found setting up the Mark II to be incredibly smooth. It ships with “Dinkum” software, which, according to the Mycroft website, is a stable operating system for the Mark II. However, it diverges significantly from the previous Mycroft Core software – meaning that Skills previously developed for previous iterations of Mycroft software – for Picroft, or the Mark I, or Mycroft on Linux – won’t work on the Mark II. I found Dinkum to be very stable – it “just worked” out of the box.

Connecting to WiFi

The Mark II uses WiFi connection technology from Pantacor. Once the Mark II had finished booting, I was advised to connect to an SSID that the Mark II was advertising using my phone or laptop, and from there, enter my WiFi credentials. I wasn’t surprised that only my 2.4GHz SSID was detected – I have a dual band router that also advertises a 5GHz SSID – and I wasn’t surprised that enterprise WiFi – WPA2 – wasn’t supported. This appears to be a common issue with IoT devices; they will readily connect to consumer grade networks, but cannot handle the additional complexity – such as MSCHAPv2 authentication – required by WPA2 networks. I suspect this presents a barrier to enterprise IoT adoption – and certainly voice assistant adoption.

Pairing the device with the backend platform

When I worked at Mycroft.AI, pairing the device with the backend platform – https://home.mycroft.ai – was one of the most frequently problematic parts of the device. This was seamless. The Mark II device displayed a pairing code on screen, along with a URL to visit, and it paired almost immediately. I was then able to set the device’s physical location and timezone.

Testing out Skills

One of the frequent criticisms of the Mark II running Dinkum software is its lack of Skills; this criticism is well founded. I can ask basic questions – which hit the Wikipedia backend – or ask for time and weather information – but anything more is unavailable. I was particularly disappointed that there wasn’t any music integration with the Mark II – which boasts dual 1.5″ 5W drivers – but I can’t hold this against Mycroft.

In late 2020, Spotify revoked the API access that Mycroft was using to provide premium Spotify subscribers (Spotify’s API only works for subscribed users) with access to Spotify via Mycroft. Local music is an option, but because I use music streaming – well, apart from my now-nearly-30-year-old CD collection – this wasn’t of much use.

Hardware features

The Mark II sports some impressive hardware. There is an inbuilt camera (not that any Skills make use of it), with a physical privacy shutter.

The microphone is a 2-mic array with noise cancellation, and a physical mute switch – very important for those who like privacy – although I don’t know the brand of the mic.

The mute button integrated very well with the design of the touch screen – with the NeoPixel indicator taking on a red hue, and the border of the touch screen rendering in red also when hardware mute is on.

Seeed Studios hardware are generally considered best-of-breed for embedded Linux devices, but I don’t think this is a Seeed microphone.

The screen is a 4.3″ IPS touchscreen, and it boasts bright, bold colours. I’m guessing it has a resolution of 800px by 400px, but don’t hold me to that. The board itself is based on the Raspberry Pi 4, and the Pi’s GPIO pins 1, 12, 13, GND and 3.3V power are exposed so that you can integrate other hardware with the device; that is, it is extensible if you’re into open hardware. There’s also a 1GB Ethernet RJ45 socket – so you are not reliant on WiFi – which was quite useful. The case itself is very sturdy, and needs a torx screwdriver to open.

SSH’ing into the device

SSH on the default port 22 is blocked and you need to use a custom port to SSH into the device – this is well documented. For security, the Mark II uses SSH security keys; requiring you to generate an SSH key, and enter this on the home.mycroft.ai platform. The key is then sent to the device, and you then SSH in to the custom port using the key. In my opinion this is far more secure than depending on passwords, however, this introduces a closer dependency on the home.mycroft.ai platform – and as we’ll see later, this closely ties the hardware to a supporting platform.

What’s under the hood?

Once SSH’d into the device, I was able to take a closer look at the mycroft-dinkum software in operation.

Voice activity detection / keyword spotter

One of the software layers in a voice assistant stack is a wake word listener or keyword spotter. This software constantly listens for spoken phrases – utterances – and classifies whether the utterance is a wake word. If the wake word listener detects a wake word, then the voice assistant listens for the next utterance, and assumes the next utterance spoken is a voice assistant command.

Previous versions of Mycroft used the Precise wake word listener, and I found it regularly had a high number of false positives – triggering when I didn’t say the “Hey Mycroft” wake word – and false negatives – not responding when I did utter the wake word. The Mark II was incredibly accurate in detecting my wake word utterances – especially considering I am female, and have an Australian accent – groups for which Precise performed poorly.

The mycroft-dinkum software uses several services. The voice service within mycroft-dinkum provides speech recognition, voice activity detection and microphone input. Unsurprisingly, because this service is constantly listening, it consumes a lot of the CPU of the device, which we can see if we run htop while ssh‘d in:

htop output from the Mycroft II while ssh’d in

I was curious about what models were being used for wake word detection, and by looking into the GitHub code, I was able to determine that the Mark II was using the Silero VAD engine. Silero is a pre-trained model. However, there is no information available about the data it was trained on – the GitHub page doesn’t say – or the type of algorithm – such as a recurrent neural network – used to train the model.

I have a hunch – and it’s just a hunch – that Silero uses some of the Mycroft Precise wake word data that was captured via opt-in mechanisms. I was the main female contributor to that dataset – and if Silero was trained on Precise data, it would explain why it’s so accurate for my voice. But, because the provenance of the training data isn’t clear, I won’t be able to tell for sure.

Speech recognition

Finding out what is used for speech recognition was slightly more challenging. In previous versions of Mycroft software, the speech recognition layer is configurable through the mycroft-config.conf file, allowing the use of on-device speech recognition, or a preferred cloud provider. That configuration is stored in a different location under mycroft-dinkum, but I was able to find it. The STT module was set to mycroft – I’m not sure what this means, but I think it means Mark II is using the Google cloud service, anonymised through the home.mycroft.ai platform. Again, there’s a cloud – and network – dependency here.

Intent parsing and Skills

Once an STT utterance transcription has been identified by the speech recognition layer, the voice assistant needs to know how to tie the utterance to a command and pass it to a Skill for handling that command. In the Mark II, from what I could tell in the source code, that layer is provided by Padatious, a neural-network based intent parser.

The Skills range on the Mark II is very limited – you can search for weather, find out the time in different timezones – which was one of my most-used Skills, play from a pre-defined list of internet radio stations, and query Wikipedia. Limiting the range of Skills means that intent parsing is easier – because you have fewer Skills to choose between. However, passing the right query to a Skill can be problematic if your speech recognition isn’t accurate. Intent parsing for Named Entities – people, places, products, especially Indigenous-language-derived ones, worked reasonably well. Finding out the weather for “Yarrawonga”, “Geelong” and “Canberra” were all recognised correctly; “Wagga Wagga” wasn’t. Queries to Wikipedia were parsed accurately – “tell me about pierogi” (Polish derivation) and “tell me about Maryam Mirzakhani” (Persian derivation) – were both correctly identified.

Text to speech

Text to speech, or synthetic speech, is a complement to speech recognition, and takes written phrases and outputs speech as audio. For this layer, the Mark II uses the Mimic 3 engine, a TTS engine based on the ViTS algorithm. The GitHub repo for Mimic 3 doesn’t contain the original data the Mimic 3 voices were trained on, but in this blog post, developer Mike Hansen provides a technical walkthrough of how the models were trained, including the challenges of training new languages, and the phonetic and normalisation challenges that entails.

By default, the Mark II uses the UK English “Pope” Mimic 3 voice, which is a male-gendered voice. I was pleasantly surprised by this, given the industry default of gendering voice assistants as female, which is covered at length in the book The Smart Wife by Yolande Strengers and Jenny Kennedy 10Strengers, Y., & Kennedy, J. (2021). The smart wife: Why Siri, Alexa, and other smart home devices need a feminist reboot. Mit Press.. There were no Australian accents available.

I have a lot of my own (female, Australian) voice data recorded, but I didn’t want to contribute it to the Mimic 3 project. There are growing concerns around how voice and video data of people is being used to create realistic avatars – digital characters that sound, and look, like real humans – and until we have better regulation, I don’t particularly want a “Kathy” voice out in the wild – because I have no control over how it would be used. For example, I know that sex doll companies have approached voice startups wanting access to voices to put into their products. As we say in ‘Strayan: yeah, nah.

Mimic 3 works offline, and I was surprised at how fast it generated speech – there was a slight delay, but not very noticeable at all. Some of the pronunciations in the Pope voice were definitely a little off – a key issue in TTS is getting the generated voice to have the correct emphasis and prosody – but I was still pretty impressed. When I’m done with my PhD work, I’d love to train a Mimic 3 voice on my own voice data to understand the process a bit better.

Screen

Mark II uses the Plymouth splash screen manager for the touch screen visuals, and looking at the htop output, this one component uses over 16% of the available CPU. I found the visuals very well-designed and complementary to the audio information that was presented by the TTS layer.

The screen adds additional complexity and consideration for Skill designers however; not only do they have to design a pleasant voice interaction experience in a Skill , but for devices like the Mark II which are multi-modal, the Skill designer must now ensure a complementary visual experience.

Overall impressions

The Mark II packs some impressive hardware, and mature software into a solid form factor. The visual design is well integrated with the conversational design, and it’s intended to be extensible – a boon for those who like to experiment and tinker. The software layers that are used are generally matured, and predominantly run on-device, with the exception of the speech recognition layer (I think). The Skills all have a dependency on external services – Wikipedia, weather and so on. The integration with the home.mycroft.ai backend serves to protect my privacy – as long as I trust that platform – and there was no evidence in any of the digging I did that the device is “leaking” data to third party services.

These positive factors are tarnished by the high price point (listed at $USD 499, although the Kickstarter price was a lot lower at $USD 199) and the lack of Skills that integrate with other services – like Spotify. This is a device that is capable of so much more – it’s a diamond in the rough. But what would give it more polish?

And that takes me to part 2 of this post – ecosystems, infrastructures and the friction of privacy.

A critical lens on an inflection point in voice assistant technology: we are in the trough of disillusionment

It’s a difficult time for the voice assistant industry.

Amazon has recently announced huge layoffs to its Alexa voice assistant department (see write ups by Ars Technica, Business Insider, and Vox), as it struggles to find a path to monetisation for the loss-leading hardware devices that its been shipping. Google has followed this trend. This also comes off the back of an architectural change at Google where it dropped support for third party applications on its Assistant voice apps and devices, which has the effect of more tightly integrating its hardware with the Android ecosystem. Apple – the other big player in the voice assistant space, with Siri, has begun quietly laying off third-party contractors, according to reports. Baidu, the largest voice player outside of Western Europe and America, with its Xiaodu assistant, which had a $5b valuation just two short years ago, has also shed 10-15% of its staff. The outlier here is the open source Home Assistant, and its commercial sister operation, Nabu Casa, who have announced the impending launch of a voice-assistant-with-screen competitor to Google Nest Hub and to the Amazon Echo Show, the Home Assistant Yellow.

Previous predictions that there will be 8 billion voice assistant devices in use by this year now appear idealistic indeed.

Voice assistants and theories of innovation

It’s fair to say that voice assistants have reached the “trough of disillusionment”. This term belongs to Gartner and their Hype Cycle; a frame they use to plot the adoption and integration of emerging technology. The “trough of disillusionment” refers to a period in a technology’s history where:

Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters.

Gartner Hype Cycle

The Gartner Hype Cycle is a theory of innovation; different technologies move through the Hype Cycle at varying rates, influenced by a range of factors. Other theories of innovation use divergent analytical lenses, or ascribe primacy to differing drivers and constraints on technology. The diffusion of innovation theory gave us the terms “early adopters” and “laggards”. Induced innovation 11Ruttan, V. W. (1997). Induced innovation, evolutionary theory and path dependence: sources of technical change. The Economic Journal, 107(444), 1520-1529. for example places emphasis on economic factors such as market demand. Evolutionary theory focuses on the search for better tools and technologies, with the market selecting – and ensuring the survival – of the best. Path dependence models valorize the role of seemingly insignificant historical changes that compound to shape a technology’s dominance or decline. The multi-level perspective blends the micro and macro levels together, showing how they interact to influence technological development. Disruptive innovation theory 12Si, S., & Chen, H. (2020). A literature review of disruptive innovation: What it is, how it works and where it goes. Journal of Engineering and Technology Management, 56, 101568. takes a contingent approach; different innovations require different strategies to challenge and unseat established incumbents. Apple unseated Nokia with touch screens. Linux dominated the data centre due to higher reliability and performance. Netflix swallowed Blockbuster by leveraging increasing internet speeds for content delivery. Disruptions harnesses interacting social, economic and political developments.

I digress. What all of these views of innovation have in common, regardless of their focus – is inter-dependency of factors that influence a technology’s trajectory – of adoption, success, return on investment.

So what are the inter-dependencies in voice assistant technology that lead us to our current inflection point?

Platform and service inter-dependencies

Voice assistants are interfaces. They enable interaction with other systems. We speak to our phones to call someone. We speak to our televisions to select a movie. We speak to our car console to get directions. A voice assistant like Mycroft, or Alexa, or Google Home or Siri is a multi-purpose interface. It is reliant on connecting to other systems – systems of content, systems of knowledge, and systems of data. Wikipedia for knowledge articles, a weather API for temperature information, or a music content provider for music. These are all platform inter-dependencies.

Here, large providers have an advantage because they can vertically integrate their voice assistant. Google has Google Music, Amazon has Amazon Music, Apple has Apple Music. Mycroft has no music service, and this has no doubt played into Spotify’s decision to block Mycroft from interfacing with it. Content providers know that their content is valuable; this is why Paramount and HBO are launching their own platforms rather than selling content to Netflix.

Voice assistants need other platforms to deliver value to the end user. Apple knew this when they acquired Dark Sky and locked Android users out of the platform; although platforms like Open WeatherMap are filling this gap. We’ve seen similar content wars play out in the maps space; Google Maps is currently dominant, but Microsoft and Bing are leveraging OpenStreetMap to counter this with Map-Builder – raising the ire of open source purists in the process.

Voice assistants need content, and services, to deliver end user value.

Discovery – or identifying what Skills a voice assistant can respond to

Voice assistants are interfaces. We use myriad interfaces in our everyday lives; a steering wheel, a microwave timer, an espresso coffee maker, an oven dial, a phone app, natural language, a petrol bowser, an EV charger. I doubt you RTFM’d for them all. And here’s why: interfaces are designed to be intuitive. We intuitively know how a steering wheel works (even if, for example, you’ve never driven a manual). If you’ve used one microwave, you can probably figure out another one, even if the display is in a foreign language. Compare this with the cockpit of the Concorde – a cacophony of knobs, buttons and dials.

If you’ve used Alexa or Siri, then you could probably set a timer, or ask about something from Wikipedia. But what else can the assistant do? This is the discovery problem. We don’t know what interfaces can do, because we often don’t use all the functions an interface provides. When was the last time you dimmed the dashboard lights in your car? Or when did you last set a defrost timer on your microwave?

The same goes for voice assistants; we don’t know what they can do, and this means we don’t maximise their utility. But what if a voice assistant gave you hints about what it could do? As Katherine Latham reports in this article for the BBC, it turns out that people find that incredibly annoying; we don’t want voice assistants to interrupt us, or suggest things to us. We find that intrusive.

How, then, do we become more acquainted with a voice assistant’s capabilities?

For more information on skill discovery in voice assistants, you might find this paper interesting.

White, R. W. (2018). Skill discovery in virtual assistants. Communications of the ACM, 61(11), 106-113.

Privacy and the surveillance economy – or – who benefits from the data a voice assistant collects?

There is a clear trade-off in voice assistants between functionality and privacy. Amazon Alexa can help you order groceries – from Amazon. Google Home can read you your calendar and email – from Gmail. Frictionless, seamless access to information and agents which complete tasks for you requires sharing data with the platforms that provide those functions. This is a two-fold dependency; first the platforms that provide this functionality must be available – my vertical dependency argument from above. Secondly, you must provide personal information – preferences, a login, something trackable – to these platforms in order to receive their benefits.

I’m grossly oversimplifying his arguments here, however the well-known science and technology studies scholar, Luciano Floridi, has argued that “privacy is friction” 13Floridi, L. (2005). The ontological interpretation of informational privacy. Ethics and information technology, 7, 185-200.. The way that information is organised contributes to, or impinges upon, our privacy. Voice assistants that track our requests, record our utterances, and then use this information to suggestively sell more products to us reduce friction by reducing privacy. Mireille Hildebrandt, in her book Technology and the Ends of the Law, goes one step further: voice assistants impugn our personal agency through their anticipatory nature 14Hildebrandt, M. (2015). Smart technologies and the end (s) of law: novel entanglements of law and technology. Edward Elgar Publishing.. By predicting, or assuming our desires and needs, our ability to be reflective in our choices, to be be mindful of our activities, is eroded. Academic Shoshana Zuboff takes a broader view of these developments, theorising that we live in an age of surveillance capitalism; where the data produced through technologies which surveil us – our web browsing history, CCTV camera feeds – and yes – utterances issued to a voice assistant – become a form of capital which is traded in order to more narrowly market to us. Jathan Sadowski has argued, similarly, for the concept of datafication; when our interactions with the world become a form of capital – for trading, for investing, and for extraction.

Many of us have become accustomed to speaking with voice interfaces, and glossing over how those utterances are stored, or linked, or mined – downstream in an ecosystem. In fact, Professor Joe Turow argues this case eloquently in his book, The Voice Catchers: 15Turow, J. (2021). The voice catchers: How marketers listen in to exploit your feelings, your privacy, and your wallet. Yale University Press. by selling voice assistants at less than cost, their presence, and our interactions with them have been normalised, and backgrounded. We don’t think anything of sharing our data with the corporate platforms on which they rest. Giving personal data to a voice assistant is something we take for granted. By design. We trade the utility of frictionless access to services with the friction to privacy.

And this points to a key challenge for open source and private voice assistants – like Mycroft and like Home Assistant; in order to deliver services and content through those voice assistants, we have to give up some privacy. Mycroft handles this by abstraction; for example, the speech recognition that is done through Google’s cloud service is channelled through home.mycroft.ai – and done under a single identifier so that individual Mycroft users’ privacy is protected.

How do voice assistants overcome the tradeoff between utility and privacy?

Hardware is hard

One of my favourite books is by the developer of the almost-unheard-of open source assistant, Chumby – Bunnie Huang – The Hardware Hacker 16Huang, A. B. (2019). The hardware hacker: Adventures in making and breaking hardware. No Starch Press.. It is a chronicle of the myriad challenges involved in designing, manufacturing, certifying and bringing to market a new consumer device. For every mention of “Chumby” in the book, you could substitute “Mycroft Mark II”. Design tradeoffs, the capital required to fund manufacturing, quality control issues, a fickle consumer market – all are present in the tale of the Mark II.

Hardware is hard.

Hardware has to be designed, tested, integrated with software through drivers and embedded libraries, it has to be certified compliant with regulations. And above all, it has to yield a profit for the hardware manufacturer to keep manufacturing the hardware. If we think about the escalating costs of the Mark II – the Chumby – and then look at how cheaply competitor devices – Alexa, Google Home – are sold for – then it becomes clear that hardware is a loss leader for ecosystem integration. I have no way to prove this, but I strongly suspect that the true cost of an Alexa or Google Home is four or five times more than what a consumer pays for it.

Voice assistants are interfaces.

And by having a voice assistant in a consumer’s home, it becomes an interface into a broader ecosystem – more closely imbricating and embroiling the customer in the manufacturer’s ecosystem. And if you can’t transform that loss leader into a recurrent revenue stream – through additional purchases, or through paid voice search, or through voice advertising revenue – then the hardware becomes a sunk cost. And you start laying off staff.

An open source voice assistant strategy is different – its selling point is in opposition to a commercial assistant – its privacy, its interoperability, its extensibility. Its lack of lock in to an ecosystem. And the Mark II is all of those things – private, interoperable and extensible. But it still hasn’t achieved product market fit, particularly at its high, true-cost price point.

How do voice assistants reconcile the cost of hardware and the ability to achieve product market fit?

Overcoming the trough of disillusionment

So how might voice assistants overcome the trough of disillusionment?

Higher utility through additional content, data sources and APIs

Voice assistant utility is a function of how many Skills the device has, how useful those Skills are to the end user, and how frequently they are used. Think about your phone. What apps do you use the most? Text messaging? Facebook? Tik Tok? What app could you not delete from your phone? Skills require two things; a data source or content source, and a developer community to build them. Open source enthusiasts may build a Skill out of curiosity or to slake a desire to be able to build a Skill, but commercial entities will only invest in Skill building if it generates revenue for a product or service. And then what does the voice assistant manufacturer do to share in that revenue? So we need to find ways to incentivise Skill development (and maintenance) as well as revenue sharing models that help support the infrastructure, Skill development – and the service or data that the Skill interfaces with. For example, Spotify understands this – and will reserve access to their highly-valued content only for business arrangement that help them generate additional revenue.

I also see governments having a role to play here – imagine for example accessing government services through your voice assistant – no sitting in queues on a phone. The French Minitel service was originally a way to help citizens access services like telephone directories, and postal information. But governments want to both streamline the development they do in-house, and control access to API information; will there be a level of comfort in opening access – and if so, who bears the cost of development?

Distinguishing voice assistants in the home from the voice assistant in your pocket (your mobile phone)

Most of us already have a voice assistant in our pocket – if we have an Android or iPhone mobile phone. So what niche does a physical voice assistant serve? One differentiator I see here is privacy; a voice assistant on a mobile phone ties you to the ecosystem of that device – to Apple, or Google or another manufacturer. Another differentiator is the context in which the voice assistant operates; few of us would want to use a voice assistant in a public or semi-public context, such as on a train or a bus or in a crowd. But a home office is semi-private; and many of us are now working from home. Is there an opportunity for a home office assistant? That isn’t tied to a mobile phone ecosystem? But thinking this through its trajectory of development, if we’re working from home, then we’re working. How will employers seek to leverage voice assistants, and does this sit in opposition to privacy? We are already seeing a backlash against the rise of workplace surveillance (itself a form of surveillance capitalism), so I think there will be barriers to employment or work-based technology being deployed on voice assistants.

Towards privacy, user agency and user choice

I’m someone who places a premium on privacy: I pay for encrypted communications technology and pay for encrypted email that isn’t harvested for advertising. I pay for services like relay that hide my email address from those who would seek to harvest it for surveillance capitalism.

But not everyone does; because privacy is friction, and we have normalised the absence of friction for the price of privacy. As we start to see models like ChatGPT and Whisper that hoover up all the public data on the internet – YouTube videos, public photographs on platforms like Flickr – I think we will start to see more public awareness of how are data is being used – and not always in our own best interests. In voice assistants, this means more safe-guarding of user data, and more protection against harnessing that data for profit.

Voice assistants also have a role to play in user agency and user choice. This means giving people choice about where intents – the commands used to activate Skills – lead. For example, if a commercial voice assistant “sells” the intent “buy washing powder” to the highest “washing powder” bidder, then this restricts user agency and user choice. So we need to think about ways that put control back in the user’s hands – or, in this case, voice. But this of course constrains the revenue generation capabilities of voice assistants.

For voice assistants to escape the trough of disillusionment, they will need to prioritise privacy, agency, utility and choice – and still find a path to revenue.

Footnotes

  • 1
    Zuboff, S. (2019). The age of surveillance capitalism: The fight for a human future at the new frontier of power: Barack Obama’s books of 2019. Profile books.
  • 2
    Sadowski, J. (2019). When data is capital: Datafication, accumulation, and extraction. Big data & society, 6(1), 2053951718820549.
  • 3
    Strengers, Y., & Kennedy, J. (2021). The smart wife: Why Siri, Alexa, and other smart home devices need a feminist reboot. Mit Press.
  • 4
    Ruttan, V. W. (1997). Induced innovation, evolutionary theory and path dependence: sources of technical change. The Economic Journal, 107(444), 1520-1529.
  • 5
    Si, S., & Chen, H. (2020). A literature review of disruptive innovation: What it is, how it works and where it goes. Journal of Engineering and Technology Management, 56, 101568.
  • 6
    Floridi, L. (2005). The ontological interpretation of informational privacy. Ethics and information technology, 7, 185-200.
  • 7
    Hildebrandt, M. (2015). Smart technologies and the end (s) of law: novel entanglements of law and technology. Edward Elgar Publishing.
  • 8
    Turow, J. (2021). The voice catchers: How marketers listen in to exploit your feelings, your privacy, and your wallet. Yale University Press.
  • 9
    Huang, A. B. (2019). The hardware hacker: Adventures in making and breaking hardware. No Starch Press.
  • 10
    Strengers, Y., & Kennedy, J. (2021). The smart wife: Why Siri, Alexa, and other smart home devices need a feminist reboot. Mit Press.
  • 11
    Ruttan, V. W. (1997). Induced innovation, evolutionary theory and path dependence: sources of technical change. The Economic Journal, 107(444), 1520-1529.
  • 12
    Si, S., & Chen, H. (2020). A literature review of disruptive innovation: What it is, how it works and where it goes. Journal of Engineering and Technology Management, 56, 101568.
  • 13
    Floridi, L. (2005). The ontological interpretation of informational privacy. Ethics and information technology, 7, 185-200.
  • 14
    Hildebrandt, M. (2015). Smart technologies and the end (s) of law: novel entanglements of law and technology. Edward Elgar Publishing.
  • 15
    Turow, J. (2021). The voice catchers: How marketers listen in to exploit your feelings, your privacy, and your wallet. Yale University Press.
  • 16
    Huang, A. B. (2019). The hardware hacker: Adventures in making and breaking hardware. No Starch Press.

State of my toolchain 2022

Welcome to my now-nearly-yearly State of my Toolchain report (you can see previous editions for 2021, 2019, 2018 and 2016). I began these posts as a way to document the tools, applications and hardware that were useful to me in the work that I did, but also to help observe how they shifted over time – as technology evolved, my tasks changed, and as the underpinning assumptions of usage shifted. In this year’s post, I’m still going to cover my toolchain at a glance, and report on what’s changed, and what gaps I still have in my workflow – and importantly – reflect on the shifts that have occurred over 5 years.

At a glance

Hardware, wearables and accessories

Software

  • Atom with a range of plugins for writing code, thesis notes (no change since last report)
  • Pandoc for document generation from MarkDown (no change since last report)
  • Zotero for referencing (using Better BibTeX extension) (no change since last report)
  • OneNote for Linux by @patrikx3 (no change since last report)
  • Nightly edition of Firefox (no change since last report)
  • Zoom (no change since last report)
  • Microsoft Teams for Linux (no change since last report)
  • Gogh for Linux terminal preferences (no change since last report)
  • Super productivity (instead of Task Warrior) (changed since last report)
  • Cuckoo Timer for Pomodoro sessions (changed since last report)
  • RescueTime for time tracking (no change since last report)
  • BeeMindr for commitment based goals (no change since last report)
  • Mycroft as my Linux-based voice assistant (no change since last report)
  • Okular as my preferred PDF reader (instead of Evince on Linux and Adobe Acrobat on Windows) (changed since last report)
  • NocoDB for visual database work (changed this report)
  • ObservableHQ for data visualistion (changed this report)

Techniques

  • Pomodoro (no change since last report)
  • Passion Planner for planning (no change since last report)
  • Time blocking (used on and off, but a lot more recently)

What’s changed since the last report?

There’s very little that’s changed since my last State of My Toolchain report in 2021: I’m still doing a PhD at the Australian National University’s School of Cybernetics, and the majority of my work is researching, writing, interviewing, and working with data.

Tools for PhD work

My key tools are MaxQDA for qualitative data analysis – Windows only, unfortunately, and prone to being buggy with OneDrive. My writing workflow is done using Atom. One particularly useful tool I’ve adopted in the last year has been NocoDB – it’s an opensource alternative to visual database interfaces like Notion and AirTable, and I found it very useful – even if the front end was a little clunky. Working across Windows and Linux, I’ve settled on Okular as my preferred PDF reader and annotator – I read on average about 300-400 pages of PDF content a week, and Adobe Acrobat was buggy as hell. Okular has fine-grained annotating tools, and the interface is the same across Windows and Linux. Another tool I’ve started to use a lot this year is ObservableHQ – it’s like Jupyter notebook, but for d3.js data visualisations. Unfortunately, they’ve recently brought in a change to their pricing structure, and it’s going to cost me $USD15 a month for private notebooks – and I don’t think the price point is worth it.

Hardware and wearables

The key changes this year are a phone upgrade – my Pixel 3 screen died, and the cost to replace the screen was exorbitant – a classic example of planned obsolescence. I’ve been happy with Google’s phones – as long as I disable all the spyware voice enabled features, and settled on the Pixel 4a 5G. It’s been a great choice – clear, crisp photos, snappy processor, and excellent battery life.

After nearly four years, my Mobvoi Ticwatch Pro started suffering the “ghost touch” problem, where the touch interface started picking up non-existent taps. A factory restart didn’t solve the problem, so I got the next model up – the Ticwatch Pro 2020 – at 50% off. This wearable has been one of my favourite pieces of hardware – fast, responsive, durable – and I can’t imagine not having a smartwatch now. I’ve settled on the Flower watch face after using Pujie Black for a long time – both heavily customisable. The love Google is giving to Wear OS is telling – I have much smoother integration between phone apps and Wear OS apps than even 1-2 years ago.

After having two Plantronics Backbeat Pro headphones – one from around 2017 and the other circa 2021, both still going, but the first with a very poor battery life and battered earpads, I invested in my first pair of reasonable headphones – the Sennheiser Momentum Pro 3. The sound quality is incredible – I got them for $AUD 300 which I thought was a lot to pay for headphones, but they’ve been worth every penny – particularly when listening to speech recognition data.

With so much PhD research and typing, I found my Logitech MK240 just wasn’t what I needed – it’s a great little unit if you don’t have anything else, but it was time for a mechanical keyboard because I love expensive hobbies. After some research, and a mis-step with the far too small HuoJi Z-88, (the keypresses for linux command line tasks were horrendous) I settled on the Keychron K8 and haven’t looked back. Solid, sturdy, blue Gateron switches – it’s a dream to type on, and works well across Windows and Linux. However, on Linux it is using a Mac keyboard layout and I had to do some tweaking with a keymapper – and used keyd. My only disappointment with Keychron is the hackyness needed to get it working properly on Linux.

Productivity

My Passion Planner is still going strong, but I haven’t been as diligent as using it as a second brain as I have been in the past, and the price changes this year meant that shipping one to Australia cost me nearly $AUD 120 in total – and that’s unaffordable in the longer term – so I’m actively looking at alternatives as as Bullet Journalling. The Passion Planner is great – it’s just expensive.

I’ve also dropped Task Warrior in favour of Super Productivity this year. Task Warrior isn’t cross-platform – I can’t use it on Android, or on Windows, and thanks to MaxQDA software, I’m spending a lot more of my time in Windows. The Gothenberg Bit Factory are actively developing Task Warrior – full transparency, I’m a GitHub sponsor of theirs – but the cloud-based and cross-platform features seem to be taking a while to come to fruition.

I’m also using time-blocking a lot more, and am regularly using Cuckoo as a pomodoro timer with a PhD cohort colleague, T. We have an idea for a web app that optimises the timing of Pomodoros based on a feedback loop – but more on that next year.

Current gaps in my toolchain

Visual Git editor

In my last State of My Toolchain report, I lamented having a good Visual Git Editor. That’s been solved in Windows with GitHub’s desktop application, but as of writing the Linux variant appears to be permanently mothballed. I’m sure this has nothing to do with Microsoft buying GitHub. So I am still on the lookout for a good Linux desktop Git GUI. On the other hand, doing everything by CLI is always good practice.

Second Brain

In my last report I also mentioned having taken Huginn for a spin, and being let down at its immaturity. It doesn’t seem to have come very far. So I’m still on the lookout for “Second Brain” software – this is more than the knowledge management software in the space that tools like Roam and Obsidian occupy, but much more an organise-your-life tool. The Microsoft suite – Office, Teams, and their stablemates – are trying to fill this niche, but I want something that’s not dependent on an enterprise login. But I’ve decided to reframe this gap as a “Second Brain” gap – after reading Tiago Forte’s book on the topic.

The Fediverse

Triggered by Elon Musk’s purchase, and subsequent transformation of Twitter into a flaming dumpster fire, I’ve become re-acquainted with the Fediverse – you can find me on Mastodon here, on Pixelfed.au here, and on Bookwyrm here. However, the tooling infrastructure around the Fediverse isn’t as mature – understandably – as commercial platforms. I’m using Tusky as my Android app, and the advanced web interface. But there are a lack of hosting options for the Fediverse – I can’t find a pre-configured Digital Ocean Droplet for Mastodon, for example – and I think the next year will see some development in this space. If you’re not across Mastodon, I wrote a piece that uses cybernetic principles to compare and contrast it with Twitter.

5 years of toolchain trends

After five years of the State of My Toolchain report, I want to share some reflections on the longer-term trends that have been influential in my choice of tools.

Cross-platform availability and dropping support for Linux

I work across three main operating systems – Linux, Windows (because I have to for certain applications) and Android. The tools I use need to work seamlessly across all three. There’s been a distinct trend over the last five years for applications to start providing Linux support but then move to a “community” model or drop support altogether. Two cases in point were Pomodone – which I dropped because of its lack of Linux support, and RescueTime – which still works on Linux for me, albeit with some quirks (such as not restarting properly when the machine awakes from suspend). This is counter-intuitive given the increasing usage of Linux on the desktop. The aspiration of many Linux aficionados that the current year will be “The Year of Linux on the Desktop” is not close to fruition – but the statistics show a continued, steady rise – if small – in the number of Linux desktop users. This is understandable though – startups and small SaaS providers cannot justify supporting such a small user base. That said, they shouldn’t claim to support the operating system then drop support – as both Pomodone and RescueTime have done.

Takeaway: products I use need to work cross-platform, anywhere, anytime – and especially on Linux.

Please don’t make me change my infrastructure to work with your product

A key reason for choosing the Ticwatch Pro 2020 over other Mobvoi offerings was that the watch’s charger was the same between hardware models. I’d bought a couple of extra chargers to have handy, and didn’t want to have to buy more “spares”. This mirrors a broader issue with hardware – it has a secondary ecosystem. I don’t just need a mobile phone, I need a charger, a case, and glass screen protectors – a bunch of accessories. These are all different – they exhibit variety – a deliberate reduction in re-usability and a buffer against commodification. But in choosing hardware, one of my selection criteria is now re-usability or upgradeability – how can I re-use this hardware’s supporting infrastructure. The recent decision by Europe to standardise on USB-C is the right one.

Takeaway: don’t make me buy a second infrastructure to use your product.

I’m happy to pay for your product, but it has to represent value for money, or it’s gone

Several of my tools are open source – Super Productivity, NocoDB, Atom, Pandoc – and where I can, I GitHub sponsor them or provide a monetary contribution.On the whole, these pieces of software are often worth a lot more too me than the paid proprietary software I used – for example, MaxQDA is over $AUD 300 a year – predominantly because it only has one main competitor, NVIVO. I have no issue paying for software, but it has to represent value for money. If I can get the same value – or nearly equivalent – from an open source product, then I’m choosing open source. Taguette wasn’t there over MaxQDA, but Super Productivity has equivalent functionality to Pomodone. Open source products keep proprietary products competitive – and this is a great reason to invest in open source where you are able.

That’s it! Are there any products or platforms you’ve found particularly helpful? Let me know in the comments.

State of my toolchain 2021

I’ve been doing a summary of the state of my toolchain now for around five years (2019, 2018, 2016). Tools, platforms and techniques evolve over time; the type of work that I do has shifted; and the environment in which that work is done has changed due to the global pandemic. Documenting my toolchain has been a useful exercise on a number of fronts; it’s made explicit what I actually use day-to-day, and, equally – what I don’t. In an era of subscription-based software, this has allowed me to make informed decisions about what to drop – such as Pomodone. It’s also helped me to identify niggles or gaps with my existing toolchain, and to deliberately search for better alternatives.

At a glance

Hardware, wearables and accessories

Software

Techniques

  • Pomodoro (no change since last report)
  • Passion Planner for planning (no change since last report)

What’s changed since the last report?

Writing workflow

Since the last report in 2019, I’ve graduated from a Masters in Applied Cybernetics at the School of Cybernetics at Australian National University. I was accepted into the first cohort of their PhD program. This shift has meant an increased focus on in-depth, academic-style writing. To help with this, I’ve moved to a Pandoc, Atom, Zotero and LaTeX-based workflow, which has been documented separately. This workflow is working solidly for me after about a year. Although it took about a weekend worth of setup time, it’s definitely saving me a lot of time.

Atom in particularly is my predominant IDE, and also my key writing tool. I use it with a swathe of plugins for LaTeX, document structure, and Zotero-based academic citations. It took me a while to settle on a UI and syntax theme for Atom, but in the end I went with Atom Solarized. My strong preference is to write in MarkDown, and then export to a target format such as PDF or LaTeX. Pandoc handles this beautifully, but I do have to keep a file of command line snippets handy for advanced functionality.

Primary machine

I had an ASUS Zenbook UX533FD – small, portable and great battery life, even with an MX150 GPU running. Unfortunately, the keyboard started to malfunction just after a year after purchase (I know, right). I gave up trying to get it repaired because I had to chase my local repair shop for updates on getting a replacement. I lodged a repair request in October, and it’s now May, so I’m not holding out hope… That necessitated me getting a new machine – and it was a case of getting whatever was available with the Coronavirus pandemic.

I settled on a ASUS ROG Zephyrus G15 GA502IV. I was a little cautious, having never had an AMD Ryzen-based machine before, but I haven’t looked back. It has 16 Ryzen 4900 cores, and an NVIDIA GeForce RTX 2060 with 6GB of RAM. It’s a powerful workhorse and is reasonably portable, if a little noisy. It get about 3 hours’ battery life in class. Getting NVIDIA dependencies installed under Ubuntu 20.04 LTS was a little tricky – especially cudnn, but that seems to be normal for anything NVIDIA under Linux. Because the hardware was so new, it lacked support in the 20.04 kernel, so I had to pull in experimental Wi-Fi drivers (it uses Realtek).

To be honest I was somewhat smug that my hardware was ahead of the kernel. One little niggle I still have is that the machine occasionally green screens. This has been reported with other ROG models and I suspect it’s an HDMI-under-Linux driver issue, but haven’t gone digging too far into driver diagnostics. Yet.

One idiosyncrasy of the Zephyrus G15 is that it doesn’t have built-in web camera; for me that was a feature. I get to choose when I do and don’t connect the web camera. And yes – I’m firmly in the web-cameras-shouldn’t-have-to-be-on by default camp.

Machine learning work, NVIDIA dependencies and utilities

Over the past 18 months, I’ve been doing a lot more work with machine learning, specifically in building the DeepSpeech PlayBook. Creating the PlayBook has meant training a lot of speech recognition models in order to document hyperparameters and tacit knowledge around DeepSpeech.

In particular, the DeepSpeech PlayBook uses a Docker image to abstract away Python, TensorFlow and other dependencies. However, this still requires all NVIDIA dependencies such as drivers and cudnn to be installed beforehand. NVIDIA has made this somewhat easier with the Linux CUDA installation guide, which advises on which version to install with other dependencies, but it’s still tough to get all the dependencies installed correctly. In particular, the nvtop utility, which is super handy for monitoring GPU operations (such as identifying blocking I/O or other bottlenecks) had to be compiled from source. As an aside, the developer experience for getting NVIDIA dependencies installed under Linux is a major hurdle for developers. It’s something I want NVIDIA to put some effort into going forward.

Colour customisation of the terminal with Gogh

I use Ubuntu Linux for 99% of my work now – and rarely boot into Windows. A lot of that work is based in the Linux terminal; from spinning up Docker containers for machine learning training, running Python scripts or even pandoc builds. At any given time I might have 5-6 open terminals, and so I needed a way to easily distinguish between them. Enter Gogh – an easy to install set of terminal profiles.

One bugbear that I still have with the Ubuntu 20.04 terminal is that the fonts that can be used with terminal profiles are restricted to only mono-spaced fonts. I haven’t been able to find where to alter this setting – or how the terminal is identifying which fonts are mono-spaced for inclusion. If you know how to alter this, let me know!

Linux variants of Microsoft software intended for Windows

ANU has adopted Microsoft primarily for communications. This means not only Outlook for mail – for which there are no good Linux alternatives (and so I use the web version), but also the use of Teams and OneNote. I managed to find an excellent alternative in OneNote for Linux by @patrikx3, which is much more usable than the web version of OneNote. Teams on Linux is usable for messaging, but for videoconferencing I’ve found that I can’t use USB or Bluetooth headphones or microphones – which essentially renders it useless. Zoom is much better on Linux.

Better microphone for videoconferencing and conference presentations

As we’ve travelled through the pandemic, we’re all using a lot more videoconferencing instead of face to face meetings, and the majority of conferences have gone online. I’ve recently presented at both PyCon AU 2020 and linux.conf.au 2021 around voice and speech recognition. Both conferences used the VenueLess platform. I decided to upgrade my microphone for better audio quality. After all, research has shown that speakers with better audio are perceived as more trustworthy. I’ve been very happy with the Stadium USB microphone.

Taskwarrior over Pomodone for tasks

I tried Pomodone for about 6 months – and it was great for integrating tasks from multiple sources such as Trello, GitHub and GitLab. However, I found it very expensive (around $AUD 80 per year) and the Linux version suddenly stopped working. The scripting options also only support Windows and Apple, not Linux. So I didn’t renew my subscription.

Instead, I’ve moved to Taskwarrior via Paul Fenwick‘s recommendation. This has some downsides – it’s a command line utility rather than a graphical interface, and it only works on a single machine. But it’s free, and it does what I need – prioritises the tasks that I need to complete.

What hasn’t changed

Wearables and hearables

My Mobvoi TicWatch Pro is still going strong, and Google appears to be giving Wear OS some love. It’s the longest I’ve had a smart watch, and given how rugged and hardy the TicWatch has been, it will definitely be my first choice when this one reaches end of life. My Plantronics BB Pro 2 are still going strong, and I got another pair on sale as my first pair are now four years old and the battery is starting to degrade.

Quantified self

I’ve started using Sleep as Android for sleep tracking, which uses data from the TicWatch. This has been super handy for assessing the quality of sleep, and making changes such as adjusting going-to-bed times. Sleep as Android exports data to Google Drive. BeeMinder ingests that data into a goal, and keeps me accountable for getting enough sleep.

RescueTime, BeeMinder and Passion Planner are still going strong, and I don’t think I’ll be moving away from them anytime soon.

Assistant services

I still refuse to use Amazon Alexa or Google Home – and they wouldn’t work with the 5GHz-band WiFi where I am living on campus. Mycroft.AI is still my go-to for a voice assistant, but I rarely use it now because the the Spotify app support for Mycroft doesn’t work anymore after Spotify blocked Mycroft from using the Spotify API.

One desktop utility that fits into the “assistant” space that I’ve found super helpful has been GNOME extensions. I use extensions for weather, peripheral selection and random desktop background selection. Being able to see easily during Australian summer how hot it is outside has been super handy.

Current gaps in my toolchain

I don’t really have any major gaps in my toolchain at the moment, but there are some things that could be better.

  • Visual Git Editor – I’ve been using command line Git for years now, but having a visual indicator of branches and merges is useful. I tried GitKraken, but I don’t use Git enough to justify the monthly-in-$USD price tag. The Git plugin for Atom is good enough for now.
  • Managing everything for me – I looked a Huginn a while back and it sounds really promising as a “second brain” – for monitoring news sites, Twitter etc – but I haven’t had time to have a good play with it yet.