Building a database to handle PhD interview tracking using MySQL and noco-db

So, as folx probably know, I’m currently during a PhD at the Australian National University’s School of Cybernetics, investigating voice data practices and what we might be able to do to change them to have less biased voice data, and less biased voice technology products. If you’d like to see some of the things I’ve been working on, you can check out my portfolio. Two of my research methods are related to interviews; the first tranche being shorter exploratory interviews and the second being in-depth interviews with machine learning practitioners.

Because there are many stages to interviews – identifying participants, approaching them for interviews, obtaining consent, scheduling, transcription and coding – I needed a way to manage the pipeline. My PhD cohort colleagues use a combination of AirTable and Notion, but I wanted an opensource alternative (surprise!).

Identifying alternatives and choosing one to use

I did a scan of what alternatives were available simply by searching for “open source alternative to AirTable”. Some of the options I considered but discarded were:

  • BaseRow: While this is open source, and built in widely adopted frameworks such as Django and Vue.js, and available in Docker and Heroku deploys, the commercial framing behind the product is very much open core. That is, there are a lot of features that are only available in the paid / premium version. I’ve worked with open core offerings before, and I’ve found that the most useful features are usually those that are behind the paywall.
  • AppFlowy: While this looked really impressive, and the community behind it looked strong, the use of Flutter and Rust put me off – I’m not as familiar with either of them compared to Vue.js or Django. I also found the documentation really confusing – for example, to install the Linux version it said to “use the official package”, but it didn’t give the name of the official package. Not helpful. On this basis I ruled out AppFlowy.
  • DBeaver: This tool is more aimed at people who have to work with multiple databases; it provides a using GUI over the top of the database, but is not designed to be a competitor to Notion or AirTable. I wanted something more graphically-focused, and with multiple layout styles (grid, card etc).

This left me with NoCoDB. I kicked the tyres a bit by looking at the GitHub code, and read through the documentation to get a feel for whether it was well constructed; it was. Importantly, I was able to install it on my localhost; my ethics protocol for my research method prevented it being hosted on a cloud platform.

Installation

Installation was a breeze. I set up a database in MySQL (also running locally), then git clone‘d the repo, and used npm to install the software:

git clone https://github.com/nocodb/nocodb-seed
cd nocodb-seed
npm install
npm start

nocodb uses node.js’s httpd server, and starts the application by default on port 80, so to start using it, you simply go to: http://localhost:8080/. One slightly frustrating thing is that it does require an email address and password to log in. nocodb is a commercial company – they’ve recently done a raised and are hiring – and I suspect this is part of their telemetry, even for self-hosted solutions. I run Pihole as my DNS server, and I don’t see any telemetry from nocodb in my block list, however.

Next, you need to provide nocodb with the MySQL database details that you created earlier. This creates some additional tables. nocodb then creates some base views, but at this point you are free to start creating your own.

Deciding what fields I needed to capture to be able to visualise my interview pipeline

Identifying what fields I needed to track was a case of trial and error. As I added new fields, or modified the datatypes of existing ones, nocodb was able to be easily re-synced with the underlying database schema. This makes

Identifying what fields I needed to track was a case of trial and error. As I added new fields, or modified the datatypes of existing ones, nocodb was able to be easily re-synced with the underlying database schema. This makes nocodb ideal for prototyping database structures.

nocodb showing tables out of sync
nocodb now in sync with the underlying tables

In the end, I settled on the following tables and fields:

Interviewees table

  • INTERVIEWEE_ID – a unique, auto-incrementing ID for each participant
  • REAL_NAME – the real name of my participant (and one of the reasons this is running locally and not in the cloud)
  • CODE_NAME – a code name I ascribed to each participant, as part of my Ethics Protocol
  • ROLE_ID – foreign key identifier for the ROLES table.
  • EMAIL_ADDRESS – what it says on the tin.
  • LINKEDIN_URL – I used LinkedIn to contact several participants, and this was a way of keeping track of that information.
  • HOMEPAGE_URL – the participant’s home page, if they had one. This was useful for identifying the participant’s background – part of the purposive sampling technique.
  • COUNTRY_ID – foreign key identifier for the COUNTRIES table – again used for purposive sampling.
  • HOW_IDENTIFIED – to identify whether people had been snowball sampled
  • HAS_BEEN_CONTACTED – Boolean to flag whether the participant had been contacted
  • HAS_AGREED_TO_INTERVIEW – Boolean to flag whether the participant had agreed to be interviewed
  • NO_RESPONSE_AFTER_SEVERAL_ATTEMPTS – Boolean to flag whether the participant hadn’t responded to a request to interview
  • HAS_DECLINED – Boolean to flag an explicit decline
  • INTERVIEW_SCHEDULED – Boolean to indicate a date had been scheduled with the participant
  • IS_EXPLORATORY – Boolean to indicate the interview was exploratory rather than in-depth. Having an explicit Boolean for the interview type allows me to add others if needed (while I felt that a full blown table for interview type was overkill).
  • IS_INDEPTH – Boolean for the other type of interview I was conducting.
  • INTERVIEWEE_DESCRIPTION – descriptive information about the participant’s background. Used to help me formulate questions relevant to the participant.
  • CONSENT_RECEIVED – Boolean to flag whether the participant had provided informed consent.
  • CONSENT_URL – A space to record the file location of the consent form.
  • CONSENT_ALLOWS_PARTICIPATION – A flag relevant to specific type of participation in my ethics protocol, and my consent form
  • CONSENT_ALLOWS_IDENTIFICATION_VIA_PARTICIPANT_CODE – A flag relevant to how participants were able to elect to be identified, as part of my ethics protocol.
  • INTERVIEW_CONDUCTED – Boolean to flag that the interview had been conducted.
  • TRANSCRIPT_DONE – Boolean to flag that the transcript had been created (I used an external company for this).
  • TRANSCRIPT_URL – A space to record the file location of the transcript.
  • TRANSCRIPT_APPROVED – Boolean to indicate the participant had reviewed and approved the transcript.
  • TRANSCRIPT_APPROVED_URL – A space to record the file location of the approved transcript
  • CODING_FIRST_DONE – Boolean to indicate first pass coding done
  • CODING_FIRST_LINK – A space to record the file location of the first coding
  • CODING_SECOND_DONE – Boolean to indicate second pass coding done
  • CODING_SECOND_URL – A space to record the file location of the second coding
  • NOTES – I used this field to make notes about the participant or to flag things to follow up.
  • LAST_CONTACT – I used this date field so I could easily order interviewees to follow them up.
  • LAST_MODIFIED – This field auto-updated on update.

Countries table

  • COUNTRY_ID – Unique identifier, used as primary key and foreign key reference in the INTERVIEWEES table.
  • COUNTRY_NAME – human readable name of the country, useful for demonstrating purposive sampling.
  • LAST_MODIFIED – This field auto-updated on update.

Roles table

  • ROLE_ID – Unique identifier, used as primary key and foreign key reference in the INTERVIEWEES table.
  • ROLE_TITLE – human readable title of the role, used for purposive sampling.
  • ROLE_DESCRIPTION – descriptive information about the activities performed by the role.
  • LAST_MODIFIED – This field auto-updated on update.

If I were to update the database structure in the future, I would be inclined to have a “URLs” table, where the file links for things like consent forms and transcripts are stored. Having them all in one table would make it easier to do things like URL validation. This was overkill for what I needed here.

Thinking also about the interview pipeline, the status of the interviewee in the pipeline is a combination of various Boolean flags. I would have found it useful to have a summary STATUS_ID with a useful descriptor of the status.

Get the SQL to replicate the database table structure

I’ve exported the table structure to SQL in case you want to use it for your own interview tracking purposes. It’s a Gist because I can’t be bothered altering my wp_options.php to allow for .sql uploads, and that’s probably a terrible idea, anyway 😉

Creating views based on field values to track the interview pipeline

Now that I had a useful table structure, I settled on some Views that helped me create and manage the interview pipeline. Views in nocodb are lenses on the underlying database – that restrict or constrain the data that is shown so that it’s more relevant to the task at hand. This is done through showing or hiding fields, and then filtering the selected fields.

  • Data entry view – this was a form view where I could add new Interviewees.
  • Views for parts of the pipeline – I set up several grid views that restricted Interviewees using filters to the part of the interview pipeline they were in. These included those I had and hadn’t contacted, those who had a scheduled interview, those who hadn’t responded, as well as several views for where the interviewee was in the coding and consent pipeline.
  • At a glance view – this was a gallery view, where I could get an overview of all the potential and confirmed participants.

A limitation I encountered working with these views is that there’s no way to provide summary information – like you might with a SUM or COUNT query in SQL. Ideally I would like to be able to build a dashboard that provides statistics on how many participants are at each part of the pipeline, but I wasn’t able to do this.

Updating nocodb

nocodb is under active development, and has regular updates. Updating the software proved to be incredibly easy through npm, with a two-line command:

Uninstall NocoDB package

npm uninstall nocodb

Install NocoDB package

npm install --save nocodb

Parting thoughts

Overall, I have been really impressed by nocodb – it’s a strong fit for my requirements in this use case – easily prototypable, runs locally, and is easily updateable. The user interface is still not perfect, and is downright clunky in places, but as an open source alternative to AirTable and Notion, it hits the spot.

linux.conf.au 2020 – Who’s watching?

Yes, it’s July, and linux.conf.au 2020 #lca2020 happened over 6 months ago – but like everyone in The Time of ‘Rona, the last few months have been a little strange, so better late than never!

Location

This year, linux.conf.au headed to the Gold Coast – a location that it’s never been to before. That in itself is a major accomplishment by Joel Addison, Ben Stevens and the linux.conf.au 2020 #lca2020 team. linux.conf.au, and other open source events in the Linux Australia stable such as PyConAU and DrupalSouth, are entirely volunteer-run. Those volunteer communities tend to spring up geographically – and due to population and surrounding ecosystems this tends to happen more frequently in large capital cities. For example, if we look at the Startup Genome ecosystem reports, only three cities in Australia are significant enough to make the global map – Melbourne, Sydney and Brisbane – frequent locations for linux.conf.au.

As a location, the Gold Coast was outstanding (n=9). There were plentiful accommodation options to suit a range of budgets, nearby food options, plenty of outstanding venues for conference events like the Penguin Dinner and PDNS, and a beach a couple of blocks away. The Gold Coast Convention and Exhibition Centre was a an excellent venue for the conference. It had a mix of room sizings, comfortable seating, breakout and quiet space, organiser space and enough “space” in general for the 600-ish delegates. I can’t remember whether Wi-Fi this year was venue-provided, or whether AARNet deployed this as they often do – but I do remember Wi-Fi being strong, fast and reliable. A convention centre is always a tough choice for LCA – the additional venue costs compared to, say, a University, are a major proportion of the conference budget, so to justify that the venue really has to deliver, and as a delegate, this one did.

Community

The Welcome to Country was given by the Yugambeh people of the Gold Coast. A Welcome to Country is now part of the LCA tradition; however I also know from experience how difficult it can be to arrange – when we tried to arrange a Welcome to Country for LCA2016 we were unable to get in contact with the Traditional Owners of our region.

Welcome to Country from the Yugambeh people of Gold Coast

Another beautiful community memory I have of #lca2020 was this gorgeous rainbow “Yarn Chicken” pin gifted to me by Keith Packard – also an avid knitter (for anyone of a similar purl-suasion, there is an LCA knitters group on Ravelry). Keith’s sister owns Island Wools in PDX, and if you would like a pin you can get one from there (‘Rona permitting).

Badge from linux.conf.au 2020

Notable presentations

Donna Benjamin – Who’s watching?

You can see Donna’s keynote here on YouTube

Using the experience of her grandparents escaping Nazi Germany as a departure point, and her Dad’s ASIO file – created for such hideous crimes as advocating for Indigenous people to be able to vote (applause), Donna posed some very uncomfortable questions to the audience.

We are surveillance arms dealers for the persuasion industry. Are we accountable for the tools we make?

In a nuanced, multi-dimensional talk about the benefits and drawbacks of surveillance of technology, Donna took intent as her index point, outlining how intent is the key differentiator for whether technologies contribute to collective good, or collective evil. What is the intent of our actions? How does that intent change over time?

Drawing trajectories and threads from the past, she painted some clear trajectories for the future, and outlined the key actions we as a community can take to shape the world in ways we want to see – collective privacy, and collective efforts to hold others accountable for the ways in which they use technology.

Donna left me with a clear and resounding resonance.

The fight isn’t over, and our work is not yet done.

A/Prof Vanessa Teague – Who cares about Democracy? The sorry state of Australia’s election software and what we can do about it

You can see this presentation on YouTube

Vanessa’s keynote was a state-of-the-landscape talk which outlined the cryptographic deficiency of several of the e-election software models being deployed across the country. Using mathematical proofs of cryptography – something the deeply technical LCA audience was at ease with, she shows how flaws in implementation imperil not just the integrity of elections, but undermine our democratic processes.

Vanessa is someone I admire greatly.

Her personal integrity – she recently resigned from the University of Melbourne shortly after LCA after the Department of Health pressured the University over her (and colleagues’) research findings that supposedly unanimous and de-identified health records were re-identifiable – is something I deeply respect. Her work on assiduously interrogating the COVIDSafe app and, again, identifying flaws in its implementation, makes her a vanguard of privacy, digital rights, and of building systems that are able to be validated.

Associate Professor Vanessa Teague's keynote at linux.conf.au 2020 on e-election systems

Open Education Miniconf Keynote – The Who of CSIRAC – Roland Gesthuizen, Gillian Kidman, Hazel Tan, Caroline Pham

You can see this presentation on YouTube

This talk was one of my favourites from the conference, and provided a history of CSIRAC – Australia’s first programmable computer. The talk drew through-lines from HAL, 2001: A Space Odyssey and Memory Alpha, to the NASA space program and rope memory, to the women who first occupied the skilled role of “computers”, doing astronomical trajectory calculations. It went on to outline key milestones in the history of CSIRAC – being switched on for the first time in 1949, using the electricity of a small town, how it adopted technology from the telegraph, and the jacquard loom. It highlighted the small obstacles the team had to overcome to reach larger goals – which on reflection appears to be a recurring theme in technological development – the need for horizontal axis storage, and the problem of digital decay and bit error.

The talk went on to explore the role of Trevor Pearcey in CSIRAC’s development and his contribution to Australian computation, and his role of opening up CSIRAC “to the people”, and furthering their understanding of its capabilities – virtually unknown at the time. Issues such as trust and faith in technology were examined; one failed program on CSIRAC meant that people would be wary of using it a second time – something that echoes through our use of technology today. The talk highlighted the culture that surrounded CSIRAC – one of tinkering, one of playing, and exploring, one of creativity.

What would have happened if the research assistants were required to submit a project plan? You need freedom and the space to explore; and from this can emerge unexpected and unanticipated benefits.

My takeaways from this talk were that understanding our technical history, and the challenges that have been faced, and overcome, help us to understand how the technologies of today emerge, evolve, and go to scale – and provide lessons on how we can shape those trajectories. We have a role to play in ensuring students are creators of technology, not just consumers of technology.

Christopher Biggs – The Awful Design of Everyday Things

You can see this presentation on YouTube

Drawing links between design, documentation and technology, Christopher took us on an entertaining, insightful and challenging tour-de-force of design fail. He challenged us to improve our human-centred design skills – because;

Documentation is required when the design has failed

Drawing from Asimov’s laws of robotics, he put forward rules for human-centred design of technology:

  • Machines must be beautiful (or invisible)
  • Machines must co-operate for the benefit of humanity
  • Machines must communicate, and obey instructions
  • Machines must be as simple and reliable as possible

Extrapolating these to the internet of things, he provided principles for design;

  • discoverability – how does the user discover how to use the interface?
  • test on beginners – how does someone without context use the interface or product? Watch people, what do people expect?
  • feedback – how does the interface provide useful feedback?
  • affordances – what are the affordances of the interface? How does the user know this?
  • completion – how does the user know they’ve completed their task?

A house is a machine to live in – and we need to be friends with the machines.

Joshua Simmons – Open Source Citizenship

You can see this presentation on YouTube

Josh’s presentation focused on the ways in which companies and large organisations can be good open source “citizens”. As citizens, we have a duty to the society and communities of which we are a part – and from which we benefit, and companies that profit from open source software have similar obligations.

He outlined practical ways in which businesses can support open source, and in doing so, support the technical foundations on which their profits are generated, including;

  • understanding the technical dependencies of their products, and supporting the components on their stacks. It’s this contribution that helps the communities maintaining those projects to continue to do so. Open source is part of a business supply chain – and if you don’t want part of that supply chain to vanish, then it needs to be supported.
  • sending people to conferences, and paying for travel, as conferences themselves – such as linux.conf.au – often provide a revenue stream to open source organisations.
  • encouraging universities to give students credit for contributing to FLOSS projects – as this is analogous to “paid” work in industry.

A huge congratulations too to Josh for taking over the reigns recently at the Open Source Initiative.

Josh Simmons talking on open source citizenship at linux.conf.au 2020

Jussi Pakkanen – Fonts and Math

You can see this presentation on YouTube

From the Creative Arts Miniconf, I really appreciated Jussi’s presentation as an amateur font designer, working through some of the approaches in font design. One of the approaches is to design each glyph in the alphabet individually, which is time consuming.

This led Donald Knuth in 1977 to mimicking the way that a person draws with a nib – the shape of the pen – by defining the strokes of the pen mathematically. This information can then be used to generate the glyphs in the alphabet, using a set of linear equations.

The work of Knuth has been extended to projects such as MetaFont and Tex.

My key takeaway from this talk with that it sits at the intersection of both the mathematical and the artistic – maths has an inherent beauty to it – the curves of linear equations. It is by combining both the artistic and the mathematical that we can design beautiful, re-usable, extendable, scalable fonts.

Thank you to all the Volunteers, Core Team and Sponsors

I know how hard it is to deliver an outstanding LCA – I’ve done it twice. It’s a huge amount of work, for a long time – planning for an LCA can take 12-18 months – and in that time other priorities can slip – like family and relationships. A huge, huge thank you to the whole #lca2020 team for your outstanding efforts, dedication and contribution not only to Australia’s open source community, but to open source efforts worldwide.

Call for Volunteers for #lca2021 – linux.conf.au goes online

Following the announcement earlier this year that linux.conf.au 2021, originally set to be in Canberra, would be postponed to 2022 due to the coronavirus pandemic, and that linux.conf.au 2021 would be an online event, the Call for Volunteers has now opened. Being a Volunteer at linux.conf.au is a significant commitment, but is also a great way to meet new people, and get experience in many areas that might complement your career path, such as project management, team leading and people management, media and marketing, audio visual, logistics and event co-ordination.

linux.conf.au 2019 Christchurch – The Linux of Things

linux.conf.au 2019 this year went over the Tasman to New Zealand for the fourth time, to the Cantabrian university city of Christchurch. This was the first year that Christchurch had played host and I sincerely hope it’s not the last.

First, to the outstanding presentations.

NOTE: You can see all the presentations from linux.conf.au 2019 at this YouTube channel

Open Artificial Pancreas System (OpenAPS) by Dana Lewis

See the video of Dana’s presentation here

Dana Lewis lives with Type 1 diabetes, and her refusal to accept current standards of care with diabetes management led her to collaborate widely, developing OpenAPS. OpenAPS is a system that leverages existing medical devices, and adds a layer of monitoring using open hardware and open software solutions.

This presentation was outstanding on a number of levels.

As a self-experimenter, Dana joins the ranks of scientists the world over putting their own health on the line in the strive for progress. Her ability to collaborate with others from disparate backgrounds and varied skillsets to make something greater than the sum of its parts is a textbook case in the open source ethos; moreover the results that the OpenAPS achieved were remarkable; significant stabilization in blood sugars and better predictive analytics – providing better quality of life to those living with Type 1 diabetes.

Dana also touched on the Open Humans project, which is aiming to have people share their medical health data publicly so that collective analysis can occur – opening up this data from the vice-like grip of medical device manufacturers. Again, we’re seeing that data itself has incredible value – sometimes more so than the devices which monitor and capture the data itself.

Open Source Magnetic Resonance Imaging: From the community to the community by Ruben Pellicer Guridi

You can view the video of Ruben’s presentation here

Ruben Pellicer Guridi‘s talk centred on how the Open Source MRI community has founded to solve the problems of needing more MRI machines, particularly in low socio-economic areas and in developing countries. The project has attracted a community of health and allied health professionals, and has made available both open hardware and open software, with the first image from their Desktop MR software being acquired in December.

Although the project is in its infancy, the implications are immediately evident; providing better public healthcare, particularly for the most vulnerable in the world.

Apathy and Arsenic: A Victorian era lesson on fighting the surveillance state by Lilly Ryan

You can view the video of Lilly’s presentation here

Lilly Ryan’s entertaining and thought-provoking talk drew parallels between our current obsession with privacy-leaking apps and data platforms and the awareness campaign around the detrimental effects of arsenic in the 1800s. Her presentation was a clarion call to resist ‘peak indifference’ and increase privacy awareness and digital literacy.

Deep Learning, not Deep Creepy by Jack Moffitt

You can view the video of Jack’s presentation here

Jack Moffitt is a Principal Research Engineer with Mozilla, and in this presentation he opened by providing an overview of Deep Learning. He then dug a little bit deeper into the dangers of deep learning, specifically the biases that are inherent in current deep learning approaches, and some of the solutions that have been trialled to address them, such as making gender and noun pairs – such as “doctor” and “man” – equidistant – so that “doctor” is equally predictive for “man” and “woman”.

He then covered the key ML projects from Mozilla such as Deep Speech, Common Voice and Deep Proof.

This was a great corollary to the two talks I gave;

Computer Science Unplugged by Professor Tim Bell

You can view Tim’s presentation here

Part of the Open Education Miniconf, Tim‘s presentation covered how to teach computer science in a way that was fun, entertaining and accessible. The key problem that Computer Science Unplugged solves is that teachers are often afraid of CS concepts – and CS Unplugged makes teaching these concepts fun for both learners and teachers.

Go All In! By Bdale Garbee

You can view Bdale’s talk here

Bdale’s talk was a reinforcement of the power of open source collaboration, and the ideals that underpin it, with a call to “bet on” the power of the open source community.

Open source superhumans by Jon Oxer

You can view Jon’s talk here

Jon Oxer’s talk covered the power of open source hardware for assistive technologies, which are often inordinately expensive.

Other conversations

I had a great chat with Kate Stewart from the Linux Foundation and the work she’s doing in the programmatic audit of source code licensing space – her talk on grep-ability of licenses is worth watching – and we covered metrics for communities with CHAOSS, and the tokenisation of Git commits to understand who has committed which code, specifically for unwinding dependencies and copyright.

Christchurch as a location

Christchurch was a wonderful location for linux.conf.au – the climate was perfect – we had a storm or two but it wasn’t 45 C burnination like Perth. The airport was also much bigger than I had expected and the whole area is set up for hospitality and tourism. It won’t be the last time I head to CHC!