Building a database to handle PhD interview tracking using MySQL and noco-db

So, as folx probably know, I’m currently during a PhD at the Australian National University’s School of Cybernetics, investigating voice data practices and what we might be able to do to change them to have less biased voice data, and less biased voice technology products. If you’d like to see some of the things I’ve been working on, you can check out my portfolio. Two of my research methods are related to interviews; the first tranche being shorter exploratory interviews and the second being in-depth interviews with machine learning practitioners.

Because there are many stages to interviews – identifying participants, approaching them for interviews, obtaining consent, scheduling, transcription and coding – I needed a way to manage the pipeline. My PhD cohort colleagues use a combination of AirTable and Notion, but I wanted an opensource alternative (surprise!).

Identifying alternatives and choosing one to use

I did a scan of what alternatives were available simply by searching for “open source alternative to AirTable”. Some of the options I considered but discarded were:

  • BaseRow: While this is open source, and built in widely adopted frameworks such as Django and Vue.js, and available in Docker and Heroku deploys, the commercial framing behind the product is very much open core. That is, there are a lot of features that are only available in the paid / premium version. I’ve worked with open core offerings before, and I’ve found that the most useful features are usually those that are behind the paywall.
  • AppFlowy: While this looked really impressive, and the community behind it looked strong, the use of Flutter and Rust put me off – I’m not as familiar with either of them compared to Vue.js or Django. I also found the documentation really confusing – for example, to install the Linux version it said to “use the official package”, but it didn’t give the name of the official package. Not helpful. On this basis I ruled out AppFlowy.
  • DBeaver: This tool is more aimed at people who have to work with multiple databases; it provides a using GUI over the top of the database, but is not designed to be a competitor to Notion or AirTable. I wanted something more graphically-focused, and with multiple layout styles (grid, card etc).

This left me with NoCoDB. I kicked the tyres a bit by looking at the GitHub code, and read through the documentation to get a feel for whether it was well constructed; it was. Importantly, I was able to install it on my localhost; my ethics protocol for my research method prevented it being hosted on a cloud platform.

Installation

Installation was a breeze. I set up a database in MySQL (also running locally), then git clone‘d the repo, and used npm to install the software:

git clone https://github.com/nocodb/nocodb-seed
cd nocodb-seed
npm install
npm start

nocodb uses node.js’s httpd server, and starts the application by default on port 80, so to start using it, you simply go to: http://localhost:8080/. One slightly frustrating thing is that it does require an email address and password to log in. nocodb is a commercial company – they’ve recently done a raised and are hiring – and I suspect this is part of their telemetry, even for self-hosted solutions. I run Pihole as my DNS server, and I don’t see any telemetry from nocodb in my block list, however.

Next, you need to provide nocodb with the MySQL database details that you created earlier. This creates some additional tables. nocodb then creates some base views, but at this point you are free to start creating your own.

Deciding what fields I needed to capture to be able to visualise my interview pipeline

Identifying what fields I needed to track was a case of trial and error. As I added new fields, or modified the datatypes of existing ones, nocodb was able to be easily re-synced with the underlying database schema. This makes

Identifying what fields I needed to track was a case of trial and error. As I added new fields, or modified the datatypes of existing ones, nocodb was able to be easily re-synced with the underlying database schema. This makes nocodb ideal for prototyping database structures.

nocodb showing tables out of sync
nocodb now in sync with the underlying tables

In the end, I settled on the following tables and fields:

Interviewees table

  • INTERVIEWEE_ID – a unique, auto-incrementing ID for each participant
  • REAL_NAME – the real name of my participant (and one of the reasons this is running locally and not in the cloud)
  • CODE_NAME – a code name I ascribed to each participant, as part of my Ethics Protocol
  • ROLE_ID – foreign key identifier for the ROLES table.
  • EMAIL_ADDRESS – what it says on the tin.
  • LINKEDIN_URL – I used LinkedIn to contact several participants, and this was a way of keeping track of that information.
  • HOMEPAGE_URL – the participant’s home page, if they had one. This was useful for identifying the participant’s background – part of the purposive sampling technique.
  • COUNTRY_ID – foreign key identifier for the COUNTRIES table – again used for purposive sampling.
  • HOW_IDENTIFIED – to identify whether people had been snowball sampled
  • HAS_BEEN_CONTACTED – Boolean to flag whether the participant had been contacted
  • HAS_AGREED_TO_INTERVIEW – Boolean to flag whether the participant had agreed to be interviewed
  • NO_RESPONSE_AFTER_SEVERAL_ATTEMPTS – Boolean to flag whether the participant hadn’t responded to a request to interview
  • HAS_DECLINED – Boolean to flag an explicit decline
  • INTERVIEW_SCHEDULED – Boolean to indicate a date had been scheduled with the participant
  • IS_EXPLORATORY – Boolean to indicate the interview was exploratory rather than in-depth. Having an explicit Boolean for the interview type allows me to add others if needed (while I felt that a full blown table for interview type was overkill).
  • IS_INDEPTH – Boolean for the other type of interview I was conducting.
  • INTERVIEWEE_DESCRIPTION – descriptive information about the participant’s background. Used to help me formulate questions relevant to the participant.
  • CONSENT_RECEIVED – Boolean to flag whether the participant had provided informed consent.
  • CONSENT_URL – A space to record the file location of the consent form.
  • CONSENT_ALLOWS_PARTICIPATION – A flag relevant to specific type of participation in my ethics protocol, and my consent form
  • CONSENT_ALLOWS_IDENTIFICATION_VIA_PARTICIPANT_CODE – A flag relevant to how participants were able to elect to be identified, as part of my ethics protocol.
  • INTERVIEW_CONDUCTED – Boolean to flag that the interview had been conducted.
  • TRANSCRIPT_DONE – Boolean to flag that the transcript had been created (I used an external company for this).
  • TRANSCRIPT_URL – A space to record the file location of the transcript.
  • TRANSCRIPT_APPROVED – Boolean to indicate the participant had reviewed and approved the transcript.
  • TRANSCRIPT_APPROVED_URL – A space to record the file location of the approved transcript
  • CODING_FIRST_DONE – Boolean to indicate first pass coding done
  • CODING_FIRST_LINK – A space to record the file location of the first coding
  • CODING_SECOND_DONE – Boolean to indicate second pass coding done
  • CODING_SECOND_URL – A space to record the file location of the second coding
  • NOTES – I used this field to make notes about the participant or to flag things to follow up.
  • LAST_CONTACT – I used this date field so I could easily order interviewees to follow them up.
  • LAST_MODIFIED – This field auto-updated on update.

Countries table

  • COUNTRY_ID – Unique identifier, used as primary key and foreign key reference in the INTERVIEWEES table.
  • COUNTRY_NAME – human readable name of the country, useful for demonstrating purposive sampling.
  • LAST_MODIFIED – This field auto-updated on update.

Roles table

  • ROLE_ID – Unique identifier, used as primary key and foreign key reference in the INTERVIEWEES table.
  • ROLE_TITLE – human readable title of the role, used for purposive sampling.
  • ROLE_DESCRIPTION – descriptive information about the activities performed by the role.
  • LAST_MODIFIED – This field auto-updated on update.

If I were to update the database structure in the future, I would be inclined to have a “URLs” table, where the file links for things like consent forms and transcripts are stored. Having them all in one table would make it easier to do things like URL validation. This was overkill for what I needed here.

Thinking also about the interview pipeline, the status of the interviewee in the pipeline is a combination of various Boolean flags. I would have found it useful to have a summary STATUS_ID with a useful descriptor of the status.

Get the SQL to replicate the database table structure

I’ve exported the table structure to SQL in case you want to use it for your own interview tracking purposes. It’s a Gist because I can’t be bothered altering my wp_options.php to allow for .sql uploads, and that’s probably a terrible idea, anyway 😉

Creating views based on field values to track the interview pipeline

Now that I had a useful table structure, I settled on some Views that helped me create and manage the interview pipeline. Views in nocodb are lenses on the underlying database – that restrict or constrain the data that is shown so that it’s more relevant to the task at hand. This is done through showing or hiding fields, and then filtering the selected fields.

  • Data entry view – this was a form view where I could add new Interviewees.
  • Views for parts of the pipeline – I set up several grid views that restricted Interviewees using filters to the part of the interview pipeline they were in. These included those I had and hadn’t contacted, those who had a scheduled interview, those who hadn’t responded, as well as several views for where the interviewee was in the coding and consent pipeline.
  • At a glance view – this was a gallery view, where I could get an overview of all the potential and confirmed participants.

A limitation I encountered working with these views is that there’s no way to provide summary information – like you might with a SUM or COUNT query in SQL. Ideally I would like to be able to build a dashboard that provides statistics on how many participants are at each part of the pipeline, but I wasn’t able to do this.

Updating nocodb

nocodb is under active development, and has regular updates. Updating the software proved to be incredibly easy through npm, with a two-line command:

Uninstall NocoDB package

npm uninstall nocodb

Install NocoDB package

npm install --save nocodb

Parting thoughts

Overall, I have been really impressed by nocodb – it’s a strong fit for my requirements in this use case – easily prototypable, runs locally, and is easily updateable. The user interface is still not perfect, and is downright clunky in places, but as an open source alternative to AirTable and Notion, it hits the spot.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.