Building a database to handle PhD interview tracking using MySQL and noco-db

So, as folx probably know, I’m currently during a PhD at the Australian National University’s School of Cybernetics, investigating voice data practices and what we might be able to do to change them to have less biased voice data, and less biased voice technology products. If you’d like to see some of the things I’ve been working on, you can check out my portfolio. Two of my research methods are related to interviews; the first tranche being shorter exploratory interviews and the second being in-depth interviews with machine learning practitioners.

Because there are many stages to interviews – identifying participants, approaching them for interviews, obtaining consent, scheduling, transcription and coding – I needed a way to manage the pipeline. My PhD cohort colleagues use a combination of AirTable and Notion, but I wanted an opensource alternative (surprise!).

Identifying alternatives and choosing one to use

I did a scan of what alternatives were available simply by searching for “open source alternative to AirTable”. Some of the options I considered but discarded were:

• BaseRow: While this is open source, and built in widely adopted frameworks such as Django and Vue.js, and available in Docker and Heroku deploys, the commercial framing behind the product is very much open core. That is, there are a lot of features that are only available in the paid / premium version. I’ve worked with open core offerings before, and I’ve found that the most useful features are usually those that are behind the paywall.
• AppFlowy: While this looked really impressive, and the community behind it looked strong, the use of Flutter and Rust put me off – I’m not as familiar with either of them compared to Vue.js or Django. I also found the documentation really confusing – for example, to install the Linux version it said to “use the official package”, but it didn’t give the name of the official package. Not helpful. On this basis I ruled out AppFlowy.
• DBeaver: This tool is more aimed at people who have to work with multiple databases; it provides a using GUI over the top of the database, but is not designed to be a competitor to Notion or AirTable. I wanted something more graphically-focused, and with multiple layout styles (grid, card etc).

This left me with NoCoDB. I kicked the tyres a bit by looking at the GitHub code, and read through the documentation to get a feel for whether it was well constructed; it was. Importantly, I was able to install it on my localhost; my ethics protocol for my research method prevented it being hosted on a cloud platform.

Installation

Installation was a breeze. I set up a database in MySQL (also running locally), then git clone‘d the repo, and used npm to install the software:

git clone https://github.com/nocodb/nocodb-seed
cd nocodb-seed
npm install
npm start

nocodb uses node.js’s httpd server, and starts the application by default on port 80, so to start using it, you simply go to: http://localhost:8080/. One slightly frustrating thing is that it does require an email address and password to log in. nocodb is a commercial company – they’ve recently done a raised and are hiring – and I suspect this is part of their telemetry, even for self-hosted solutions. I run Pihole as my DNS server, and I don’t see any telemetry from nocodb in my block list, however.

Next, you need to provide nocodb with the MySQL database details that you created earlier. This creates some additional tables. nocodb then creates some base views, but at this point you are free to start creating your own.

Deciding what fields I needed to capture to be able to visualise my interview pipeline

Identifying what fields I needed to track was a case of trial and error. As I added new fields, or modified the datatypes of existing ones, nocodb was able to be easily re-synced with the underlying database schema. This makes

Identifying what fields I needed to track was a case of trial and error. As I added new fields, or modified the datatypes of existing ones, nocodb was able to be easily re-synced with the underlying database schema. This makes nocodb ideal for prototyping database structures.

In the end, I settled on the following tables and fields:

Interviewees table

• INTERVIEWEE_ID – a unique, auto-incrementing ID for each participant
• REAL_NAME – the real name of my participant (and one of the reasons this is running locally and not in the cloud)
• CODE_NAME – a code name I ascribed to each participant, as part of my Ethics Protocol
• ROLE_ID – foreign key identifier for the ROLES table.
• EMAIL_ADDRESS – what it says on the tin.
• LINKEDIN_URL – I used LinkedIn to contact several participants, and this was a way of keeping track of that information.
• HOMEPAGE_URL – the participant’s home page, if they had one. This was useful for identifying the participant’s background – part of the purposive sampling technique.
• COUNTRY_ID – foreign key identifier for the COUNTRIES table – again used for purposive sampling.
• HOW_IDENTIFIED – to identify whether people had been snowball sampled
• HAS_BEEN_CONTACTED – Boolean to flag whether the participant had been contacted
• HAS_AGREED_TO_INTERVIEW – Boolean to flag whether the participant had agreed to be interviewed
• NO_RESPONSE_AFTER_SEVERAL_ATTEMPTS – Boolean to flag whether the participant hadn’t responded to a request to interview
• HAS_DECLINED – Boolean to flag an explicit decline
• INTERVIEW_SCHEDULED – Boolean to indicate a date had been scheduled with the participant
• IS_EXPLORATORY – Boolean to indicate the interview was exploratory rather than in-depth. Having an explicit Boolean for the interview type allows me to add others if needed (while I felt that a full blown table for interview type was overkill).
• IS_INDEPTH – Boolean for the other type of interview I was conducting.
• INTERVIEWEE_DESCRIPTION – descriptive information about the participant’s background. Used to help me formulate questions relevant to the participant.
• CONSENT_RECEIVED – Boolean to flag whether the participant had provided informed consent.
• CONSENT_URL – A space to record the file location of the consent form.
• CONSENT_ALLOWS_PARTICIPATION – A flag relevant to specific type of participation in my ethics protocol, and my consent form
• CONSENT_ALLOWS_IDENTIFICATION_VIA_PARTICIPANT_CODE – A flag relevant to how participants were able to elect to be identified, as part of my ethics protocol.
• INTERVIEW_CONDUCTED – Boolean to flag that the interview had been conducted.
• TRANSCRIPT_DONE – Boolean to flag that the transcript had been created (I used an external company for this).
• TRANSCRIPT_URL – A space to record the file location of the transcript.
• TRANSCRIPT_APPROVED – Boolean to indicate the participant had reviewed and approved the transcript.
• TRANSCRIPT_APPROVED_URL – A space to record the file location of the approved transcript
• CODING_FIRST_DONE – Boolean to indicate first pass coding done
• CODING_FIRST_LINK – A space to record the file location of the first coding
• CODING_SECOND_DONE – Boolean to indicate second pass coding done
• CODING_SECOND_URL – A space to record the file location of the second coding
• NOTES – I used this field to make notes about the participant or to flag things to follow up.
• LAST_CONTACT – I used this date field so I could easily order interviewees to follow them up.
• LAST_MODIFIED – This field auto-updated on update.

Countries table

• COUNTRY_ID – Unique identifier, used as primary key and foreign key reference in the INTERVIEWEES table.
• COUNTRY_NAME – human readable name of the country, useful for demonstrating purposive sampling.
• LAST_MODIFIED – This field auto-updated on update.

Roles table

• ROLE_ID – Unique identifier, used as primary key and foreign key reference in the INTERVIEWEES table.
• ROLE_TITLE – human readable title of the role, used for purposive sampling.
• ROLE_DESCRIPTION – descriptive information about the activities performed by the role.
• LAST_MODIFIED – This field auto-updated on update.

If I were to update the database structure in the future, I would be inclined to have a “URLs” table, where the file links for things like consent forms and transcripts are stored. Having them all in one table would make it easier to do things like URL validation. This was overkill for what I needed here.

Thinking also about the interview pipeline, the status of the interviewee in the pipeline is a combination of various Boolean flags. I would have found it useful to have a summary STATUS_ID with a useful descriptor of the status.

Get the SQL to replicate the database table structure

I’ve exported the table structure to SQL in case you want to use it for your own interview tracking purposes. It’s a Gist because I can’t be bothered altering my wp_options.php to allow for .sql uploads, and that’s probably a terrible idea, anyway 😉

Creating views based on field values to track the interview pipeline

Now that I had a useful table structure, I settled on some Views that helped me create and manage the interview pipeline. Views in nocodb are lenses on the underlying database – that restrict or constrain the data that is shown so that it’s more relevant to the task at hand. This is done through showing or hiding fields, and then filtering the selected fields.

• Data entry view – this was a form view where I could add new Interviewees.
• Views for parts of the pipeline – I set up several grid views that restricted Interviewees using filters to the part of the interview pipeline they were in. These included those I had and hadn’t contacted, those who had a scheduled interview, those who hadn’t responded, as well as several views for where the interviewee was in the coding and consent pipeline.
• At a glance view – this was a gallery view, where I could get an overview of all the potential and confirmed participants.

A limitation I encountered working with these views is that there’s no way to provide summary information – like you might with a SUM or COUNT query in SQL. Ideally I would like to be able to build a dashboard that provides statistics on how many participants are at each part of the pipeline, but I wasn’t able to do this.

Updating nocodb

nocodb is under active development, and has regular updates. Updating the software proved to be incredibly easy through npm, with a two-line command:

Uninstall NocoDB package

npm uninstall nocodb


Install NocoDB package

npm install --save nocodb


Parting thoughts

Overall, I have been really impressed by nocodb – it’s a strong fit for my requirements in this use case – easily prototypable, runs locally, and is easily updateable. The user interface is still not perfect, and is downright clunky in places, but as an open source alternative to AirTable and Notion, it hits the spot.

A review of Taguette – an open source alternative for qualitative data coding

Motivation and context

As you might know, I’m currently undertaking a PhD program at Australian National University’s School of Cybernetics, looking at voice dataset documentation practices, and what we might be able to improve about them to reduce statistical and experienced bias in voice technologies like speech recognition and wake words. As part of this journey, I’ve learned an array of new research methods – surveys, interviews, ethics approaches, literature review and so on. I’m now embarking on some early qualitative data analysis.

The default tool in the qualitative data analysis space is NVIVO, made by Melbourne-based company, QSR. However, NVIVO has both a steep learning curve and a hefty price tag. I’m lucky enough that this pricing is abstracted away from me – ANU provides NVIVO for free to HDR students and staff – but reports suggest that the enterprise licensing starts at around USD 85 per user. NVIVO operates predominantly as a desktop-based pieces of software and is only available for Mac or Windows. My preferred operating system is Linux – as that is what my academic writing toolchain based on LaTeX, Atom and Pandoc – is based on – and I wanted to see if there was a tool with equivalent functionality that aligned with this toolchain. About Taguette Taguette is a BSD-3 licensed qualitative coding tool, positioned as an alternative to NVIVO. It’s written by a small team of library specialists and software developers, based in New York. The developers are very clear about their motivation in creating Taguette; Qualitative methods generate rich, detailed research materials that leave individuals’ perspectives intact as well as provide multiple contexts for understanding the phenomenon under study. Qualitative methods are used in a wide range of fields, such as anthropology, education, nursing, psychology, sociology, and marketing. Qualitative data has a similarly wide range: observations, interviews, documents, audiovisual materials, and more. However – the software options for qualitative researchers are either far too expensive, don’t allow for the seminal method of highlighting and tagging materials, or actually perform quantitative analysis, just on text. It’s not right or fair that qualitative researchers without massive research funds cannot afford the basic software to do their research. So, to bolster a fair and equitable entry into qualitative methods, we’ve made Taguette! Taguette.org website, “About” page This motivation spoke to me, and aligned with my own interest in free and open source software. Running Taguette and identifying its limitations For reproduceability, I ran Taguette version 1.1.1 on Ubuntu 20.04 LTS with Python 3.8.10 Taguette can be run in the cloud, and the website provides a demo server so that you can explore the cloud offering. However, I was more interested in the locally-hosted option, which runs on a combination of python, calibre, and I believe sqlite as the database backend, with SQLAlchemy for mappings. The install instructions recommend running Taguette in a virtual environment, and this worked well for me – presumably running the binary from the command line spawns a flask– or gunicorn– type web application, which you can then access in your browser. This locally hosted feature was super helpful for me, as my ethics protocol has restrictions on what cloud services I could use. To try Taguette, I first created a project, then uploaded a Word document in docx format, and began highlighting. This was smooth and seamless. However, I soon ran into my first limitation. My coding approach is to use nested codes. Taguette has no functionality for nested codes, and no concomitant functionality for “rolling up” nested codes. This was a major blocker for me. However, I was impressed that I could add tags in multiple languages, including non-Latin orthographies, such as Japanese and Arabic. Presumably, although I didn’t check this, Taguette uses Unicode under the hood – so it’s foreseeable that you could use emojis as tags as well, which might be useful for researchers of social media. Taguette has no statistical analysis tools built in, such as word frequency distributions, clustering or other corpus-type methods. While these weren’t as important for me at this stage of my research, they are functions that I envisage using in the future. Taguette’s CodeBook export and import functions work really well, and I was impressed with the range of formats that could be imported or exported. What I would like Taguette to do in the future I really need nested tags that have aggregation functionality for Taguette to be a a viable software tool for my qualitative data analysis – this is a high priority feature, followed by statistical analysis tools. Some thoughts on the broader academic software ecosystem Even though I won’t be adopting Taguette, I admire and respect the vision it has – to free qualitative researchers from being anchored to expensive, limiting tools. While I’m fortunate enough to be afforded an NVIVO license, many smaller, less wealthy or less research-intensive universities will struggle to provide a license seat for all qualitative researchers. This is another manifestation of universities becoming increasingly beholden to large software manufacturers, rather than having in-house capabilities to produce and manage software that directly adds value to a university’s core capability of generating new knowledge. We’ve seen it in academic journals – with companies like EBSCO, Sage and Elsevier intermediating the publication of journals, hording copyrights to articles and collecting a tidy profit in the process – and we’re increasingly seeing it in academic software. Learning Management Systems such as Desire2Learn and Blackboard are now prohibitively expensive, while open source alternatives such as Moodle still require skilled (and therefore expensive) staff to be maintained and integrated – a challenge when universities are shedding staff in the post-COVID era. Moreover, tools like NVIVO are imbricated in other structures which reinforce their dominance. University HDR training courses and resource guides are devoted to software tools which are in common use. Additionally, supervisors and senior academics are likely to use the dominant software, and so are in an influential position to recommend its use to their students. This support infrastructure reinforces their dominance by ascribing them a special, or reified status within the institution. At a broader level, even though open source has become a dominant business model, the advocacy behind free and open source software (FOSS) appears to be waning; open source is now the mainstream, and it no longer requires a rebel army of misfits, nerds and outliers (myself included) to be its flag-bearers. This begs the question – who advocates for FOSS within the academy? And more importantly – what influence do they have compared with a slick marketing and sales effort from a global multi-national? I’m reminded here of Eben Moglen’s wise words at linux.conf.au 2015 in Auckland in the context of opposing patent trolls through collective efforts – “freedom itself depends upon how we make use of the technologies we are creating”. That is, universities themselves have created the dependence on academic technologies which now restrict them. There is hope, however. Platforms like ArXiv – the free distribution service and open access archive for nearly two million pre-prints in mathematics, computer science and other (primarily quant) fields – are starting to challenge the status quo. For example, the Australian Research Council recently overturned their prohibition on the citation of pre-prints in competitive grant applications. Imagine if universities combined their resources – like they have done with ArXiv – to provide an open source qualitative coding tool, locally hosted, and accessible to everyone. In the words of Freire, “Reading is not walking on the words; it’s grasping the soul of them.” Paulo Freire, Pedagogy of the Oppressed Qualitative analysis tools allow us to grasp the soul of the artefacts we create through research; and that ability should be afforded to everyone – not just those that can afford it. Setting up an academic writing workflow using Pandoc, Markdown, Zotero and LaTeX on Ubuntu Linux using Atom This year, I started a PhD at the 3A Institute, within the College of Engineering and Computer Science at Australian National University. I came into the PhD not from a researcher or academic background, but from a career in industry as a technology practitioner. As such, my experience with formatting academic papers, for example for publication in journals, is limited. Our PhD program is hands-on; as well as identifying a specific research topic and articulating a research design, we undertake writing activities – both as academic writing practice, and to help solidify the theoretical concepts we’ve been discussing in Seminar. I needed an academic writing workflow. For one of these writing exercises, I decided to build out a toolchain using LaTeX – the preferred typesetting tool for academic papers. This blog post documents how I approached this, and serves as a reference both for myself and others who might want to adopt a similar toolchain. In summary, the process was: • Define overall goals • Install dependendies such as pandoc, Zotero and the BetterBibtex extension for citations, LaTeX • Experiment with pandoc on the command line for generating PDF from Markdown via LaTeX Goals of an academic writing workflow In approaching this exercise, I had several key goals: • create an academic writing workflow using my preferred Atom IDE on Linux; • that could easily be set up and replicated on other machines if needed; • which would allow me to use my preferred citation editor, Zotero; • and which would allow me to use my preferred writing format, Markdown; • and which would support available LaTeX templates available for journals (and from the ANU for formatting theses) Why not use OverLeaf? For those who’ve worked with LaTeX before, one of the key questions you might have here is “why not just use a platform like Overleaf for your academic writing workflow and not worry about setting up a local environment?”. Overleaf is a cloud LaTeX service, that provides a collaborative editing environment, and a range of LaTeX templates. However, it’s not free – plans range fromUSD 12-35 per month. Overleaf is based on free and open source software, and then adds a proprietary collaboration layer over the top – it abstracts away complexity – and this is what you pay a monthly fee for.

In principle, I have no issue with cloud platforms adding value to FOSS, and then charging for that value, but I doubt any of the profits from Overleaf are going to folks like John MacFarlane – writer of pandoc, and Donald E. Knuth and Leslie Lamport – who contributed early work to tex and LaTeX respectively. I felt like I owed it to these folx to “dig a little under the hood” and learn some of that complexity instead of outsourcing it away.

So, to the process …

Installing pandoc

Pandoc is a free and open source tool for converting documents between different formats. It’s widely used in academia, and used extensively in publishing and writing workflows. Pandoc is written by John MacFarlane, who is a philosophy professor at UC Berkeley.

One of the other things that John is less known for is Lucida Navajo, which is a font for the Navajo language, a Native American language of the Southern Athabascan family. Although it’s based on Latin, it contains a number of diacritical marks not found in other Latin languages.

Pandoc is available for all platforms, but because my goal here was to develop a workflow on Ubuntu Linux, I’ve only shown installation instructions for that platform.

To install pandoc, use the following command:

$sudo apt install pandoc  I also had to make sure that the pandoc-citeproc tool was installed; it’s not installed by default as part of pandoc itself. Again, this was a simple command: $ sudo apt install pandoc-citeproc

Installing Zotero and the BetterBibtex extension

My next challenge was to figure out how to do citations with pandoc. In the pandoc documentation, there is a whole section on citations, and this provided some pointers. This blog post from Chris Krycho was also useful in figuring out what to use.

This involved installing the BetterBibtex extension for Zotero (which I already had installed). You can find installation instructions for the BetterBibtex extension here. It has to be downloaded as a file and then added through Zotero (not through Firefox).

BibTex is a citation standard, supported by most academic journals and referencing tools. Google Scholar exports to BibTex.

Once installed, BetterBibtex updates each Zotero reference with a citation key that can then be used as an “at-reference” in pandoc – ie @strengersSmartWifeWhy2020. Updating citation keys can take a few minutes – I have several thousand references stored in Zotero and it took about 6 minutes to ensure that each reference had a BibTex citation key.

In order to use BetterBibtex, I had to make sure that the specific export format for Zotero was Better BibTex, and that Zotero updated on change:

Installing LaTeX

Next, I needed to install LaTeX for Linux. This was installed via the texlive package:

$sudo apt install texlive Based on this Stack Overflow error message I got early on in testing, I also installed the texlive-latex-extra package. $ sudo apt-get install texlive-latex-extra

Zotero citations package for Atom

Next, I needed to configure the Atom IDE to work with Zotero and BetterBibtex. This involved installing several plugins, including;

Once these were installed, I was ready to start experimenting with a workflow.

Experimenting with a workflow

To start with, I used a based workflow to go from a Markdown formatted text file to PDF. pandoc converts this to LaTeX as an interim step, and then to PDF, using the inbuilt templates from pandoc. This first step was a very basic attempt to go from pandoc to PDF, and was designed to “shake out” any issues with software installation.

The pandoc command line options I used to start with were:

pandoc -f markdown \  writing.md \  -o writing.pdf

In this example, I’m telling pandoc to expect markdown-styled input, to use the file writing.md as the input file and to write to the output file writing.pdf. pandoc infers that the output file is PDF-formatted from the .pdf extension.

Next, I wanted to include citations. First, I exported my Zotero citations to a .bib-formatted file, using the BetterBibtex extension. I stored this in a directory called bibliography in the same directory as my writing.md file. The command line options I used here were:

$pandoc -f markdown \ --filter=pandoc-citeproc \ --bibliography=bibliography/blog-post-citations.bib \ writing.md \ -o writing.pdf Note here the two additional commands – the --filter used to invoke pandoc-citeproc, and the --bibliography filter to include a BibTex formatted file. This worked well, and generated a plainly formatted PDF (based on the pandoc default format). Using a yaml file for metadata Becoming more advanced with pandoc, I decided to experiment with including a yaml file to help generate the document. The yaml file can specify metadata such as author, date and so on, which can then be substituted into the PDF file – if the intermediary LaTeX template accommodates these values. The basic LaTeX template included with pandoc includes values for author, title, date and abstract. Here’s the yaml file I used for this example: --- author: Kathy Reid title: blog post on pandoc date: December 2020 abstract: | This is the abstract. ... Note that the yaml file must start with three dashes ---and end with three periods ... The pandoc command line options I used to include metadata were: $ pandoc -f markdown+yaml_metadata_block \
--filter=pandoc-citeproc \
--bibliography=bibliography/blog-post-citations.bib \
-o writing.pdf

Note here that in the -f switch an additional option for yaml_metadata_block is given, and that the yaml file is listed after the first input file, writing.md. By adding the metadata.yml file in the command line, pandoc considers them both to be input files.

By using the yaml file, this automatically appended author, title, date and abstract information to the resulting PDF.

I also found that I could control the margins and paper size of the resulting PDF file by controlling these in the yaml file.

---
author: Kathy Reid
title: blog post on pandoc
date: December 2020
abstract: |
This is the abstract.
fontsize: 12pt
papersize: a4
margin-top: 25mm
margin-right: 25mm
margin-bottom: 25mm
margin-left: 25mm
...

It took a little while to get the hang of working with yaml, and I wanted a way to be able to inspect the output of the pandoc process. To do this, I added a switch to the command line option, and also piped the output of the command to a logging file.

$pandoc -f markdown+yaml_metadata_block \ --verbose --filter=pandoc-citeproc \ --bibliography=bibliography/blog-post-citations.bib \ writing.md metadata.yml \ -o writing.pdf > pandoc-log.txt The --verbose switch tells pandoc to use verbose logging, and the logging is piped to pandoc-log.txt. If the output wasn’t piped to a file, it would appear on the screen as stdout, and because it’s verbose, it’s hard to read – it’s much easier to pipe it to a file to inspect it. Working with other LaTeX templates Now that I had a Markdown to PDF via LaTeX workflow working reasonably well, it was time to experiment using an academic writing workflow with other templates. Many publications provide a LaTeX template for submission, such as these from the ACM, and ideally I wanted to be able to go from Markdown to a journal template using pandoc. I’d come across other blog posts where similar goals had been attempted, but this proved significantly harder to implement than I’d anticipated. My first attempt entailed trying to replicate the work Daniel Graziotin had done here – but I ran into several issues. After copying over the table-filter.py file from the blog post, an ACM .cls file, and copying the ACM pdf file to default.pdf in my pandoc-data directory, and running the below command, I got the following error. $ pandoc -f markdown+yaml_metadata_block \
--verbose \
--data-dir=pandoc-data \
--variable documentclass=acmart \
--variable classname=acmlarge \
--filter=pandoc-citeproc \
--filter=table-filter.py \
--bibliography=bibliography/blog-post-citations.bib \
-o writing.pdf
Error running filter table-filter.py:Could not find executable python

My first thought was that python somehow was aliased to an older version of python – ie python 2. To verify this I ran:

$which python This didn’t return anything, which explained the error. From this, I assumed that pandoc was expecting that the python alias resolved to the current python. I didn’t know how to change this – for example changing the pandoc preferences to point to the right python binary. Instead, I created a symlink so that pandoc could find python3. $ pwd/usr/bin
$ls | grep python python3 python3.8 python3.8-config python3-config python3-futurize python3-pasteurize x86_64-linux-gnu-python3.8-config x86_64-linux-gnu-python3-config$ sudo ln -s python3 python 

Installing pandocfilters package for Python

I attempted to run the pandoc command again but ran into another error.

Traceback (most recent call last):
File "table-filter.py", line 6, in
import pandocfilters as pf
ModuleNotFoundError: No module named 'pandocfilters'
Error running filter table-filter.py:
Filter returned error status 1

My guess here was that the python module pandocfilters had not been installed via pip. I installed this through pip.

$pip3 install pandocfiltersCollecting pandocfiltersDownloading pandocfilters-1.4.3.tar.gz (16 kB)Building wheels for collected packages: pandocfiltersBuilding wheel for pandocfilters (setup.py) … doneCreated wheel for pandocfilters: filename=pandocfilters-1.4.3-py3-none-any.whl size=7991 sha256=3c4445092ee0c8b00e2eab814ad69ca91d691d2567c12adbc4bcc4fb82928701Stored in directory: /home/kathyreid/.cache/pip/wheels/fc/39/52/8d6f3cec1cca4ceb44d658427c35711b19d89dbc4914af657fSuccessfully built pandocfiltersInstalling collected packages: pandocfiltersSuccessfully installed pandocfilters-1.4.3 This again swapped one error for another. Error producing PDF. ! LaTeX Error: Missing \begin{document}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. … l.55 u Luckily, I had set –verbose, and piped to a log file. I went digging through the log file to see if I could find anything useful. Class acmart Warning: You do not have the libertine package installed. Please upgrade your TeX on input line 669. Class acmart Warning: You do not have the zi4 package installed. Please upgrade your TeX on input line 672. Class acmart Warning: You do not have the newtxmath package installed. Please upgrade your TeX on input line 675. After reading through this Stack Overflow article, these all looked like font issues. I used the solution given in the Stack Overflow article, which was to install another Ubuntu package: $ sudo apt-get install texlive-fonts-extra

Again, this swapped one error message for another.

! LaTeX Error: Command \Bbbk' already defined.
See the LaTeX manual or LaTeX Companion for explanation.Type H for immediate help.…
l.261 …ol{\Bbbk} {\mathord}{AMSb}{"7C}
! ==> Fatal error occurred, no output PDF file produced!Transcript written on ./tex2pdf.-bf3aef739e05d883/input.log.
Error producing PDF.! LaTeX Error: Command \Bbbk' already defined.
See the LaTeX manual or LaTeX Companion for explanation.Type H for immediate help.…
l.261 …ol{\Bbbk} {\mathord}{AMSb}{"7C}

Another Google search, another solution from Stack Overflow, which suggested removing the LaTeX line \usepackage{amssymb} from the template. The challenge was, I wasn’t sure where the LaTeX template for pandoc was stored. Reading through this Stack Overflow post, it looked liked the default LaTeX template is stored at:

~/.pandoc/templates

But on my system, this directory didn’t exist. I created it, and used a command to store a default LaTeX file in there. Then I was able to remove the line.

\usepackage{amssymb,amsmath}

This then resulted in yet another error which required Googling.

! LaTeX Error: Missing \begin{document}.See the LaTeX manual or LaTeX Companion for explanation.
Type H for immediate help
…l.54 u
! ==> Fatal error occurred, no output PDF file produced!
Looking through the log file, this was generated by the parskip.sty package. More Googling revealed another incompatibility with the ACM class. I removed the parskip package from the LaTeX default template, but was continually met with similar errors.
On the plus side though, I learned a lot about pandoc, how to reference with it, and how to produce simple LaTeX documents with this workflow.