Document ClassesDesign of cls to capture publication metadata

Information and discussion about specific document classes and how to create your own document classes.
Post Reply
mccurley
Posts: 1
Joined: Mon Sep 13, 2021 7:06 pm

Design of cls to capture publication metadata

Post by mccurley »

I am working on a new cls for a specific journal in mathematics and computer science. This is intended to be a diamond open access journal, and as a result it is imperative that we minimize the amount of human time in handling the editing tasks. One of the major requirements for a new journal is to conform with the various export formats of metadata, including (but not limited to) Google Scholar, datacite, DOAJ, DBLP, crossref, and scopus. This metadata includes:
1. title and abstract
2. a list of authors, with surname, email, orcid id, urls, and affiliations.
3. a list of affiliations, with ROR id, city, country, etc.
4. the mapping from authors to affiliations
5. a list of bibliographic citations (including DOIs when possible).

LaTeX has mostly been designed in the past to display this information rather than capture it in any machine-readable format. I am intending that we should separate the capture of metadata from the presentation of metadata in the output PDF or HTML/MATHML.

The metadata can of course be stipulated to be in the LaTeX files, but parsing LaTeX with a high-level language is problematic. It is also possible to embed some metadata into a PDF file, but the basic metadata such as pdftitle is inadequate, and so is the XMP format for XML metadata (in part because every consumer of metadata tends to specify their own schema). As an example, many indexing agencies now require ORCID IDs for authors, but this is not part of any standard XMP schema (same for ROR ids of institutions).

In setting up our journal production workflow, our plan is to have authors submit their LaTeX sources and have that executed in the cloud on our server (running LaTeX inside of docker is a whole can of worms in itself). ACM is doing something similar (so is arxiv), but their workflow is mostly proprietary and their LaTeX style files don't seem to adequately capture the metadata at submission time.

We have therefore taken a slightly different approach. We have implemented a cls format for our journal that requires metadata to be properly formatted in the LaTeX file, and when the author runs pdflatex on it, it produces a yaml file with all of the metadata we need. As a result, we are requiring one \author tag per author, and one \affiliation tag per affiliation. The mapping between authors and affiliations is many-to-many, and this is provided as an argument to the \author macro (similarly for funding agencies). One thing that is still missing from our solution is to capture metadata about citations - ACM has the authors upload their bbl files, but it's not clear if that even includes ORCID IDs from bibtex files.

I have surveyed quite a few different LaTeX journal and conference formats, but have found none that adequately capture the metadata that will be required about publications in the future. The one that probably comes the closest is the relatively new acmart.cls from ACM. Another one that comes close is the elsevier elsearticle cls. Moreover, it appears the most common open access publication platform is OJS, which has essentially no support for LaTeX at all.

That was a long introduction, but it raises several questions:
1. have I overlooked any other packages that adequately capture modern metadata about a journal article. Things like authblk are out of date at this point.
2. The situation for bst files is even worse - is it possible to produce a machine-parsable structured format from a bst file? Has anyone done that?
3. Does it seem to others like it is overdue to have a cls that other journals can use for this purpose?
4. are there any publication editing workflow products like OJS that work well with LaTeX? I suspect this is a major barrier to wider adoption of open access publishing, because the editing process is currently quite expensive (e.g., $30/page).

Recommended reading 2024:

LaTeXguide.org • LaTeX-Cookbook.net • TikZ.org

NEW: TikZ book now 40% off at Amazon.com for a short time.

Post Reply