Designing for discovery

Why most OJS journals are invisible to Google Scholar — and what fixes it.

OJSJATSACADEMIC PUBLISHINGOPEN JOURNAL SYSTEMSEDITORIAL DESIGN

Most academic presses obsess over citation counts. The bottleneck for most journals isn’t the quality of the scholarship. It’s whether the article can be found at all.

Editors learn this the hard way. A press commissions a redesign, the new site looks beautiful, and six months later the article pages aren’t appearing in Google Scholar. The redesign broke the only thing that mattered.

This is almost always treated as an SEO problem. It almost never is.

Scholar doesn’t read your homepage

Google Scholar is not Google. It does not crawl marketing pages or guess what an article is from layout. It looks for two things on an article page: a clean machine-readable identification of the article, and a clean machine-readable list of its references.

When both are present, the article enters the index. When either is broken, the article is invisible — regardless of how the page looks to a human reader.

The two surfaces that decide indexing:

The JATS-derived metadata in the article HTML — title, authors, abstract, DOI, publication date, citation list, all emitted as <meta name="citation_*"> tags in the page head.
The schema.org ScholarlyArticle block — the same information again, in JSON-LD that any structured-data consumer (Scholar, Crossref, Bing, ChatGPT browsing) can parse.

These are not branding decisions. They are the indexing surface.

JATS is the source of truth

Every academic article published in OJS has a JATS XML file behind it. The XML carries the article’s structure — title, authors, affiliations, abstract, body, references — in a format designed by NCBI in the early 2000s specifically for scholarly indexing and typesetting.

When OJS renders an article page, it transforms that JATS XML into HTML. Most custom themes break this transform. Either by injecting layout markup that displaces the citation_* meta tags Scholar expects, or by reorganising the body in ways that drop the reference list, or by silently swallowing italics and smart quotes that change the meaning of a citation.

The first thing we check on any OJS theme audit is the article page’s <head>. If citation_title, citation_author, citation_doi, and citation_publication_date aren’t present and well-formed, the journal is invisible. No amount of “SEO optimisation” downstream fixes this — the upstream JATS-to-HTML transform is wrong.

Schema.org is the bridge

The second surface is ScholarlyArticle JSON-LD. It tells Scholar, Crossref, and the open web: this is an article, here’s its DOI, here are its authors, here are its citations.

A correct block should include:

@type: "ScholarlyArticle" with headline, datePublished, author, isPartOf linking the journal.
identifier for the DOI as a PropertyValue with propertyID: "doi".
citation array linking each referenced work by DOI where one exists.
abstract and keywords pulled from the JATS metadata.

Most OJS themes ship none of this. The default theme has partial coverage; most custom themes break what’s there. Adding it back is a structural decision, not a plugin install — the JATS-to-HTML transformer is the only place that knows the article’s full structure, and that’s where the JSON-LD should be emitted.

The sitemap nobody updates

The third failure mode is mundane: sitemap.xml. OJS ships with a sitemap that lists issue pages but not article pages directly. Custom themes routinely override the sitemap template without restoring the article URLs. The result is that Scholar crawls the homepage, finds links to a few recent issues, and stops.

A journal sitemap should list every article page — with lastmod dates from the article’s own publication metadata. Anything less and Scholar under-indexes, especially for older volumes that no longer link from the homepage navigation.

Structure first, surface after

Discovery in scholarly publishing isn’t a marketing problem. It’s a structural problem solved at the JATS-to-HTML layer. The citation tags, the schema.org block, the sitemap — these decide whether an article is read at all.

When a press hires us to redesign a journal, we audit the indexing surface before we look at typography or layout. The most beautiful article page in the world is invisible if Scholar can’t read it. The reverse — a plain page with correct structure — is read for decades.

Pick for indexing. Document the metadata layer. Then redesign.

NEWSLETTER

Get notified when we publish.

Occasional notes from the studio. No frequency promises, no marketing — just the writing, when it ships.