Journal Formatting Tool

A desktop GUI tool that reformats .docx manuscripts into journal-specific submission formats. Select your manuscript, pick a target journal format, and save the formatted output — no manual reformatting needed.

Currently supports MDPI and Elsevier formats. New formats can be added by dropping a single Python file into the formats/ directory.

Why This Exists

Preparing a manuscript for journal submission means hours of tedious reformatting: adjusting fonts, margins, heading styles, table borders, reference formatting, and more. Each journal has its own template and requirements. If your paper gets rejected and you resubmit elsewhere, you start the formatting over again.

This tool automates that. It reads your manuscript once, classifies every element (title, abstract, headings, tables, figures, equations, references), and rebuilds the document in the target journal's format — complete with correct typography, spacing, three-line tables, and placeholder fields for authors and affiliations.

Key Features

Two journal formats — MDPI and Elsevier, with more coming
Word-recognizable captions — Table and Figure captions use SEQ fields with bookmarks. Word's cross-references, "Insert Table of Figures", and auto-numbering all work. Named styles (MDPI41tablecaption, etc.) are registered in the document.
Smart reference resolution — Three-tier pipeline: extract DOIs from reference text (regex), match against RIS records (local), look up unmatched via CrossRef API (free, no auth). Works regardless of input citation style.
MDPI reference formatting — N. + TAB numbering (not [N]), italic journal names, en-dash page ranges, structured runs matching published MDPI papers
RIS bibliography import — Upload a .ris file from Zotero, Mendeley, or any reference manager. The tool cross-checks hardwritten citations against RIS metadata by DOI, author names, year, and title.
CrossRef API lookup — For references without a .ris match, optionally look them up online via the free CrossRef API. Works by DOI or bibliographic text search.
Zotero field code embedding — Optionally wrap references in Zotero ADDIN ZOTERO_ITEM CSL_CITATION field codes with w:dirty flag and full w:rPr on every run, so Zotero can recognize and manage them after formatting
Plugin architecture — Drop a new .py file in formats/ to add a journal format

Supported Formats

Format	Font	Page Layout	Reference Style
MDPI	Palatino Linotype 10pt	A4, asymmetric margins (2.5/1.6/1.27/1.27 cm)	Numbered `[N]` — `Lastname, F.M.; ... Title. Journal Year, Vol, Pages.`
Elsevier	Times New Roman 12pt	A4, 2.5cm uniform margins, 1.5x line spacing	Preserved from source (hanging indent, 11pt)

Both formats produce three-line tables (top border, header-bottom border, table-bottom border) and preserve inline formatting (bold, italic, superscript, subscript) from your source document.

Quick Start

Requirements

Python 3.9+
A .docx manuscript with standard heading styles (Heading 1, Heading 2, Heading 3)

Installation

git clone https://github.com/g-pachakis/journal_formatting.git
cd journal_formatting
python -m venv venv

# Windows
venv\Scripts\activate

# macOS / Linux
source venv/bin/activate

pip install -r requirements.txt

Run

python manuscript_formatter.py

A window opens with these controls:

Manuscript (.docx) — Select your manuscript file
Bibliography (.ris) — Optionally load a .ris file for reference matching
Format — Choose MDPI or Elsevier
Options:
- "Embed Zotero field codes" — wraps references in Zotero-compatible field codes
- "Look up unmatched references via CrossRef" — uses the free CrossRef API for references not matched by RIS
Format Manuscript — Processes and opens a Save As dialog

RIS Bibliography Workflow

If you manage references with Zotero, Mendeley, EndNote, or any citation manager:

Export your library as a .ris file from your reference manager
Load the .ris file in the tool alongside your manuscript
The tool matches each hardwritten reference (e.g., [1] Shannon, C.E. ...) against the RIS metadata using author names, year, title, and DOI
Matched references are reformatted in the target journal's style with correct author formatting, punctuation, and DOI links
Unmatched references are preserved as-is from your manuscript

Zotero Field Codes

When the "Embed Zotero field codes" option is checked:

Each matched reference is wrapped in a ADDIN ZOTERO_ITEM CSL_CITATION Word field code
The field contains complete CSL JSON bibliographic metadata
When you open the output in Word with Zotero installed, Zotero can recognize and manage the citations
You can then use Zotero's "Refresh" to update the bibliography or switch citation styles

This is useful when you want to continue editing the formatted manuscript while keeping Zotero citation management active.

How It Works

manuscript.docx ─────┐
                      v
                 ┌─────────┐     Classifies every paragraph, table, and element
                 │ reader.py│     into semantic types (heading, abstract, etc.)
                 └────┬─────┘     Returns structured data — not raw XML
                      │
refs.ris ────┐        │
             v        v
        ┌──────────────────┐
        │  ris_parser.py   │     Matches [N] citations to RIS records
        │  citation_fmt.py │     Reformats matched refs in journal style
        └────────┬─────────┘
                 │
                 v
        ┌──────────────────┐
        │  formats/*.py    │     Builds .docx with journal-specific styles,
        │  caption_fields  │     SEQ field captions, optional Zotero codes
        └────────┬─────────┘
                 │
                 v
          formatted_output.docx

What the Reader Detects

Element	Detection Method
Title	First paragraph before any heading
Abstract	Paragraph(s) after "Abstract" heading
Keywords	Text starting with "Keywords:"
Headings (H1/H2/H3)	Word heading styles
Body paragraphs	Default classification
Table captions	Text matching "Table N."
Tables	Full cell data with merged cell support
Table footnotes	Text starting with `*`
Figure captions	Text matching "Figure N"
Equations	Text containing "(Eq. N)"
References	`[N]` pattern or Bibliography style

MDPI Reference Format

MDPI uses a numbered citation style. References are formatted in the MDPIBibliography style (hanging indent, 10pt) with N. + TAB numbering — matching published MDPI papers.

When metadata is available (via .ris or CrossRef), references are automatically formatted as:

Journal article:

1.	Shannon, C.E.; Weaver, W. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

Book:

2.	Smith, J.D. Introduction to Chemical Engineering, 3rd; Wiley: New York, 2020.

Book chapter:

3.	Jones, R.A. Absorption in Packed Columns. In Handbook of Chemical Engineering; Smith, J.D., Ed.; McGraw-Hill: New York, 2019; pp. 145–198.

Key formatting details:

Author names: Lastname, F.M. with semicolons between authors
Journal names are italic (run-level formatting, not just text)
Page ranges use en-dash (–) not hyphen (-)
DOIs as full URLs
N. + TAB numbering (not [N] brackets) matching the MDPI standard
Hanging indent via the MDPIBibliography named Word style

Manuscript Preparation Tips

For best results, your source .docx should use:

Heading 1 style for main sections (Introduction, Methods, Results, etc.)
Heading 2 / Heading 3 for subsections
Standard Word tables (not images of tables)
Keywords: on its own paragraph
References as individual paragraphs starting with [1], [2], etc.

The tool handles numbered headings (e.g., "1. Introduction") and unnumbered ones equally well.

Adding a New Journal Format

Create a file in formats/ — for example formats/springer.py:

"""Springer Nature format plugin."""

import re
from docx import Document as DocxDocument
from docx.shared import Pt, Cm
from docx.enum.text import WD_ALIGN_PARAGRAPH
from caption_fields import add_caption_with_seq

FORMAT_NAME = 'Springer'       # Shown in the GUI
FORMAT_SUFFIX = '_Springer'    # Default output filename suffix

def build(items, output_path, ris_data=None, zotero_enabled=False):
    """Build formatted document from reader items.

    Args:
        items: List of dicts from reader.read_manuscript()
        output_path: Where to save the .docx
        ris_data: Optional list of RIS records (from ris_parser.parse_ris)
        zotero_enabled: Whether to embed Zotero field codes

    Returns:
        output_path
    """
    doc = DocxDocument()

    for item in items:
        if item['type'] == 'table_caption':
            # Use SEQ field for Word-recognizable caption
            import re as _re
            m = _re.match(r'^Table\s+(\d+)\.?\s*(.*)', item['text'])
            if m:
                para = doc.add_paragraph()
                add_caption_with_seq(para, 'Table', m.group(1),
                                     description=m.group(2))
        elif item['type'] == 'paragraph':
            para = doc.add_paragraph()
            for run_data in item['runs']:
                run = para.add_run(run_data['text'])
                run.bold = run_data['bold']
                run.italic = run_data['italic']
        # ... handle other types

    doc.save(output_path)
    return output_path

Save the file, restart the tool, and your new format appears automatically.

Item Schema Reference

Paragraph items (type != 'table'):

{
    'type': str,        # Content classification
    'text': str,        # Full plain text
    'runs': [           # Formatted text segments
        {
            'text': str,
            'bold': bool,
            'italic': bool,
            'superscript': bool,
            'subscript': bool,
        }
    ]
}

Table items (type == 'table'):

{
    'type': 'table',
    'text': '',
    'runs': [],
    'rows': [           # List of rows, each a list of cells
        [
            {
                'text': str,
                'gridspan': int,            # Column span (1 = normal)
                'runs': [{ ... }],          # Same format as above
                'vmerge_continue': bool,    # True = vertically merged continuation
            }
        ]
    ]
}

Project Structure

journal_formatting/
├── manuscript_formatter.py     # Tkinter GUI
├── reader.py                   # Manuscript reader & classifier
├── ris_parser.py               # RIS bibliography file parser
├── citation_formatter.py       # MDPI reference formatter (structured runs)
├── caption_fields.py           # Word SEQ captions, bookmarks, Zotero fields, named styles
├── crossref_client.py          # CrossRef API client (DOI lookup + text search)
├── reference_engine.py         # Reference resolution pipeline orchestrator
├── formats/
│   ├── __init__.py             # Plugin auto-discovery
│   ├── elsevier.py             # Elsevier format builder
│   └── mdpi.py                 # MDPI format builder (v2 — named styles, reference engine)
├── tests/                      # 66 tests
│   ├── conftest.py             # Test fixtures (sample manuscripts)
│   ├── test_reader.py          # Reader tests (14)
│   ├── test_registry.py        # Plugin registry tests (4)
│   ├── test_elsevier.py        # Elsevier builder tests (8)
│   ├── test_mdpi.py            # MDPI builder tests (10)
│   ├── test_integration.py     # End-to-end pipeline tests (4)
│   ├── test_ris_parser.py      # RIS parser & formatter tests (12)
│   ├── test_caption_fields.py  # Caption & Zotero field tests (4)
│   ├── test_crossref.py        # CrossRef client tests (5)
│   └── test_reference_engine.py # Reference pipeline tests (5)
├── requirements.txt
└── .gitignore

Running Tests

# Activate venv first, then:
python -m pytest tests/ -v

66 tests cover the reader, plugin registry, both format builders, RIS parsing, citation formatting (with italic runs), caption fields (with bookmarks), CrossRef client, reference resolution engine, Zotero integration, and full pipeline.

Dependencies

Package	Purpose
python-docx	Read and write `.docx` files
lxml	XML manipulation for table borders, SEQ fields, and Zotero field codes
pytest	Testing (dev only)

Tkinter is included with Python's standard library. No external citation libraries needed — RIS parsing and formatting are built in.

Limitations

Images are not transferred — the tool processes text, tables, and structure. Figures need to be re-inserted manually.
Complex equations (MathType, Equation Editor objects) are read as text only.
Custom styles in your source document may not be detected — use standard Word heading styles for best results.
Multi-column layouts are not applied (both MDPI and Elsevier submission formats use single-column).
RIS matching uses author names, year, and title fragments — unusual author name formats may not match perfectly.
Zotero field codes require Zotero's Word plugin to be functional after opening the document.

License

MIT

Feedback & Contributions

This tool was built to solve a real pain point in academic publishing. If you use it, your feedback would be genuinely valuable:

As a researcher:

Which journal formats would you like to see added next? (Springer, IEEE, ACS, RSC, Wiley, ...)
Does the output match what you'd expect from your target journal's template?
Are there manuscript elements the tool misses or misclassifies?
How well does the RIS matching work with your reference library?
Would you prefer a command-line interface in addition to the GUI?

As a developer:

See a bug or edge case? Open an issue.
Want to contribute a new format plugin? PRs are welcome — the plugin API is documented above.
Have ideas for the architecture? The reader/plugin split is designed to be extended.
Want to add a new citation style? See citation_formatter.py for the pattern.

Get in touch: Open an issue on GitHub or submit a pull request. All contributions — bug reports, format requests, code, documentation — are appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Journal Formatting Tool

Why This Exists

Key Features

Supported Formats

Quick Start

Requirements

Installation

Run

RIS Bibliography Workflow

Zotero Field Codes

How It Works

What the Reader Detects

MDPI Reference Format

Manuscript Preparation Tips

Adding a New Journal Format

Item Schema Reference

Project Structure

Running Tests

Dependencies

Limitations

License

Feedback & Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
formats		formats
tests		tests
.gitignore		.gitignore
README.md		README.md
caption_fields.py		caption_fields.py
citation_formatter.py		citation_formatter.py
crossref_client.py		crossref_client.py
manuscript_formatter.py		manuscript_formatter.py
reader.py		reader.py
reference_engine.py		reference_engine.py
requirements.txt		requirements.txt
ris_parser.py		ris_parser.py

Folders and files

Latest commit

History

Repository files navigation

Journal Formatting Tool

Why This Exists

Key Features

Supported Formats

Quick Start

Requirements

Installation

Run

RIS Bibliography Workflow

Zotero Field Codes

How It Works

What the Reader Detects

MDPI Reference Format

Manuscript Preparation Tips

Adding a New Journal Format

Item Schema Reference

Project Structure

Running Tests

Dependencies

Limitations

License

Feedback & Contributions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages