A desktop GUI tool that reformats .docx manuscripts into journal-specific submission formats. Select your manuscript, pick a target journal format, and save the formatted output — no manual reformatting needed.
Currently supports MDPI and Elsevier formats. New formats can be added by dropping a single Python file into the formats/ directory.
Preparing a manuscript for journal submission means hours of tedious reformatting: adjusting fonts, margins, heading styles, table borders, reference formatting, and more. Each journal has its own template and requirements. If your paper gets rejected and you resubmit elsewhere, you start the formatting over again.
This tool automates that. It reads your manuscript once, classifies every element (title, abstract, headings, tables, figures, equations, references), and rebuilds the document in the target journal's format — complete with correct typography, spacing, three-line tables, and placeholder fields for authors and affiliations.
- Two journal formats — MDPI and Elsevier, with more coming
- Word-recognizable captions — Table and Figure captions use SEQ fields with bookmarks. Word's cross-references, "Insert Table of Figures", and auto-numbering all work. Named styles (
MDPI41tablecaption, etc.) are registered in the document. - Smart reference resolution — Three-tier pipeline: extract DOIs from reference text (regex), match against RIS records (local), look up unmatched via CrossRef API (free, no auth). Works regardless of input citation style.
- MDPI reference formatting —
N.+ TAB numbering (not[N]), italic journal names, en-dash page ranges, structured runs matching published MDPI papers - RIS bibliography import — Upload a
.risfile from Zotero, Mendeley, or any reference manager. The tool cross-checks hardwritten citations against RIS metadata by DOI, author names, year, and title. - CrossRef API lookup — For references without a
.rismatch, optionally look them up online via the free CrossRef API. Works by DOI or bibliographic text search. - Zotero field code embedding — Optionally wrap references in Zotero
ADDIN ZOTERO_ITEM CSL_CITATIONfield codes withw:dirtyflag and fullw:rPron every run, so Zotero can recognize and manage them after formatting - Plugin architecture — Drop a new
.pyfile informats/to add a journal format
| Format | Font | Page Layout | Reference Style |
|---|---|---|---|
| MDPI | Palatino Linotype 10pt | A4, asymmetric margins (2.5/1.6/1.27/1.27 cm) | Numbered [N] — Lastname, F.M.; ... Title. Journal Year, Vol, Pages. |
| Elsevier | Times New Roman 12pt | A4, 2.5cm uniform margins, 1.5x line spacing | Preserved from source (hanging indent, 11pt) |
Both formats produce three-line tables (top border, header-bottom border, table-bottom border) and preserve inline formatting (bold, italic, superscript, subscript) from your source document.
- Python 3.9+
- A
.docxmanuscript with standard heading styles (Heading 1, Heading 2, Heading 3)
git clone https://github.com/g-pachakis/journal_formatting.git
cd journal_formatting
python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activate
pip install -r requirements.txtpython manuscript_formatter.pyA window opens with these controls:
- Manuscript (.docx) — Select your manuscript file
- Bibliography (.ris) — Optionally load a
.risfile for reference matching - Format — Choose MDPI or Elsevier
- Options:
- "Embed Zotero field codes" — wraps references in Zotero-compatible field codes
- "Look up unmatched references via CrossRef" — uses the free CrossRef API for references not matched by RIS
- Format Manuscript — Processes and opens a Save As dialog
If you manage references with Zotero, Mendeley, EndNote, or any citation manager:
- Export your library as a
.risfile from your reference manager - Load the
.risfile in the tool alongside your manuscript - The tool matches each hardwritten reference (e.g.,
[1] Shannon, C.E. ...) against the RIS metadata using author names, year, title, and DOI - Matched references are reformatted in the target journal's style with correct author formatting, punctuation, and DOI links
- Unmatched references are preserved as-is from your manuscript
When the "Embed Zotero field codes" option is checked:
- Each matched reference is wrapped in a
ADDIN ZOTERO_ITEM CSL_CITATIONWord field code - The field contains complete CSL JSON bibliographic metadata
- When you open the output in Word with Zotero installed, Zotero can recognize and manage the citations
- You can then use Zotero's "Refresh" to update the bibliography or switch citation styles
This is useful when you want to continue editing the formatted manuscript while keeping Zotero citation management active.
manuscript.docx ─────┐
v
┌─────────┐ Classifies every paragraph, table, and element
│ reader.py│ into semantic types (heading, abstract, etc.)
└────┬─────┘ Returns structured data — not raw XML
│
refs.ris ────┐ │
v v
┌──────────────────┐
│ ris_parser.py │ Matches [N] citations to RIS records
│ citation_fmt.py │ Reformats matched refs in journal style
└────────┬─────────┘
│
v
┌──────────────────┐
│ formats/*.py │ Builds .docx with journal-specific styles,
│ caption_fields │ SEQ field captions, optional Zotero codes
└────────┬─────────┘
│
v
formatted_output.docx
| Element | Detection Method |
|---|---|
| Title | First paragraph before any heading |
| Abstract | Paragraph(s) after "Abstract" heading |
| Keywords | Text starting with "Keywords:" |
| Headings (H1/H2/H3) | Word heading styles |
| Body paragraphs | Default classification |
| Table captions | Text matching "Table N." |
| Tables | Full cell data with merged cell support |
| Table footnotes | Text starting with * |
| Figure captions | Text matching "Figure N" |
| Equations | Text containing "(Eq. N)" |
| References | [N] pattern or Bibliography style |
MDPI uses a numbered citation style. References are formatted in the MDPIBibliography style (hanging indent, 10pt) with N. + TAB numbering — matching published MDPI papers.
When metadata is available (via .ris or CrossRef), references are automatically formatted as:
Journal article:
1. Shannon, C.E.; Weaver, W. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Book:
2. Smith, J.D. Introduction to Chemical Engineering, 3rd; Wiley: New York, 2020.
Book chapter:
3. Jones, R.A. Absorption in Packed Columns. In Handbook of Chemical Engineering; Smith, J.D., Ed.; McGraw-Hill: New York, 2019; pp. 145–198.
Key formatting details:
- Author names:
Lastname, F.M.with semicolons between authors - Journal names are italic (run-level formatting, not just text)
- Page ranges use en-dash (–) not hyphen (-)
- DOIs as full URLs
N.+ TAB numbering (not[N]brackets) matching the MDPI standard- Hanging indent via the
MDPIBibliographynamed Word style
For best results, your source .docx should use:
- Heading 1 style for main sections (Introduction, Methods, Results, etc.)
- Heading 2 / Heading 3 for subsections
- Standard Word tables (not images of tables)
Keywords:on its own paragraph- References as individual paragraphs starting with
[1],[2], etc.
The tool handles numbered headings (e.g., "1. Introduction") and unnumbered ones equally well.
Create a file in formats/ — for example formats/springer.py:
"""Springer Nature format plugin."""
import re
from docx import Document as DocxDocument
from docx.shared import Pt, Cm
from docx.enum.text import WD_ALIGN_PARAGRAPH
from caption_fields import add_caption_with_seq
FORMAT_NAME = 'Springer' # Shown in the GUI
FORMAT_SUFFIX = '_Springer' # Default output filename suffix
def build(items, output_path, ris_data=None, zotero_enabled=False):
"""Build formatted document from reader items.
Args:
items: List of dicts from reader.read_manuscript()
output_path: Where to save the .docx
ris_data: Optional list of RIS records (from ris_parser.parse_ris)
zotero_enabled: Whether to embed Zotero field codes
Returns:
output_path
"""
doc = DocxDocument()
for item in items:
if item['type'] == 'table_caption':
# Use SEQ field for Word-recognizable caption
import re as _re
m = _re.match(r'^Table\s+(\d+)\.?\s*(.*)', item['text'])
if m:
para = doc.add_paragraph()
add_caption_with_seq(para, 'Table', m.group(1),
description=m.group(2))
elif item['type'] == 'paragraph':
para = doc.add_paragraph()
for run_data in item['runs']:
run = para.add_run(run_data['text'])
run.bold = run_data['bold']
run.italic = run_data['italic']
# ... handle other types
doc.save(output_path)
return output_pathSave the file, restart the tool, and your new format appears automatically.
Paragraph items (type != 'table'):
{
'type': str, # Content classification
'text': str, # Full plain text
'runs': [ # Formatted text segments
{
'text': str,
'bold': bool,
'italic': bool,
'superscript': bool,
'subscript': bool,
}
]
}Table items (type == 'table'):
{
'type': 'table',
'text': '',
'runs': [],
'rows': [ # List of rows, each a list of cells
[
{
'text': str,
'gridspan': int, # Column span (1 = normal)
'runs': [{ ... }], # Same format as above
'vmerge_continue': bool, # True = vertically merged continuation
}
]
]
}journal_formatting/
├── manuscript_formatter.py # Tkinter GUI
├── reader.py # Manuscript reader & classifier
├── ris_parser.py # RIS bibliography file parser
├── citation_formatter.py # MDPI reference formatter (structured runs)
├── caption_fields.py # Word SEQ captions, bookmarks, Zotero fields, named styles
├── crossref_client.py # CrossRef API client (DOI lookup + text search)
├── reference_engine.py # Reference resolution pipeline orchestrator
├── formats/
│ ├── __init__.py # Plugin auto-discovery
│ ├── elsevier.py # Elsevier format builder
│ └── mdpi.py # MDPI format builder (v2 — named styles, reference engine)
├── tests/ # 66 tests
│ ├── conftest.py # Test fixtures (sample manuscripts)
│ ├── test_reader.py # Reader tests (14)
│ ├── test_registry.py # Plugin registry tests (4)
│ ├── test_elsevier.py # Elsevier builder tests (8)
│ ├── test_mdpi.py # MDPI builder tests (10)
│ ├── test_integration.py # End-to-end pipeline tests (4)
│ ├── test_ris_parser.py # RIS parser & formatter tests (12)
│ ├── test_caption_fields.py # Caption & Zotero field tests (4)
│ ├── test_crossref.py # CrossRef client tests (5)
│ └── test_reference_engine.py # Reference pipeline tests (5)
├── requirements.txt
└── .gitignore
# Activate venv first, then:
python -m pytest tests/ -v66 tests cover the reader, plugin registry, both format builders, RIS parsing, citation formatting (with italic runs), caption fields (with bookmarks), CrossRef client, reference resolution engine, Zotero integration, and full pipeline.
| Package | Purpose |
|---|---|
| python-docx | Read and write .docx files |
| lxml | XML manipulation for table borders, SEQ fields, and Zotero field codes |
| pytest | Testing (dev only) |
Tkinter is included with Python's standard library. No external citation libraries needed — RIS parsing and formatting are built in.
- Images are not transferred — the tool processes text, tables, and structure. Figures need to be re-inserted manually.
- Complex equations (MathType, Equation Editor objects) are read as text only.
- Custom styles in your source document may not be detected — use standard Word heading styles for best results.
- Multi-column layouts are not applied (both MDPI and Elsevier submission formats use single-column).
- RIS matching uses author names, year, and title fragments — unusual author name formats may not match perfectly.
- Zotero field codes require Zotero's Word plugin to be functional after opening the document.
MIT
This tool was built to solve a real pain point in academic publishing. If you use it, your feedback would be genuinely valuable:
As a researcher:
- Which journal formats would you like to see added next? (Springer, IEEE, ACS, RSC, Wiley, ...)
- Does the output match what you'd expect from your target journal's template?
- Are there manuscript elements the tool misses or misclassifies?
- How well does the RIS matching work with your reference library?
- Would you prefer a command-line interface in addition to the GUI?
As a developer:
- See a bug or edge case? Open an issue.
- Want to contribute a new format plugin? PRs are welcome — the plugin API is documented above.
- Have ideas for the architecture? The reader/plugin split is designed to be extended.
- Want to add a new citation style? See
citation_formatter.pyfor the pattern.
Get in touch: Open an issue on GitHub or submit a pull request. All contributions — bug reports, format requests, code, documentation — are appreciated.