This is an auxiliary Python script for parsing and checking text files. The main problem it solves is verifying Markdown links between text files. For example, the Docusaurus, a static site generator that we use for documentation, recognizes the link to a non-existing file like [a link to a file](non-existing-path-to-file). But it cannot recognize a link to a non-existing heading inside a file: [link to a paragraph](existing-file#non-existing-heading). So, when we change a heading we might break a link to it in other files. This script is intended to catch such situations.
The script is configurable. You can define configuration parameters either in a config.ini file or pass them as arguments when running the script.
The configuration must have a section [MAIN] and the following parameters:
rootis the root directory where the files are scanned. Note that the script works with all files of the given extension in the root recursively. The root can be an absolute path or a relative path.file_extensionis the extension of files that are being parsed. Accepts only one value with a leading dot, for example.md. You can use any text extension that can be opened with the default Pythonopenfunction.actionis the intended action of the script. Accepts one of the following values:check_linksparses all files with a given extension in a given root and checks all Markdown links. If there are broken links, it outputs them in a stream specified inoutputparameter.
searchasks for a non-empty string to search, parses all files with the given extension in the given root, and searches the string. Then outputs the result in a stream specified inoutputparameter.
If config.ini misses [MAIN] section, the default configuration is used.
If config.ini misses one or more obligatory parameters in [MAIN] section, the default values are used.
If a path specified in root parameter is not found or not a directory or empty string, the default value is used.
If action has a value that is not accepted, the default value is used.
All other sections and parameters are ignored.
Default configuration:
[MAIN]
root: .
file_extension: .mdx
action: check_links
Note if you pass configuration arguments, the config file is ignored.
All arguments are optional, if you don't pass an argument the default value will be used. You can use the following configuration arguments in command line:
-ror--rootto define the root directory-for--fileto define the file extension
-aor--actionto define the action. The argument acceptscheck_linksorsearchvalues only
Run python3 main.py -h to see the help in console.
Example of using arguments:
python3 main.py -r ../../docs -f .md -a print_links
TBD.
You can run the script as a Github action, for example, when someone creates a new PR.
- Create a
yamlfile in.github/workflowsdirectory. - Copy the following snippet into
yamlfile:
name: Check Markdown links
on:
pull_request:
branches: [ "main", "test", "develop" ]
workflow_dispatch:
jobs:
links-checker:
name: links-checker
runs-on: ubuntu-latest
steps:
- name: Normal checkout
uses: actions/checkout@v3
- name: Check-out parser repository
uses: actions/checkout@v2
with:
repository: paveltovchigrechko/markdown-parser
path: "markdown-parser"
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Run parser
run: python3 markdown-parser/main.py