Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

This repository comprises plugins which allow extending the functionality of [ONLYOFFICE DocSpace](https://www.onlyoffice.com/docspace.aspx).

* [ZIP Archives](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/archives)
* [Codemirror](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/codemirror)
* [ConvertToMarkdown](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/convert-to-markdown)
* [draw.io](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/draw.io)
* [Markdown](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/markdown)
* [ImageEditor](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/image-editor)
* [PDFConverter](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/pdf-converter)
* [SpeechToText](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/speech-to-text)
* [ZIP Archives](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/archives)
* [ConvertToMarkdown](https://github.com/ONLYOFFICE/docspace-plugins/tree/master/convert-to-markdown)

## User feedback and support

Expand Down
12 changes: 6 additions & 6 deletions convert-to-markdown/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Markdown Converter - ONLYOFFICE DocSpace Plugin

Convert DOCX and TXT files to Markdown format directly in DocSpace with a single click.
Convert DOCX, HTML and TXT files to Markdown format directly in DocSpace with a single click.

## Features

- **Easy Conversion**: Right-click any DOCX or TXT file and select "Convert to Markdown"
- **Easy Conversion**: Right-click any DOCX, HTML or TXT file and select "Convert to Markdown"
- **Client-Side Processing**: All conversions happen in your browser - no external services required
- **Supported Formats**:
- `.docx` - Microsoft Word documents
- `.docx` - Word documents
- `.txt` - Plain text files
- `.html` - HTML files
- **Fast & Reliable**: Uses industry-standard libraries (mammoth.js for DOCX parsing, Turndown for HTML-to-Markdown conversion)
Expand All @@ -16,7 +16,7 @@ Convert DOCX and TXT files to Markdown format directly in DocSpace with a single
## How to Use

1. Navigate to a folder in DocSpace
2. Right-click on a DOCX, TXT, or HTML file
2. Right-click on a DOCX, HTML or TXT file
3. Select **"Convert to Markdown"** from the context menu
4. The converted `.md` file will be created in the same folder

Expand All @@ -27,7 +27,7 @@ Convert DOCX and TXT files to Markdown format directly in DocSpace with a single

- **DOCX Files**: Parsed using [mammoth.js](https://github.com/mwilliamson/mammoth.js) to extract content as HTML, then converted to Markdown
- **TXT Files**: Converted directly to Markdown format
- **HTML Files**: Converted directly to Markdown format
- **HTML Files**: Converted directly to Markdown format using [Turndown](https://github.com/mixmark-io/turndown)

### Technologies Used

Expand All @@ -43,7 +43,7 @@ Convert DOCX and TXT files to Markdown format directly in DocSpace with a single
## Troubleshooting

### Conversion Failed
- Check that the file is a valid DOCX, TXT or HTML file
- Check that the file is a valid DOCX, HTML or TXT file
- Ensure you have permission to create files in the folder
- Try refreshing the page and attempting again

Expand Down
66 changes: 51 additions & 15 deletions convert-to-markdown/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 6 additions & 4 deletions convert-to-markdown/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
"version": "1.0.0",
"description": "Convert DOCX, TXT and HTML files to Markdown format directly in DocSpace",
"license": "Apache-2.0",
"homepage": "https://github.com/ONLYOFFICE/docspace-plugins/tree/master/convert-to-markdown",
"author": "ONLYOFFICE",
"maintainers": [
"Ascensio System SIA <integration@onlyoffice.com> (https://www.onlyoffice.com)"
Expand All @@ -13,6 +12,7 @@
},
"pluginName": "MarkdownConverter",
"logo": "icon-md.svg",
"homepage": "",
"main": "index.ts",
"private": true,
"scopes": [
Expand All @@ -28,12 +28,14 @@
"prettier": "2.8.6",
"ts-loader": "^9.3.1",
"typescript": "^4.7.4",
"webpack": "^5.105.0",
"webpack": "^5.74.0",
"webpack-cli": "^4.10.0"
},
"dependencies": {
"@onlyoffice/docspace-plugin-sdk": "^2.0.0",
"@truto/turndown-plugin-gfm": "^1.0.2",
"mammoth": "^1.11.0",
"turndown": "^7.2.2"
}
"turndown": "^7.2.2",
"turndown-plugin-gfm": "^1.0.2"
},
}
87 changes: 81 additions & 6 deletions convert-to-markdown/src/ConvertFile.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,38 @@ import {
ToastType,
} from "@onlyoffice/docspace-plugin-sdk";
import TurndownService from "turndown";
import { tables } from "turndown-plugin-gfm";
import mammoth from "mammoth";

import plugin from ".";

// Patch mammoths internal Element.prototype.text which throws "Not implemented"
// when an element has multiple or non text children
// Both convertToHtml and extractRawText hit the same reader path, so patching
// at the source is the only reliable fix for the issue
const patchMammothNodes = (): void => {
try {
// eslint-disable-next-line @typescript-eslint/no-var-requires
const nodes = require("mammoth/lib/xml/nodes");
const collectText = (children: any[]): string =>
children
.map((child: any) => {
if (child.type === "text") return child.value;
if (Array.isArray(child.children)) return collectText(child.children);
return "";
})
.join("");

nodes.Element.prototype.text = function (): string {
if (this.children.length === 0) return "";
return collectText(this.children);
};
} catch (_) {
// If patching fails mammoth will still work for docs that dont hit the bug
}
};
patchMammothNodes();

// Supported file extensions
const SUPPORTED_EXTENSIONS = {
docx: ".docx",
Expand All @@ -37,12 +65,52 @@ class ConvertFile {
private apiURL = "";
private createLock = false;

private turndownService = new TurndownService({
headingStyle: "atx",
codeBlockStyle: "fenced",
emDelimiter: "*",
bulletListMarker: "-",
});
private turndownService = (() => {
const service = new TurndownService({
headingStyle: "atx",
codeBlockStyle: "fenced",
emDelimiter: "*",
bulletListMarker: "-",
});
service.use(tables);

// Handle tables that have no <thead> (all rows in <tbody>)
// The GFM tables plugin only produces pipe-tables when it finds <th> / <thead>
// so we need a separate rule that treats the first <tr> as the header
service.addRule("tableWithoutHeader", {
filter: (node: HTMLElement): boolean => {
return node.nodeName === "TABLE" && !node.querySelector("thead");
},
replacement: (_content: string, node: Node): string => {
const el = node as HTMLElement;
const rows = Array.from(el.querySelectorAll("tr"));
if (rows.length === 0) return _content;

const getCells = (row: Element): string[] =>
Array.from(row.querySelectorAll("td, th")).map((cell) =>
(cell.textContent || "")
.trim()
.replace(/\s+/g, " ")
.replace(/\|/g, "\\|")
);

const allRows = rows.map(getCells);
const header = allRows[0];
const separator = header.map(() => "---");
const body = allRows.slice(1);

const fmt = (cells: string[]): string => `| ${cells.join(" | ")} |`;

return (
"\n\n" +
[fmt(header), fmt(separator), ...body.map(fmt)].join("\n") +
"\n\n"
);
},
});

return service;
})();

private createAPIUrl = (): void => {
const api = plugin.getAPI();
Expand Down Expand Up @@ -123,6 +191,13 @@ class ConvertFile {
if (!response.ok) {
throw new Error(`Failed to download file: ${response.statusText}`);
}

const contentType = response.headers.get("Content-Type") || "";
if (contentType.includes("application/pdf")) {
throw new Error(
"This file is watermark-protected. Disable the \"Add watermarks to documents\" room setting and try again."
);
}

const buffer = await response.arrayBuffer();
if (!buffer || buffer.byteLength === 0) {
Expand Down
8 changes: 8 additions & 0 deletions convert-to-markdown/src/turndown-plugin-gfm.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
declare module "turndown-plugin-gfm" {
import TurndownService from "turndown";

export function gfm(service: TurndownService): void;
export function tables(service: TurndownService): void;
export function strikethrough(service: TurndownService): void;
export function taskListItems(service: TurndownService): void;
}
Loading
Loading