Skip to content

Latest commit

 

History

History
48 lines (33 loc) · 2.82 KB

File metadata and controls

48 lines (33 loc) · 2.82 KB

Product: GroupDocs.Parser — Document Parsing SDKs

Purpose

  • GroupDocs.Parser is a robust API for extracting text, images, metadata, and structured data from a wide range of documents (PDF, Word, Excel, PowerPoint, email, archives, and more).
  • Target audiences: software developers and technical buyers evaluating on‑premise document parsing and data‑extraction libraries and SDKs for .NET and Java (and related platforms/wrappers).

Brand and Product Names (do not translate)

  • GroupDocs, GroupDocs.Total
  • GroupDocs.Parser, GroupDocs.Viewer, GroupDocs.Conversion, GroupDocs.Comparison, GroupDocs.Signature, GroupDocs.Merger, GroupDocs.Assembly, GroupDocs.Metadata, GroupDocs.Redaction, GroupDocs.Search, GroupDocs.Watermark, GroupDocs.Classification
  • NuGet, Maven

Style and Tone

  • Professional, concise, developer‑oriented. Prefer active voice and imperative headlines.
  • Keep marketing claims measured; avoid exaggerations. Emphasize capabilities and workflows.
  • Preserve technical terminology (e.g., OCR, PDF/A‑2b, metadata, API, SDK) using common local equivalents when appropriate.

Formatting and Placeholders (must preserve exactly)

  • Keep placeholders unchanged: {0}, {1}, {name}, {link}.
  • Keep Markdown links intact: text — translate the link text but never change the URL.
  • Preserve inline code and technical tokens: ClassName, code identifiers, file extensions, version strings.
  • Preserve HTML and entities:
    ,  .

Context & Positioning

  • GroupDocs.Parser is an on‑premise library; it does not require Adobe Acrobat, Microsoft Office, or other third‑party office software to parse documents or extract content.
  • Supports 50+ file formats across documents, spreadsheets, presentations, PDFs, emails, archives, ebooks and more (see product site for exact lists).
  • Common actions: extract full text or page‑level text, extract images, parse data using user‑defined templates, parse and read PDF forms, extract metadata and document information, detect and navigate document structure (headings, tables, etc.).
  • Platform focus pages exist for .NET and Java; terminology should fit the platform (e.g., NuGet for .NET, Maven for Java).

Localization Guidance

  • Keep brand and product names in English.
  • Translate UI/marketing text naturally for the locale; keep technical accuracy.
  • Use formal address unless the locale strongly prefers informal style.

Examples of phrasing

  • EN: "Extract text from PDF invoices with layout preserved"
    • Translate as a concise, developer‑friendly instruction; keep file type tokens (PDF) in uppercase.
  • EN: "Parse data from documents using custom templates"
    • Translate the phrase clearly; keep terms like "templates" and "data" in a natural technical style for the locale.
  • EN: "50+ file formats supported"
    • Translate the phrase; keep the number and plus sign as is.