Skip to content

Support for minority languages #786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
daylightduck opened this issue Mar 19, 2025 · 8 comments
Open
4 tasks done

Support for minority languages #786

daylightduck opened this issue Mar 19, 2025 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed Low priority Waiting for feedback

Comments

@daylightduck
Copy link

Before you asking

  • I have searched the existing issues
  • I spend at least 5 minutes for thinking and preparing
  • I have thoroughly and completely read the wiki.
  • I have carefully checked the issue, and it is unrelated to the network environment.

Environment

- OS:
- Python:3.11
- pdf2zh:1.9.6

How to install pdf2zh

pip

Describe the bug

GoNotoKurrent.ttf package not able to print the hindi font correctly.

Image

To Reproduce

  1. execute '...'
  2. select '....'
  3. see errors

Expected behavior

it is expected to translate like this in devanagri format. Help me out please as i tried different font packages but the babeldoc does not recognize the package says invalid or corrupted file as it verifies the hash

पुस्तक में 11 अध्याय हैं, पहले 10 अध्याय विदेश नीति में खतरों, अवसरों और अपनाई गई नीतियों के बारे में विस्तृत जानकारी प्रदान करते हैं। अंतिम अध्याय 'भारत' के महत्व को समझाता है। भारत, भारत का हिंदी नाम है, जो एक दक्षिण एशियाई राष्ट्र है जो अपनी सांस्कृतिक विविधता, ऐतिहासिक महत्व और आर्थिक शक्ति के लिए जाना जाता है। प्राचीन भारतीय शास्त्रों में निहित सांस्कृतिक और ऐतिहासिक महत्व के कारण हाल ही में भारतीय राजनेताओं द्वारा 'भारत' शब्द का अक्सर उपयोग किया गया है। भारत दुनिया का सबसे बड़ा लोकतंत्र और तेजी से बढ़ती अर्थव्यवस्था के रूप में महत्वपूर्ण वैश्विक महत्व रखता है जिसका अंतर्राष्ट्रीय व्यापार पर महत्वपूर्ण प्रभाव है। भारत की सॉफ्ट पावर, जिसमें

Relevant log output


Origin PDF file

No response

Anything else?

No response

@daylightduck daylightduck added the bug Something isn't working label Mar 19, 2025
@awwaawwa
Copy link
Collaborator

At present, minor languages are not supported. It is expected that related research will be conducted in the second half of this year.

@awwaawwa awwaawwa added enhancement New feature or request Low priority and removed bug Something isn't working labels Mar 19, 2025
@awwaawwa awwaawwa changed the title noto language issue Support for minority languages Mar 19, 2025
@awwaawwa
Copy link
Collaborator

Developers can try to modify fonts by hacking & referencing the following files. However, BabelDOC will not provide custom font functionality in the short term. https://github.com/funstory-ai/BabelDOC/blob/main/babeldoc/assets/embedding_assets_metadata.py https://github.com/funstory-ai/BabelDOC/blob/main/babeldoc/tools/generate_font_metadata.py https://github.com/funstory-ai/BabelDOC/blob/main/babeldoc/assets/assets.py

@awwaawwa awwaawwa added the help wanted Extra attention is needed label Mar 19, 2025
@awwaawwa
Copy link
Collaborator

Please upload the original input PDF and the translated output PDF.

@daylightduck
Copy link
Author

Please upload the original input PDF and the translated output PDF.

Why Bharat Matters.pdf orignal

Why Bharat Matters-mono (2).pdf Translated

@awwaawwa
Copy link
Collaborator

You can take a screenshot, and then mark the problematic areas. I don't understand Hindi language, so I don't know exactly which part has errors.

@daylightduck
Copy link
Author

Image English para on page 2

Image Translated output in Hindi

Image Expected Output in Hindi

The font mistakes like this are present at several places in the translated file this is just a sample, in earlier version pdf2zh1.9.0 I thought it was an encoding issue but when I tested other font packages same issue arises while constructing the pdf but the font package works for txt files

@awwaawwa
Copy link
Collaborator

This is a typesetting issue and has nothing to do with the font. Currently, our typesetting system does not support the special typesetting features required by these languages.

@daylightduck
Copy link
Author

Okay thank you for the help👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed Low priority Waiting for feedback
Projects
None yet
Development

No branches or pull requests

2 participants