Skip to content

Problems when extracting 4-bit CMYK images  #27

@manuGil

Description

@manuGil

This is known in PDFminder: pdfminer/pdfminer.six#853

For example, when attempting to save an image as bytes for an element (image) like:

<PDFStream(20): raw=5846880, {'BitsPerComponent': 8, 'ColorSpace': /'DeviceCMYK', 'Decode': [0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0],
 'DecodeParms': [{'BitsPerComponent': 4, 'Colors': 4, 'Columns': 2953, 'Predictor': 15}], 'Filter': [/'FlateDecode'], 'Height': 1205, 
'Length': 5846880, 'SMask': <PDFObjRef:19>, 'Subtype': /'Image', 'Type': /'XObject', 'Width': 2953}> [/'DeviceCMYK']

We get the following error:

  File "/home/manuel/Documents/devel/desing-handbook/data-pipelines/src/aidapta/image_pipeline.py", line 139, in main
    image_file_name =iw.export_image(img) # returns image file name, 
  File "/home/manuel/Documents/devel/pdfminer.six/pdfminer/image.py", line 129, in export_image
    name = self._save_bytes(image)
  File "/home/manuel/Documents/devel/pdfminer.six/pdfminer/image.py", line 227, in _save_bytes
    image.stream.get_data()
  File "/home/manuel/Documents/devel/pdfminer.six/pdfminer/pdftypes.py", line 396, in get_data
    self.decode()
  File "/home/manuel/Documents/devel/pdfminer.six/pdfminer/pdftypes.py", line 384, in decode
    data = apply_png_predictor(
  File "/home/manuel/Documents/devel/pdfminer.six/pdfminer/utils.py", line 137, in apply_png_predictor
    raise ValueError(msg)
ValueError: Unsupported `bitspercomponent': 4

The solution proposed in https://github.com/pdfminer/pdfminer.six/pull/854/files doesn't solve this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdata pipelineTask related with data extraction pipeline

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions