Extracted image from pdf is completely black #1407
-
I am working on image extraction from PDF. The library can detect the image in the PDF page correctly, But while saving it or displaying it I get a completely black image. |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 7 replies
-
The attachment doesn't help - please provide the document and the code you used for extraction. |
Beta Was this translation helpful? Give feedback.
-
Test.pdf
|
Beta Was this translation helpful? Give feedback.
-
Your way of image extraction is unable to deal with images having an image mask. >>> from pprint import pprint
>>>
>>> pprint(page.get_images(True))
[(19, 25, 419, 64, 8, 'DeviceRGB', '', 'Img1', 'FlateDecode', 0),
(20, 26, 419, 64, 8, 'DeviceRGB', '', 'Img10', 'FlateDecode', 0)]
>>> to extract such images, a special coding must be used: e-g- for the first one (xref 19, mask xref 25): pix19 = fitz.Pixmap(doc, 19)
mask = fitz.Pixmap(doc, 25)
pix = fitz.Pixmap(pix19, mask)
pix.save("test.png") # fully recovered image |
Beta Was this translation helpful? Give feedback.
-
Sorry forgot to mention that you need to upgrade to v1.19.x for this to work. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the file. |
Beta Was this translation helpful? Give feedback.
-
The following may give you a somewhat better result: pix = fitz.Pixmap(doc, 1088)
mask = fitz.Pixmap(doc, 6181)
pix.set_alpha(mask.samples) |
Beta Was this translation helpful? Give feedback.
-
@SummerXXXX - in the meantime I also tested yet another approach: pix1088 = fitz.Pixmap(doc,1088)
mask = fitz.Pixmap(doc, 6181)
if pix1088.alpha:
temp = fitz.Pixmap(pix1088, 0) # make temp pixmap w/o the alpha
pix1088 = None # release storage
pix1088 = temp
pix = fitz.Pixmap(pix1088, mask) # now compose final pixmap
pix.save("image1088.png") This method works with the example file, because all the For the next version, I plan a modification which hopefully provides more of these cases. |
Beta Was this translation helpful? Give feedback.
-
i am also having same issue with my code. black back ground images are extracted from pdf. but need proper images as in pdf. code used: def extract_and_save(input_pdf_path, output_pdf_path):
Will appreciate if you answer as quickly as possible. |
Beta Was this translation helpful? Give feedback.
Your way of image extraction is unable to deal with images having an image mask.
Your PDF however has 2 images, each with an image mask:
to extract such images, a special coding must be used: e-g- for the first one (xref 19, mask xref 25):