Skip to content

Extracted image from pdf is completely black #1407

Discussion options

You must be logged in to vote

Your way of image extraction is unable to deal with images having an image mask.
Your PDF however has 2 images, each with an image mask:

>>> from pprint import pprint
>>> 
>>> pprint(page.get_images(True))
[(19, 25, 419, 64, 8, 'DeviceRGB', '', 'Img1', 'FlateDecode', 0),
 (20, 26, 419, 64, 8, 'DeviceRGB', '', 'Img10', 'FlateDecode', 0)]
>>> 

to extract such images, a special coding must be used: e-g- for the first one (xref 19, mask xref 25):

pix19 = fitz.Pixmap(doc, 19)
mask = fitz.Pixmap(doc, 25)
pix = fitz.Pixmap(pix19, mask)
pix.save("test.png")  # fully recovered image

Replies: 8 comments 7 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@YashMistry349
Comment options

Answer selected by YashMistry349
Comment options

You must be logged in to vote
4 replies
@SummerXXXX
Comment options

@JorjMcKie
Comment options

@SummerXXXX
Comment options

@SummerXXXX
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
1 reply
@SummerXXXX
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants
Converted from issue

This discussion was converted from issue #1406 on November 16, 2021 10:53.