Extract all base64 encoded images from a SVG

Mon 13 February 2017 | tags: PythonSVGTranslations: fr

Recently, I had to integrate a SVG in an Aurelia template. Sadly, it contained a lot of images and they were all included in base64. It made the file almost unreadable with big blobs that prevented me to see the code and add the "proper Aurelia attributes" (eg if.bind). Hopefully, Python 3 is there to help!

The script below takes as parameter the path to a SVG file and will extract all images encoded in base64 in separate files in the current folder. The SVG using these extracted images will be saved in the current folder with the name svg-without-images.svg.

 1 import sys
 2 
 3 from base64 import b64decode
 4 from lxml import etree
 5 
 6 
 7 # We define the proper XML namespaces to query the SVG.
 8 NS = {
 9     'svg': 'http://www.w3.org/2000/svg',
10     'xlink': 'http://www.w3.org/1999/xlink',
11 }
12 
13 
14 def print_help():
15     print('./extract-images-svg.py SVG_FILE')
16     print('This will extract the images included in b64 in the SVG.')
17 
18 
19 def extract_images_svg(file_name):
20     # We open the file.
21     with open(file_name) as svg_file:
22         svg = etree.parse(svg_file)
23 
24     # We find all images with xpath.
25     images = svg.xpath('.//svg:image', namespaces=NS)
26     for index, img in enumerate(images):
27         # We get the value of the image.
28         content = img.get('{http://www.w3.org/1999/xlink}href')
29         # We check it is a base64 image. If so, we extract it.
30         if content.startswith('data:image/'):
31             # We take the content of the image and its metadata
32             # (only the format of the image is relevant to us).
33             meta, img_b64 = content.split(';base64,')
34             _, img_format = meta.split('/')
35             # Replace the base64 data by a link to an external image in the proper format.
36             img_file_name = 'img-{index}.{format}'.format(index=index, format=img_format)
37             img.set('{http://www.w3.org/1999/xlink}href', img_file_name)
38             # Save the extracted image.
39             with open(img_file_name, 'wb') as img_file:
40                 img_file.write(b64decode(img_b64))
41 
42     # Save the "corrected" SVG file.
43     with open('svg-without-images.svg', 'w') as svg_file:
44         svg_content = etree.tostring(svg)\
45                 .decode('utf-8')\
46                 .replace('>', '>')
47         svg_file.write(svg_content)
48 
49 
50 if __name__ == "__main__":
51     if len(sys.argv) != 2:
52         print_help()
53         sys.exit(0)
54 
55     extract_images_svg(sys.argv[1])

blogroll

social

>