A Short Python PPTX Tutorial Using Simpler Syntax¶

In a world now dominated by generative AI and complex machine learning approaches, it is easy to forget that most people will not use these approaches as often as we think they would. Many more people however still have to create reports from the data they collect, sometimes regularly and in a very standardised format, while also avoiding dashboarding (which can be difficult for a third party to interpret or time-consuming to fully automate). This is a perfect situation where we could simply put a PowerPoint presentation together, and leverage the power of scripting to remove plenty of tedious, time-consuming and mind-numbing manual work while creating said presentation, while using language we're already familiar with (Python).

Fortunately, there is a library called python-pptx that can do the heavy lifting for us. It is well equipped to handle the most recurring tasks you would normally do on a PowerPoint presentation, and it also has its own internal plotting engine. However, the documentation is difficult to follow, the syntax appears much more complex than it should be, and sometimes finding a way to achieve simple tasks (such as changing the color of the text in a textbox) becomes more an exercise in tinkering and experimenting with the code rather than an exercise in following the documentation or use cases. There are some tutorials already on the internet on how to use python-pptx, but similarly to the official documentation I found them hard follow in both the approach and the scripts used, and I learned much more by tinkering around, but this took a while. This is why I put together this tutorial, to show some basic functionality that can be achieved with python-pptx while keeping the syntax and approach as intuitive as I can make it for others to follow, and hopefully save you some time.

Importantly, this short tutorial will show only some of the capabilities and possibilities that can be achieved with the library, albeit arguably the most commonly needed ones when creating a report: inserting text, tables and figures. You are encouraged to try more things on your own. Credit to ourworldindata for the publicly available data I will briefly use in this tutorial. Note: this script was run on Python 3.10 with python-pptx 0.6.21, pandas 1.4.4, and matplotlib 3.5.3.

But first we need to discuss a couple of topics that I have found confuse most people working with PowerPoint: "true" placeholders and the Slide Master.

"True" Placeholders and the Slide Master¶

For most people, a placeholder is simply something (an image, a textbox) holding the place of future content to be defined by the user. In terms of PowerPoint and python-pptx, this definition does not apply. "True" placeholders are placeholders defined in the Slide Master, and are the ones that you are usually greeted with when you create a new slide from a layout:

fig1.png

The Slide Master is found in the View tab (View > Slide Master) and as the name suggests is the collection of layouts that you can use for your presentation. Importantly, this collection of layouts are not the layouts of the slides you have in a given presentation (you can have many slides in the Master even when you only have one slide in the presentation, for example).

fig2.png

This distinction will completely change the approach you take from a coding perspective, since, for example:

  • "True" placeholders will preserve the format of the shape (size, font, style, colour, etc), while "false" placeholders will not. This means that if you assign a simple string to a "true" placeholder using python-pptx, it will be output with the format contained in the placeholder, while if you assign it to a "false" placeholder, it will be output with a predefined format that may not match what you expected. You will have to code the desired formatting of the text output to a "false" placeholder while assigning the string to the latter.

  • "True" placeholders will constrain the output of, say, the image you assign to them to the shape of the placeholder. This applies to both its size and position on the slide. So, if you assign a 10 x 5 graph to a 3 x 3 placeholder, the graph will be 3 x 3, most of the time cropped, instead of resized (see below).

Most importantly, "true" placeholders have attributes that "false" placeholders do not, especially is_placeholder = True.

There are many pros and cons to working with "true" placeholders, but it depends on how much appetite you have for freedom to control things from the code, how much you want to code, and more. I personally prefer to specify things from the code with more freedom even if it means more coding. I will nonetheless show you some basic functionality with "true" placeholders and then repeat the exercise with "false" placeholders. Importantly, and even though you can create slides with python-pptx, it is always easier to create the template you want to work on, import it to the environment, and work on it before outputting it. Some basic functionality (like selecting slides) remains the same for both approaches.

Inspecting the slides and their elements¶

For this tutorial I will just use a template consisting of three slides: a title (first) slide with empty "true" placeholders, a second slide with a mixture of "true" and "false" placeholders (the one you saw earlier), a third one similar to the second one but with a table ("false") placeholder, and a fourth one that contains a picture ("true") placeholder alongside other "true" placeholders.

In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import glob
import io
from pptx import Presentation
from pptx.util import Pt
from pptx.util import Inches
from pptx.enum.text import PP_ALIGN
from pptx.dml.color import RGBColor
In [ ]:
# Select and import the template
template_ppt='./template.pptx'
prs = Presentation(template_ppt)
In [ ]:
# Inspect the slides' elements, in this case slide 2
slide_2 = prs.slides[1]

for i,shape in enumerate(slide_2.shapes):
    print(f'Shape: {i}, \
          \n\tName: {slide_2.shapes[i].name}, \
          \n\tHas Chart: {slide_2.shapes[i].has_chart}, \
          \n\tHas Table: {slide_2.shapes[i].has_table}, \
          \n\tHas Text Frame: {slide_2.shapes[i].has_text_frame}, \
          \n\tIs Placeholder: {slide_2.shapes[i].is_placeholder}', '\n') 
Shape: 0,           
	Name: Title 1,           
	Has Chart: False,           
	Has Table: False,           
	Has Text Frame: True,           
	Is Placeholder: True 

Shape: 1,           
	Name: Content Placeholder 2,           
	Has Chart: False,           
	Has Table: False,           
	Has Text Frame: True,           
	Is Placeholder: True 

Shape: 2,           
	Name: TextBox 5,           
	Has Chart: False,           
	Has Table: False,           
	Has Text Frame: True,           
	Is Placeholder: False 

Shape: 3,           
	Name: TextBox 6,           
	Has Chart: False,           
	Has Table: False,           
	Has Text Frame: True,           
	Is Placeholder: False 

As you can see slide 2 has the four objects we saw in the figure earlier, but only two of them (the "true" placeholders) have the is_placeholder attribute set to True. If you inspect slide.placeholders you will also see that it outputs only 2 placeholders (see below), the "true" placeholders, even when the slide effectively contains 4 shapes acting as placeholders.

Part 1: Working with "true" placeholders¶

"True" placeholders are, in my opinion, a bit non-inuitive to work with. In addition to the issues mentioned before (like limiting the size of the graph you want to assign to it), "true" placeholders needs to be referred to by their idx attribute. If we inspect the same slide again but call the idx attribute of the elements:

In [ ]:
slide_2 = prs.slides[1]
for shape in slide_2.placeholders:
    print('%d %s' % (shape.placeholder_format.idx, shape.name))
0 Title 1
1 Content Placeholder 2

This means that Title 1 has an idx of 0 and Content Placeholder 2 has an idx of 1, and you have to use the idx to assign content to the placeholder. For example:

slide.placeholders[idx].text = 'Your text here!'

Although it is tempting to see these numbers as python indices, the idx is not the index of the placeholder. On a presentation with more slides you can have only two or three placeholders on a slide and the idx can be a completely different number, for example:

slide_8.placeholders[17].text = 'Your text here!'

This applies to all placeholders and not just the ones that can contain text.

Inserting Text: "True" and "False" Placeholders in Action¶

As mentioned before, the "true" placeholders will keep the format specified in the master, while the "false" placeholder will inherit the instructions from the Master upon saving even when you have manually changed the format of the placeholder. In the following slide, for example, the Master style has font Cambria, but the "false" placeholder had Helvetica size 28 as font. Upon assigning some text to both placeholders, only the "true" placeholder behaves as expected (i.e. keeps Cambria size 44), while the "false" placeholder is given Cambria 18 as font and size, even when that textbox was set up to contain Helvetica size 28 text!

In [ ]:
string = 'Hello World!'
slide_2.placeholders[0].text = string
slide_2.shapes[2].text = string

fig3.png

In order to get the "false" placeholder to look like we want it to, we need to specify the format as part of the code. You can also pass other formatting such as bold, italics, and colour:

In [ ]:
slide_2.placeholders[0].text = string
slide_2.shapes[2].text = string
slide_2.shapes[2].text_frame.paragraphs[0].font.name = 'Helvetica'
slide_2.shapes[2].text_frame.paragraphs[0].font.size = Pt(28)
slide_2.shapes[2].text_frame.paragraphs[0].font.bold = True
slide_2.shapes[2].text_frame.paragraphs[0].font.italic = True

#Colour in Hex Code, split in 0xRR, 0xGG, 0xBB
slide_2.shapes[2].text_frame.paragraphs[0].font.color.rgb = RGBColor(0xFF, 0xD7, 0x00)

fig4.png

"True" placeholders are handy for text since you only have to specify things once from the Master and all placeholders will be subservient to it, but if you want to have many slides with many different placeholders with different fonts and styles, you are going to have to spend plenty of time tweaking the Master.

Inserting an Image into a "True" Placeholder¶

"True" placeholders, however, are terrible for passing graphs, as shown below. Since the placeholder size will limit the size of the picture it can contain, it will almost invariably crop the graph you want to pass to it. Additionally, only placeholders that have the attribute insert_picture can be assigned with a picture (and for some reason Content Placeholders do not have this attribute, even when technically they can hold pictures). This means that you will have to painstakinly tweak the Master to get a good result inserting an image/graph to the placeholder.

Importantly, python-pptx has its own internal plotting engine, but the syntax is complex and most people would much prefer passing their own pyplot or seaborn plots to their presentations. This can be accomplished by saving the image stream to a variable, "saving" it, and then passing it to the correct placeholder or shape (you can also insert the image as a new shape as we'll see later).

In [ ]:
# Get Slide
slide_4 = prs.slides[3]

# Get data
data = pd.read_csv('./co2-emissions-vs-gdp.csv')
au = data[data['Entity'] == 'Australia'][['Year', 'GDP per capita']].sort_values(by='Year')

# Define plot
fig, ax = plt.subplots(figsize=(12,5))
au.plot(x='Year', y='GDP per capita', ax=ax);

# Insert plot
image_stream = io.BytesIO()
plt.savefig(image_stream, dpi=300, bbox_inches='tight')
slide_4.placeholders[1].insert_picture(image_stream) #also works to insert images saved to a local directory

fig5.png

Inserting a Table into a "True" Placeholder¶

To insert a table into a "true" placeholder you have to first create the table, and then pass on the information that you want. Technically speaking, a table is simply an array of cells that can be accessed one by one to insert the desired content, so the most efficient way to pass the data we want is to create a table with the dimensions matching the data we want to pass, and then use a loop to iterate over each cell passing one part of the dataframe at a time, starting with the headers. The table will be output where your placeholder is located, with the specifications given by the Master. Similarly to the picture placeholders, only the correct placeholders for tables containing the insert_table attribute can be assigned a table to them, and again I do not know why Content Placeholders do not have this attribute when you can still assign a table to them via PowerPoint.

In [ ]:
# Get slide
slide_3 = prs.slides[2]

# Define data for table
df = data.iloc[1:4,:4]

# Create table
table_1 = slide_3.placeholders[13].insert_table(rows= len(df)+1, cols=len(df.columns))

# Pass on the df to the table
n_rows = len(df)
n_cols = len(df.columns)
colnames = df.columns.tolist()

for i in range(n_cols): #insert the table headers (df column names)
    table_1.table.cell(0,i).text = colnames[i]

for i in range(n_cols): #insert the table values
    for j in range(n_rows):
        table_1.table.cell(j+1,i).text = str(df.iloc[j,i])

fig6.png

Part 2: Working with "False" Placeholders: Slide Shapes¶

If you are like me (and you can't be bothered with the Master), you might already have a PowerPoint template you want to use, maybe one that has been prepared by another colleague that also had no idea about the existence of the Master. Therefore you either do not want to (or cannot) work with "true" placeholders. "False" placeholders, as I have been calling them, are simply "Shapes", and technically speaking everything on a slide is a Shape, from textboxes, images, icons, tables, etc. The problem is that "true" placeholders are also Shapes, which is why when you inspect all the Shapes of a slide, both "true" and "false" placeholders are output as seen above. Hence to me the nomenclature of "true" and "false" placeholders makes more sense to distinguish between placeholders (that have is_placeholder = True) and Shapes acting as placeholders, but technically speaking next to everything on a slide is a Shape, placeholder or not. I hope this is not too confusing.

Importantly, only certain Shapes will have certain attributes (e.g. an image shape will not have text attributes, understandably). You can either create them from scratch (see the python-pptx documentation), or work with pre-existing Shapes acting as placeholders on a template (much faster and more intuitive in my opinion).

Passing Text to Textbox Shapes the Smart Way¶

As you may remember, passing text to a textbox involves passing not only the string, but also all the formatting we want for the text, which can be tedious and time-consuming. If you have a long presentation and you are planning on passing a lot of texts, your best approach is to define a function that does the heavy lifting for you by default, and allows you to control the details in some instances where you might want a different font, style, or size, for example.

In [ ]:
def text_to_ppt(shape, string, size = 18, font = 'Helvetica Neue Light', bold = False, italic = False):
    '''
    Simple function to assign a string to a pre-defined text box in the presentation. Requires `Pt` from pptx.util.  
    The function does not need to be assigned to a variable to work.
    Args:   `shape`. Slide Shape as defined by python-pptx. The relevant text shape (text box placeholder) to be filled in with the string. 
                Must have `.text` and `.text_frame` attributes
            `string`. String. The string to be passed. 
            `size`. Integer or Float. The size of the font in PowerPoint range (e.g. 12 or 40). Default is 16.
            `font`. String. The font to be used. Must be a valid font recognised by PowerPoint. Default is Helvetica Neue.
    '''
    shape.text = string
    shape.text_frame.paragraphs[0].font.size = Pt(size)
    shape.text_frame.paragraphs[0].font.name = font
    shape.text_frame.paragraphs[0].font.bold = bold
    shape.text_frame.paragraphs[0].font.italic = italic

So then every time you need to pass a string to a textbox you simply define the textbox where you want the string, and then call the function.

In [ ]:
textbox_2 = slide_2.shapes[2]
text_to_ppt(textbox_2, string)

Passing a Dataframe to a Table Shape¶

The same principle applied to passing strings applies to passing a dataframe to a table shape -- since a table is simply a collection of textboxes -- although in the case of tables it often makes more sense to center the data in the cell (which can be performed with the .alignment attribute):

In [ ]:
def data_to_table(table, df):
    '''This function outputs the dataframe of interest with the corresponding table on the PPT slide. Note that BOTH the data 
    and the table need to be instantiated outside the function beforehand and then passed onto the function, for example:

    table_6 = slide_6.shapes[4].table
    data_table_6 = your_df
    data_to_table(table_6, data_table_6) <- this will output the data to the table, there's no need to assign the output 
    to a variable
    '''
    nrows = len(df)
    ncols = len(df.columns)
    colnames = df.columns.tolist()

    for i in range(len(colnames)):
        table.cell(0,i).text = colnames[i]
        table.cell(0,i).text_frame.paragraphs[0].alignment = PP_ALIGN.CENTER
        table.cell(0,i).text_frame.paragraphs[0].font.name = 'Helvetica Neue Medium'
        table.cell(0,i).text_frame.paragraphs[0].font.size = Pt(16)
        

    for i in range(ncols):
        for j in range(nrows):
            table.cell(j+1,i).text = str(df.iloc[j,i])
            table.cell(j+1,i).text_frame.paragraphs[0].alignment = PP_ALIGN.CENTER
            table.cell(j+1,i).text_frame.paragraphs[0].font.name = 'Helvetica Neue Light'
            table.cell(j+1,i).text_frame.paragraphs[0].font.size = Pt(16)

So then passing the data to the table shape becomes as simple as:

In [ ]:
table_3 = slide_3.shapes[1].table
data_to_table(table_3, data)

Inserting a Plot to a Slide¶

The easiest way to insert your own custom plot to a slide is to pass the IO stream we saw before to a newly created shape, all in one go. Since the new shape is not a "true" placeholder, it does not have predefined constraints in terms of size or position, meaning the plot will not be cropped upon insertion.

In [ ]:
fig, ax = plt.subplots(figsize=(12,5))
au.plot(x='Year', y='GDP per capita', ax=ax);
image_stream = io.BytesIO()
plt.savefig(image_stream, dpi=300, bbox_inches='tight')

x = Inches(3) #horizontal position in slide, from the left, in inches
y = Inches(2) #vertical position in slide, from the top, in inches

pic = slide_4.shapes.add_picture(image_stream, x, y)

fig7.png

Don't forget to save the presentation to see the changes reflected!

In [ ]:
prs.save('./output.pptx')

Conclusions¶

This is a short python-pptx tutorial that aims to show some basic functionality that can be achieved with the library but using a more streamlined syntax approach than what can be read in the official documentation, as well as exploring topics that I could not find in it (like passing custom plots). As such, this tutorial is not meant to be exhaustive or cover every aspect of python-pptx, but rather it focuses on arguably the most common tasks involved in creating a presentation. This approach can then be integrated in a pipeline that ingests data that needs to be presented in regular reports, to automate the repetitive tasks.

I hope you enjoyed it and that it's helpful to you. Please reach out if there is anything wrong with the material!