Connecting Python to Google Sheets and Saving Time
As I’ve mentioned in a previous post, I am currently interested in using Python to Automate the Boring Stuff. Recently, I found an opportunity to apply it in my work at Bleeker and it was a huge time-saver and will add quite a bit of value over time. I figured it would be worthwhile to write out how I did it here for my own reference, but perhaps you will find it useful too!
The Use Case: Most of Bleeker’s Thinking Partners record general themes of their conversations in their timesheets that have limited access. What we want to do is to aggregate the many spreadsheets into one but only collect the names of the companies the clients work for and the general themes – so that we can get a high-level view of what is on people’s minds.
What I had to figure out:
1. How to connect Google Sheets and Python.
2. How to move the data to and from Google Sheets and compile/format it and play around with it to get what I need.
3. How to make sure it was all working correctly.
Step 1: Connecting Google Sheets and Python
First, you need to create a project in order to enable the Google Sheets API and the Google Drive API.
· Visit your Google Development Console and create a project. This will look different based on whether or not you are in a personal account or an organizational account.
· Name your project and create it.
· Enable the Google Sheets API
· Enable the Google Drive API
· Get the credentials you need so that your Python script can access the spreadsheet (this is especially important if the spreadsheet cannot be made public). You can then take the generated email address and add that as a collaborator for the spreadsheets you are working with so now your python script will be automatically authenticated.
· Go into those credentials and create a JSON key. Save that file into the directory where you are running your Python code and make a note of the name. Moving forward, we will refer to it as CREDENTIALS.json
Step 2: Pulling the data from Google Sheets into Python
· Make sure you have gspread and pandas packages installed. (Here are instructions on how to install packages.)
· Authenticate your Python script like so:
gc = gspread.service_account('CREDENTIALS.json')
Note: The rest of this blog post is my code with comments explaining what is happening. You can find general uses of gspread here including how to create or edit spreadsheets or work with specific tabs.
First, we set up the spreadsheet where we are going to compile all the data. I created the spreadsheet in google sheets and got the key from the string of text in the URL like so:https://docs.google.com/spreadsheets/d/1HkreKeAWq5aCi_rONCdlqh9qSQ0fSkdsfsf2CfYW-M/ (This is not the actual URL)
Here is the code:
#Access the spreadsheet where the report will go - AKA "Report Spreadsheet"
REPORT = '1HkreKeAWq5aCi_rONCdlqh9qSQ0fSkdsfsf2CfYW-M' #Pulled from the google sheet URL
report_sheet = gc.open_by_key(REPORT)
report_worksheet = report_sheet.get_worksheet(0) #This is to access the first tab.
Then, I want to make sure that this code only runs on a certain subset of spreadsheets I have access to - namely the ones with that have TPTS in the title.
sheet_id_list = [record['id'] for record in gc.list_spreadsheet_files() if record['name'].startswith('TPTS')]
Fun fact - record is just placeholder text. You could use any word in there to define this variable.
The next step is to create a function that will pull the information I need from the TPTS spreadsheets and append them to the Report Spreadsheet. I have decided to add the descriptions within the line to explain what it does.
def extract_worksheet(sheet_id,worksheet_id): #A function based on spreadsheet and tab
spreadsheet = gc.open_by_key(sheet_id) #Identify the spreadsheet
worksheet = spreadsheet.get_worksheet_by_id(worksheet_id) #Identify the tab
rows = worksheet.get_all_values() #Pull everything from the spreadsheet
df = pd.DataFrame(rows) #Turn the data from the spreadsheet into a pandas dataframe
col_Names = df.iloc[8,1:] #Pull the column headers of the data
df = df.iloc[9:,1:] #Reformat the dataframe (since the spreadsheets have a lot of formatting)
df.columns = col_Names #Re-add the column headers
df = df.drop(["Client's First Name", "Client's Last Name",'Duration'], errors='ignore', axis=1) #Remove the columns we do not need.
#Set the condition for the rows we want to pull from the dataframe (i.e. only rows that are about client facing sessions that actually happened)
etp_report = df[(df['Type of work'] == 'Client facing: Thinking Partner session (TPs)') & (df['TP session held?'] == 'Yes')]
report_worksheet.append_rows(etp_report.values.tolist()) #Append the identified rows to the Report Spreadsheet
Finally, we need a way to cycle through all the spreadsheets and tabs since there are multiple Thinking Partners and a tab for each month. First, we define a function to get the list of the tabs for each spreadsheet:
def get_worksheet_id_list(sheet_id):
sheet = gc.open_by_key(sheet_id) #Identify the spreadsheet.
return [record.id for record in sheet.worksheets() if record.title != 'Lists'] #Return the list of tab ids excluding the Lists tab that exists in each spreadsheet that is not useful for our purposes.
Then we create a function that puts everything together by applying the extract_worksheet
function through all the spreadsheets and tabs we defined.
for sheet_id in sheet_id_list: #Remember the list we made earlier?
worksheet_id_list = get_worksheet_id_list(sheet_id) #Using the function above.
for worksheet_id in worksheet_id_list:
extract_worksheet(sheet_id, worksheet_id)
#Putting it all together!
Step 3: Quality Control
I wanted some peace of mind that the function is working. So I decided I wanted it to print a confirmation in my powershell every time the function ran in the form of:
Spreadsheet Title, Tab Name, # of Rows Appended to the Report Sheet.
Here are the two lines of code added to the extract_worksheet
function that do that:
num_rows_appended = len(etp_report) #Getting the number of rows pulled from each spreadsheet.
print(f"{spreadsheet.title}, {worksheet.title}: {num_rows_appended} rows appended") #Printing the information I wanted.
So, there you have it! Not the most straightforward blog post but maybe it will be helpful one day. Are you working on a Python/Google Sheets project? I would love to hear about it.