Historical NBA Player Stat AnalysisΒΆ

OverviewΒΆ

In this notebook, I will demonstrate the process of scraping the web for Historical NBA player statistics and then analyzing trends in the league over time.

Part 1 - ScrapingΒΆ

For this project, I scraped data from https://stats.nba.com using a Python API client known as nba_api. The dataset is also available on kaggle. The scraper takes about 12 hours to run because of API rate limitations.

Step 1 - Import Libraries

This crawler uses the following libraries:

  • time: to pause script in between API calls to avoid rate limit
  • random: to select a random time to pause, to make sure pauses are different each time so we don't flag bot detectors.
  • nba_api: to interact with the api
  • requests: to determine if we have recieved a rate limit
InΒ [1]:
import time # python time library
import random # python randomization library
from nba_api.stats.static import players # library to interact with the players endpoints of the NBA api
from nba_api.stats.endpoints import playercareerstats # library to interact with the career stats endpoint of the NBA API
from requests.exceptions import ReadTimeout  # library to determine if our API request has timed out

Step 2 - Define Scraper Function

The web scraper function does the following:

  1. take input of a dictionary called player
  2. extract the players name and id from the dictionary
  3. try to make an API request to get the players stats using ID
  4. if rate limit, pause for 30-90 seconds and try again
  5. once working save carreer stats into a dataframe
  6. add a column for the players name
  7. sleep for 0.4-0.7 seconds to avoid read timouts
  8. return the stats
InΒ [2]:
def crawl(player): # 1. define function and input player dictionary
    id, name = player['id'], player['full_name'] # 2. extract id and name from dictionary
    try: # 3. try to make API call
        stats = playercareerstats.PlayerCareerStats(player_id=id).get_data_frames()[0] # 5. save players stats into a dataframe
        stats['Name'] = name # 6. add a column for the players name
    except ReadTimeout: # 4. run if rate limit
        time.sleep(random.uniform(30, 90)) # 4. pause for 30-90 seconds
        stats = crawl(player) # 4. call function again
    time.sleep(random.uniform(0.4, .7)) # 7. sleep for 0.4-0.7 seconds
    return stats # 8. return dataframe

Step 3 - Calling the Function

This section does the following:

  1. Sets having a header to false, because the csv does not yet have a header
  2. make an API call to request all players, saving their names and ids into a list of dictionaries
  3. open the output csv
  4. iterate through list of players
  5. scrape players stats and append to csv
  6. set having a header to true, so we don't have a header after every player
InΒ [3]:
h = False # 1. set header to false
nba_players = players.get_players() # 2. call API for list of players
with open("data/ALL_NBA_PLAYERS.csv", mode='w', newline='', encoding='utf-8') as file: # 3. open output csv file 
    for i in nba_players: # 4. iterate through scraped player names
        crawl(i).to_csv(file, header=(h == False), index=False) # 5. scrape stats for current player and append to csv
        h = True # 6. make sure header is set to true

Part 2 - AnalyzingΒΆ

This notebook calculates and plots the following data:

  1. Average Player Age by Season
  2. Average Defensive vs Offensive Rebounds by Season
  3. Average Personal Foulds by Season
  4. Average Free Throw Percent by Season
  5. Average Steals vs Blocks by season

Import Libraries

We use the following libraries:

  • Pandas: to load data from spreadsheet
  • matplotlib: to plot data
InΒ [4]:
import pandas as pd # for loading data
import matplotlib.pyplot as plt # for plotting graphs

Read data

This part of the code stores the data from the dataset into a pandas dataframe.

InΒ [5]:
df = pd.read_csv("data/ALL_NBA_PLAYERS.csv") # store data in pandas dataframe

Calculate and Plot Average Player Age by Season

This code calculates the average player age by season, then sets up and plots the graph.

InΒ [6]:
avg_age = df.groupby('SEASON_ID')["PLAYER_AGE"].mean() # store the average player age in pandas series by season

plt.figure(figsize=(10,6)) # set graph size
plt.plot(avg_age.index, avg_age, label='Average Age', marker='o', color='r') # plot lines
plt.xlabel('Season') # set x axis label
plt.ylabel('Average Age') # set y axis label
plt.title('Average Player Age by Season') # set graph title
plt.legend() # create key
plt.xticks(avg_age.index[::2], rotation=90) # set X axis ticks to every other season, rotate 90 degrees
plt.grid(True) # enable grid
plt.show() # show graph
No description has been provided for this image

Calculate and Plot Average Offensive, Defensive, and Total Rebounds

This code calculates the average offensive, defensive, and total rebounds, sets up the graph, then plots all three for comparison.

InΒ [7]:
avg_reb = df.groupby("SEASON_ID")[["OREB", "DREB", "REB"]].mean() # store average Defensive, offensive, and total rebounds into series by season

plt.figure(figsize=(10,6)) # set graph size
plt.plot(avg_reb.index, avg_reb["OREB"], label="Average Offensive Rebounds", color='red', marker="^") # plot lines
plt.plot(avg_reb.index, avg_reb["DREB"], label="Average Defensive Rebounds", color='blue', marker="v") # plot lines
plt.plot(avg_reb.index, avg_reb["REB"], label="Average Combined Rebounds", color='green', marker="o") # plot lines
plt.xlabel("Season") # set x axis label
plt.ylabel("Average Rebounds") # set y axis label
plt.title("Average Offensive Rebounds vs Average Defensive Rebounds by Season") # set graph title
plt.legend() # create key
plt.xticks(avg_reb.index[::2], rotation=90) # set X axis ticks to every other season, rotate 90 degrees
plt.grid(True) # enable grid
plt.show() # show graph
No description has been provided for this image

Calculate and Plot Average Personal Fouls by Season

This code calculates the average personal fouls by season then graphs the data.

InΒ [8]:
avg_pf = df.groupby('SEASON_ID')["PF"].mean() # store average personal fouls grouped by season

plt.figure(figsize=(10,6)) # set graph size
plt.plot(avg_pf.index, avg_pf.values, label="Average Personal Fouls", color='red', marker="o") # plot lines
plt.xlabel("Season") # set x axis label
plt.ylabel("Average Personal Fouls") # set y axis label
plt.title("Average Personal Fouls by Season") # set graph title
plt.legend() # create key
plt.xticks(avg_pf.index[::2], rotation=90) # set X axis ticks to every other season, rotate 90 degrees
plt.grid(True) # enable grid
plt.show() # show graph
No description has been provided for this image

Calculate and Plot Average Free Throw Percent by Season

this code calculates the average free throw percent by season, multiplies the decimal by 100 to get the percent, then graphs

InΒ [9]:
avg_ftpct = df.groupby("SEASON_ID")["FT_PCT"].mean() # store average free throw percent grouped by season
avg_ftpct *= 100 # multiply decimal by 100 to get %

plt.figure(figsize=(10,6)) # set graph size
plt.plot(avg_ftpct.index, avg_ftpct.values, label="Average Free Throw Percent", color="red", marker="o") # plot lines
plt.xlabel("Season") # set x axis label
plt.ylabel("Average Free Throw Percent") # set y axis label
plt.title("Average Free Throw Percent by Season") # set graph title
plt.legend() # create key
plt.xticks(avg_ftpct.index[::2], rotation=90) # set X axis ticks to every other season, rotate 90 degrees
plt.grid(True) # enable grid
plt.show() # show graph
No description has been provided for this image

Compare Average Steals vs Blocks by Season

The code gets the average steals and blocks by season then graphs them for comparison

InΒ [10]:
avg_blocks_steals = df.groupby("SEASON_ID")[["STL", "BLK"]].mean() # get averages for steals and blocks by season

plt.figure(figsize=(10,6)) # set graph size
plt.plot(avg_blocks_steals.index, avg_blocks_steals.STL, label="Average Blocks", color="red", marker="^") # plot lines
plt.plot(avg_blocks_steals.index, avg_blocks_steals.BLK, label="Average Steals", color="blue", marker="v") # plot lines
plt.xlabel("Season") # set x axis labe;
plt.ylabel("Averages") # set y axis label
plt.title("Average Blocks vs Average Steals by Season") # set graph title
plt.xticks(avg_blocks_steals.index[::2], rotation=90) # set X axis ticks to every other season, rotate 90 degrees
plt.grid(True) # enable grid
plt.legend() # create key
plt.show() # show graph
No description has been provided for this image

LiscenseΒΆ

This notebook and it's code is liscensed under the Apache 2.0 open source liscense.