Pandas Cheat Sheet For Data Science In Python

Bacteria, datalators,

For working with data in python, Pandas is an essential tool you must use. This is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

But even when you’ve learned pandas in python, it’s easy to forget the specific syntax for doing something. That’s why today I am giving you a cheat sheet to help you easily reference the most common pandas tasks.

It’s also a good idea to check to the official pandas documentation from time to time, even if you can find what you need in the cheat sheet. Reading documentation is a skill every data professional needs, and the documentation goes into a lot more detail than we can fit in a single sheet anyway!

Importing Data:

Use these commands to import data from a variety of different sources and formats.

pd.read_csv(filename)Read From CSV file
pd.read_table(filename)From delimited text file (like TSV)
pd.read_excel(filename)From Excel file
pd.read_sql(query, connection_object)Read from SQL table/database
pd.read_json(json_string)Read from a JSON fromatted string, URL or file
pd.read_html(url)Parse an html URL, string or file and extract tables to a list of dataframes
pd.read_clipboard()Takes the contents of your clipboard adn passes it to read table()
pd.DataFrame(dict/list)From a dictionary, keys for columns name, values for data as list

Exporting Data:

Use these commands to export a DataFrame to CSV, .xlsx, SQL, or JSON.

df.to_csv(filename)Write teh df dataframe to a CSV file
df.to_excel(filename)write to an Excel file
df.to_sql(filename)Write to a SQL table
df.to_json(filename)Write to a file in JSON format

Viewing/Inspecting Data:

Use these commands to take a look at specific sections of your pandas DataFrame or Series.

df.head(n)First n rows of the dataframe (default is 5)
df.tail(n)Last n rows of the dataframe
df.shapeKnow the dimension of the dataframe
df.info()index, Datatype and Memory information
df.describe()summary statistics for numerical columns / pass include = ‘all’ to see whole summary
s.value_counts(dropna=False)View unique values and counts
df.apply(pd.Series.value_counts)Unique values and counts for all columns

Selection:

Use these commands to select a specific subset of your data.

df[col]Returns Column with label col as Series
df[[col1, col2]]Return Columns as a new DataFrame
s.iloc[0]Selection by position
s.loc[‘index_one’]Selection by index
df.iloc[0,:]First row
df.iloc[0,0]First element of first column

Data Cleaning:

Use these commands to perform a variety of data cleaning tasks.

df.columns = [‘a’,’b’,’c’]Rename columns
pd.isnull()Checks for null values, Return Boolean Array
pd.notnullCheck for non null values
df.dropna()Drop all rows that contains null values
df.dropna(axis=1)Drop all columns that contain null values
df.dropna(axis=1,thresh=n)Drop all rows that have less than n non values
df.fillna(x)Replace all null values with x
s.fillna(s.mean())Replace all null values with the mean (can be replaced with almost any function from statistics module)
s.astype(fload)Convert the datatype of the series to float
s.replace(1,’one’)Replace something with something else
s.replace([1,3],[‘one’,’three’])Replace multiple items
df.rename(colums=lambda x: x+1)Mass renaming of columns
df.rename(columns = {‘old_name’:’new_name})Selective renaming
df.set_index(‘column_one’)Change the index
df.rename(index = lambda x: x+1)Mass renaming of index

Filter, Sort, and Groupby:

Use these commands to filter, sort, and group your data.

df[df[clo]>0.5]Rows where column col is greater than 0.5
df[(df[col] > 0.5) & (df[col] < 0.7)]Rows where 0.7 > col > 0.5
df.sort_values(col1)Sort values by col1 in ascending order
df.sort_values(col2,ascending=False)Sort values by col2 in descending order
df.sort_values([col1,col2],ascending=[True,False])Sort values by col1 in ascending order then col2 in descending order
df.groupby(col)Returns a groupby object for values from one column
df.groupby([col1,col2])Returns groupby object for values from multiple columns
df.groupby(col1)[col2]Returns the mean of the values in col2, grouped by the values
in col1 (mean can be replaced with almost any function from the Statistical M)
df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean)
Create a pivot table that groups by col1 and calculates the mean of col2 and col3
df.groupby(col1).agg(np.mean)Find the average across all columns for every unique col1 group
df.apply(np.mean)Apply the function np.mean() across each column
nf.apply(np.max,axis=1)Apply the function np.max() across each row

Join/Combine:

Use these commands to combine multiple dataframes into a single one.

df1.append(df2)Add the rows in df1 to the end of df2 (columns should be identical)
pd.concat([df1, df2],axis=1)Add the columns in df1 to the end of df2 (rows should be identical)
df1.join(df2,on=col1,how=’inner’)Selection by position
s.loc[‘index_one’]SQL-style join the columns in df1 with the columns on df2 where the rows for col have identical values. ‘how’ can be one

Statistics:

These commands perform various statistical tests. (They can be applied to a series as well)

df.describe()Summary statistics for numerical columns
df.mean()Returns the mean of all columns
df.corr()Returns the correlation between columns in a DataFrame
df.count()Returns the number of non-null values in each DataFrame column
df.max()Returns the highest value in each column
df.min()Returns the lowest value in each column
df.median()Returns the median of each column
df.std()Returns the standard deviation of each column
df.transpose()Transposing rows and columns

I hope this cheat sheet will be useful to you no matter you are new to python who is learning python for data science or a data professional. Happy Programming.

You can also download the printable PDF file from here.

Down Arrow on Twitter Twemoji 13.0.1

.

49 thoughts on “Pandas Cheat Sheet For Data Science In Python”

  1. I enjoy what you guys are up too. This sort of clever work and exposure!
    Keep up the amazing works guys I’ve incorporated you
    guys to my personal blogroll.

  2. Somebody essentially assist to make seriously posts I would state.

    That is the first time I frequented your website page and to this point?
    I surprised with the research you made to create this actual publish extraordinary.

    Wonderful task!

  3. Great post. I was checking continuously this blog and I’m impressed!
    Very useful information particularly the last part 🙂 I care for such information a
    lot. I was seeking this certain info for a very
    long time. Thank you and good luck.

  4. Pretty part of content. I just stumbled upon your website and in accession capital to say that I
    acquire actually loved account your blog posts.
    Any way I will be subscribing on your augment or even I fulfillment you get admission to persistently quickly.

  5. Hiya very nice website!! Man .. Excellent .. Amazing .. I will bookmark your web site and take the feeds additionally?

    I am satisfied to find numerous helpful info here in the put
    up, we want develop more techniques in this regard, thank you
    for sharing. . . . . .

  6. Someone necessarily assist to make significantly articles I might state.
    That is the first time I frequented your web page and up to now?
    I amazed with the analysis you made to make this particular publish amazing.

    Magnificent activity!

  7. The morning of March 12, the Class B girls state basketball tournaments got underway in Spearfish with No.
    1 Corsica-Stickney vs. No. eight Irene-Wakonda, followed by No.
    four Faith vs. No. five Howard.

  8. Generally I don’t learn post on blogs, but I would like to say that this write-up very pressured me
    to check out and do it! Your writing style has been amazed me.
    Thanks, quite nice post.

  9. Its such as you learn my thoughts! You seem to know a lot approximately this, such as
    you wrote the ebook in it or something. I believe that you can do with a few p.c.
    to drive the message home a bit, however instead of that,
    that is excellent blog. A great read. I will certainly be back.

  10. Commissioner Roger Goodell agreed with Manfred in a July 2017 seminar that betting on in-game events,
    as opposed to the outcome of games, was a more palatable type of sports betting.

  11. I blog frequently and I truly thank you for your content.
    Your article has truly peaked my interest. I’m going to bookmark
    your blog and keep checking for new information about once
    per week. I opted in for your Feed as well.

  12. Hey There. I found your blog using msn. This is a very well written article.
    I will make sure to bookmark it and come back to read more of your useful information. Thanks for the post.
    I will certainly return.

Leave a Comment

Your email address will not be published.