Post

Analyse your data with python using pandas

learn how to use pandas package to clean up or analyze data using python

Analyse your data with python using pandas

Overview

  • In this tutorial, I’ll show you how to use pandas Library to manipulate your tabular data
  • Pandas Library is open source and free to use and my favorite library to manipulate tabular data due it it’s simple API
  • In this tutorial I am going to focus on
    • Basic Data manipulation
    • Reading writing data from and to excel file

Setup

  • use Pip install pandas to install pandas package
  • use import pandas as pd to import pandas package
  • i am also going to use csv file and excel file which contain column data from below table
  • you can copy paste data from this table and create your own version
IDBREATHDEPTH
C1750400
C2900400
C31050400
C4830630
C51285400
C6600600
C71200400
C8800400
C9900600
C10750300

Create Dataframe

List

1
2
3
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=[
                  "i", "ii", "iii"], columns=["A", "B", "C"])
print(df)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
df = pd.DataFrame(
    [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9],
        [10, 11, 12],
        [13, 14, 15],
        [16, 17, 18],
        [19, 20, 21],
        [22, 23, 24],
        [25, 26, 27],
        [28, 29, 30]
    ],
    index=["i", "ii", "iii", "iv", "v", "vi", "vii", "viii", "ix", "x"],
    columns=["A", "B", "C"]
)

CSV File

1
df = pd.read_csv('Column Data.csv')

Excel file

1
df = pd.read_excel('Column Data.xlsx')

if you have multiple sheets then specify sheet name

1
df = pd.read_excel('Column Data.xlsx', sheet_name="Data")

Manipulation Dataframe

print first/last 5 rows

1
2
print(df.head())
print(df.tail())

if you want to print only first/last 2 rows

1
2
print(df.head(2))
print(df.tail(2))

print only rowIndex or column Index this is usefull to create dropdowns

1
2
3
print(df.index)
print(df.columns)
print(df["B"])

print specific row

1
print(df.loc[0:2])

print specific value based on row and column index

1
2
3
4
5
print(df.iat[0, 0])
print(df.iat[2, 1])
print(df.loc["i"]["A"])
# use Row and Column Name
print(df.loc["v", "B"])

you can also use this to update specific values

1
2
df.loc["i"]["A"] = 100
print(df.loc["i"]["A"])

Sort Data

sort values by specific column

1
2
3
print(df.sort_values(by=["A"]))
print(df.sort_values(by=["A"], ascending=False))

Filter using multiple columns

1
print(df.sort_values(by=["A", "B"]))

Filter Data

filter by specific value

1
2
3
print(df[df["B"] > 10])
print(df[df["DEPTH"] == 600])
print(df[df["DEPTH"] > 600])

filter but only show specific columns

1
print(df[df["B"] > 10, ["A", "C"]])

filter by multiple conditions

1
print(df[(df["BREATH"] > 600) & (df["DEPTH"] > 400)])

Write data to file

CSV file

1
df.to_csv("data.csv", index=True)

Excel file

1
df.to_excel("data.xlsx", index=True)

Conclusion

  • pandas is a very useful library to manipulate tabular data
  • It’s best usecase is when you combine with other libraries/framework like
    • xlWings to automate excel
    • JupyterNotebook to create interactive notebooks
    • Streamlit to create interactive web apps
    • matplotlib or plotly to create interactive graphs

If you have any questions or want to discuss something : Join our comment section

This post is licensed under CC BY-NC-ND 4.0 by the author.