Analyse your data with python using pandas
learn how to use pandas package to clean up or analyze data using python
Analyse your data with python using pandas
Overview
- In this tutorial, I’ll show you how to use pandas Library to manipulate your tabular data
- Pandas Library is open source and free to use and my favorite library to manipulate tabular data due it it’s simple API
- In this tutorial I am going to focus on
- Basic Data manipulation
- Reading writing data from and to excel file
Setup
- use
Pip install pandas
to install pandas package - use
import pandas as pd
to import pandas package - i am also going to use csv file and excel file which contain column data from below table
- you can copy paste data from this table and create your own version
ID | BREATH | DEPTH |
---|---|---|
C1 | 750 | 400 |
C2 | 900 | 400 |
C3 | 1050 | 400 |
C4 | 830 | 630 |
C5 | 1285 | 400 |
C6 | 600 | 600 |
C7 | 1200 | 400 |
C8 | 800 | 400 |
C9 | 900 | 600 |
C10 | 750 | 300 |
Create Dataframe
List
1
2
3
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=[
"i", "ii", "iii"], columns=["A", "B", "C"])
print(df)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
df = pd.DataFrame(
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18],
[19, 20, 21],
[22, 23, 24],
[25, 26, 27],
[28, 29, 30]
],
index=["i", "ii", "iii", "iv", "v", "vi", "vii", "viii", "ix", "x"],
columns=["A", "B", "C"]
)
CSV File
1
df = pd.read_csv('Column Data.csv')
Excel file
1
df = pd.read_excel('Column Data.xlsx')
if you have multiple sheets then specify sheet name
1
df = pd.read_excel('Column Data.xlsx', sheet_name="Data")
Manipulation Dataframe
Print Data
print first/last 5 rows
1
2
print(df.head())
print(df.tail())
if you want to print only first/last 2 rows
1
2
print(df.head(2))
print(df.tail(2))
print only rowIndex or column Index this is usefull to create dropdowns
1
2
3
print(df.index)
print(df.columns)
print(df["B"])
print specific row
1
print(df.loc[0:2])
print specific value based on row and column index
1
2
3
4
5
print(df.iat[0, 0])
print(df.iat[2, 1])
print(df.loc["i"]["A"])
# use Row and Column Name
print(df.loc["v", "B"])
you can also use this to update specific values
1
2
df.loc["i"]["A"] = 100
print(df.loc["i"]["A"])
Sort Data
sort values by specific column
1
2
3
print(df.sort_values(by=["A"]))
print(df.sort_values(by=["A"], ascending=False))
Filter using multiple columns
1
print(df.sort_values(by=["A", "B"]))
Filter Data
filter by specific value
1
2
3
print(df[df["B"] > 10])
print(df[df["DEPTH"] == 600])
print(df[df["DEPTH"] > 600])
filter but only show specific columns
1
print(df[df["B"] > 10, ["A", "C"]])
filter by multiple conditions
1
print(df[(df["BREATH"] > 600) & (df["DEPTH"] > 400)])
Write data to file
CSV file
1
df.to_csv("data.csv", index=True)
Excel file
1
df.to_excel("data.xlsx", index=True)
Conclusion
- pandas is a very useful library to manipulate tabular data
- It’s best usecase is when you combine with other libraries/framework like
- xlWings to automate excel
- JupyterNotebook to create interactive notebooks
- Streamlit to create interactive web apps
- matplotlib or plotly to create interactive graphs
If you have any questions or want to discuss something : Join our comment section
This post is licensed under CC BY-NC-ND 4.0 by the author.