Effective Methods for Creating Dataframe Subsets in Python
Learn in our latest blog post how to effortlessly create subsets using Python Pandas Dataframes. Explore three different methods for efficient data filtering and manipulation.
A Python Dataframe? What is it, anyway?
The Python Pandas modules provide us with two data structures: Series and Dataframe, for storing values. A Dataframe is a data structure that stores data in the form of a matrix, i.e., rows and columns. With a Dataframe, we can create subsets in various ways and access them:
- Accessing data by rows as subsets
- Retrieving data by columns as subsets
- Accessing specific data from specific rows and columns as subsets
Creating a Dataframe to Work With!
import pandas as pd
data = {"Roll-num": [10,20,30,40,50,60,70], "Age":[12,14,13,12,14,13,15], "NAME":['John','Camili','Rheana','Joseph','Amanti','Alexa','Siri']}
block = pd.DataFrame(data)
print("Original Dataframe:\n")
print(block)
Original Dataframe:
Roll-num Age NAME
0 10 12 John
1 20 14 Camili
2 30 13 Rheana
3 40 12 Joseph
4 50 14 Amanti
5 60 13 Alexa
6 70 15 Siri
Here, we created a Dataframe using the pandas.DataFrame()
method. We will use this created dataset throughout this article.
Creating Subsets of a Python Dataframe Using the loc() Function
The Python loc()
function allows us to create subsets of a Dataframe based on a specific row, column, or a combination of both. The loc()
function operates based on labels, meaning we need to provide it with the label of the row/column to create custom subsets.
Example 1: Extracting data of specific rows from a Dataframe
block.loc[[0,1,3]]
Roll-num Age NAME
0 10 12 John
1 20 14 Camili
3 40 12 Joseph
Example 2: Creating subsets of rows using slicing
block.loc[0:3]
Roll-num Age NAME
0 10 12 John
1 20 14 Camili
2 30 13 Rheana
3 40 12 Joseph
Example 3: Creating subsets of specific columns with labels
block.loc[0:2,['Age','NAME']]
Age NAME
0 12 John
1 14 Camili
2 13 Rheana
Using the Python iloc() Function to Create Subsets of a Dataframe
The Python iloc()
function allows us to select subsets of rows and columns based on index values. Unlike the loc()
function, which operates based on labels, the iloc()
function operates based on index values. We can create subsets of a Python Dataframe’s data by specifying the index numbers of rows and columns.
Example:
block.iloc[[0,1,3,6],[0,2]]
Roll-num NAME
0 10 John
1 20 Camili
3 40 Joseph
6 70 Siri
Using the Index Operator to Create Subsets of a Dataframe
We can use the index operator, square brackets, to easily create subsets of the data.
Example:
block[['Age','NAME']]
Age NAME
0 12 John
1 14 Camili
2 13 Rheana
3 12 Joseph
4 14 Amanti
5 13 Alexa
6 15 Siri
Conclusion
With this, we’ve reached the end of this topic. Please feel free to leave a comment below if you have any questions. Stay tuned for more Python articles, and until then, happy learning!