Effective Methods for Creating Dataframe Subsets in Python

Learn in our latest blog post how to effortlessly create subsets using Python Pandas Dataframes. Explore three different methods for efficient data filtering and manipulation.


A Python Dataframe? What is it, anyway?

The Python Pandas modules provide us with two data structures: Series and Dataframe, for storing values. A Dataframe is a data structure that stores data in the form of a matrix, i.e., rows and columns. With a Dataframe, we can create subsets in various ways and access them:

  • Accessing data by rows as subsets
  • Retrieving data by columns as subsets
  • Accessing specific data from specific rows and columns as subsets

Creating a Dataframe to Work With!

import pandas as pd 
data = {"Roll-num": [10,20,30,40,50,60,70], "Age":[12,14,13,12,14,13,15], "NAME":['John','Camili','Rheana','Joseph','Amanti','Alexa','Siri']}
block = pd.DataFrame(data)
print("Original Dataframe:\n")

 Original Dataframe:
   Roll-num  Age    NAME
0        10   12    John
1        20   14  Camili
2        30   13  Rheana
3        40   12  Joseph
4        50   14  Amanti
5        60   13   Alexa
6        70   15    Siri

Here, we created a Dataframe using the pandas.DataFrame() method. We will use this created dataset throughout this article.

Creating Subsets of a Python Dataframe Using the loc() Function

The Python loc() function allows us to create subsets of a Dataframe based on a specific row, column, or a combination of both. The loc() function operates based on labels, meaning we need to provide it with the label of the row/column to create custom subsets.

Example 1: Extracting data of specific rows from a Dataframe

   Roll-num  Age    NAME
0        10   12    John
1        20   14  Camili
3        40   12  Joseph

Example 2: Creating subsets of rows using slicing

   Roll-num  Age    NAME
0        10   12    John
1        20   14  Camili
2        30   13  Rheana
3        40   12  Joseph

Example 3: Creating subsets of specific columns with labels

   Age    NAME
0   12    John
1   14  Camili
2   13  Rheana

Using the Python iloc() Function to Create Subsets of a Dataframe

The Python iloc() function allows us to select subsets of rows and columns based on index values. Unlike the loc() function, which operates based on labels, the iloc() function operates based on index values. We can create subsets of a Python Dataframe’s data by specifying the index numbers of rows and columns.


   Roll-num    NAME
0        10    John
1        20  Camili
3        40  Joseph
6        70    Siri

Using the Index Operator to Create Subsets of a Dataframe

We can use the index operator, square brackets, to easily create subsets of the data.


   Age    NAME
0   12    John
1   14  Camili
2   13  Rheana
3   12  Joseph
4   14  Amanti
5   13   Alexa
6   15    Siri


With this, we’ve reached the end of this topic. Please feel free to leave a comment below if you have any questions. Stay tuned for more Python articles, and until then, happy learning!

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in:

centron Managed Cloud Hosting in Deutschland

Dimension Reduction – IsoMap

Dimension Reduction – IsoMap Content1 Introduction2 Prerequisites for Dimension Reduction3 Why Geodesic Distances Are Better for Dimension Reduction4 Dimension Reduction: Steps of the IsoMap Algorithm5 Landmark Isomap6 Drawbacks of Isomap7…
centron Managed Cloud Hosting in Deutschland

What Every ML/AI Developer Should Know About ONNX

What Every ML/AI Developer Should Know About ONNX Content1 Introduction2 ONNX Overview3 Prerequisites for ML/AI Developer4 ONNX in Practice for ML/AI Developer5 Conclusion for What Every ML/AI Developer Should Know…