Effective Methods for Creating Dataframe Subsets in Python

Learn in our latest blog post how to effortlessly create subsets using Python Pandas Dataframes. Explore three different methods for efficient data filtering and manipulation.

 

A Python Dataframe? What is it, anyway?

The Python Pandas modules provide us with two data structures: Series and Dataframe, for storing values. A Dataframe is a data structure that stores data in the form of a matrix, i.e., rows and columns. With a Dataframe, we can create subsets in various ways and access them:

  • Accessing data by rows as subsets
  • Retrieving data by columns as subsets
  • Accessing specific data from specific rows and columns as subsets

Creating a Dataframe to Work With!


import pandas as pd 
data = {"Roll-num": [10,20,30,40,50,60,70], "Age":[12,14,13,12,14,13,15], "NAME":['John','Camili','Rheana','Joseph','Amanti','Alexa','Siri']}
block = pd.DataFrame(data)
print("Original Dataframe:\n")
print(block)


 Original Dataframe:
   Roll-num  Age    NAME
0        10   12    John
1        20   14  Camili
2        30   13  Rheana
3        40   12  Joseph
4        50   14  Amanti
5        60   13   Alexa
6        70   15    Siri

Here, we created a Dataframe using the pandas.DataFrame() method. We will use this created dataset throughout this article.

Creating Subsets of a Python Dataframe Using the loc() Function

The Python loc() function allows us to create subsets of a Dataframe based on a specific row, column, or a combination of both. The loc() function operates based on labels, meaning we need to provide it with the label of the row/column to create custom subsets.

Example 1: Extracting data of specific rows from a Dataframe


   Roll-num  Age    NAME
0        10   12    John
1        20   14  Camili
3        40   12  Joseph

Example 2: Creating subsets of rows using slicing


   Roll-num  Age    NAME
0        10   12    John
1        20   14  Camili
2        30   13  Rheana
3        40   12  Joseph

Example 3: Creating subsets of specific columns with labels


   Age    NAME
0   12    John
1   14  Camili
2   13  Rheana

Using the Python iloc() Function to Create Subsets of a Dataframe

The Python iloc() function allows us to select subsets of rows and columns based on index values. Unlike the loc() function, which operates based on labels, the iloc() function operates based on index values. We can create subsets of a Python Dataframe’s data by specifying the index numbers of rows and columns.

Example:


   Roll-num    NAME
0        10    John
1        20  Camili
3        40  Joseph
6        70    Siri

Using the Index Operator to Create Subsets of a Dataframe

We can use the index operator, square brackets, to easily create subsets of the data.

Example:


   Age    NAME
0   12    John
1   14  Camili
2   13  Rheana
3   12  Joseph
4   14  Amanti
5   13   Alexa
6   15    Siri

Conclusion

With this, we’ve reached the end of this topic. Please feel free to leave a comment below if you have any questions. Stay tuned for more Python articles, and until then, happy learning!

Source: digitalocean.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in:

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

How to Manage User Groups in Linux Step-by-Step

Linux Basics, Tutorial

Linux file permissions with this comprehensive guide. Understand how to utilize chmod and chown commands to assign appropriate access rights, and gain insights into special permission bits like SUID, SGID, and the sticky bit to enhance your system’s security framework.

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

Apache Airflow on Ubuntu 24.04 with Nginx and SSL

Apache, Tutorial

This guide provides step-by-step instructions for installing and configuring the Cohere Toolkit on Ubuntu 24.04. It includes environment preparation, dependency setup, and key commands to run language models and implement Retrieval-Augmented Generation (RAG) workflows. Ideal for developers building AI applications or integrating large language models into their existing projects.