TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

Pandas Groupby

Pandas Groupby with What is Python Pandas, Reading Multiple Files, Null values, Multiple index, Application, Application Basics, Resampling, Plotting the data, Moving windows functions, Series, Read the file, Data operations, Filter Data etc.

<< Back to PANDAS

Pandas DataFrame.groupby()

In Pandas, groupby() function allows us to rearrange the data by utilizing them on real-world data sets. Its primary task is to split the data into various groups. These groups are categorized based on some criteria. The objects can be divided from any of their axes.

Syntax:

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

This operation consists of the following steps for aggregating/grouping the data:

  • Splitting datasets
  • Analyzing data
  • Aggregating or combining data

Note: The result of Groupby operation is not a DataFrame, but dict of DataFrame objects.

Split data into groups

There are multiple ways to split any object into the group which are as follows:

  • obj.groupby('key')
  • obj.groupby(['key1','key2'])
  • obj.groupby(key,axis=1)

We can also add some functionality to each subset. The following operations can be performed on the applied functionality:

  • Aggregation: Computes summary statistic.
  • Transformation: It performs some group-specific operation.
  • Filtration: It filters the data by discarding it with some condition.

Aggregations

It is defined as a function that returns a single aggregated value for each of the groups. We can perform several aggregation operations on the grouped data when the groupby object is created.

Example

# import the pandas library
import pandas as pd
import numpy as np
data = {'Name': ['Parker', 'Smith', 'John', 'William'],
   'Percentage': [82, 98, 91, 87],
   'Course': ['B.Sc','B.Ed','M.Phill','BA']}
df = pd.DataFrame(data)

grouped = df.groupby('Course')
print(grouped['Percentage'].agg(np.mean))

Output

Course
B.Ed        98
B.Sc        82
BA          87
M.Phill     91
Name: Percentage, dtype: int64

Transformations

It is an operation on a group or column that performs some group-specific computation and returns an object that is indexed with the same size as of the group size.

Example

# import the pandas library
import pandas as pd
import numpy as np

data = {'Name': ['Parker', 'Smith', 'John', 'William'],
   'Percentage': [82, 98, 91, 87],
   'Course': ['B.Sc','B.Ed','M.Phill','BA']}
df = pd.DataFrame(data)

grouped = df.groupby('Course')
Percentage = lambda x: (x - x.mean()) / x.std()*10
print(grouped.transform(Percentage))

Output

    Percentage
0       NaN
1       NaN
2       NaN
3       NaN

Filtration

The filter() function filters the data by defining some criteria and returns the subset of data.

Example

# import the pandas library
import pandas as pd
import numpy as np

data = {'Name': ['Parker', 'Smith', 'John', 'William'],
   'Percentage': [82, 98, 91, 87],
   'Course': ['B.Sc','B.Ed','M.Phill','BA']}
df = pd.DataFrame(data)

grouped = df.groupby('Course')
print (df.groupby('Course').filter(lambda x: len(x) >= 1))

Output

     Name      Percentage    Course
0   Parker      82            B.Sc
1   Smith       98            B.Ed
2   John        91            M.Phill
3   William     87            BA

Parameters of Groupby:

  • by: mapping, function, str, or iterable

Its main task is to determine the groups in the groupby. If we use by as a function, it is called on each value of the object's index. If in case a dict or Series is passed, then the Series or dict VALUES will be used to determine the groups.

If a ndarray is passed, then the values are used as-is determine the groups.

We can also pass the label or list of labels to group by the columns in the self.

  • axis: {0 or 'index', 1 or 'columns'}, default value 0
  • level: int, level name, or sequence of such, default value None.

It is used when the axis is a MultiIndex (hierarchical), so, it will group by a particular level or levels.

  • as_index: bool, default True
    It returns the object with group labels as the index for the aggregated output.
  • sort: bool, default True
    It is used to sort the group keys. Get better performance by turning this off.

Note: It does not influence the order of observations within each group. The Groupby preserves the order of rows within each group.

  • group_keys: bool, default value True

When we call it, it adds the group keys to the index for identifying the pieces.

  • observed: bool, default value False
    It will be used only if any of the groupers are the Categoricals. If the value is True, then it will show only the observed values for categorical groupers. Otherwise, it will show all of its values.
  • **kwargs
    It is an optional parameter that only accepts the keyword argument 'mutated' that is passed to groupby.

Returns

It returns the DataFrameGroupBy or SeriesGroupBy. The return value depends on the calling object that consists of information about the groups.

Example

import pandas as pd
info = pd.DataFrame({'Name': ['Parker', 'Smith','John', 'William'],'Percentage': [92., 98., 89., 86.]})
info

Output

Pandas Groupby

Example

# import the pandas library
import pandas as pd

data = {'Name': ['Parker', 'Smith', 'John', 'William'],
   'Percentage': [82, 98, 91, 87],}
info = pd.DataFrame(data)

print (info)

Output

    Name     Percentage
0   Parker     82
1   Smith      98
2   John       91
3   William    87

Next TopicDataFrame.head()




Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf