C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Pandas DataFrame.groupby()In Pandas, groupby() function allows us to rearrange the data by utilizing them on real-world data sets. Its primary task is to split the data into various groups. These groups are categorized based on some criteria. The objects can be divided from any of their axes. Syntax:DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs) This operation consists of the following steps for aggregating/grouping the data:
Note: The result of Groupby operation is not a DataFrame, but dict of DataFrame objects.Split data into groupsThere are multiple ways to split any object into the group which are as follows:
We can also add some functionality to each subset. The following operations can be performed on the applied functionality:
AggregationsIt is defined as a function that returns a single aggregated value for each of the groups. We can perform several aggregation operations on the grouped data when the groupby object is created. Example # import the pandas library import pandas as pd import numpy as np data = {'Name': ['Parker', 'Smith', 'John', 'William'], 'Percentage': [82, 98, 91, 87], 'Course': ['B.Sc','B.Ed','M.Phill','BA']} df = pd.DataFrame(data) grouped = df.groupby('Course') print(grouped['Percentage'].agg(np.mean)) Output Course B.Ed 98 B.Sc 82 BA 87 M.Phill 91 Name: Percentage, dtype: int64 TransformationsIt is an operation on a group or column that performs some group-specific computation and returns an object that is indexed with the same size as of the group size. Example # import the pandas library import pandas as pd import numpy as np data = {'Name': ['Parker', 'Smith', 'John', 'William'], 'Percentage': [82, 98, 91, 87], 'Course': ['B.Sc','B.Ed','M.Phill','BA']} df = pd.DataFrame(data) grouped = df.groupby('Course') Percentage = lambda x: (x - x.mean()) / x.std()*10 print(grouped.transform(Percentage)) Output Percentage 0 NaN 1 NaN 2 NaN 3 NaN FiltrationThe filter() function filters the data by defining some criteria and returns the subset of data. Example # import the pandas library import pandas as pd import numpy as np data = {'Name': ['Parker', 'Smith', 'John', 'William'], 'Percentage': [82, 98, 91, 87], 'Course': ['B.Sc','B.Ed','M.Phill','BA']} df = pd.DataFrame(data) grouped = df.groupby('Course') print (df.groupby('Course').filter(lambda x: len(x) >= 1)) Output Name Percentage Course 0 Parker 82 B.Sc 1 Smith 98 B.Ed 2 John 91 M.Phill 3 William 87 BA Parameters of Groupby:
Its main task is to determine the groups in the groupby. If we use by as a function, it is called on each value of the object's index. If in case a dict or Series is passed, then the Series or dict VALUES will be used to determine the groups. If a ndarray is passed, then the values are used as-is determine the groups. We can also pass the label or list of labels to group by the columns in the self.
It is used when the axis is a MultiIndex (hierarchical), so, it will group by a particular level or levels.
Note: It does not influence the order of observations within each group. The Groupby preserves the order of rows within each group.
When we call it, it adds the group keys to the index for identifying the pieces.
ReturnsIt returns the DataFrameGroupBy or SeriesGroupBy. The return value depends on the calling object that consists of information about the groups. Example import pandas as pd info = pd.DataFrame({'Name': ['Parker', 'Smith','John', 'William'],'Percentage': [92., 98., 89., 86.]}) info Output Example # import the pandas library import pandas as pd data = {'Name': ['Parker', 'Smith', 'John', 'William'], 'Percentage': [82, 98, 91, 87],} info = pd.DataFrame(data) print (info) Output Name Percentage 0 Parker 82 1 Smith 98 2 John 91 3 William 87
Next TopicDataFrame.head()
|