TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

Python Pandas DataFrame

Python Pandas DataFrame| Python Pandas Tutorial, Python Pandas Introduction, What is Python Pandas, Data Structures, Reading Multiple Files, Null values, Multiple index, Application, Application Basics, Resampling, Plotting the data, Moving windows functions, Series, Read the file, Data operations, Filter Data etc.

<< Back to PYTHON

Python Pandas DataFrame

Pandas DataFrame is a widely used data structure which works with a two-dimensional array with labeled axes (rows and columns). DataFrame is defined as a standard way to store data that has two different indexes, i.e., row index and column index. It consists of the following properties:

  • The columns can be heterogeneous types like int, bool, and so on.
  • It can be seen as a dictionary of Series structure where both the rows and columns are indexed. It is denoted as "columns" in case of columns and "index" in case of rows.

Parameter & Description:

data: It consists of different forms like ndarray, series, map, constants, lists, array.

index: The Default np.arrange(n) index is used for the row labels if no index is passed.

columns: The default syntax is np.arrange(n) for the column labels. It shows only true if no index is passed.

dtype: It refers to the data type of each column.

copy(): It is used for copying the data.

Python Pandas DataFrame

Create a DataFrame

We can create a DataFrame using following ways:

  • dict
  • Lists
  • Numpy ndarrrays
  • Series

Create an empty DataFrame

The below code shows how to create an empty DataFrame in Pandas:

# importing the pandas library
import pandas as pd
df = pd.DataFrame()
print (df)

Output

Empty DataFrame
Columns: []
Index: []

Explanation: In the above code, first of all, we have imported the pandas library with the alias pd and then defined a variable named as df that consists an empty DataFrame. Finally, we have printed it by passing the df into the print.

Create a DataFrame using List:

We can easily create a DataFrame in Pandas using list.

# importing the pandas library
import pandas as pd
# a list of strings
x = ['Python', 'Pandas']

# Calling DataFrame constructor on list
df = pd.DataFrame(x)
print(df)

Output

      0
0   Python
1   Pandas

Explanation: In the above code, we have defined a variable named "x" that consist of string values. The DataFrame constructor is being called for a list to print the values.

Create a DataFrame from Dict of ndarrays/ Lists

# importing the pandas library
import pandas as pd
info = {'ID' :[101, 102, 103],'Department' :['B.Sc','B.Tech','M.Tech',]}
df = pd.DataFrame(info)
print (df)

Output

       ID      Department
0      101        B.Sc
1      102        B.Tech
2      103        M.Tech

Explanation: In the above code, we have defined a dictionary named "info" that consist list of ID and Department. For printing the values, we have to call the info dictionary through a variable called df and pass it as an argument in print().

Create a DataFrame from Dict of Series:

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}

d1 = pd.DataFrame(info)
print (d1)

Output

        one         two
a       1.0          1
b       2.0          2
c       3.0          3
d       4.0          4
e       5.0          5
f       6.0          6
g       NaN          7
h       NaN          8

Explanation: In the above code, a dictionary named "info" consists of two Series with its respective index. For printing the values, we have to call the info dictionary through a variable called d1 and pass it as an argument in print().

Column Selection

We can select any column from the DataFrame. Here is the code that demonstrates how to select a column from the DataFrame.

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}

d1 = pd.DataFrame(info)
print (d1 ['one'])

Output

a      1.0
b      2.0
c      3.0
d      4.0
e      5.0
f      6.0
g      NaN
h      NaN
Name: one, dtype: float64

Explanation: In the above code, a dictionary named "info" consists of two Series with its respective index. Later, we have called the info dictionary through a variable d1 and selected the "one" Series from the DataFrame by passing it into the print().

Column Addition

We can also add any new column to an existing DataFrame. The below code demonstrates how to add any new column to an existing DataFrame:

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}

df = pd.DataFrame(info)

# Add a new column to an existing DataFrame object 

print ("Add new column by passing series")
df['three']=pd.Series([20,40,60],index=['a','b','c'])
print (df)

print ("Add new column using existing DataFrame columns")
df['four']=df['one']+df['three']

print (df)

Output

Add new column by passing series
      one     two      three
a     1.0      1        20.0
b     2.0      2        40.0
c     3.0      3        60.0
d     4.0      4        NaN
e     5.0      5        NaN
f     NaN      6        NaN

Add new column using existing DataFrame columns
       one      two       three      four
a      1.0       1         20.0      21.0
b      2.0       2         40.0      42.0
c      3.0       3         60.0      63.0
d      4.0       4         NaN      NaN
e      5.0       5         NaN      NaN
f      NaN       6         NaN      NaN

Explanation: In the above code, a dictionary named as f consists two Series with its respective index. Later, we have called the info dictionary through a variable df.

To add a new column to an existing DataFrame object, we have passed a new series that contain some values concerning its index and printed its result using print().

We can add the new columns using the existing DataFrame. The "four" column has been added that stores the result of the addition of the two columns, i.e., one and three.

Column Deletion:

We can also delete any column from the existing DataFrame. This code helps to demonstrate how the column can be deleted from an existing DataFrame:

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2], index= ['a', 'b']), 
   'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}
   
df = pd.DataFrame(info)
print ("The DataFrame:")
print (df)

# using del function
print ("Delete the first column:")
del df['one']
print (df)
# using pop function
print ("Delete the another column:")
df.pop('two')
print (df)

Output

The DataFrame:
      one    two
a     1.0     1
b     2.0     2
c     NaN     3

Delete the first column:
     two
a     1
b     2
c     3

Delete the another column:
Empty DataFrame
Columns: []
Index: [a, b, c]

Explanation:

In the above code, the df variable is responsible for calling the info dictionary and print the entire values of the dictionary. We can use the delete or pop function to delete the columns from the DataFrame.

In the first case, we have used the delete function to delete the "one" column from the DataFrame whereas in the second case, we have used the pop function to remove the "two" column from the DataFrame.


Row Selection, Addition, and Deletion

Row Selection:

We can easily select, add, or delete any row at anytime. First of all, we will understand the row selection. Let's see how we can select a row using different ways that are as follows:

Selection by Label:

We can select any row by passing the row label to a loc function.

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}

df = pd.DataFrame(info)
print (df.loc['b'])

Output

one    2.0
two    2.0
Name: b, dtype: float64

Explanation: In the above code, a dictionary named as info that consists two Series with its respective index.

For selecting a row, we have passed the row label to a loc function.

Selection by integer location:

The rows can also be selected by passing the integer location to an iloc function.

# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df.iloc[3])

Output

one    4.0
two    4.0
Name: d, dtype: float64

Explanation: Explanation: In the above code, we have defined a dictionary named as info that consists two Series with its respective index.

For selecting a row, we have passed the integer location to an iloc function.

Slice Rows

It is another method to select multiple rows using ':' operator.

# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df[2:5])

Output

      one    two
c     3.0     3
d     4.0     4
e     5.0     5

Explanation: In the above code, we have defined a range from 2:5 for the selection of row and then printed its values on the console.

Addition of rows:

We can easily add new rows to the DataFrame using append function. It adds the new rows at the end.

# importing the pandas library
import pandas as pd
d = pd.DataFrame([[7, 8], [9, 10]], columns = ['x','y'])
d2 = pd.DataFrame([[11, 12], [13, 14]], columns = ['x','y'])
d = d.append(d2)
print (d)

Output

      x      y
0     7      8
1     9      10
0     11     12
1     13     14

Explanation: In the above code, we have defined two separate lists that contains some rows and columns. These columns have been added using the append function and then result is displayed on the console.

Deletion of rows:

We can delete or drop any rows from a DataFrame using the index label. If in case, the label is duplicate then multiple rows will be deleted.

# importing the pandas library
import pandas as pd

a_info = pd.DataFrame([[4, 5], [6, 7]], columns = ['x','y'])
b_info = pd.DataFrame([[8, 9], [10, 11]], columns = ['x','y'])

a_info = a_info.append(b_info)

# Drop rows with label 0
a_info = a_info.drop(0)

Output

x      y
1     6      7
1     10    11

Explanation: In the above code, we have defined two separate lists that contains some rows and columns.

Here, we have defined the index label of a row that needs to be deleted from the list.

DataFrame Functions

There are lots of functions used in DataFrame which are as follows:

Functions Description
Pandas DataFrame.append() Add the rows of other dataframe to the end of the given dataframe.
Pandas DataFrame.apply() Allows the user to pass a function and apply it to every single value of the Pandas series.
Pandas DataFrame.assign() Add new column into a dataframe.
Pandas DataFrame.astype() Cast the Pandas object to a specified dtype.astype() function.
Pandas DataFrame.concat() Perform concatenation operation along an axis in the DataFrame.
Pandas DataFrame.count() Count the number of non-NA cells for each column or row.
Pandas DataFrame.describe() Calculate some statistical data like percentile, mean and std of the numerical values of the Series or DataFrame.
Pandas DataFrame.drop_duplicates() Remove duplicate values from the DataFrame.
Pandas DataFrame.groupby() Split the data into various groups.
Pandas DataFrame.head() Returns the first n rows for the object based on position.
Pandas DataFrame.hist() Divide the values within a numerical variable into "bins".
Pandas DataFrame.iterrows() Iterate over the rows as (index, series) pairs.
Pandas DataFrame.mean() Return the mean of the values for the requested axis.
Pandas DataFrame.melt() Unpivots the DataFrame from a wide format to a long format.
Pandas DataFrame.merge() Merge the two datasets together into one.
Pandas DataFrame.pivot_table() Aggregate data with calculations such as Sum, Count, Average, Max, and Min.
Pandas DataFrame.query() Filter the dataframe.
Pandas DataFrame.sample() Select the rows and columns from the dataframe randomly.
Pandas DataFrame.shift() Shift column or subtract the column value with the previous row value from the dataframe.
Pandas DataFrame.sort() Sort the dataframe.
Pandas DataFrame.sum() Return the sum of the values for the requested axis by the user.
Pandas DataFrame.to_excel() Export the dataframe to the excel file.
Pandas DataFrame.transpose() Transpose the index and columns of the dataframe.
Pandas DataFrame.where() Check the dataframe for one or more conditions.

Next TopicDataFrame.append()




Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf