C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Python Pandas DataFramePandas DataFrame is a widely used data structure which works with a two-dimensional array with labeled axes (rows and columns). DataFrame is defined as a standard way to store data that has two different indexes, i.e., row index and column index. It consists of the following properties:
Parameter & Description:data: It consists of different forms like ndarray, series, map, constants, lists, array. index: The Default np.arrange(n) index is used for the row labels if no index is passed. columns: The default syntax is np.arrange(n) for the column labels. It shows only true if no index is passed. dtype: It refers to the data type of each column. copy(): It is used for copying the data. Create a DataFrameWe can create a DataFrame using following ways:
Create an empty DataFrame The below code shows how to create an empty DataFrame in Pandas: # importing the pandas library import pandas as pd df = pd.DataFrame() print (df) Output Empty DataFrame Columns: [] Index: [] Explanation: In the above code, first of all, we have imported the pandas library with the alias pd and then defined a variable named as df that consists an empty DataFrame. Finally, we have printed it by passing the df into the print. Create a DataFrame using List:We can easily create a DataFrame in Pandas using list. # importing the pandas library import pandas as pd # a list of strings x = ['Python', 'Pandas'] # Calling DataFrame constructor on list df = pd.DataFrame(x) print(df) Output 0 0 Python 1 Pandas Explanation: In the above code, we have defined a variable named "x" that consist of string values. The DataFrame constructor is being called for a list to print the values. Create a DataFrame from Dict of ndarrays/ Lists# importing the pandas library import pandas as pd info = {'ID' :[101, 102, 103],'Department' :['B.Sc','B.Tech','M.Tech',]} df = pd.DataFrame(info) print (df) Output ID Department 0 101 B.Sc 1 102 B.Tech 2 103 M.Tech Explanation: In the above code, we have defined a dictionary named "info" that consist list of ID and Department. For printing the values, we have to call the info dictionary through a variable called df and pass it as an argument in print(). Create a DataFrame from Dict of Series:# importing the pandas library import pandas as pd info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']), 'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])} d1 = pd.DataFrame(info) print (d1) Output one two a 1.0 1 b 2.0 2 c 3.0 3 d 4.0 4 e 5.0 5 f 6.0 6 g NaN 7 h NaN 8 Explanation: In the above code, a dictionary named "info" consists of two Series with its respective index. For printing the values, we have to call the info dictionary through a variable called d1 and pass it as an argument in print(). Column SelectionWe can select any column from the DataFrame. Here is the code that demonstrates how to select a column from the DataFrame. # importing the pandas library import pandas as pd info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']), 'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])} d1 = pd.DataFrame(info) print (d1 ['one']) Output a 1.0 b 2.0 c 3.0 d 4.0 e 5.0 f 6.0 g NaN h NaN Name: one, dtype: float64 Explanation: In the above code, a dictionary named "info" consists of two Series with its respective index. Later, we have called the info dictionary through a variable d1 and selected the "one" Series from the DataFrame by passing it into the print(). Column AdditionWe can also add any new column to an existing DataFrame. The below code demonstrates how to add any new column to an existing DataFrame: # importing the pandas library import pandas as pd info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])} df = pd.DataFrame(info) # Add a new column to an existing DataFrame object print ("Add new column by passing series") df['three']=pd.Series([20,40,60],index=['a','b','c']) print (df) print ("Add new column using existing DataFrame columns") df['four']=df['one']+df['three'] print (df) Output Add new column by passing series one two three a 1.0 1 20.0 b 2.0 2 40.0 c 3.0 3 60.0 d 4.0 4 NaN e 5.0 5 NaN f NaN 6 NaN Add new column using existing DataFrame columns one two three four a 1.0 1 20.0 21.0 b 2.0 2 40.0 42.0 c 3.0 3 60.0 63.0 d 4.0 4 NaN NaN e 5.0 5 NaN NaN f NaN 6 NaN NaN Explanation: In the above code, a dictionary named as f consists two Series with its respective index. Later, we have called the info dictionary through a variable df. To add a new column to an existing DataFrame object, we have passed a new series that contain some values concerning its index and printed its result using print(). We can add the new columns using the existing DataFrame. The "four" column has been added that stores the result of the addition of the two columns, i.e., one and three. Column Deletion:We can also delete any column from the existing DataFrame. This code helps to demonstrate how the column can be deleted from an existing DataFrame: # importing the pandas library import pandas as pd info = {'one' : pd.Series([1, 2], index= ['a', 'b']), 'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])} df = pd.DataFrame(info) print ("The DataFrame:") print (df) # using del function print ("Delete the first column:") del df['one'] print (df) # using pop function print ("Delete the another column:") df.pop('two') print (df) Output The DataFrame: one two a 1.0 1 b 2.0 2 c NaN 3 Delete the first column: two a 1 b 2 c 3 Delete the another column: Empty DataFrame Columns: [] Index: [a, b, c] Explanation: In the above code, the df variable is responsible for calling the info dictionary and print the entire values of the dictionary. We can use the delete or pop function to delete the columns from the DataFrame. In the first case, we have used the delete function to delete the "one" column from the DataFrame whereas in the second case, we have used the pop function to remove the "two" column from the DataFrame. Row Selection, Addition, and DeletionRow Selection:We can easily select, add, or delete any row at anytime. First of all, we will understand the row selection. Let's see how we can select a row using different ways that are as follows: Selection by Label: We can select any row by passing the row label to a loc function. # importing the pandas library import pandas as pd info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])} df = pd.DataFrame(info) print (df.loc['b']) Output one 2.0 two 2.0 Name: b, dtype: float64 Explanation: In the above code, a dictionary named as info that consists two Series with its respective index. For selecting a row, we have passed the row label to a loc function. Selection by integer location: The rows can also be selected by passing the integer location to an iloc function. # importing the pandas library import pandas as pd info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])} df = pd.DataFrame(info) print (df.iloc[3]) Output one 4.0 two 4.0 Name: d, dtype: float64 Explanation: Explanation: In the above code, we have defined a dictionary named as info that consists two Series with its respective index. For selecting a row, we have passed the integer location to an iloc function. Slice Rows It is another method to select multiple rows using ':' operator. # importing the pandas library import pandas as pd info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])} df = pd.DataFrame(info) print (df[2:5]) Output one two c 3.0 3 d 4.0 4 e 5.0 5 Explanation: In the above code, we have defined a range from 2:5 for the selection of row and then printed its values on the console. Addition of rows: We can easily add new rows to the DataFrame using append function. It adds the new rows at the end. # importing the pandas library import pandas as pd d = pd.DataFrame([[7, 8], [9, 10]], columns = ['x','y']) d2 = pd.DataFrame([[11, 12], [13, 14]], columns = ['x','y']) d = d.append(d2) print (d) Output x y 0 7 8 1 9 10 0 11 12 1 13 14 Explanation: In the above code, we have defined two separate lists that contains some rows and columns. These columns have been added using the append function and then result is displayed on the console. Deletion of rows: We can delete or drop any rows from a DataFrame using the index label. If in case, the label is duplicate then multiple rows will be deleted. # importing the pandas library import pandas as pd a_info = pd.DataFrame([[4, 5], [6, 7]], columns = ['x','y']) b_info = pd.DataFrame([[8, 9], [10, 11]], columns = ['x','y']) a_info = a_info.append(b_info) # Drop rows with label 0 a_info = a_info.drop(0) Output x y 1 6 7 1 10 11 Explanation: In the above code, we have defined two separate lists that contains some rows and columns. Here, we have defined the index label of a row that needs to be deleted from the list. DataFrame FunctionsThere are lots of functions used in DataFrame which are as follows:
Next TopicDataFrame.append()
|