dropna() and drop() in Python

Sana Bulbule
3 min readSep 28, 2020

--

Python is popular with developers because of many good reasons:

  • Clear and easy syntax
  • Easy to read, learn and understand
  • Type declarations are not required
  • Memory management is fast and automatic
  • Makes it easy to write shorter code than other programming languages.

Python has many useful and important packages for doing data analysis. One of those packages is Pandas which makes importing and analyzing data much easier.

Explaining following methods of Pandas package:

  • dropna() method
  • drop() method

Pandas DataFrame.dropna()

If csv file has null values then they are displayed as NaN in Data Frame. The Pandas dropna() method allows the user to analyze and drop Rows/Columns with Null values in different ways.

Syntax:

Parameters:

Pandas DataFrame.drop()

Pandas provide drop() method to remove rows by using index label or column name. This method is used to drop rows that do not satisfy the given conditions. This helps data analysts to delete and filter Data Frame.

Syntax:

Parameters:

Examples:

==> drop method

  • For drop() method, below is an example where a csv file is read and stored in a Data Frame. A list is defined that contains the names of all the columns we want to drop. Next, we call the drop() function passing the axis parameter as 1.
  • This tells Pandas that we want the changes to be made directly and it should look for the values to be dropped in the cloumn names provided in the ‘to_drop’ list.

Output:

In below output, we see that the columns ‘date’, ’action’, ’title’, ’inviter’, ’photo’, ’width’ and ’height’ are not displayed. The remaining columns of the Data Frame are displayed in the output.

==> dropna method

  • For dropna() method, using the same example to drop the NaN values from the list of columns [‘id’, ’from’, ’reply_to_message_id’]
  • And the remaining columns ‘actor’ and ‘actor_id’ from the Data Frame ‘df_sorted_data’ still have NaN values.

Output:

  • The row count from the two outputs changed from 14566 to 4882 because the rows were deleted by the dropna() method. For dropping rows with NaN values using only the column names mentioned in subset parameter

Thanks for reading!

Originally published at https://www.numpyninja.com on September 28, 2020.

--

--

Sana Bulbule
Sana Bulbule

Written by Sana Bulbule

Data Scientist & Machine Learning Engineer

No responses yet