Tutorials References Exercises Videos Menu
Create Website Get Certified Upgrade

Pandas DataFrame drop_duplicates() Method

❮ DataFrame Reference


Example

Remove duplicate rows from  the DataFrame:

import pandas as pd

data = {
  "name": ["Sally", "Mary", "John", "Mary"],
  "age": [50, 40, 30, 40],
  "qualified": [True, False, False, False]
}

df = pd.DataFrame(data)

newdf = df.drop_duplicates()
Try it Yourself »

Definition and Usage

The drop_duplicates() method removes duplicate rows.

Use the subset parameter if only some specified columns should be considered when looking for duplicates.


Syntax

dataframe.drop_duplicates(subset, keep, inplace, ignore_index)

Parameters

The parameters are keyword arguments.

Parameter Value Description
subset column label(s) Optional. A String, or a list, containing the columns to use when looking for duplicates. If not specified, all columns are being used.
keep 'first'
'last'
False
Optional, default 'first'. Specifies which duplicate to keep. If False, drop ALL duplicates
inplace True
False
Optional, default False. If True: the removing is done on the current DataFrame. If False: returns a copy where the removing is done.
ignore_index True
False
Optional, default False. Specifies whether to label the 0, 1, 2 etc., or not

Return Value

A DataFrame with the result, or None if the inplace parameter is set to True.


❮ DataFrame Reference