+1 vote
in Programming Languages by (39.4k points)
I want to delete duplicate rows from a dataframe using one column i.e. if many rows have the same value for a column, the final dataframe should have just one of those rows. How can I drop duplicate rows using one column?

1 Answer

+1 vote
by (281k points)
selected by
 
Best answer

You can use the drop_duplicates() function with the parameter subset. You can specify the column(s) for the parameter subset. 

Here is an example:

>>> import pandas as pd

>>> df = pd.DataFrame({'a':[1,2,3,4,5,1,3], 'b':[11,12,13,14,15,12,16]})

>>> df

   a   b

0  1  11

1  2  12

2  3  13

3  4  14

4  5  15

5  1  12

6  3  16

>>> df.drop_duplicates(subset=["a"])

   a   b

0  1  11

1  2  12

2  3  13

3  4  14

4  5  15

>>> 

The above code found duplicate rows using column 'a' and dropped them.

...