How to drop duplicate rows from a pandas dataframe based on one column

Question 1

I want to delete duplicate rows from a dataframe using one column i.e. if many rows have the same value for a column, the final dataframe should have just one of those rows. How can I drop duplicate rows using one column?

Question 2

You can use the drop_duplicates() function with the parameter subset. You can specify the column(s) for the parameter subset.

Here is an example:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,2,3,4,5,1,3], 'b':[11,12,13,14,15,12,16]})
>>> df
a b
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 1 12
6 3 16
>>> df.drop_duplicates(subset=["a"])
a b
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
>>>

The above code found duplicate rows using column 'a' and dropped them.

pkumar81 · Answer 1 · 2023-05-09T20:37:12+0000

You can use the drop_duplicates() function with the parameter subset. You can specify the column(s) for the parameter subset.

Here is an example:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,2,3,4,5,1,3], 'b':[11,12,13,14,15,12,16]})
>>> df
a b
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 1 12
6 3 16
>>> df.drop_duplicates(subset=["a"])
a b
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
>>>

The above code found duplicate rows using column 'a' and dropped them.

How to drop duplicate rows from a pandas dataframe based on one column

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories