[python] How to implement 'in' and 'not in' for Pandas dataframe



Answers

I've been usually doing generic filtering over rows like this:

criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]
Question

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values. Here's the scenario:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']

# pseudo-code:
df[df['countries'] not in countries]

My current way of doing this is as follows:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})

# IN
df.merge(countries,how='inner',on='countries')

# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?




I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds

Finally got it working:

dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]



Links