python how word - Fastest way to check if a value exist in a list





6 Answers

As stated by others, in can be very slow for large lists. Here are some comparisons of the performances for in, set and bisect. Note the time (in second) is in log scale.

Code for testing:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time

def method_in(a,b,c):
    start_time = time.time()
    for i,x in enumerate(a):
        if x in b:
            c[i] = 1
    return(time.time()-start_time)   

def method_set_in(a,b,c):
    start_time = time.time()
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = 1
    return(time.time()-start_time)

def method_bisect(a,b,c):
    start_time = time.time()
    b.sort()
    for i,x in enumerate(a):
        index = bisect.bisect_left(b,x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return(time.time()-start_time)

def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    Nls = [x for x in range(1000,20000,1000)]
    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        time_method_in.append(math.log(method_in(a,b,c)))
        time_method_set_in.append(math.log(method_set_in(a,b,c)))
        time_method_bisect.append(math.log(method_bisect(a,b,c)))

    plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in')
    plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set')
    plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()
not exists of

I'm searching for the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is? I know that all values in the list are unique like my example.

The first method I try is (3.8sec in my real code):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

The second method I try is (2x faster:1.9sec on my real code):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

Proposed methods from user (2.74sec on my real code):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

In my real code, first method takes 3.81sec and the second method takes 1.88sec. It's a good improvement but:

I'm a beginner with Python/scripting and I want to know if a fastest way exists to do the same things and save more process time?

More specific explication for my application:

In the API of blender a can access to a list of particles:

particles = [1,2,3,4...etc.]

From there, I can access to its location:

particles[x].location = [x,y,z]

And I test for each particle if a neighbour exists by searching in the location of each particle like:

if [x+1,y,z] in particles.location
    "find the identity of this neighbour particles in x:the index 
    of the particles array"
    particles.index([x+1,y,z])



You could put your items into a set. Set lookups are very efficient.

Try:

s = set(a)
if 7 in s:
  # do stuff

edit In a comment you say that you'd like to get the index of the element. Unfortunately, sets have no notion of element position. An alternative is to pre-sort your list and then use binary search every time you need to find an element.




It sounds like your application might gain advantage from the use of a Bloom Filter data structure.

In short, a bloom filter look-up can tell you very quickly if a value is DEFINITELY NOT present in a set. Otherwise, you can do a slower look-up to get the index of a value that POSSIBLY MIGHT BE in the list. So if your application tends to get the "not found" result much more often then the "found" result, you might see a speed up by adding a Bloom Filter.

For details, Wikipedia provides a good overview of how Bloom Filters work, and a web search for "python bloom filter library" will provide at least a couple useful implementations.




this is not the code, but the algorithm for very fast searching

if your list and the value you are looking for are all numbers, this is pretty straightforward, if strings: look at the bottom:

  • -let "n" be the length of your list
  • -optional step: if you need the index of the element: add a second column to the list with current index of elements (0 to n-1) - see later
  • order your list or a copy of it (.sort())
  • loop through:
    • compare your number to the n/2th element of the list
      • if larger, loop again between indexes n/2-n
      • if smaller, loop again between indexes 0-n/2
      • if the same: you found it
  • keep narrowing the list until you have found it or only have 2 numbers (below and above the one you are looking for)
  • this will find any element in at most 19 steps for a list of 1.000.000 (log(2)n to be precise)

if you also need the original position of your number, look for it in the second, index column

if your list is not made of numbers, the method still works and will be fastest, but you may need to define a function which can compare/order strings

of course this needs the investment of the sorted() method, but if you keep reusing the same list for checking, it may be worth it




present = False
searchItem = 'd'
myList = ['a', 'b', 'c', 'd', 'e']
if searchItem in myList:
   present = True
   print('present = ', present)
else:
   print('present = ', present)



Or use __contains__:

sequence.__contains__(value)

Demo:

>>> l=[1,2,3]
>>> l.__contains__(3)
True
>>> 



Related