[python] “yield”关键字有什么作用?



Answers

快速获得Grokking yield

当你看到带有yield语句的函数时,应用这个简单的技巧来了解将要发生的事情:

  1. 在函数的开头插入一行result = []
  2. result.append(expr)替换每个yield expr
  3. 在函数的底部插入一个换行return result
  4. 耶 - 没有更多的yield声明! 阅读并找出代码。
  5. 比较功能与原始定义。

这个技巧可能会让你对函数背后的逻辑有所了解,但是yield实际情况与基于列表的方法发生的情况明显不同。 在很多情况下,收益率方法的记忆效率会更高,速度更快。 在其他情况下,这个技巧会让你陷入无限循环,尽管原始函数工作得很好。 请继续阅读以了解更多信息...

不要混淆你的迭代器,迭代器和发生器

首先, 迭代器协议 - 当你写

for x in mylist:
    ...loop body...

Python执行以下两个步骤:

  1. 获取mylist的迭代器:

    调用iter(mylist) - >这将返回一个带有next()方法的对象(或Python 3中的__next__() )。

    [这是大多数人忘记告诉你的步骤]

  2. 使用迭代器遍历项目:

    继续从步骤1返回的迭代器上调用next()方法。将next()的返回值赋给x并执行循环体。 如果在next()引发异常StopIteration ,则意味着迭代器中没有更多值,并且退出循环。

事实是,只要Python想循环对象的内容,Python就会执行上述两个步骤 - 所以它可能是一个for循环,但它也可以是像otherlist.extend(mylist)这样的代码(其中otherlist是Python列表) 。

这里mylist是一个迭代器,因为它实现了迭代器协议。 在用户定义的类中,可以实现__iter__()方法以使您的类的实例可迭代。 这个方法应该返回一个迭代器 。 迭代器是带有next()方法的对象。 可以在同一个类上实现__iter__()next() ,并使__iter__()返回self 。 这将适用于简单的情况,但不是当你想让两个迭代器同时在同一个对象上循环时。

所以这就是迭代器协议,许多对象实现这个协议:

  1. 内置列表,字典,元组,集合,文件。
  2. 实现__iter__()用户定义类。
  3. 发电机。

请注意, for循环并不知道它处理的是什么类型的对象 - 它只是遵循迭代器协议,并且很高兴在item调用next()获取item。 内置列表逐个返回它们的项目,字典逐个返回 ,文件逐个返回 ,等等。并且生成器返回......那么这就是yield的地方:

def f123():
    yield 1
    yield 2
    yield 3

for item in f123():
    print item

而不是yield语句,如果你在f123()有三个return语句, f123()只有第一个语句会被执行,并且该函数会退出。 但f123()不是普通的函数。 当f123() ,它不会返回yield语句中的任何值! 它返回一个生成器对象。 此外,函数并不真正退出 - 它进入暂停状态。 当for循环尝试循环生成器对象时,该函数从之前返回的yield之后的下一行恢复其挂起状态,执行下一行代码(在本例中为yield语句),并将其返回为下一个项目。 发生这种情况,直到函数退出,此时生成器引发StopIteration ,并退出循环。

所以生成器对象有点像适配器 - 一方面它展示了迭代器协议,通过暴露__iter__()next()方法来保持for循环的快乐。 然而,在另一端,它运行的功能足以让下一个值出来,并将其重新置于暂停模式。

为什么使用生成器?

通常你可以编写不使用生成器但实现相同逻辑的代码。 一种选择是使用我之前提到的临时列表“技巧”。 这在所有情况下都不起作用,例如,如果你有无限循环,或者当你有一个很长的列表时,它可能会无效地使用内存。 另一种方法是实现一个新的可迭代的类SomethingIter ,它保存实例成员中的状态,并在Python 3中的next() (或__next__() )方法中执行下一个逻辑步骤。 取决于逻辑, next()方法中的代码可能最终看起来非常复杂并容易出现错误。 这里的发电机提供了一个干净而简单的解

Question

Python中yield关键字的用法是什么? 它有什么作用?

例如,我试图理解这个代码1

def _get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild  

这是来电者:

result, candidates = [], [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result

调用方法_get_child_candidates时会发生什么? 是否返回列表? 单个元素? 它是否再次被调用? 随后的通话何时停止?

1.代码来自Jochen Schulz(jrschulz),他为度量空间创建了一个伟大的Python库。 这是完整源代码的链接: 模块mspace




Many people use return rather than yield but in some cases yield can be more efficient and easier to work with.

Here is an example which yield is definitely best for:

return (in function)

import random

def return_dates():
    dates = [] # with return you need to create a list then return it
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        dates.append(date)
    return dates

yield (in function)

def yield_dates():
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        yield date # yield makes a generator automatically which works in a similar way, this is much more efficient

Calling functions

dates_list = return_dates()
print(dates_list)
for i in dates_list:
    print(i)

dates_generator = yield_dates()
print(dates_generator)
for i in  dates_generator:
    print(i)

Both functions do the same thing but yield uses 3 lines instead of 5 and has one less variable to worry about.

This is the result from the code:

As you can see both functions do the same thing, the only difference is return_dates() gives a list and yield_dates() gives a generator

A real life example would be something like reading a file line by line or if you just want to make a generator




From a programming viewpoint, the iterators are implemented as thunks

http://en.wikipedia.org/wiki/Thunk_(functional_programming)

To implement iterators/generators/thread pools for concurrent execution/etc as thunks (also called anonymous functions), one uses messages sent to a closure object, which has a dispatcher, and the dispatcher answers to "messages".

http://en.wikipedia.org/wiki/Message_passing

" next " is a message sent to a closure, created by " iter " call.

There are lots of ways to implement this computation. I used mutation but it is easy to do it without mutation, by returning the current value and the next yielder.

Here is a demonstration which uses the structure of R6RS but the semantics is absolutely identical as in python, it's the same model of computation, only a change in syntax is required to rewrite it in python.

Welcome to Racket v6.5.0.3.

-> (define gen
     (lambda (l)
       (define yield
         (lambda ()
           (if (null? l)
               'END
               (let ((v (car l)))
                 (set! l (cdr l))
                 v))))
       (lambda(m)
         (case m
           ('yield (yield))
           ('init  (lambda (data)
                     (set! l data)
                     'OK))))))
-> (define stream (gen '(1 2 3)))
-> (stream 'yield)
1
-> (stream 'yield)
2
-> (stream 'yield)
3
-> (stream 'yield)
'END
-> ((stream 'init) '(a b))
'OK
-> (stream 'yield)
'a
-> (stream 'yield)
'b
-> (stream 'yield)
'END
-> (stream 'yield)
'END
-> 



All great answers whereas a bit difficult for newbies.

I assume you have learned return statement.
As an analogy, return and yield are twins.
return means 'Return and Stop' whereas 'yield` means 'Return but Continue'

  1. Try to get a num_list with return .
def num_list(n):
    for i in range(n):
        return i

运行:

In [5]: num_list(3)
Out[5]: 0

See, you get only a single number instead of a list of them,. return never allow you happy to prevail. It implemented once and quit.

  1. There comes yield

Replace return with yield

In [10]: def num_list(n):
    ...:     for i in range(n):
    ...:         yield i
    ...:

In [11]: num_list(3)
Out[11]: <generator object num_list at 0x10327c990> 

In [12]: list(num_list(3))
Out[12]: [0, 1, 2]

Now, you win to get all the numbers.
Comparing to return which runs once and stops, yield runs times you planed.
You can interpret return as return one of them ,
yield as return all of them . This is called iterable .

  1. One more step we can rewrite yield statement with return
In [15]: def num_list(n):
    ...:     result = []
    ...:     for i in range(n):
    ...:         result.append(i)
    ...:     return result

In [16]: num_list(3)
Out[16]: [0, 1, 2]

It's the core about yield .

The difference between a list return outputs and the object yield output is:
You can get [0, 1, 2] from a list object always whereas can only retrieve them from 'the object yield output' once.
So, it has a new name generator object as displayed in Out[11]: <generator object num_list at 0x10327c990> .

In conclusion as a metaphor to grok it,

return and yield are twins,
list and generator are twins.




For those who prefer a minimal working example, meditate on this interactive Python session:

>>> def f():
...   yield 1
...   yield 2
...   yield 3
... 
>>> g = f()
>>> for i in g:
...   print i
... 
1
2
3
>>> for i in g:
...   print i
... 
>>> # Note that this time nothing was printed



Yet another TL;DR

iterator on list : next() returns the next element of the list

iterator generator : next() will compute the next element on the fly (execute code)

You can see the yield/generator as a way to manually run the control flow from outside (like continue loop 1 step), by calling next, however complex the flow.

NOTE: the generator is NOT a normal function, it remembers previous state like local variables (stack), see other answers or articles for detailed explanation, the generator can only be iterated on once . You could do without yield but it would not be as nice, so it can be considered 'very nice' language sugar.




(My below answer only speaks from the perspective of using Python generator, not the underlying implementation of generator mechanism , which involves some tricks of stack and heap manipulation.)

When yield is used instead of a return in a python function, that function is turned into something special called generator function . That function will return an object of generator type. The yield keyword is a flag to notify the python compiler to treat such function specially. Normal functions will terminate once some value is returned from it. But with the help of the compiler, the generator function can be thought of as resumable. That is, the execution context will be restored and the execution will continue from last run. Until you explicitly call return, which will raise a StopIteration exception (which is also part of the iterator protocol), or reach the end of the function. I found a lot of references about generator but this one from the functional programming perspective is the most digestable.

(Now I want to talk about the rationale behind generator , and the iterator based on my own understanding. I hope this can help you grasp the essential motivation of iterator and generator. Such concept shows up in other languages as well such as C#.)

As I understand, when we want to process a bunch of data, we usually first store the data somewhere and then process it one by one. But this intuitive approach is problematic. If the data volume is huge, it's expensive to store them as a whole beforehand. So instead of storing the data itself directly, why not store some kind of metadata indirectly, ie the logic how the data is computed .

There are 2 approaches to wrap such metadata.

  1. The OO approach, we wrap the metadata as a class . This is the so-called iterator who implements the iterator protocol (ie the __next__() , and __iter__() methods). This is also the commonly seen iterator design pattern .
  2. The functional approach, we wrap the metadata as a function . This is the so-called generator function . But under the hood, the returned generator object still IS-A iterator because it also implements the iterator protocol.

Either way, an iterator is created, ie some object that can give you the data you want. The OO approach may be a bit complex. Anyway, which one to use is up to you.




Here is a mental image of what yield does.

I like to think of a thread as having a stack (even when it's not implemented that way).

When a normal function is called, it puts its local variables on the stack, does some computation, then clears the stack and returns. The values of its local variables are never seen again.

With a yield function, when its code begins to run (ie after the function is called, returning a generator object, whose next() method is then invoked), it similarly puts its local variables onto the stack and computes for a while. But then, when it hits the yield statement, before clearing its part of the stack and returning, it takes a snapshot of its local variables and stores them in the generator object. It also writes down the place where it's currently up to in its code (ie the particular yield statement).

So it's a kind of a frozen function that the generator is hanging onto.

When next() is called subsequently, it retrieves the function's belongings onto the stack and re-animates it. The function continues to compute from where it left off, oblivious to the fact that it had just spent an eternity in cold storage.

Compare the following examples:

def normalFunction():
    return
    if False:
        pass

def yielderFunction():
    return
    if False:
        yield 12

When we call the second function, it behaves very differently to the first. The yield statement might be unreachable, but if it's present anywhere, it changes the nature of what we're dealing with.

>>> yielderFunction()
<generator object yielderFunction at 0x07742D28>

Calling yielderFunction() doesn't run its code, but makes a generator out of the code. (Maybe it's a good idea to name such things with the yielder prefix for readability.)

>>> gen = yielderFunction()
>>> dir(gen)
['__class__',
 ...
 '__iter__',    #Returns gen itself, to make it work uniformly with containers
 ...            #when given to a for loop. (Containers return an iterator instead.)
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'next',        #The method that runs the function's body.
 'send',
 'throw']

The gi_code and gi_frame fields are where the frozen state is stored. Exploring them with dir(..) , we can confirm that our mental model above is credible.




yield is just like return - it returns whatever you tell it to. The only difference is that the next time you call the function, execution starts from the last call to the yield statement.

In the case of your code, the function get_child_candidates is acting like an iterator so that when you extend your list, it adds one element at a time to the new list.

list.extend calls an iterator until it's exhausted. In the case of the code sample you posted, it would be much clearer to just return a tuple and append that to the list.




It's returning a generator. I'm not particularly familiar with Python, but I believe it's the same kind of thing as C#'s iterator blocks if you're familiar with those.

There's an IBM article which explains it reasonably well (for Python) as far as I can see.

The key idea is that the compiler/interpreter/whatever does some trickery so that as far as the caller is concerned, they can keep calling next() and it will keep returning values - as if the generator method was paused . Now obviously you can't really "pause" a method, so the compiler builds a state machine for you to remember where you currently are and what the local variables etc look like. This is much easier than writing an iterator yourself.




Yield is an Object

A return in a function will return a single value.

If you want function to return huge set of values use yield .

More importantly, yield is a barrier

like Barrier in Cuda Language, it will not transfer control until it gets completed.

ie

It will run the code in your function from the beginning until it hits yield . Then, it'll return the first value of the loop. Then, every other call will run the loop you have written in the function one more time, returning the next value until there is no value to return.




TL; DR

When you find yourself building a list from scratch...

def squares_list(n):
    the_list = []                         # Replace
    for x in range(n):
        y = x * x
        the_list.append(y)                # these
    return the_list                       # lines

... yield each piece instead

def squares_the_yield_way(n):
    for x in range(n):
        y = x * x
        yield y                           # with this

This was my first "aha" moment with yield.

yield is a sugary way to say

build a series of stuff

Same behavior:

>>> for square in squares_list(4):
...     print(square)
...
0
1
4
9
>>> for square in squares_the_yield_way(4):
...     print(square)
...
0
1
4
9

Different behavior:

Yield is single-pass : you can only iterate through once. When a function has a yield in it we call it a generator function . And an iterator is what it returns. That's revealing. We lose the convenience of a container, but gain the power of an arbitrarily long series.

Yield is lazy , it puts off computation. A function with a yield in it doesn't actually execute at all when you call it. The iterator object it returns uses magic to maintain the function's internal context. Each time you call next() on the iterator (this happens in a for-loop) execution inches forward to the next yield. ( return raises StopIteration and ends the series.)

Yield is versatile . It can do infinite loops:

>>> def squares_all_of_them():
...     x = 0
...     while True:
...         yield x * x
...         x += 1
...
>>> squares = squares_all_of_them()
>>> for _ in range(4):
...     print(next(squares))
...
0
1
4
9

If you need multiple passes and the series isn't too long, just call list() on it:

>>> list(squares_the_yield_way(4))
[0, 1, 4, 9]

Brilliant choice of the word yield because both meanings apply:

yield — produce or provide (as in agriculture)

...provide the next data in the series.

yield — give way or relinquish (as in political power)

...relinquish CPU execution until the iterator advances.




While a lot of answers show why you'd use a yield to create a generator, there are more uses for yield . It's quite easy to make a coroutine, which enables the passing of information between two blocks of code. I won't repeat any of the fine examples that have already been given about using yield to create a generator.

To help understand what a yield does in the following code, you can use your finger to trace the cycle through any code that has a yield . Every time your finger hits the yield , you have to wait for a next or a send to be entered. When a next is called, you trace through the code until you hit the yield … the code on the right of the yield is evaluated and returned to the caller… then you wait. When next is called again, you perform another loop through the code. However, you'll note that in a coroutine, yield can also be used with a send … which will send a value from the caller into the yielding function. If a send is given, then yield receives the value sent, and spits it out the left hand side… then the trace through the code progresses until you hit the yield again (returning the value at the end, as if next was called).

例如:

>>> def coroutine():
...     i = -1
...     while True:
...         i += 1
...         val = (yield i)
...         print("Received %s" % val)
...
>>> sequence = coroutine()
>>> sequence.next()
0
>>> sequence.next()
Received None
1
>>> sequence.send('hello')
Received hello
2
>>> sequence.close()



yield关键字被简化为两个简单的事实:

  1. 如果编译器在函数内的任何位置检测到yield关键字,则该函数不再通过return语句return相反 ,它会立即返回一个名为生成器的懒惰“待处理列表”对象
  2. 一个生成器是可迭代的。 什么是可迭代的 ? 它像listsetrange或字典视图一样,具有用于以特定顺序访问每个元素内置协议

简而言之: 生成器是一个懒惰的递增列表 ,并且yield语句允许您使用函数表示法来编写生成器应该逐渐吐出的列表值

generator = myYieldingFunction(...)
x = list(generator)

   generator
       v
[x[0], ..., ???]

         generator
             v
[x[0], x[1], ..., ???]

               generator
                   v
[x[0], x[1], x[2], ..., ???]

                       StopIteration exception
[x[0], x[1], x[2]]     done

list==[x[0], x[1], x[2]]

让我们定义一个函数makeRange ,就像Python的range 。 调用makeRange(n)一个发生器:

def makeRange(n):
    # return 0,1,2,...,n-1
    i = 0
    while i < n:
        yield i
        i += 1

>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>

为了强制生成器立即返回其待处理值,可以将它传递给list() (就像你可以迭代的那样):

>>> list(makeRange(5))
[0, 1, 2, 3, 4]

将示例与“仅返回列表”进行比较

上面的例子可以被认为仅仅是创建一个你追加并返回的列表:

# list-version                   #  # generator-version
def makeRange(n):                #  def makeRange(n):
    """return [0,1,2,...,n-1]""" #~     """return 0,1,2,...,n-1"""
    TO_RETURN = []               #>
    i = 0                        #      i = 0
    while i < n:                 #      while i < n:
        TO_RETURN += [i]         #~         yield i
        i += 1                   #          i += 1  ## indented
    return TO_RETURN             #>

>>> makeRange(5)
[0, 1, 2, 3, 4]

但是有一个主要区别: 见最后一节。

你如何使用发电机

迭代是列表理解的最后一部分,所有的生成器都是可迭代的,所以它们经常被使用如下:

#                   _ITERABLE_
>>> [x+10 for x in makeRange(5)]
[10, 11, 12, 13, 14]

为了更好地感受生成器,你可以使用itertools模块(确保使用chain.from_iterable而不是在保证时chain )。 例如,你甚至可以使用生成器来实现像itertools.count()这样的无限长的懒惰列表。 你可以实现你自己的def enumerate(iterable): zip(count(), iterable) ,或者在while循环中用yield关键字来实现。

请注意:发生器实际上可以用于更多的事情,比如执行协程或非确定性编程或其他优雅的事情。 然而,我在这里呈现的“懒惰列表”观点是最常见的用法。

在幕后

这就是“Python迭代协议”的工作原理。 也就是说,当你list(makeRange(5)) 。 这是我之前描述的“懒惰,增量列表”。

>>> x=iter(range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

内置函数next()只是调用对象.next()函数,它是“迭代协议”的一部分,可在所有迭代器中找到。 你可以手动使用next()函数(和迭代协议的其他部分)来实现花哨的东西,通常会牺牲可读性,所以尽量避免这样做...

细节

通常,大多数人不会关心以下区别,并可能想停止阅读。

在Python中, 可迭代是任何“理解for循环的概念”的对象,如列表[1,2,3]迭代器是请求的for循环的特定实例,如[1,2,3].__iter__()生成器与任何迭代器完全相同,除了它的写法(使用函数语法)。

当您从列表中请求迭代器时,它会创建一个新的迭代器。 然而,当你从一个迭代器(你很少会这么做)请求一个迭代器时,它只会给你一个自己的副本。

因此,万一你没有做到这样的事情......

> x = myRange(5)
> list(x)
[0, 1, 2, 3, 4]
> list(x)
[]

...然后记住一个生成器是一个迭代器 ; 也就是说,这是一次性使用。 如果你想重用它,你应该再次调用myRange(...) 。 如果您需要使用两次结果,请将结果转换为列表并将其存储在变量x = list(myRange(5)) 。 那些绝对需要克隆一个生成器的人(例如,可怕的元编程人员)可以在绝对必要的情况下使用itertools.tee ,因为可复制的迭代器Python PEP标准提议已被推迟。




I was going to post "read page 19 of Beazley's 'Python: Essential Reference' for a quick description of generators", but so many others have posted good descriptions already.

Also, note that yield can be used in coroutines as the dual of their use in generator functions. Although it isn't the same use as your code snippet, (yield) can be used as an expression in a function. When a caller sends a value to the method using the send() method, then the coroutine will execute until the next (yield) statement is encountered.

Generators and coroutines are a cool way to set up data-flow type applications. I thought it would be worthwhile knowing about the other use of the yield statement in functions.






Related