once - python passing mutable arguments
“Least Astonishment” and the Mutable Default Argument (20)
Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=): a.append(5) return a
Python novices would expect this function to always return a list with only one element:
. The result is instead very different, and very astonishing (for a novice):
>>> foo()  >>> foo() [5, 5] >>> foo() [5, 5, 5] >>> foo() [5, 5, 5, 5] >>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a(): ... print("a executed") ... return  ... >>> >>> def b(x=a()): ... x.append(5) ... print(x) ... a executed >>> b()  >>> b() [5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or "together" with it?
Doing the binding inside the function would mean that
x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the
def line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
5 points in defense of Python
Simplicity: The behavior is simple in the following sense: Most people fall into this trap only once, not several times.
Consistency: Python always passes objects, not names. The default parameter is, obviously, part of the function heading (not the function body). It therefore ought to be evaluated at module load time (and only at module load time, unless nested), not at function call time.
Usefulness: As Frederik Lundh points out in his explanation of "Default Parameter Values in Python", the current behavior can be quite useful for advanced programming. (Use sparingly.)
Sufficient documentation: In the most basic Python documentation, the tutorial, the issue is loudly announced as an "Important warning" in the first subsection of Section "More on Defining Functions". The warning even uses boldface, which is rarely applied outside of headings. RTFM: Read the fine manual.
Meta-learning: Falling into the trap is actually a very helpful moment (at least if you are a reflective learner), because you will subsequently better understand the point "Consistency" above and that will teach you a great deal about Python.
Assigning default values in a function call is a code smell.
def a(b=): pass
This is a signature of a function that is up to no good. Not just because of the problems described by other answers. I won't go in to that here.
This function aims to do two things. Create a new list, and execute a functionality, most likely on said list.
Functions that do two things are bad functions, as we learn from clean code practices.
Attacking this problem with polymorphism, we would extend the python list or wrap one in a class, then perform our function upon it.
But wait you say, I like my one-liners.
Well, guess what. Code is more than just a way to control the behavior of hardware. It's a way of:
communicating with other developers, working on the same code.
being able to change the behavior of the hardware when new requirements arises.
being able to understand the flow of the program after you pick up the code again after two years to make the change mentioned above.
Don't leave time-bombs for yourself to pick up later.
Separating this function into the two things it does, we need a class
class ListNeedsFives(object): def __init__(self, b=None): if b is None: b =  self.b = b def foo(): self.b.append(5)
a = ListNeedsFives() a.foo() a.b
And why is this better than mashing all the above code into a single function.
def dontdothis(b=None): if b is None: b =  b.append(5) return b
Why not do this?
Unless you fail in your project, your code will live on. Most likely your function will be doing more than this. The proper way of making maintainable code is to separate code into atomic parts with a properly limited scope.
The constructor of a class is a very commonly recognized component to anyone who has done Object Oriented Programming. Placing the logic that handles the list instantiation in the constructor makes the cognitive load of understanding what the code does smaller.
foo() does not return the list, why not?
In returning a stand alone list, you could assume that it's safe to do what ever you feel like to it. But it may not be, since it is also shared by the object
a. Forcing the user to refer to it as
a.b reminds them where the list belongs. Any new code that wants to modify
a.b will naturally be placed in the class, where it belongs.
def dontdothis(b=None): signature function has none of these advantages.
Why don't you introspect?
I'm really surprised no one has performed the insightful introspection offered by Python (
3 apply) on callables.
Given a simple little function
func defined as:
>>> def func(a = ): ... a.append(5)
When Python encounters it, the first thing it will do is compile it in order to create a
code object for this function. While this compilation step is done, Python evaluates* and then stores the default arguments (an empty list
 here) in the function object itself. As the top answer mentioned: the list
a can now be considered a member of the function
So, let's do some introspection, a before and after to examine how the list gets expanded inside the function object. I'm using
Python 3.x for this, for Python 2 the same applies (use
func_defaults in Python 2; yes, two names for the same thing).
Function Before Execution:
>>> def func(a = ): ... a.append(5) ...
After Python executes this definition it will take any default parameters specified (
a =  here) and cram them in the
__defaults__ attribute for the function object (relevant section: Callables):
>>> func.__defaults__ (,)
O.k, so an empty list as the single entry in
__defaults__, just as expected.
Function After Execution:
Let's now execute this function:
Now, let's see those
>>> func.__defaults__ (,)
Astonished? The value inside the object changes! Consecutive calls to the function will now simply append to that embedded
>>> func(); func(); func() >>> func.__defaults__ ([5, 5, 5, 5],)
So, there you have it, the reason why this 'flaw' happens, is because default arguments are part of the function object. There's nothing weird going on here, it's all just a bit surprising.
The common solution to combat this is to use
None as the default and then initialize in the function body:
def func(a = None): # or: a =  if a is None else a if a is None: a = 
Since the function body is executed anew each time, you always get a fresh new empty list if no argument was passed for
To further verify that the list in
__defaults__ is the same as that used in the function
func you can just change your function to return the
id of the list
a used inside the function body. Then, compare it to the list in
__defaults__) and you'll see how these are indeed refering to the same list instance:
>>> def func(a = ): ... a.append(5) ... return id(a) >>> >>> id(func.__defaults__) == func() True
All with the power of introspection!
* To verify that Python evaluates the default arguments during compilation of the function, try executing the following:
def bar(a=input('Did you just see me without calling the function?')): pass # use raw_input in Py2
as you'll notice,
input() is called before the process of building the function and binding it to the name
bar is made.
1) The so-called problem of "Mutable Default Argument" is in general a special example demonstrating that:
"All functions with this problem suffer also from similar side effect problem on the actual parameter,"
That is against the rules of functional programming, usually undesiderable and should be fixed both together.
def foo(a=): # the same problematic function a.append(5) return a >>> somevar = [1, 2] # an example without a default parameter >>> foo(somevar) [1, 2, 5] >>> somevar [1, 2, 5] # usually expected [1, 2]
Solution: a copy
An absolutely safe solution is to
deepcopy the input object first and then to do whatever with the copy.
def foo(a=): a = a[:] # a copy a.append(5) return a # or everything safe by one line: "return a + "
Many builtin mutable types have a copy method like
some_set.copy() or can be copied easy like
list(some_list). Every object can be also copied by
copy.copy(any_object) or more thorough by
copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copying
Example problem for a similar SO question
class Test(object): # the original problematic class def __init__(self, var1=): self._var1 = var1 somevar = [1, 2] # an example without a default parameter t1 = Test(somevar) t2 = Test(somevar) t1._var1.append() print somevar # [1, 2, ] but usually expected [1, 2] print t2._var1 # [1, 2, ] but usually expected [1, 2]
It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e.
_var1 is a private attribute )
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" (The first two paragraphs are relevent in this context.) .)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is
if var1 is None:
var1 =  More..
3) In some cases is the mutable behavior of default parameters useful.
A simple workaround using None
>>> def bar(b, data=None): ... data = data or  ... data.append(b) ... return data ... >>> bar(3)  >>> bar(3)  >>> bar(3)  >>> bar(3, ) [34, 3] >>> bar(3, ) [34, 3]
AFAICS no one has yet posted the relevant part of the documentation:
Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function [...]
Already busy topic, but from what I read here, the following helped me realizing how it's working internally:
def bar(a=): print id(a) a = a +  print id(a) return a >>> bar() 4484370232 4484524224  >>> bar() 4484370232 4484524152  >>> bar() 4484370232 # Never change, this is 'class property' of the function 4484523720 # Always a new object  >>> id(bar.func_defaults) 4484370232
I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).
As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function's shared parameter at the same time oblivious to the changes made by the other.
Wrong Method (probably...):
def foo(list_arg=): return list_arg a = foo() a.append(6) >>> a [5, 6] b = foo() b.append(7) # The value of 6 appended to variable 'a' is now part of the list held by 'b'. >>> b [5, 6, 7] # Although 'a' is expecting to receive 6 (the last element it appended to the list), # it actually receives the last element appended to the shared list. # It thus receives the value 7 previously appended by 'b'. >>> a.pop() 7
You can verify that they are one and the same object by using
>>> id(a) 5347866528 >>> id(b) 5347866528
Per Brett Slatkin's "Effective Python: 59 Specific Ways to Write Better Python", Item 20: Use
None and Docstrings to specify dynamic default arguments (p. 48)
The convention for achieving the desired result in Python is to provide a default value of
Noneand to document the actual behaviour in the docstring.
This implementation ensures that each call to the function either receives the default list or else the list passed to the function.
def foo(list_arg=None): """ :param list_arg: A list of input values. If none provided, used a list with a default value of 5. """ if not list_arg: list_arg =  return list_arg a = foo() a.append(6) >>> a [5, 6] b = foo() b.append(7) >>> b [5, 7] c = foo() c.append(11) >>> c [10, 11]
There may be legitimate use cases for the 'Wrong Method' whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.
I sometimes exploit this behavior as an alternative to the following pattern:
singleton = None def use_singleton(): global singleton if singleton is None: singleton = _make_singleton() return singleton.use_me()
singleton is only used by
use_singleton, I like the following pattern as a replacement:
# _make_singleton() is called only once when the def is executed def use_singleton(singleton=_make_singleton()): return singleton.use_me()
I've used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.
Since I don't think this pattern is well known, I do put a short comment in to guard against future misunderstandings.
I think the answer to this question lies in how python pass data to parameter (pass by value or by reference), not mutability or how python handle the "def" statement.
A brief introduction. First, there are two type of data types in python, one is simple elementary data type, like numbers, and another data type is objects. Second, when passing data to parameters, python pass elementary data type by value, i.e., make a local copy of the value to a local variable, but pass object by reference, i.e., pointers to the object.
Admitting the above two points, let's explain what happened to the python code. It's only because of passing by reference for objects, but has nothing to do with mutable/immutable, or arguably the fact that "def" statement is executed only once when it is defined.
 is an object, so python pass the reference of  to
a is only a pointer to  which lies in memory as an object. There is only one copy of  with, however, many references to it. For the first foo(), the list  is changed to 1 by append method. But Note that there is only one copy of the list object and this object now becomes 1. When running the second foo(), what effbot webpage says (items is not evaluated any more) is wrong.
a is evaluated to be the list object, although now the content of the object is 1. This is the effect of passing by reference! The result of foo(3) can be easily derived in the same way.
To further validate my answer, let's take a look at two additional codes.
====== No. 2 ========
def foo(x, items=None): if items is None: items =  items.append(x) return items foo(1) #return  foo(2) #return  foo(3) #return 
 is an object, so is
None (the former is mutable while the latter is immutable. But the mutability has nothing to do with the question). None is somewhere in the space but we know it's there and there is only one copy of None there. So every time foo is invoked, items is evaluated (as opposed to some answer that it is only evaluated once) to be None, to be clear, the reference (or the address) of None. Then in the foo, item is changed to , i.e., points to another object which has a different address.
====== No. 3 =======
def foo(x, items=): items.append(x) return items foo(1) # returns  foo(2,) # returns  foo(3) # returns [1,3]
The invocation of foo(1) make items point to a list object  with an address, say, 11111111. the content of the list is changed to 1 in the foo function in the sequel, but the address is not changed, still 11111111. Then foo(2,) is coming. Although the  in foo(2,) has the same content as the default parameter  when calling foo(1), their address are different! Since we provide the parameter explicitly,
items has to take the address of this new
, say 2222222, and return it after making some change. Now foo(3) is executed. since only
x is provided, items has to take its default value again. What's the default value? It is set when defining the foo function: the list object located in 11111111. So the items is evaluated to be the address 11111111 having an element 1. The list located at 2222222 also contains one element 2, but it is not pointed by items any more. Consequently, An append of 3 will make
From the above explanations, we can see that the effbot webpage recommended in the accepted answer failed to give a relevant answer to this question. What is more, I think a point in the effbot webpage is wrong. I think the code regarding the UI.Button is correct:
for i in range(10): def callback(): print "clicked button", i UI.Button("button %s" % i, callback)
Each button can hold a distinct callback function which will display different value of
i. I can provide an example to show this:
x= for i in range(10): def callback(): print(i) x.append(callback)
If we execute
x() we'll get 7 as expected, and
x() will gives 9, another value of
It may be true that:
- Someone is using every language/library feature, and
- Switching the behavior here would be ill-advised, but
it is entirely consistent to hold to both of the features above and still make another point:
- It is a confusing feature and it is unfortunate in Python.
The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.
It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano's opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,
The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working--not because of a design fl--I mean, hidden logic puzzle--that cuts against the intuitions of programmers who are drawn to Python because it Just Works.
It's a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?
def print_tuple(some_tuple=(1,2,3)): print some_tuple print_tuple() #1 print_tuple((1,2,3)) #2
I'll give you a hint. Here's the disassembly (see http://docs.python.org/library/dis.html):
0 LOAD_GLOBAL 0 (print_tuple) 3 CALL_FUNCTION 0 6 POP_TOP 7 LOAD_CONST 0 (None) 10 RETURN_VALUE
0 LOAD_GLOBAL 0 (print_tuple) 3 LOAD_CONST 4 ((1, 2, 3)) 6 CALL_FUNCTION 1 9 POP_TOP 10 LOAD_CONST 0 (None) 13 RETURN_VALUE
I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)
As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it's a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn't C. In C you have constants that are pretty much free. In Python you don't have this benefit.
Suppose you have the following code
fruits = ("apples", "bananas", "loganberries") def eat(food=fruits): ...
When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple
("apples", "bananas", "loganberries")
However, supposed later on in the code, I do something like
def some_random_function(): global fruits fruits = ("blueberries", "mangos")
then if default parameters were bound at function execution rather than function declaration then I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your
foo function above was mutating the list.
The real problem lies with mutable variables, and all languages have this problem to some extent. Here's a question: suppose in Java I have the following code:
StringBuffer s = new StringBuffer("Hello World!"); Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>(); counts.put(s, 5); s.append("!!!!"); System.out.println( counts.get(s) ); // does this work?
Now, does my map use the value of the
StringBuffer key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of the
Map using a value identical to the one they put it in with, or the person who can't seem to retrieve their object even though the key they're using is literally the same object that was used to put it into the map (this is actually why Python doesn't allow its mutable built-in data types to be used as dictionary keys).
Your example is a good one of a case where Python newcomers will be surprised and bitten. But I'd argue that if we "fixed" this, then that would only create a different situation where they'd be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they're writing.
I personally like Python's current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.
The shortest answer would probably be "definition is execution", therefore the whole argument makes no strict sense. As a more contrived example, you may cite this:
def a(): return  def b(x=a()): print x
Hopefully it's enough to show that not executing the default argument expressions at the execution time of the
def statement isn't easy or doesn't make sense, or both.
I agree it's a gotcha when you try to use default constructors, though.
This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)
I'm gonna give you what I see as a useful example.
def example(errors=): # statements # Something went wrong mistake = True if mistake: tryToFixIt(errors) # Didn't work.. let's try again tryToFixItAnotherway(errors) # This time it worked return errors def tryToFixIt(err): err.append('Attempt to fix it') def tryToFixItAnotherway(err): err.append('Attempt to fix it by another way') def main(): for item in range(2): errors = example() print '\n'.join(errors) main()
prints the following
Attempt to fix it Attempt to fix it by another way Attempt to fix it Attempt to fix it by another way
This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.
>>> def foo(a): a.append(5) print a >>> a =  >>> foo(a) [5, 5] >>> foo(a) [5, 5, 5] >>> foo(a) [5, 5, 5, 5] >>> foo(a) [5, 5, 5, 5, 5]
No default values in sight in this code, but you get exactly the same problem.
The problem is that
foo is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something like
append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).
foo, with a default argument, shouldn't be modifying
a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.
If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.
This behavior is not surprising if you take the following into consideration:
- The behavior of read-only class attributes upon assignment attempts, and that
- Functions are objects (explained well in the accepted answer).
The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.
(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:
...all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).
Look back to the original example and consider the above points:
def foo(a=): a.append(5) return a
foo is an object and
a is an attribute of
foo (available at
a is a list,
a is mutable and is thus a read-write attribute of
foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.
foo without overriding a default uses that default's value from
foo.func_defs. In this case,
foo.func_defs is used for
a within function object's code scope. Changes to
foo.func_defs, which is part of the
foo object and persists between execution of the code in
Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:
def foo(a, L=None): if L is None: L =  L.append(a) return L
Taking (1) and (2) into account, one can see why this accomplishes the the desired behavior:
- When the
foofunction object is instantiated,
foo.func_defsis set to
None, an immutable object.
- When the function is executed with defaults (with no parameter specified for
Lin the function call),
None) is available in the local scope as
L = , the assignment cannot succeed at
foo.func_defs, because that attribute is read-only.
- Per (1), a new local variable also named
Lis created in the local scope and used for the remainder of the function call.
foo.func_defsthus remains unchanged for future invocations of
Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well... when the functions is defined.
class BananaBunch: bananas =  def addBanana(self, banana): self.bananas.append(banana)
This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it's added to all instances of that class. The reason is exactly the same.
It's just "How It Works", and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.
Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it's a good teaching aid, and once you understand why this happens, you'll grok python much better.
That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.
When we do this:
def foo(a=): ...
... we assign the argument
a to an unnamed list, if the caller does not pass the value of a.
To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about
def foo(a=pavlo): ...
At any time, if the caller doesn't tell us what
a is, we reuse
pavlo is mutable (modifiable), and
foo ends up modifying it, an effect we notice the next time
foo is called without specifying
So this is what you see (Remember,
pavlo is initialized to ):
>>> foo() 
pavlo is .
foo() again modifies
>>> foo() [5, 5]
a when calling
pavlo is not touched.
>>> ivan = [1, 2, 3, 4] >>> foo(a=ivan) [1, 2, 3, 4, 5] >>> ivan [1, 2, 3, 4, 5]
pavlo is still
>>> foo() [5, 5, 5]
You can get round this by replacing the object (and therefore the tie with the scope):
def foo(a=): a = list(a) a.append(5) return a
Ugly, but it works.