multiple - sort dictionary by value python 3




How do I sort a list of dictionaries by a value of the dictionary? (12)

Here is my answer to a related question on sorting by multiple columns. It also works for the degenerate case where the number of columns is only one.

I have a list of dictionaries and want each item to be sorted by a specific property values.

Take into consideration the array below,

[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]

When sorted by name, should become

[{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]

Here is the alternative general solution - it sorts elements of dict by keys and values. The advantage of it - no need to specify keys, and it would still work if some keys are missing in some of dictionaries.

def sort_key_func(item):
    """ helper function used to sort list of dicts

    :param item: dict
    :return: sorted list of tuples (k, v)
    """
    pairs = []
    for k, v in item.items():
        pairs.append((k, v))
    return sorted(pairs)
sorted(A, key=sort_key_func)

I tried something like this:

my_list.sort(key=lambda x: x['name'])

It worked for integers as well.


If you do not need the original list of dictionaries, you could modify it in-place with sort() method using a custom key function.

Key function:

def get_name(d):
    """ Return the value of a key in a dictionary. """

    return d["name"]

The list to be sorted:

data_one = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]

Sorting it in-place:

data_one.sort(key=get_name)

If you need the original list, call the sorted() function passing it the list and the key function, then assign the returned sorted list to a new variable:

data_two = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
new_data = sorted(data_two, key=get_name)

Printing data_one and new_data.

>>> print(data_one)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
>>> print(new_data)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]

It may look cleaner using a key instead a cmp:

newlist = sorted(list_to_be_sorted, key=lambda k: k['name']) 

or as J.F.Sebastian and others suggested,

from operator import itemgetter
newlist = sorted(list_to_be_sorted, key=itemgetter('name')) 

For completeness (as pointed out in comments by fitzgeraldsteele), add reverse=True to sort descending

newlist = sorted(l, key=itemgetter('name'), reverse=True)

Lets Say I h'v a Dictionary D with elements below. To sort just use key argument in sorted to pass custom function as below

D = {'eggs': 3, 'ham': 1, 'spam': 2}

def get_count(tuple):
    return tuple[1]

sorted(D.items(), key = get_count, reverse=True)
or
sorted(D.items(), key = lambda x: x[1], reverse=True)  avoiding get_count function call

https://wiki.python.org/moin/HowTo/Sorting/#Key_Functions


Using the pandas package is another method, though it's runtime at large scale is much slower than the more traditional methods proposed by others:

import pandas as pd

listOfDicts = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
df = pd.DataFrame(listOfDicts)
df = df.sort_values('name')
sorted_listOfDicts = df.T.to_dict().values()

Here are some benchmark values for a tiny list and a large (100k+) list of dicts:

setup_large = "listOfDicts = [];\
[listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10})) for _ in range(50000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"

setup_small = "listOfDicts = [];\
listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"

method1 = "newlist = sorted(listOfDicts, key=lambda k: k['name'])"
method2 = "newlist = sorted(listOfDicts, key=itemgetter('name')) "
method3 = "df = df.sort_values('name');\
sorted_listOfDicts = df.T.to_dict().values()"

import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))

t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_large)
print('Large Method Pandas: ' + str(t.timeit(1)))

#Small Method LC: 0.000163078308105
#Small Method LC2: 0.000134944915771
#Small Method Pandas: 0.0712950229645
#Large Method LC: 0.0321750640869
#Large Method LC2: 0.0206089019775
#Large Method Pandas: 5.81405615807

You could use a custom comparison function, or you could pass in a function that calculates a custom sort key. That's usually more efficient as the key is only calculated once per item, while the comparison function would be called many more times.

You could do it this way:

def mykey(adict): return adict['name']
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=mykey)

But the standard library contains a generic routine for getting items of arbitrary objects: itemgetter. So try this instead:

from operator import itemgetter
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=itemgetter('name'))

sometime we need to use lower() for example

lists = [{'name':'Homer', 'age':39},
  {'name':'Bart', 'age':10},
  {'name':'abby', 'age':9}]

lists = sorted(lists, key=lambda k: k['name'])
print(lists)
# [{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}, {'name':'abby', 'age':9}]

lists = sorted(lists, key=lambda k: k['name'].lower())
print(lists)
# [ {'name':'abby', 'age':9}, {'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]

a = [{'name':'Homer', 'age':39}, ...]

# This changes the list a
a.sort(key=lambda k : k['name'])

# This returns a new list (a is not modified)
sorted(a, key=lambda k : k['name']) 

import operator
a_list_of_dicts.sort(key=operator.itemgetter('name'))

'key' is used to sort by an arbitrary value and 'itemgetter' sets that value to each item's 'name' attribute.


my_list = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]

my_list.sort(lambda x,y : cmp(x['name'], y['name']))

my_list will now be what you want.

(3 years later) Edited to add:

The new key argument is more efficient and neater. A better answer now looks like:

my_list = sorted(my_list, key=lambda k: k['name'])

...the lambda is, IMO, easier to understand than operator.itemgetter, but YMMV.







data-structures