with - python reset index




What does `ValueError: cannot reindex from a duplicate axis` mean? (5)

As others have said, you've probably got duplicate values in your original index. To find them do this:

df[df.index.duplicated()]

I am getting a ValueError: cannot reindex from a duplicate axis when I am trying to set an index to a certain value. I tried to reproduce this with a simple example, but I could not do it.

Here is my session inside of ipdb trace. I have a DataFrame with string index, and integer columns, float values. However when I try to create sum index for sum of all columns I am getting ValueError: cannot reindex from a duplicate axis error. I created a small DataFrame with the same characteristics, but was not able to reproduce the problem, what could I be missing?

I don't really understand what ValueError: cannot reindex from a duplicate axis means, what does this error message mean? Maybe this will help me diagnose the problem, and this is most answerable part of my question.

ipdb> type(affinity_matrix)
<class 'pandas.core.frame.DataFrame'>
ipdb> affinity_matrix.shape
(333, 10)
ipdb> affinity_matrix.columns
Int64Index([9315684, 9315597, 9316591, 9320520, 9321163, 9320615, 9321187, 9319487, 9319467, 9320484], dtype='int64')
ipdb> affinity_matrix.index
Index([u'001', u'002', u'003', u'004', u'005', u'008', u'009', u'010', u'011', u'014', u'015', u'016', u'018', u'020', u'021', u'022', u'024', u'025', u'026', u'027', u'028', u'029', u'030', u'032', u'033', u'034', u'035', u'036', u'039', u'040', u'041', u'042', u'043', u'044', u'045', u'047', u'047', u'048', u'050', u'053', u'054', u'055', u'056', u'057', u'058', u'059', u'060', u'061', u'062', u'063', u'065', u'067', u'068', u'069', u'070', u'071', u'072', u'073', u'074', u'075', u'076', u'077', u'078', u'080', u'082', u'083', u'084', u'085', u'086', u'089', u'090', u'091', u'092', u'093', u'094', u'095', u'096', u'097', u'098', u'100', u'101', u'103', u'104', u'105', u'106', u'107', u'108', u'109', u'110', u'111', u'112', u'113', u'114', u'115', u'116', u'117', u'118', u'119', u'121', u'122', ...], dtype='object')

ipdb> affinity_matrix.values.dtype
dtype('float64')
ipdb> 'sums' in affinity_matrix.index
False

Here is the error:

ipdb> affinity_matrix.loc['sums'] = affinity_matrix.sum(axis=0)
*** ValueError: cannot reindex from a duplicate axis

I tried to reproduce this with a simple example, but I failed

In [32]: import pandas as pd

In [33]: import numpy as np

In [34]: a = np.arange(35).reshape(5,7)

In [35]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))

In [36]: df.values.dtype
Out[36]: dtype('int64')

In [37]: df.loc['sums'] = df.sum(axis=0)

In [38]: df
Out[38]: 
      10  11  12  13  14  15   16
x      0   1   2   3   4   5    6
y      7   8   9  10  11  12   13
u     14  15  16  17  18  19   20
z     21  22  23  24  25  26   27
w     28  29  30  31  32  33   34
sums  70  75  80  85  90  95  100

For people who are still struggling with this error, it can also happen if you accidentally create a duplicate column with the same name. Remove duplicate columns like so:

df = df.loc[:,~df.columns.duplicated()]

In my case, this error popped up not because of duplicate values, but because I attempted to join a shorter Series to a Dataframe: both had the same index, but the Series had fewer rows (missing the top few). The following worked for my purposes:

df.head()
                          SensA
date                           
2018-04-03 13:54:47.274   -0.45
2018-04-03 13:55:46.484   -0.42
2018-04-03 13:56:56.235   -0.37
2018-04-03 13:57:57.207   -0.34
2018-04-03 13:59:34.636   -0.33

series.head()
date
2018-04-03 14:09:36.577    62.2
2018-04-03 14:10:28.138    63.5
2018-04-03 14:11:27.400    63.1
2018-04-03 14:12:39.623    62.6
2018-04-03 14:13:27.310    62.5
Name: SensA_rrT, dtype: float64

df = series.to_frame().combine_first(df)

df.head(10)
                          SensA  SensA_rrT
date                           
2018-04-03 13:54:47.274   -0.45        NaN
2018-04-03 13:55:46.484   -0.42        NaN
2018-04-03 13:56:56.235   -0.37        NaN
2018-04-03 13:57:57.207   -0.34        NaN
2018-04-03 13:59:34.636   -0.33        NaN
2018-04-03 14:00:34.565   -0.33        NaN
2018-04-03 14:01:19.994   -0.37        NaN
2018-04-03 14:02:29.636   -0.34        NaN
2018-04-03 14:03:31.599   -0.32        NaN
2018-04-03 14:04:30.779   -0.33        NaN
2018-04-03 14:05:31.733   -0.35        NaN
2018-04-03 14:06:33.290   -0.38        NaN
2018-04-03 14:07:37.459   -0.39        NaN
2018-04-03 14:08:36.361   -0.36        NaN
2018-04-03 14:09:36.577   -0.37       62.2

Indices with duplicate values often arise if you create a DataFrame by concatenating other DataFrames. IF you don't care about preserving the values of your index, and you want them to be unique values, when you concatenate the the data, set ignore_index=False .

Alternatively, to overwrite your current index with a new one, instead of using df.reindex() , set:

df.index = new_index

This error usually rises when you join / assign to a column when the index has duplicate values. Since you are assigning to a row, I suspect that there is a duplicate value in affinity_matrix.columns , perhaps not shown in your question.





pandas