pandas選取列 - python dataframe取第一列




數據框中的行和列之間的Python交互 (2)

我有一個數據框:

df = pd.DataFrame({
    'exam': [
        'French', 'English', 'German', 'Russian', 'Russian',
        'German', 'German', 'French', 'English', 'French'
    ],

'student' : ['john', 'ted', 'jason', 'marc', 'peter', 'bob',
            'robert', 'david', 'nik', 'kevin'
]
})

print (df)

              exam   student   
    0       French    john     
    1       English   ted        
    2       German    jason         
    3       Russian   marc         
    4       Russian   peter         
    5       German    bob         
    6       German    robert         
    7       French    david         
    8       English   nik          
    9       French    kevin         

有人知道如何創建一個包含兩列“學生”和“學生共享考試”的新數據框。

我應該得到像這樣的東西:

                student   shared_exam_with      
        0       john       david                   
        1       john       kevin            
        2       ted        nik                    
        3       jason      bob                 
        4       jason      robert                   
        5       marc       peter              
        6       peter      marc             
        7       bob        jason                    
        8       bob        robert                    
        9       robert     jason                      
       10       robert     bob                   
       11       david      john             
       12       david      kevin                      
       13       nik        ted                     
       14       kevin      john                     
       15       kevin      david                   

例如:約翰拿法國人,大衛和凱文也是!

有任何想法嗎? 先謝謝你!


一種方法是:

cross = pd.crosstab(df['student'], df['exam'])
res = cross.dot(cross.T)
res.where(np.triu(res, k=1).astype('bool')).stack()
Out: 
student  student
bob      jason      1.0
         robert     1.0
david    john       1.0
         kevin      1.0
jason    robert     1.0
john     kevin      1.0
marc     peter      1.0
nik      ted        1.0
dtype: float64

點積為共現點生成一個二元矩陣。 為了不重複同一對,我過濾他們在哪里和堆棧。 結果系列的索引是具有相同考試的學生。


自我merge

df.merge(
    df, on='exam',
    suffixes=['', '_shared_with']
).query('student != student_shared_with')

       exam student student_shared_with
1    French    john               david
2    French    john               kevin
3    French   david                john
5    French   david               kevin
6    French   kevin                john
7    French   kevin               david
10  English     ted                 nik
11  English     nik                 ted
14   German   jason                 bob
15   German   jason              robert
16   German     bob               jason
18   German     bob              robert
19   German  robert               jason
20   German  robert                 bob
23  Russian    marc               peter
24  Russian   peter                marc

自我join

d1 = df.set_index('exam')
d1.join(
    d1, rsuffix='_shared_with'
).query('student != student_shared_with')

        student student_shared_with
exam                               
English     ted                 nik
English     nik                 ted
French     john               david
French     john               kevin
French    david                john
French    david               kevin
French    kevin                john
French    kevin               david
German    jason                 bob
German    jason              robert
German      bob               jason
German      bob              robert
German   robert               jason
German   robert                 bob
Russian    marc               peter
Russian   peter                marc

itertools.permutations + groupby

from itertools import permutations as perm

cols = ['student', 'student_shared_with']
df.groupby('exam').student.apply(
    lambda x: pd.DataFrame(list(perm(x, 2)), columns=cols)
).reset_index(drop=True)

   student student_shared_with
0      ted                 nik
1      nik                 ted
2     john               david
3     john               kevin
4    david                john
5    david               kevin
6    kevin                john
7    kevin               david
8    jason                 bob
9    jason              robert
10     bob               jason
11     bob              robert
12  robert               jason
13  robert                 bob
14    marc               peter
15   peter                marc




pandas-groupby