C++ sorting and keeping track of indexes


Answers

You could sort std::pair instead of just ints - first int is original data, second int is original index. Then supply a comparator that only sorts on the first int. Example:

Your problem instance: v = [5 7 8]
New problem instance: v_prime = [<5,0>, <8,1>, <7,2>]

Sort the new problem instance using a comparator like:

typedef std::pair<int,int> mypair;
bool comparator ( const mypair& l, const mypair& r)
   { return l.first < r.first; }
// forgetting the syntax here but intent is clear enough

The result of std::sort on v_prime, using that comparator, should be:

v_prime = [<5,0>, <7,2>, <8,1>]

You can peel out the indices by walking the vector, grabbing .second from each std::pair.

Question

Using C++, and hopefully the standard library, I want to sort a sequence of samples in ascending order, but I also want to remember the original indexes of the newly samples.

For example, I have a set, or vector, or matrix of samples A : [5, 2, 1, 4, 3]. I want to sort these to be B : [1,2,3,4,5], but I also want to remember the original indexes of the values, so I can get another set which would be: C : [2, 1, 4, 3, 0 ] - which corresponds to the index of the each element in 'B', in the original 'A'.

For example, in Matlab you can do:

 [a,b]=sort([5, 8, 7])
 a = 5 7 8
 b = 1 3 2

Can anyone see a good way to do this?




Its easier than it seems to be.

Suppose Given vector is

A=[2,4,3]

Create A new vector

V=[0,1,2] // indicating positions

Sort V and while sorting instead of comparing elements of V , compare corresponding elements of A

 //Assume A is a given vector with N elements
 vector<int> V(N);
 int j=0;
 std::iota(V.begin(),V.end(),j++); //Initializing
 for(int i=0;i<N;i++)
      V[i]=i;
 sort( V.begin(),V.end(), [&](int x,int y){return A[x]<A[y];} );



Make a std::pair in function then sort pair :

generic version :

template< class RandomAccessIterator,class Compare >
auto sort2(RandomAccessIterator begin,RandomAccessIterator end,Compare cmp) ->
   std::vector<std::pair<std::uint32_t,RandomAccessIterator>>
{
    using valueType=typename std::iterator_traits<RandomAccessIterator>::value_type;
    using Pair=std::pair<std::uint32_t,RandomAccessIterator>;

    std::vector<Pair> index_pair;
    index_pair.reserve(std::distance(begin,end));

    for(uint32_t idx=0;begin!=end;++begin,++idx){
        index_pair.push_back(Pair(idx,begin));
    }

    std::sort( index_pair.begin(),index_pair.end(),[&](const Pair& lhs,const Pair& rhs){
          return cmp(*lhs.second,*rhs.second);
    });

    return index_pair;
}

ideone




Are the items in the vector unique? If so, copy the vector, sort one of the copies with STL Sort then you can find which index each item had in the original vector.

If the vector is supposed to handle duplicate items, I think youre better of implementing your own sort routine.




I came across this question, and figured out sorting the iterators directly would be a way to sort the values and keep track of indices; There is no need to define an extra container of pairs of ( value, index ) which is helpful when the values are large objects; The iterators provides the access to both the value and the index:

/*
 * a function object that allows to compare
 * the iterators by the value they point to
 */
template < class RAIter, class Compare >
class IterSortComp
{
    public:
        IterSortComp ( Compare comp ): m_comp ( comp ) { }
        inline bool operator( ) ( const RAIter & i, const RAIter & j ) const
        {
            return m_comp ( * i, * j );
        }
    private:
        const Compare m_comp;
};

template <class INIter, class RAIter, class Compare>
void itersort ( INIter first, INIter last, std::vector < RAIter > & idx, Compare comp )
{ 
    idx.resize ( std::distance ( first, last ) );
    for ( typename std::vector < RAIter >::iterator j = idx.begin( ); first != last; ++ j, ++ first )
        * j = first;

    std::sort ( idx.begin( ), idx.end( ), IterSortComp< RAIter, Compare > ( comp ) );
}

as for the usage example:

std::vector < int > A ( n );

// populate A with some random values
std::generate ( A.begin( ), A.end( ), rand );

std::vector < std::vector < int >::const_iterator > idx;
itersort ( A.begin( ), A.end( ), idx, std::less < int > ( ) );

now, for example, the 5th smallest element in the sorted vector would have value **idx[ 5 ] and its index in the original vector would be distance( A.begin( ), *idx[ 5 ] ) or simply *idx[ 5 ] - A.begin( ).




Well, my solution uses residue technique. We can place the values under sorting in the upper 2 bytes and the indices of the elements - in the lower 2 bytes:

int myints[] = {32,71,12,45,26,80,53,33};

for (int i = 0; i < 8; i++)
   myints[i] = myints[i]*(1 << 16) + i;

Then sort the array myints as usual:

std::vector<int> myvector(myints, myints+8);
sort(myvector.begin(), myvector.begin()+8, std::less<int>());

After that you can access the elements' indices via residuum. The following code prints the indices of the values sorted in the ascending order:

for (std::vector<int>::iterator it = myvector.begin(); it != myvector.end(); ++it)
   std::cout << ' ' << (*it)%(1 << 16);

Of course, this technique works only for the relatively small values in the original array myints (i.e. those which can fit into upper 2 bytes of int). But it has additional benefit of distinguishing identical values of myints: their indices will be printed in the right order.




Links