c++ - The most elegant way to iterate the words of a string


What is the most elegant way to iterate the words of a string? The string can be assumed to be composed of words separated by whitespace.

Note that I'm not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.

The best solution I have right now is:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main()
{
    string s = "Somewhere down the road";
    istringstream iss(s);

    do
    {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}


Answers


For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};



I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template<typename Out>
void split(const std::string &s, char delim, Out result) {
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        *(result++) = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

Edit :

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':');



A possible solution using Boost might be:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.




Splitting a C++ std::string using tokens, e.g. “;”

I find std::getline() is often the simplest. The optional delimiter parameter means it's not just for reading "lines":

#include <sstream>
#include <iostream>
#include <vector>

using namespace std;

int main() {
    vector<string> strings;
    istringstream f("denmark;sweden;india;us");
    string s;    
    while (getline(f, s, ';')) {
        cout << s << endl;
        strings.push_back(s);
    }
}



You could use a string stream and read the elements into the vector.

Here are many different examples...

A copy of one of the examples:

std::vector<std::string> split(const std::string& s, char seperator)
{
   std::vector<std::string> output;

    std::string::size_type prev_pos = 0, pos = 0;

    while((pos = s.find(seperator, pos)) != std::string::npos)
    {
        std::string substring( s.substr(prev_pos, pos-prev_pos) );

        output.push_back(substring);

        prev_pos = ++pos;
    }

    output.push_back(s.substr(prev_pos, pos-prev_pos)); // Last word

    return output;
}



There are several libraries available solving this problem, but the simplest is probably to use Boost Tokenizer:

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>

typedef boost::tokenizer<boost::char_separator<char> > tokenizer;

std::string str("denmark;sweden;india;us");
boost::char_separator<char> sep(";");
tokenizer tokens(str, sep);

BOOST_FOREACH(std::string const& token, tokens)
{
    std::cout << "<" << *tok_iter << "> " << "\n";
}



Split string by single spaces

You can even develop your own split function (I know, little old-fashioned):

unsigned int split(const std::string &txt, std::vector<std::string> &strs, char ch)
{
    unsigned int pos = txt.find( ch );
    unsigned int initialPos = 0;
    strs.clear();

    // Decompose statement
    while( pos != std::string::npos ) {
        strs.push_back( txt.substr( initialPos, pos - initialPos + 1 ) );
        initialPos = pos + 1;

        pos = txt.find( ch, initialPos );
    }

    // Add the last one
    strs.push_back( txt.substr( initialPos, std::min( pos, txt.size() ) - initialPos + 1 ) );

    return strs.size();
}

Then you just need to invoke it with a vector<string> as argument:

int main()
{
    std::vector<std::string> v;

    split( "This  is a  test", v, ' ' );
    show( v );

    return 0;
}



If strictly one space character is the delimiter, probably std::getline will be valid.
For example:

int main() {
  using namespace std;
  istringstream iss("This  is a string");
  string s;
  while ( getline( iss, s, ' ' ) ) {
    printf( "`%s'\n", s.c_str() );
  }
}



Can you use boost?

samm$ cat split.cc
#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>

#include <boost/foreach.hpp>

#include <iostream>
#include <string>
#include <vector>

int
main()
{
    std::string split_me( "hello world  how are   you" );

    typedef std::vector<std::string> Tokens;
    Tokens tokens;
    boost::split( tokens, split_me, boost::is_any_of(" ") );

    std::cout << tokens.size() << " tokens" << std::endl;
    BOOST_FOREACH( const std::string& i, tokens ) {
        std::cout << "'" << i << "'" << std::endl;
    }
}

sample execution:

samm$ ./a.out
8 tokens
'hello'
'world'
''
'how'
'are'
''
''
'you'
samm$ 



Split a string into an array in C++

#include <sstream>  //for std::istringstream
#include <iterator> //for std::istream_iterator
#include <vector>   //for std::vector

while(std::getline(in, line))
{
    std::istringstream ss(line);
    std::istream_iterator<std::string> begin(ss), end;

    //putting all the tokens in the vector
    std::vector<std::string> arrayTokens(begin, end); 

    //arrayTokens is containing all the tokens - use it!
}

By the way, use qualified-names such as std::getline, std::ifstream like I did. It seems you've written using namespace std somewhere in your code which is considered a bad practice. So don't do that:







I have written a function for a simlar requirement of mine. may be u can use it!

std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) 
{
    std::stringstream ss(s+' ');
    std::string item;
    while(std::getline(ss, item, delim)) 
    {
        elems.push_back(item);
    }
    return elems;
}