What can I use for input conversion instead of scanf?




(6)

What can I use to parse input instead of scanf?

Instead of scanf(some_format, ...) , consider fgets() with sscanf(buffer, some_format_and %n, ...)

By using " %n" , code can simply detect if all the format was successfully scanned and that no extra non-white-space junk was at the end.

// scanf("%d %f fred", &some_int, &some_float);
#define EXPECTED_LINE_MAX 100
char buffer[EXPECTED_LINE_MAX * 2];  // Suggest 2x, no real need to be stingy.

if (fgets(buffer, sizeof buffer, stdin)) {
  int n = 0;
  // add ------------->    " %n" 
  sscanf(buffer, "%d %f fred %n", &some_int, &some_float, &n);
  // Did scan complete, and to the end?
  if (n > 0 && buffer[n] == '\0') {
    // success, use `some_int, some_float`
  } else {
    ; // Report bad input and handle desired.
  }

I have very frequently seen people discouraging others from using scanf and saying that there are better alternatives. However, all I end up seeing is either "don't use scanf " or "here's a correct format string" , and never any examples of the "better alternatives" mentioned.

For example, let's take this snippet of code:

scanf("%c", &c);

This reads the whitespace that was left in the input stream after the last conversion. The usual suggested solution to this is to use:

scanf(" %c", &c);

or to not use scanf .

Since scanf is bad, what are some ANSI C options for converting input formats that scanf can usually handle (such as integers, floating-point numbers, and strings) without using scanf ?


Why is scanf bad?

The main problem is that scanf was never intended to deal with user input. It's intended to be used with "perfectly" formatted data. I quoted the word "perfectly" because it's not completely true. But it is not designed to parse data that are as unreliable as user input. By nature, user input is not predictable. Users misunderstands instructions, makes typos, accidentally press enter before they are done etc. One might reasonably ask why a function that should not be used for user input reads from stdin . If you are an experienced *nix user the explanation will not come as a surprise but it might confuse Windows users. In *nix systems, it is very common to build programs that work via piping, which means that you send the output of one program to another by piping the stdout of the first program to the stdin of the second. This way, you can make sure that the output and input are predictable. During these circumstances, scanf actually works well. But when working with unpredictable input, you risk all sorts of trouble.

So why aren't there any easy-to-use standard functions for user input? One can only guess here, but I assume that old hardcore C hackers simply thought that the existing functions were good enough, even though they are very clunky. Also, when you look at typical terminal applications they very rarely read user input from stdin . Most often you pass all the user input as command line arguments. Sure, there are exceptions, but for most applications, user input is a very minor thing.

So what can you do?

My favorite is fgets in combination with sscanf . I once wrote an answer about that, but I will re-post the complete code. Here is an example with decent (but not perfect) error checking and parsing. It's good enough for debugging purposes.

Note

I don't particularly like asking the user to input two different things on one single line. I only do that when they belong to each other in a natural way. Like for instance printf("Enter the price in the format <dollars>.<cent>: ") and then use sscanf(buffer "%d.%d", &dollar, &cent) . I would never do something like printf("Enter height and base of the triangle: ") . The main point of using fgets below is to encapsulate the inputs to ensure that one input does not affect the next.

#define bsize 100

void error_function(const char *buffer, int no_conversions) {
        fprintf(stderr, "An error occurred. You entered:\n%s\n", buffer);
        fprintf(stderr, "%d successful conversions", no_conversions);
        exit(EXIT_FAILURE);
}

char c, buffer[bsize];
int x,y;
float f, g;
int r;

printf("Enter two integers: ");
fflush(stdout); // Make sure that the printf is executed before reading
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%d%d", &x, &y)) != 2) error_function(buffer, r);

// Unless the input buffer was to small we can be sure that stdin is empty
// when we come here.
printf("Enter two floats: ");
fflush(stdout);
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%d%d", &x, &y)) != 2) error_function(buffer, r);

// Reading single characters can be especially tricky if the input buffer
// is not emptied before. But since we're using fgets, we're safe.
printf("Enter a char: ");
fflush(stdout);
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%c", &c)) != 1) error_function(buffer, r);

printf("You entered %d %d %f %c\n", x, y, f, c);

Doing like this will eliminate a common problem, which is the trailing newline that can mess with the nest input. But it has another issue, which is if the line is longer than bsize . You can check that with if(buffer[strlen(buffer)-1] != '\n') . If you want to remove the newline, you can do that with buffer[strcspn(buffer, "\n")] = 0 .

In general, I would advise to not expect the user to enter input in some weird format that you should parse to different variables. If you want to assign the variables height and width , don't ask for both at the same time. Allow the user to press enter between them. Also, this approach is very natural in one sense. You will never get the input from stdin until you hit enter, so why not always read the whole line? Of course this can still lead to issues if the line is longer than the buffer. Did I remember to mention that user input is clunky in C? :)

To avoid problems with lines longer than the buffer you can use a function that automatically allocates a buffer of appropriate size, you can use getline() . The drawback is that you will need to free the result afterwards.

Stepping up the game

If you're serious about creating programs in C with user input, I would recommend having a look at a library like ncurses . Because then you likely also want to create applications with some terminal graphics. Unfortunately, you will lose some portability if you do that, but it gives you far better control of user input. For instance, it gives you the ability to read a key press instantly instead of waiting for the user to press enter.


In this answer I'm going to assume that you are reading and interpreting lines of text . Perhaps you're prompting the user, who is typing something and hitting RETURN. Or perhaps you're reading lines of structured text from a data file of some kind.

Since you're reading lines of text, it makes sense to organize your code around a library function that reads, well, a line of text. The Standard function is fgets() , although there are others (including getline ). And then the next step is to interpret that line of text somehow.

Here's the basic recipe for calling fgets to read a line of text:

char line[512];
printf("type something:\n");
fgets(line, 512, stdin);
printf("you typed: %s", line);

This simply reads in one line of text and prints it back out. As written it has a couple of limitations, which we'll get to in a minute. It also has a very great feature: that number 512 we passed as the second argument to fgets is the size of the array line we're asking fgets to read into. This fact -- that we can tell fgets how much it's allowed to read -- means that we can be sure that fgets won't overflow the array by reading too much into it.

So now we know how to read a line of text, but what if we really wanted to read an integer, or a floating-point number, or a single character, or a single word? (That is, what if the scanf call we're trying to improve on had been using a format specifier like %d , %f , %c , or %s ?)

It's easy to reinterpret a line of text -- a string -- as any of these things. To convert a string to an integer, the simplest (though imperfect) way to do it is to call atoi() . To convert to a floating-point number, there's `atof(). (And there are also better ways, as we'll see in a minute.) Here's a very simple example:

printf("type an integer:\n");
fgets(line, 512, stdin);
int i = atoi(line);
printf("type a floating-point number:\n");
fgets(line, 512, stdin);
float f = atof(line);
printf("you typed %d and %f\n", i, f);

If you wanted the user to type a single character (perhaps y or n as a yes/no response), you can literally just grab the first character of the line, like this:

printf("type a character:\n");
fgets(line, 512, stdin);
char c = line[0];
printf("you typed %c\n", c);

(This ignores, of course, the possibility that the user typed a multi-character response; it quietly ignores any extra characters that were typed.)

Finally, if you wanted the user to type a string definitely not containing whitespace, if you wanted to treat the input line

hello world!

as the string "hello" followed by something else (which is what the scanf format %s would have done), well, in that case, I fibbed a little, it's not quite so easy to reinterpret the line in that way, after all, so the answer to that part of the question will have to wait for a bit.

But first I want to go back to three things I skipped over.

(1) We've been calling

fgets(line, 512, stdin);

to read into the array line , and where 512 is the size of the array line so fgets knows not to overflow it. But to make sure that 512 is the right number (especially, to check if maybe someone tweaked the program to change the size), you have to read back to wherever line was declared. That's a nuisance, so there are two much better ways to keep the sizes in sync. (a) use the preprocessor to make a name for the size:

#define MAXLINE 512
char line[MAXLINE];
fgets(line, MAXLINE, stdin);

Or, (b) use C's sizeof operator:

fgets(line, sizeof(line), stdin);

(2) The second problem is that we haven't been checking for error. When you're reading input, you should always check for the possibility of error. If for whatever reason fgets can't read the line of text you asked it to, it indicates this by returning a null pointer. So we should have been doing things like

printf("type something:\n");
if(fgets(line, 512, stdin) == NULL) {
    printf("Well, never mind, then.\n");
    exit(1);
}

Finally, there's the issue that in order to read a line of text, fgets reads characters and fills them into your array until it finds the \n character that terminates the line, and it fills the \n character into your array, too . You can see this if you modify our earlier example slightly:

printf("you typed: \"%s\"\n", line);

If I run this and type "Steve" when it prompts me, it prints out

you typed: "Steve
"

That " on the second line is because the string it read and printed back out was actually "Steve\n" .

Sometimes that extra newline doesn't matter (like when we called atoi or atof , since they both ignore any extra non-numeric input after the number), but sometimes it matters a lot. So often we'll want to strip that newline off. There are several ways to do that, which I'll get to in a minute. (I know I've been saying that a lot. But I will get back to all those things, I promise.)

At this point, you may be thinking: "I thought you said scanf was no good, and this other way would be so much better. But fgets is starting to look like a nuisance. Calling scanf was so easy ! Can't I keep using it?"

Sure, you can keep using scanf , if you want. (And for really simple things, in some ways it is simpler.) But, please, don't come crying to me when it fails you due to one of its 17 quirks and foibles, or goes into an infinite loop because of input your didn't expect, or when you can't figure out how to use it to do something more complicated. And let's take a look at fgets 's actual nuisances:

  1. You always have to specify the array size. Well, of course, that's not a nuisance at all -- that's a feature, because buffer overflow is a Really Bad Thing.

  2. You have to check the return value. Actually, that's a wash, because to use scanf correctly, you have to check its return value, too.

  3. You have to strip the \n back off. This is, I admit, a true nuisance. I wish there were a Standard function I could point you to that didn't have this little problem. (Please nobody bring up gets .) But compared to scanf's 17 different nuisances, I'll take this one nuisance of fgets any day.

So how do you strip that newline? Three ways:

(a) Obvious way:

char *p = strchr(line, '\n');
if(p != NULL) *p = '\0';

(b) Tricky & compact way:

strtok(line, "\n");

Unfortunately this one doesn't always work.

(c) Another compact and mildly obscure way:

line[strcspn(line, "\n")] = '\0';

And now that that's out of the way, we can get back to another thing I skipped over the imperfections of atoi() and atof() . The problem with those is they don't give you any useful indication of success of success or failure: they quietly ignore trailing nonnumeric input, and they quietly return 0 if there's no numeric input at all. The preferred alternatives -- which also have certain other advantages -- are strtol and strtod . strtol also lets you use a base other than 10, meaning you can get the effect of (among other things) %o or %x with scanf . But showing how to use these functions correctly is a story in itself, and would be too much of a distraction from what is already turning into a pretty fragmented narrative, so I'm not going to say anything more about them now.

The rest of the main narrative concerns input you might be trying to parse that's more complicated than just a single number or character. What if you want to read a line containing two numbers, or multiple whitespace-separated words, or specific framing punctuation? That's where things get interesting, and where things were probably getting complicated if you were trying to do things using scanf , and where there are vastly more options now that you've cleanly read one line of text using fgets , although the full story on all those options could probably fill a book, so we're only going to be able to scratch the surface here.

  1. My favorite technique is to break the line up into whitespace-separated "words", then do something further with each "word". One principal Standard function for doing this is strtok (which also has its issues, and which also rates a whole separate discussion). My own preference is a dedicated function for constructing an array of pointers to each broken-apart "word", a function I describe in these course notes . At any rate, once you've got "words", you can further process each one, perhaps with the same atoi / atof / strtol / strtod functions we've already looked at.

  2. Paradoxically, even though we've been spending a fair amount of time and effort here figuring out how to move away from scanf , another fine way to deal with the line of text we just read with fgets is to pass it to sscanf . In this way, you end up with most of the advantages of scanf , but without most of the disadvantages.

  3. If your input syntax is particularly complicate, it might be appropriate to use a "regexp" library to parse it.

  4. Finally, you can use whatever ad hoc parsing solutions suit you. You can move through the line a character at a time with a char * pointer checking for characters you expect. Or you can search for specific characters using functions like strchr or strrchr , or strspn or strcspn , or strpbrk . Or you can parse/convert and skip over groups of digit characters using the strtol or strtod functions that we skipped over earlier.

There's obviously much more that could be said, but hopefully this introduction will get you started.


Let's state the requirements of parsing as:

  • valid input must be accepted (and converted into some other form)

  • invalid input must be rejected

  • when any input is rejected, it is necessary to provide the user with a descriptive message that explains (in clear "easily understood by normal people who are not programmers" language) why it was rejected (so that people can figure out how to fix the problem)

To keep things very simple, lets consider parsing a single simple decimal integer (that was typed in by the user) and nothing else. Possible reasons for the user's input to be rejected are:

  • the input contained unacceptable characters
  • the input represents a number that is lower than the accepted minimum
  • the input represents a number that is higher than the accepted maximum
  • the input represents a number that has a non-zero fractional part

Let's also define "input contained unacceptable characters" properly; and say that:

  • leading whitespace and trailing whitespace will be ignored (e.g. "
    5 " will be treated as "5")
  • zero or one decimal point is allowed (e.g. "1234." and "1234.000" are both treated the same as "1234")
  • there must be at least one digit (e.g. "." is rejected)
  • no more than one decimal point is allowed (e.g. "1.2.3" is rejected)
  • commas that are not between digits will be rejected (e.g. ",1234" is rejected)
  • commas that are after a decimal point will be rejected (e.g. "1234.000,000" is rejected)
  • commas that are after another comma are rejected (e.g. "1,,234" is rejected)
  • all other commas will be ignored (e.g. "1,234" will be treated as "1234")
  • a minus sign that is not the first non-whitespace character is rejected
  • a positive sign that is not the first non-whitespace character is rejected

From this we can determine that the following error messages are needed:

  • "Unknown character at start of input"
  • "Unknown character at end of input"
  • "Unknown character in middle of input"
  • "Number is too low (minimum is ....)"
  • "Number is too high (maximum is ....)"
  • "Number is not an integer"
  • "Too many decimal points"
  • "No decimal digits"
  • "Bad comma at start of number"
  • "Bad comma at end of number"
  • "Bad comma in middle of number"
  • "Bad comma after decimal point"

From this point we can see that a suitable function to convert a string into an integer would need to distinguish between very different types of errors; and that something like " scanf() " or " atoi() " or " strtoll() " is completely and utterly worthless because they fail to give you any indication of what was wrong with the input (and use a completely irrelevant and inappropriate definition of what is/isn't "valid input").

Instead, lets start writing something that isn't useless:

char *convertStringToInteger(int *outValue, char *string, int minValue, int maxValue) {
    return "Code not implemented yet!";
}

int main(int argc, char *argv[]) {
    char *errorString;
    int value;

    if(argc < 2) {
        printf("ERROR: No command line argument.\n");
        return EXIT_FAILURE;
    }
    errorString = convertStringToInteger(&value, argv[1], -10, 2000);
    if(errorString != NULL) {
        printf("ERROR: %s\n", errorString);
        return EXIT_FAILURE;
    }
    printf("SUCCESS: Your number is %d\n", value);
    return EXIT_SUCCESS;
}

To meet the stated requirements; this convertStringToInteger() function is likely to end up being several hundred lines of code all by itself.

Now, this was just "parsing a single simple decimal integer". Imagine if you wanted to parse something complex; like a list of "name, street address, phone number, email address" structures; or maybe like a programming language. For these cases you might need to write thousands of lines of code to create a parse that isn't a crippled joke.

In other words...

What can I use to parse input instead of scanf?

Write (potentially thousands of lines) of code yourself, to suit your requirements.


The most common ways of reading input are:

  • using fgets with a fixed size, which is what is usually suggested, and

  • using fgetc , which may be useful if you're only reading a single char .

To convert the input, there are a variety of functions that you can use:

  • strtoll , to convert a string into an integer

  • strtof / d / ld , to convert a string into a floating-point number

  • sscanf , which is not as bad as simply using scanf , although it does have most of the downfalls mentioned below

  • There are no good ways to parse a delimiter-separated input in plain ANSI C. Either use strtok_r from POSIX or strtok_s from the not-widely-implemented Annex K. You could also roll your own using strcspn and strspn , as it doesn't involve any special OS support.

  • It may be overkill, but you can users lexers and parsers ( flex and bison being the most common examples).

  • No conversion, simply just use the string

Since you didn't go into exactly why scanf is bad in your question, I'll elaborate:

  • With the conversion specifiers %[...] and %c , scanf does not eat up whitespace. This is apparently not widely known, as evidenced by the many duplicates of this question .

  • There is some confusion about when to use the unary & operator when referring to scanf 's arguments (specifically with strings).

  • It's very easy to ignore the return value from scanf . This could easily cause undefined behavior from reading an uninitialized variable.

  • It's very easy to forget to prevent buffer overflow in scanf . scanf("%s", str) is just as bad as, if not worse than, gets .

  • You cannot detect overflow when converting integers with scanf . In fact, overflow causes undefined behavior in these functions.


scanf is awesome when you know your input is always well-structured and well-behaved. Otherwise...

IMO, here are the biggest problems with scanf :

  • Risk of buffer overflow - if you do not specify a field width for the %s and %[ conversion specifiers, you risk a buffer overflow (trying to read more input than a buffer is sized to hold). Unfortunately, there's no good way to specify that as an argument (as with printf ) - you have to either hardcode it as part of the conversion specifier or do some macro shenanigans.

  • Accepts inputs that should be rejected - If you're reading an input with the %d conversion specifier and you type something like 12w4 , you would expect scanf to reject that input, but it doesn't - it successfully converts and assigns the 12 , leaving w4 in the input stream to foul up the next read.

So, what should you use instead?

I usually recommend reading all interactive input as text using fgets - it allows you to specify a maximum number of characters to read at a time, so you can easily prevent buffer overflow:

char input[100];
if ( !fgets( input, sizeof input, stdin ) )
{
  // error reading from input stream, handle as appropriate
}
else
{
  // process input buffer
}

One quirk of fgets is that it will store the trailing newline in the buffer if there's room, so you can do an easy check to see if someone typed in more input than you were expecting:

char *newline = strchr( input, '\n' );
if ( !newline )
{
  // input longer than we expected
}

How you deal with that is up to you - you can either reject the whole input out of hand, and slurp up any remaining input with getchar :

while ( getchar() != '\n' ) 
  ; // empty loop

Or you can process the input you got so far and read again. It depends on the problem you're trying to solve.

To tokenize the input (split it up based on one or more delimiters), you can use strtok , but beware - strtok modifies its input (it overwrites delimiters with the string terminator), and you can't preserve its state (i.e., you can't partially tokenize one string, then start to tokenize another, then pick up where you left off in the original string). There's a variant, strtok_s , that preserves the state of the tokenizer, but AFAIK its implementation is optional (you'll need to check that __STDC_LIB_EXT1__ is defined to see if it's available).

Once you've tokenized your input, if you need to convert strings to numbers (i.e., "1234" => 1234 ), you have options. strtol and strtod will convert string representations of integers and real numbers to their respective types. They also allow you to catch the 12w4 issue I mentioned above - one of their arguments is a pointer to the first character not converted in the string:

char *text = "12w4";
char *chk;
long val;
long tmp = strtol( text, &chk, 10 );
if ( !isspace( *chk ) && *chk != 0 )
  // input is not a valid integer string, reject the entire input
else
  val = tmp;




scanf