[c] Why is “while ( !feof (file) )” always wrong?


Answers

It's wrong because (in the absence of a read error) it enters the loop one more time than the author expects. If there is a read error, the loop never terminates.

Consider the following code:

/* WARNING: demonstration of bad coding technique*/

#include <stdio.h>
#include <stdlib.h>

FILE *Fopen( const char *path, const char *mode );

int main( int argc, char **argv )
{
    FILE *in;
    unsigned count;

    in = argc > 1 ? Fopen( argv[ 1 ], "r" ) : stdin;
    count = 0;

    /* WARNING: this is a bug */
    while( !feof( in )) {  /* This is WRONG! */
        (void) fgetc( in );
        count++;
    }
    printf( "Number of characters read: %u\n", count );
    return EXIT_SUCCESS;
}

FILE * Fopen( const char *path, const char *mode )
{
    FILE *f = fopen( path, mode );
    if( f == NULL ) {
        perror( path );
        exit( EXIT_FAILURE );
    }
    return f;
}

This program will consistently print one greater than the number of characters in the input stream (assuming no read errors). Consider the case where the input stream is empty:

$ ./a.out < /dev/null
Number of characters read: 1

In this case, feof() is called before any data has been read, so it returns false. The loop is entered, fgetc() is called (and returns EOF), and count is incremented. Then feof() is called and returns true, causing the loop to abort.

This happens in all such cases. feof() does not return true until after a read on the stream encounters the end of file. The purpose of feof() is NOT to check if the next read will reach the end of file. The purpose of feof() is to distinguish between a read error and having reached the end of the file. If fread() returns 0, you must use feof/ferror to decide. Similarly if fgetc returns EOF. feof() is only useful after fread has returned zero or fgetc has returned EOF. Before that happens, feof() will always return 0.

It is always necessary to check the return value of a read (either an fread(), or an fscanf(), or an fgetc()) before calling feof().

Even worse, consider the case where a read error occurs. In that case, fgetc() returns EOF, feof() returns false, and the loop never terminates. In all cases where while(!feof(p)) is used, there must be at least a check inside the loop for ferror(), or at the very least the while condition should be replaced with while(!feof(p) && !ferror(p)) or there is a very real possibility of an infinite loop, probably spewing all sorts of garbage as invalid data is being processed.

So, in summary, although I cannot state with certainty that there is never a situation in which it may be semantically correct to write "while(!feof(f))" (although there must be another check inside the loop with a break to avoid a infinite loop on a read error), it is the case that it is almost certainly always wrong. And even if a case ever arose where it would be correct, it is so idiomatically wrong that it would not be the right way to write the code. Anyone seeing that code should immediately hesitate and say, "that's a bug". And possibly slap the author (unless the author is your boss in which case discretion is advised.)

Question

I've seen people trying to read files like this in a lot of posts lately.

Code

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    char * path = argc > 1 ? argv[1] : "input.txt";

    FILE * fp = fopen(path, "r");
    if( fp == NULL ) {
        perror(path);
        return EXIT_FAILURE;
    }

    while( !feof(fp) ) {  /* THIS IS WRONG */
        /* Read and process data from file… */
    }
    if( fclose(fp) == 0 ) {
        return EXIT_SUCCESS;
    } else {
        perror(path);
        return EXIT_FAILURE;
    }
}

What is wrong with this while( !feof(fp)) loop?




feof() indicates if one has tried to read past the end of file. That means it has little predictive effect: if it is true, you are sure that the next input operation will fail (you aren't sure the previous one failed BTW), but if it is false, you aren't sure the next input operation will succeed. More over, input operations may fail for other reasons than the end of file (a format error for formatted input, a pure IO failure -- disk failure, network timeout -- for all input kinds), so even if you could be predictive about the end of file (and anybody who has tried to implement Ada one, which is predictive, will tell you it can complex if you need to skip spaces, and that it has undesirable effects on interactive devices -- sometimes forcing the input of the next line before starting the handling of the previous one), you would have to be able to handle a failure.

So the correct idiom in C is to loop with the IO operation success as loop condition, and then test the cause of the failure. For instance:

while (fgets(line, sizeof(line), file)) {
    /* note that fgets don't strip the terminating \n, checking its
       presence allow to handle lines longer that sizeof(line), not showed here */
    ...
}
if (ferror(file)) {
   /* IO failure */
} else if (feof(file)) {
   /* format error (not possible with fgets, but would be with fscanf) or end of file */
} else {
   /* format error (not possible with fgets, but would be with fscanf) */
}



Related