Fastest way to check if a file exist using standard C++/C++11/C?



Answers

I use this piece of code, it works OK with me so far. This does not use many fancy features of C++:

bool is_file_exist(const char *fileName)
{
    std::ifstream infile(fileName);
    return infile.good();
}
Question

I would like to find the fastest way to check if a file exist in standard C++11, C++, or C. I have thousands of files and before doing something on them I need to check if all of them exist. What can I write instead of /* SOMETHING */ in the following function?

inline bool exist(const std::string& name)
{
    /* SOMETHING */
}



You may also do bool b = std::ifstream('filename').good();. Without the branch instructions(like if) it must perform faster as it needs to be called thousands of times.




inline bool exist(const std::string& name)
{
    ifstream file(name);
    if(!file)            // If the file was not found, then file is 0, i.e. !file=1 or true.
        return false;    // The file was not found.
    else                 // If the file was found, then file is non-0.
        return true;     // The file was found.
}



It depends on where the files reside. For instance, if they are all supposed to be in the same directory, you can read all the directory entries into a hash table and then check all the names against the hash table. This might be faster on some systems than checking each file individually. The fastest way to check each file individually depends on your system ... if you're writing ANSI C, the fastest way is fopen because it's the only way (a file might exist but not be openable, but you probably really want openable if you need to "do something on it"). C++, POSIX, Windows all offer additional options.

While I'm at it, let me point out some problems with your question. You say that you want the fastest way, and that you have thousands of files, but then you ask for the code for a function to test a single file (and that function is only valid in C++, not C). This contradicts your requirements by making an assumption about the solution ... a case of the XY problem. You also say "in standard c++11(or)c++(or)c" ... which are all different, and this also is inconsistent with your requirement for speed ... the fastest solution would involve tailoring the code to the target system. The inconsistency in the question is highlighted by the fact that you accepted an answer that gives solutions that are system-dependent and are not standard C or C++.




I need a fast function that can check if a file is exist or not and PherricOxide's answer is almost what I need except it does not compare the performance of boost::filesystem::exists and open functions. From the benchmark results we can easily see that :

  • Using stat function is the fastest way to check if a file is exist. Note that my results are consistent with that of PherricOxide's answer.

  • The performance of boost::filesystem::exists function is very close to that of stat function and it is also portable. I would recommend this solution if boost libraries is accessible from your code.

Benchmark results obtained with Linux kernel 4.17.0 and gcc-7.3:

2018-05-05 00:35:35
Running ./filesystem
Run on (8 X 2661 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
use_stat          815 ns        813 ns     861291
use_open         2007 ns       1919 ns     346273
use_access       1186 ns       1006 ns     683024
use_boost         831 ns        830 ns     831233

Below is my benchmark code:

#include <string.h>                                                                                                                                                                                                                                           
#include <stdlib.h>                                                                                                                                                                                                                                           
#include <sys/types.h>                                                                                                                                                                                                                                        
#include <sys/stat.h>                                                                                                                                                                                                                                         
#include <unistd.h>                                                                                                                                                                                                                                           
#include <dirent.h>                                                                                                                                                                                                                                           
#include <fcntl.h>                                                                                                                                                                                                                                            
#include <unistd.h>                                                                                                                                                                                                                                           

#include "boost/filesystem.hpp"                                                                                                                                                                                                                               

#include <benchmark/benchmark.h>                                                                                                                                                                                                                              

const std::string fname("filesystem.cpp");                                                                                                                                                                                                                    
struct stat buf;                                                                                                                                                                                                                                              

// Use stat function                                                                                                                                                                                                                                          
void use_stat(benchmark::State &state) {                                                                                                                                                                                                                      
    for (auto _ : state) {                                                                                                                                                                                                                                    
        benchmark::DoNotOptimize(stat(fname.data(), &buf));                                                                                                                                                                                                   
    }                                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                             
BENCHMARK(use_stat);                                                                                                                                                                                                                                          

// Use open function                                                                                                                                                                                                                                          
void use_open(benchmark::State &state) {                                                                                                                                                                                                                      
    for (auto _ : state) {                                                                                                                                                                                                                                    
        int fd = open(fname.data(), O_RDONLY);                                                                                                                                                                                                                
        if (fd > -1) close(fd);                                                                                                                                                                                                                               
    }                                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                             
BENCHMARK(use_open);                                  
// Use access function                                                                                                                                                                                                                                        
void use_access(benchmark::State &state) {                                                                                                                                                                                                                    
    for (auto _ : state) {                                                                                                                                                                                                                                    
        benchmark::DoNotOptimize(access(fname.data(), R_OK));                                                                                                                                                                                                 
    }                                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                             
BENCHMARK(use_access);                                                                                                                                                                                                                                        

// Use boost                                                                                                                                                                                                                                                  
void use_boost(benchmark::State &state) {                                                                                                                                                                                                                     
    for (auto _ : state) {                                                                                                                                                                                                                                    
        boost::filesystem::path p(fname);                                                                                                                                                                                                                     
        benchmark::DoNotOptimize(boost::filesystem::exists(p));                                                                                                                                                                                               
    }                                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                             
BENCHMARK(use_boost);                                                                                                                                                                                                                                         

BENCHMARK_MAIN();   



Without using other libraries, I like to use the following code snippet:

#ifdef _WIN32
   #include <io.h> 
   #define access    _access_s
#else
   #include <unistd.h>
#endif

bool FileExists( const std::string &Filename )
{
    return access( Filename.c_str(), 0 ) == 0;
}

This works cross-platform for Windows and POSIX-compliant systems.




If you need to distinguish between a file and a directory, consider the following which both use stat which the fastest standard tool as demonstrated by PherricOxide:

#include <sys/stat.h>
int FileExists(char *path)
{
    struct stat fileStat; 
    if ( stat(path, &fileStat) )
    {
        return 0;
    }
    if ( !S_ISREG(fileStat.st_mode) )
    {
        return 0;
    }
    return 1;
}

int DirExists(char *path)
{
    struct stat fileStat;
    if ( stat(path, &fileStat) )
    {
        return 0;
    }
    if ( !S_ISDIR(fileStat.st_mode) )
    {
        return 0;
    }
    return 1;
}





Related