the - What are SUCCESS and part-r-00000 files in hadoop
part 00001 hadoop (1)
Although I use Hadoop frequently on my Ubuntu machine I have never thought about
part-r-00000 files. The output always resides in
part-r-00000 file, but what is the use of
SUCCESS file? Why does the output file have the name
part-r-0000? Is there any significance/any nomenclature or is this just a randomly defined?
On the successful completion of a job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS. (MAPREDUCE-947)
This would typically be used by job scheduling systems (such as OOZIE), to denote that follow-on processing on the contents of this directory can commence as all the data has been output.
Update (in response to comment)
The output files are by default named part-x-yyyyy where:
xis either 'm' or 'r', depending on whether the job was a map only job, or reduce
yyyyyis the mapper or reducer task number (zero based)
So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.