hadoop (416)

  1. Difference between Pig and Hive? Why have both?
  2. Hadoop “Unable to load native-hadoop library for your platform” warning
  3. What is the difference between Apache Spark and Apache Flink?
  4. When to use Hadoop, HBase, Hive and Pig?
  5. Apache Spark: The number of cores vs. the number of executors
  6. Difference between HBase and Hadoop/HDFS
  7. Chaining multiple MapReduce jobs in Hadoop
  8. How does Hadoop process records split across block boundaries?
  9. Name node is in safe mode. Not able to leave
  10. How to turn off INFO logging in Spark?
  11. Is there a .NET equivalent to Apache Hadoop?
  12. How does the MapReduce sort algorithm work?
  13. What is the difference between partitioning and bucketing a table in Hive ?
  14. What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming?
  15. How to copy file from HDFS to the local file system
  16. Large scale data processing Hbase vs Cassandra
  17. Difference between Hive internal tables and external tables?
  18. Failed to locate the winutils binary in the hadoop binary path
  19. hadoop No FileSystem for scheme: file
  20. When do reduce tasks start in Hadoop?
  21. Spark - load CSV file as DataFrame?
  22. merge output files after reduce phase

  23. Integration testing Hive jobs
  24. How do I output the results of a HiveQL query to CSV?
  25. what's the difference between “hadoop fs” shell commands and “hdfs dfs” shell commands?
  26. What are the pros and cons of parquet format compared to other formats?
  27. Buiding Hadoop with Eclipse / Maven - Missing artifact jdk.tools:jdk.tools:jar:1.6
  28. How to know Hive and Hadoop versions from command prompt?
  29. Where does Hive store files in HDFS?
  30. HDFS error: could only be replicated to 0 nodes, instead of 1
  31. Differences between Amazon S3 and S3n in Hadoop
  32. Technically what is the difference between s3n, s3a and s3?
  33. connect to host localhost port 22: Connection refused
  34. Life without JOINs… understanding, and common practices
  35. The way to check a HDFS directory's size?
  36. Container is running beyond memory limits
  37. Parquet vs ORC vs ORC with Snappy
  38. Hadoop on OSX “Unable to load realm info from SCDynamicStore”
  39. out of Memory Error in Hadoop
  40. How does Hive compare to HBase?
  41. How to get/generate the create statement for an existing hive table?
  42. Is there any way to get the column name along with the output while execute any query in Hive?
  43. Write to multiple outputs by key Spark - one Spark job
  44. Java vs Python on Hadoop
  45. Scalable Image Storage
  46. Namenode not getting started
  47. Can apache spark run without hadoop?
  48. PIG how to count a number of rows in alias
  49. Why is there no 'hadoop fs -head' shell command?
  50. Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)