apache-spark 351

  1. Apache Spark vs. Apache Storm
  2. What is the difference between Apache Spark and Apache Flink?
  3. Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects
  4. What is the difference between cache and persist?
  5. Spark java.lang.OutOfMemoryError: Java heap space
  6. How to read multiple text files into a single RDD?
  7. What is the difference between map and flatMap and a good use case for each?
  8. Difference between DataFrame (in Spark 2.0 i.e DataSet[Row] ) and RDD in Spark
  9. Apache Spark: The number of cores vs. the number of executors
  10. (Why) do we need to call cache or persist on a RDD
  11. What are workers, executors, cores in Spark Standalone cluster?
  12. Spark performance for Scala vs Python
  13. How to turn off INFO logging in PySpark?
  14. How to change column types in Spark SQL's DataFrame?
  15. How to store custom objects in Dataset?
  16. How to convert rdd object to dataframe in spark
  17. How to define partitioning of DataFrame?
  18. How to print the contents of RDD?
  19. How to prevent java.lang.OutOfMemoryError: PermGen space at Scala compilation?
  20. Apache Spark: map vs mapPartitions?
  21. Spark - repartition() vs coalesce()
  22. How to set Apache Spark Executor memory
  23. How to stop messages displaying on spark console?
  24. importing pyspark in python shell
  25. Errors when using OFF_HEAP Storage with Spark 1.4.0 and Tachyon 0.6.4
  26. Add jars to a Spark Job - spark-submit
  27. How to load local file in sc.textFile, instead of HDFS
  28. how to make saveAsTextFile NOT split output into multiple file?
  29. How to set up Spark on Windows?
  30. How to select the first row of each group?
  31. How to overwrite the output directory in spark
  32. How to show full column content in a Spark Dataframe?
  33. Write to multiple outputs by key Spark - one Spark job
  34. Why do Spark jobs fail with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 in speculation mode?
  35. What does “Stage Skipped” mean in Apache Spark web UI?
  36. How to sort by column in descending order in Spark SQL?
  37. What is the difference between Apache Mahout and Apache Spark's MLlib?
  38. Which cluster type should I choose for Spark?
  39. Spark: what's the best strategy for joining a 2-tuple-key RDD with single-key RDD?
  40. How to pass -D parameter or environment variable to Spark job?
  41. How do I convert csv file to rdd
  42. How do I add a new column to a Spark DataFrame (using PySpark)?
  43. Spark - load CSV file as DataFrame?
  44. How are stages split into tasks in Spark?
  45. How DAG works under the covers in RDD?
  46. Append a column to Data Frame in Apache Spark 1.3
  47. Extract column values of Dataframe as List in Apache Spark
  48. Spark Driver in Apache spark
  49. Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)
  50. What is the relationship between workers, worker instances, and executors?