apache-spark 351

  1. Apache Spark vs. Apache Storm
  2. What is the difference between Apache Spark and Apache Flink?
  3. Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects
  4. What is the difference between cache and persist?
  5. Spark java.lang.OutOfMemoryError: Java heap space
  6. How to read multiple text files into a single RDD?
  7. What is the difference between map and flatMap and a good use case for each?
  8. Difference between DataFrame (in Spark 2.0 i.e DataSet[Row] ) and RDD in Spark
  9. Apache Spark: The number of cores vs. the number of executors
  10. (Why) do we need to call cache or persist on a RDD
  11. Spark performance for Scala vs Python
  12. What are workers, executors, cores in Spark Standalone cluster?
  13. How to turn off INFO logging in PySpark?
  14. How to change column types in Spark SQL's DataFrame?
  15. How to store custom objects in Dataset?
  16. How to convert rdd object to dataframe in spark
  17. How to define partitioning of DataFrame?
  18. How to print the contents of RDD?
  19. How to prevent java.lang.OutOfMemoryError: PermGen space at Scala compilation?
  20. Apache Spark: map vs mapPartitions?
  21. Spark - repartition() vs coalesce()
  22. How to set Apache Spark Executor memory
  23. How to stop messages displaying on spark console?
  24. importing pyspark in python shell
  25. Errors when using OFF_HEAP Storage with Spark 1.4.0 and Tachyon 0.6.4
  26. Add jars to a Spark Job - spark-submit
  27. How to load local file in sc.textFile, instead of HDFS
  28. how to make saveAsTextFile NOT split output into multiple file?
  29. How to set up Spark on Windows?
  30. How to overwrite the output directory in spark
  31. How to select the first row of each group?
  32. How to show full column content in a Spark Dataframe?
  33. Write to multiple outputs by key Spark - one Spark job
  34. What is the difference between Apache Mahout and Apache Spark's MLlib?
  35. Why do Spark jobs fail with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 in speculation mode?
  36. How to sort by column in descending order in Spark SQL?
  37. What does “Stage Skipped” mean in Apache Spark web UI?
  38. Which cluster type should I choose for Spark?
  39. Spark: what's the best strategy for joining a 2-tuple-key RDD with single-key RDD?
  40. How do I convert csv file to rdd
  41. How to pass -D parameter or environment variable to Spark job?
  42. Spark - load CSV file as DataFrame?
  43. How do I add a new column to a Spark DataFrame (using PySpark)?
  44. How DAG works under the covers in RDD?
  45. Append a column to Data Frame in Apache Spark 1.3
  46. Extract column values of Dataframe as List in Apache Spark
  47. How are stages split into tasks in Spark?
  48. Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)
  49. Spark Driver in Apache spark
  50. What is the relationship between workers, worker instances, and executors?