apache-spark 356

  1. Apache Spark vs. Apache Storm
  2. What is the difference between Apache Spark and Apache Flink?
  3. Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects
  4. What is the difference between cache and persist?
  5. Difference between DataFrame (in Spark 2.0 i.e DataSet[Row] ) and RDD in Spark
  6. Spark java.lang.OutOfMemoryError: Java heap space
  7. What is the difference between map and flatMap and a good use case for each?
  8. How to read multiple text files into a single RDD?
  9. Apache Spark: The number of cores vs. the number of executors
  10. What are workers, executors, cores in Spark Standalone cluster?
  11. Spark performance for Scala vs Python
  12. (Why) do we need to call cache or persist on a RDD
  13. Spark - repartition() vs coalesce()
  14. How to change column types in Spark SQL's DataFrame?
  15. How to store custom objects in Dataset?
  16. How to turn off INFO logging in Spark?
  17. How to stop INFO messages displaying on spark console?
  18. How to print the contents of RDD?
  19. Add jars to a Spark Job - spark-submit
  20. How to convert rdd object to dataframe in spark
  21. How to define partitioning of DataFrame?
  22. How to set Apache Spark Executor memory
  23. Apache Spark: map vs mapPartitions?
  24. How to prevent java.lang.OutOfMemoryError: PermGen space at Scala compilation?
  25. importing pyspark in python shell
  26. Errors when using OFF_HEAP Storage with Spark 1.4.0 and Tachyon 0.6.4
  27. How to load local file in sc.textFile, instead of HDFS
  28. how to make saveAsTextFile NOT split output into multiple file?
  29. How to set up Spark on Windows?
  30. How to overwrite the output directory in spark
  31. How to select the first row of each group?
  32. How to show full column content in a Spark Dataframe?
  33. How to change dataframe column names in pyspark?
  34. Write to multiple outputs by key Spark - one Spark job
  35. What is the difference between Apache Mahout and Apache Spark's MLlib?
  36. Why do Spark jobs fail with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 in speculation mode?
  37. How to sort by column in descending order in Spark SQL?
  38. What does “Stage Skipped” mean in Apache Spark web UI?
  39. Which cluster type should I choose for Spark?
  40. Spark: what's the best strategy for joining a 2-tuple-key RDD with single-key RDD?
  41. How do I convert csv file to rdd
  42. How to pass -D parameter or environment variable to Spark job?
  43. How to link PyCharm with PySpark?
  44. Spark - load CSV file as DataFrame?
  45. How do I add a new column to a Spark DataFrame (using PySpark)?
  46. How DAG works under the covers in RDD?
  47. Append a column to Data Frame in Apache Spark 1.3
  48. Extract column values of Dataframe as List in Apache Spark
  49. How are stages split into tasks in Spark?
  50. Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)