This section of the blog will focus on using Apache Spark. I’ll be using Scala as the programming language, but all the concepts are the same in Python, making it easy to convert Scala code to Python since the two APIs are very similar.
II’ve created a GitHub repository containing the code for each chapter of this series, so that you can replicate the examples on your computer. Before diving into any articles, make sure to follow the Spark @0 - Run Spark Applications to set the right environment on your locale machine to run Spark Applications in Scala using IntelliJ IDE.