Apache Spark and Scala Certification Training course detail


Apache Spark and Scala is a training course which provides a certification in spark applications utilizing Scala programming. In this training it offers you an information of unmistakable examination among Spark and Hadoop . This course have methods to make application execution and empower fast handling using Spark RDDs, it’s a unique techniques to help in the customization of Spark using Scala and classes on this training course, RCS technologies provide the Apache Spark and Scala Certification training course which make your personality far better by using this new techniques in the field of application execution.

What will you learn

Key Features
  • 22 Hrs Self-paced Videos
  • 24 Hrs Instructor-led Training
  • 60 Hrs Project Work & Exercises
  • Flexible Schedule
  • 24 x 7 Lifetime Support & Access
  • Certification and Job Assistance


  • 23 Lessons
  • 1.1 Introducing Scala

    1.2 Deployment of Scala for Big Data applications and Apache Spark analytics

    1.3 Scala REPL, lazy values, and control structures in Scala

    1.4 Directed Acyclic Graph (DAG)

    1.5 First Spark application using SBT/Eclipse

    1.6 Spark Web UI

    1.7 Spark in the Hadoop ecosystem.

  • 2.1 The importance of Scala

    2.2 The concept of REPL (Read Evaluate Print Loop)

    2.3 Deep dive into Scala pattern matching

    2.4 Type interface, higher-order function, currying, traits, application space and Scala for data analysis

  • 3.1 Learning about the Scala Interpreter

    3.2 Static object timer in Scala and testing string equality in Scala

    3.3 Implicit classes in Scala

    3.4 The concept of currying in Scala

    3.5 Various classes in Scala

  • 4.1 Learning about the Classes concept

    4.2 Understanding the constructor overloading

    4.3 Various abstract classes

    4.4 The hierarchy types in Scala

    4.5 The concept of object equality

    4.6 The val and var methods in Scala

  • 5.1 Understanding sealed traits, wild, constructor, tuple, variable pattern, and constant pattern

  • 6.1 Understanding traits in Scala

    6.2 The advantages of traits

    6.3 Linearization of traits

    6.4 The Java equivalent

    6.5 Avoiding of boilerplate code

  • 7.1 Implementation of traits in Scala and Java

    7.2 Handling of multiple traits extending

  • 8.1 Introduction to Scala collections

    8.2 Classification of collections

    8.3 The difference between iterator and iterable in Scala

    8.4 Example of list sequence in Scala

  • 9.1 The two types of collections in Scala

    9.2 Mutable and immutable collections

    9.3 Understanding lists and arrays in Scala

    9.4 The list buffer and array buffer

    9.6 Queue in Scala

    9.7 Double-ended queue Deque, Stacks, Sets, Maps, and Tuples in Scala

  • 10.1 Introduction to Scala packages and imports

    10.2 The selective imports

    10.3 The Scala test classes

    10.4 Introduction to JUnit test class

    10.5 JUnit interface via JUnit 3 suite for Scala test

    10.6 Packaging of Scala applications in the directory structure

    10.7 Examples of Spark Split and Spark Scala

  • 11.1 Introduction to Spark

    11.2 Spark overcomes the drawbacks of working on MapReduce

    11.3 Understanding in-memory MapReduce

    11.4 Interactive operations on MapReduce

    11.5 Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, and YARN Revision

    11.6 The overview of Spark and how it is better than Hadoop

    11.7 Deploying Spark without Hadoop

    11.8 Spark history server and Cloudera distribution

  • 12.1 Spark installation guide

    12.2 Spark configuration

    12.3 Memory management

    12.4 Executor memory vs. driver memory

    12.5 Working with Spark Shell

    12.6 The concept of resilient distributed datasets (RDD)

    12.7 Learning to do functional programming in Spark

    12.8 The architecture of Spark

  • 13.1 Spark RDD

    13.2 Creating RDDs

    13.3 RDD partitioning

    13.4 Operations and transformation in RDD

    13.5 Deep dive into Spark RDDs

    13.6 The RDD general operations

    13.7 Read-only partitioned collection of records

    13.8 Using the concept of RDD for faster and efficient data processing

    13.9 RDD action for the collect, count, collects map, save-as-text-files, and pair RDD functions

  • 14.1 Understanding the concept of key-value pair in RDDs

    14.2 Learning how Spark makes MapReduce operations faster

    14.3 Various operations of RDD

    14.4 MapReduce interactive operations

    14.5 Fine and coarse-grained update

    14.6 Spark stack

  • 15.1 Comparing the Spark applications with Spark Shell

    15.2 Creating a Spark application using Scala or Java

    15.3 Deploying a Spark application

    15.4 Scala built application

    15.5 Creation of the mutable list, set and set operations, list, tuple, and concatenating list

    15.6 Creating an application using SBT

    15.7 Deploying an application using Maven

    15.8 The web user interface of Spark application

    15.9 A real-world example of Spark

    15.10 Configuring of Spark

  • 16.1 Learning about Spark parallel processing

    16.2 Deploying on a cluster

    16.3 Introduction to Spark partitions

    16.4 File-based partitioning of RDDs

    16.5 Understanding of HDFS and data locality

    16.6 Mastering the technique of parallel operations

    16.7 Comparing repartition and coalesce

    16.8 RDD action

  • 17.1 The execution flow in Spark

    17.2 Understanding the RDD persistence overview

    17.3 Spark execution flow, and Spark terminology

    17.4 Distribution shared memory vs. RDD

    17.5 RDD limitations

    17.6 Spark shell arguments

    17.7 Distributed persistence

    17.8 RDD lineage

    17.9 Key-value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey, and AggregateByKey

  • 18.1 Introduction to Machine Learning

    18.2 Types of Machine Learning

    18.3 Introduction to MLlib

    18.4 Various ML algorithms supported by MLlib

    18.5 Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques

    Hands-on Exercise: 

    1. Building a Recommendation Engine

  • 19.1 Why Kafka and what is Kafka?

    19.2 Kafka architecture

    19.3 Kafka workflow

    19.4 Configuring Kafka cluster

    19.5 Operations

    19.6 Kafka monitoring tools

    19.7 Integrating Apache Flume and Apache Kafka

    Hands-on Exercise: 

    1. Configuring Single Node Single Broker Cluster

    2. Configuring Single Node Multi Broker Cluster

    3. Producing and consuming messages

    4. Integrating Apache Flume and Apache Kafka

  • 20.1 Introduction to Spark Streaming

    20.2 Features of Spark Streaming

    20.3 Spark Streaming workflow

    20.4 Initializing StreamingContext, discretized Streams (DStreams), input DStreams and Receivers

    20.5 Transformations on DStreams, output operations on DStreams, windowed operators and why it is useful

    20.6 Important windowed operators and stateful operators

    Hands-on Exercise: 

    1. Twitter Sentiment analysis

    2. Streaming using Netcat server

    3. Kafka–Spark streaming

    4. Spark–Flume streaming

  • 21.1 Introduction to various variables in Spark like shared variables and broadcast variables

    21.2 Learning about accumulators

    21.3 The common performance issues

    21.4 Troubleshooting the performance problems

  • 22.1 Learning about Spark SQL

    22.2 The context of SQL in Spark for providing structured data processing

    22.3 JSON support in Spark SQL

    22.4 Working with XML data

    22.5 Parquet files

    22.6 Creating Hive context

    22.7 Writing data frame to Hive

    22.8 Reading JDBC files

    22.9 Understanding the data frames in Spark

    22.10 Creating Data Frames

    22.11 Manual inferring of schema

    22.12 Working with CSV files

    22.13 Reading JDBC tables

    22.14 Data frame to JDBC

    22.15 User-defined functions in Spark SQL

    22.16 Shared variables and accumulators

    22.17 Learning to query and transform data in data frames

    22.18 Data frame provides the benefit of both Spark RDD and Spark SQL

    22.19 Deploying Hive on Spark as the execution engine

  • 23.1 Learning about the scheduling and partitioning in Spark

    23.2 Hash partition

    23.3 Range partition

    23.4 Scheduling within and around applications

    23.5 Static partitioning, dynamic sharing, and fair scheduling

    23.6 Map partition with index, the Zip, and GroupByKey

    23.7 Spark master high availability, standby masters with ZooKeeper, single-node recovery with the local file system and high order functions


Based on 0 reviews
5 stars
4 stars
3 stars
2 stars
1 stars