PySpark Training course detail

Description

RCS Technologies provide a course which name is PySpark course,the aim of this course to assist you with understanding the PySpark idea and make custom, highlight rich application using Python and Spark. Our PySpark training program is based on online by driving PySpark specialists who working in top companies such as MNCs.In this training will gain an information about all the details of Apache Spark and related environment, including Spark framework, PySpark SQL, PySpark streaming,and much more.a virtual lab provide you with real time projects to get experience with PySpark. To enhance your hitech skills this course is totally worth it.

What will you learn

Key Features
  • 24 Hrs Instructor-led Training
  • 22 Hrs Self-paced Videos
  • 60 Hrs Project Work & Exercises
  • 24 x 7 Lifetime Support & Access
  • Flexible Schedule
  • Certification and Job Assistance

Lessons

  • 12 Lessons
  • * Setting up Python Environment and Discussing Flow Control

    * Running Python Scripts and Exploring Python Editors and IDEs

    * Sequence and File Operations

    * Defining Reserve Keywords and Command Line Arguments

    * Describing Flow Control and Sequencing

    * Indexing and Slicing

    * Learning the xrange() Function

    * Working Around Dictionaries and Sets

    * Working with Files

  • * Explaining Functions and Various Forms of Function Arguments

    * Learning Variable Scope, Function Parameters, and Lambda Functions

    * Sorting Using Python

    * Exception Handling

    * Package Installation

    * Regular Expressions

  • * Using Class, Objects, and Attributes

    * Developing Applications Based on OOP

    * Learning About Classes, Objects and How They Function Together

    * Explaining OOPs Concepts Including Inheritance, Encapsulation, and Polymorphism, Among Others

  • * Debugging Python Scripts Using pdb and IDE

    * Classifying Errors and Developing Test Units

    * Implementing Databases Using SQLite

    * Performing CRUD Operations

  • * What is Big Data?

    * 5 V's of Big Data

    * Problems related to Big Data: Use Case

    * What tools available for handling Big Data?

    * What is Hadoop?

    * Why do we need Hadoop?

    * Key Characteristics of Hadoop

    * Important Hadoop ecosystem concepts

    • MapReduce and HDFS

    • Introduction to Apache Spark

    • What is Apache Spark?

    • Why do we need Apache Spark?

    • Who uses Spark in the industry?

    • Apache Spark architecture

    • Spark Vs. Hadoop

    • Various Big data applications using Apache Spark

  •  

    • Introduction to PySpark

    • Who uses PySpark?

    • Why Python for Spark?

    • Values, Types, Variables

    • Operands and Expressions

    • Conditional Statements

    • Loops

    • Numbers

    • Python files I/O Functions

    • Strings and associated operations

    • Sets and associated operations

    • Lists and associated operations

    • Tuples and associated operations

    • Dictionaries and associated operations

     

    Hands-On:

     

    • Demonstrating Loops and Conditional Statements

    • Tuple – related operations, properties, list, etc.

    • List – operations, related properties

    • Set – properties, associated operations

    • Dictionary – operations, related properties

  • • Modules Used in Python

    • The Import Statements

    • Module Search Path

    • Package Installation Ways

    Hands-On:

    • Lambda – Features, Options, Syntax, Compared with the Functions

    • Functions – Syntax, Return Values, Arguments, and Keyword Arguments

    • Errors and Exceptions – Issue Types, Remediation

    • Packages and Modules – Import Options, Modules, sys Path

  • • Spark Components & its Architecture

    • Spark Deployment Modes

    • Spark Web UI

    • Introduction to PySpark Shell

    • Submitting PySpark Job

    • Writing your first PySpark Job Using Jupyter Notebook

    • What is Spark RDDs?

    • Stopgaps in existing computing methodologies

    • How RDD solve the problem?

    • What are the ways to create RDD in PySpark?

    • RDD persistence and caching

    • General operations: Transformation, Actions, and Functions

    • Concept of Key-Value pair in RDDs

    • Other pair, two pair RDDs

    • RDD Lineage

    • RDD Persistence

    • WordCount Program Using RDD Concepts

    • RDD Partitioning & How it Helps Achieve Parallelization

    • Passing Functions to Spark

    Hands-On:

    • Building and Running Spark Application

    • Spark Application Web UI

    • Loading data in RDDs

    • Saving data through RDDs

    • RDD Transformations

    • RDD Actions and Functions

    • RDD Partitions

    • WordCount program using RDD’s in Python

  • • Need for Spark SQL

    • What is Spark SQL

    • Spark SQL Architecture

    • SQL Context in Spark SQL

    • User-Defined Functions

    • Data Frames

    • Interoperating with RDDs

    • Loading Data through Different Sources

    • Performance Tuning

    • Spark-Hive Integration

    Hands-On:

    • Spark SQL – Creating data frames

    • Loading and transforming data through different sources

    • Spark-Hive Integration

  • • Why Kafka

    • What is Kafka?

    • Kafka Workflow

    • Kafka Architecture

    • Kafka Cluster Configuring

    • Kafka Monitoring tools

    • Basic operations

    • What is Apache Flume?

    • Integrating Apache Flume and Apache Kafka

    Hands-On:

    • Single Broker Kafka Cluster

    • Multi-Broker Kafka Cluster

    • Topic Operations

    • Integrating Apache Flume and Apache Kafka

  • • Introduction to Spark Streaming

    • Features of Spark Streaming

    • Spark Streaming Workflow

    • StreamingContext Initializing

    • Discretized Streams (DStreams)

    • Input DStreams, Receivers

    • Transformations on DStreams

    • DStreams Output Operations

    • Describe Windowed Operators and Why it is Useful

    • Stateful Operators

    • Vital Windowed Operators

    • Twitter Sentiment Analysis

    • Streaming using Netcat server

    • WordCount program using Kafka-Spark Streaming

    Hands-On:

    • Twitter Sentiment Analysis

    • Streaming using Netcat server

    • WordCount program using Kafka-Spark Streaming

    • Spark-flume Integration

  • • Introduction to Machine Learning- What, Why and Where?

    • Use Case

    • Types of Machine Learning Techniques

    • Why use Machine Learning for Spark?

    • Applications of Machine Learning (general)

    • Applications of Machine Learning with Spark

    • Introduction to MLlib

    • Features of MLlib and MLlib Tools

    • Various ML algorithms supported by MLlib

    • Supervised Learning Algorithms

    • Unsupervised Learning Algorithms

    • ML workflow utilities

    Hands-On:

    • K- Means Clustering

    • Linear Regression

    • Logistic Regression

    • Decision Tree

    • Random Forest

Reviews

0
Based on 0 reviews
5 stars
4 stars
3 stars
2 stars
1 stars