Data Engineering Course | Data Engineer Certification

Comprehensive Data Engineering Course with Python, Spark, Apache Hive and more…

This course is perfect for:
Aspiring Data Engineers
Data Analysts seeking to expand their skill set
Anyone interested in leveraging big data for deeper insights

Data Engineering Course

This course equips you with the essential skills and knowledge to become a proficient data engineer.
We’ll embark on a journey through the exciting world of data engineering, starting with Python programming. You’ll gain hands-on experience with industry-standard tools like Hadoop Distributed File System (HDFS), Apache Spark, Spark SQL, Hive, Sqoop, and more.

Key Features

Instructor-Led Online Training
Certification & Job Assistance
Flexible Schedule
24 x 7 Lifetime Support
100% Job Oriented Training
Work on Real-time Projects

Get ready to transform your data into valuable knowledge. Enroll today and become a data engineering master!

Here’s a glimpse of what you’ll conquer: 

  • Python Programming: Master the core concepts of Python, a versatile language widely used in data engineering.
  • Hadoop & Spark: Gain a solid understanding of HDFS, the foundation for storing massive datasets, and delve into Apache Spark, a powerful framework for large-scale data processing.
  • Spark RDDs & DataFrames: Work with Spark’s core abstractions, RDDs (Resilient Distributed Datasets), and explore DataFrames, a structured data format ideal for data manipulation.
  • Data Sources & Persistence: Learn how to access and utilize various data sources like CSV, Excel, JSON, and connect with cloud storage like S3. Understand different persistence levels to optimize data handling.
  • Database Connectivity: Explore connecting Spark with relational databases like MySQL and PostgreSQL, enabling you to seamlessly integrate data from various sources.
  • Data Cleaning & Transformation: Master data cleaning techniques to ensure data quality and accuracy. Learn how to transform data into a format suitable for analysis.
  • Apache Hive: Dive into Apache Hive, a data warehouse software that facilitates querying large datasets using familiar SQL syntax.
  • Sqoop: Learn how Sqoop efficiently transfers data between relational databases and HDFS, bridging the gap between structured and unstructured data.
  • Azure Cloud & Data Bricks: Explore the world of cloud-based data engineering with Azure Cloud and Data Bricks. Learn how to set up clusters, create data lakes, and leverage cloud capabilities.

PYTHON

  • Environment Setup
  • Decision Making
  • Loops and Number
  • Strings
  • Lists
  • Tuples
  • Dictionary
  • Date and Time
  • Regex
  • Functions
  • OOPS
  • Files I/O
  • Exceptions
  • SET
  • Lambda, Map and filter

HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

  • What is HDFS?
  • How the data stored in HDFS?
  • What is BLOCK?
  • Replication Factor in HDFS?
  • Command in HDFS?

PYSPARK

  • Spark and Hadoop
  • What is Hadoop platform Why Hadoop platform What is Spark
  • Why spark Evolution of Spark
  • Hadoop Vs Spark (Spark Benefits)
  • Architecture of Spark Define Spark Components Lazy Evaluation
  • Spark-shell spark submit
  • Setting Up memory (Driver Memory, Executor Memory)
  • Setting Up Cores (Executors Core) Running Spark in Local

Spark RDD

  • Hadoop Map Reduce VS Spark RDD
  • Benefits Of RDD Over Hadoop Map Reduce
  • RDD overview Transformations and actions in the context of RDDs.
  • Demonstrate Each Api’s of RDD
  • With Real Time Examples (Like: cache, uncancahe, count, filter, map etc)

SPARK DATAFRAME

  • Magic With Data frames
  • Overview Of data frames
  • Read a CSV/Excel Files And create a data frame.
  • Cache/Uncahe Operations On data frames.
  • Persist/UnPersist Operations On data frames.
  • Partition and repartition Concepts of data frames.
  • For each Partitions On Data frames.
  • Programming using data frame. How to use data frames Api’s effectually.
  • A magic spark Job using data frame concept. (small project)
  • Schema Defining on from data frame How to perform SQL operations On data frame.
  • Check Point in data frame.
  • StructType and arrayType in data frames
  • Complex Data Structure on data frame

VARIOUS DATA SOURCES

  • CSV files Excel Files JSON Files Parquet file
  • Benefits of Parquet file Text Files

VARIOUS LEVELS OF PERSISTENCE

  • MEMORY_ONLY
  • MEMORY_ONLY_SER
  • MEMORY_AND_DISK
  • MEMORY_AND_DISK_SER
  • DISK_ONLY
  • OFF_HEAP

USER DEFINE FUNCTIONS

  • Benefits of UDF’s over SQL Writing the UDF’s and applying on to the data frame
  • Complex UDF’s
  • Data cleaning Using UDF’s

CONNECTING SPARK WITH S3

  • Connect spark with s3
  • Read a file from s3 and perform Transformation
  • Write a File to the s3 Preparation and close while
  • writing the file to the s3

MySQL DATABASE

  • Overview of MySQL database and benefits.
  • Partition Key and collection concepts in MySQL Connecting MySQL with spark
  • Read a table from MySQL and perform transformations.
  • Writing data to a MySQL table with millions of data

PostgreSQL

  • Overview of PostgreSQL
  • How to connect spark with PostgreSQL
  • Collection concepts of PostgreSQL
    • and doing operation in spark
    • Writing various keys to the redis using PostgreSQL

Spark SQL

  • Overview of Spark SQL.
  • How to write SQL in spark.
  • Various types of Clause in spark SQL
  • Using UDF’s inside spark SQL SQL Fine Tuning using spark

DATA CLEANING

  • What are the data column types?
  • How many fields match the
  • data type?
  • How many fields are mismatches?
  • Which fields are matches?
  • Which fields are mismatches?

HIVE - Introduction to HIVE

  • Sql vs Hive
  • What is Hive
  • Working of Hive
  • Architecture of Hive

HIVE COMMANDS AND TABLE CREATION

  • HADOOP AND HIVE INSTALLATION
  • DataBase Creation
  • Table Creation and inserting Data
  • Multi insert statement
  • Alter Table Schema
  • Sorting — sort by, order by, distribute by, cluster by

HIVE FUNCTIONS

  • Date and Mathematical functions
  • String functions
  • Split(), Substr(), instr() functions
  • Conditional statements and Explanation
  • Explode and Lateral view
  • Rlike function
  • Rank(), Dense_rank(), Row_number()
  • Mathematical Functions Exercises

PARTITIONING AND BUCKETING

  • What is Partitioning?
  • Static vs Dynamic partitioning
  • Alter Partitioned Table and MSCK Repair command
  • What is Bucketing?
  • Bucketed Table Creation and Tablesampling
  • No drop, Offline command
  • Exercises on Partitioning and Bucketing

JOINS IN HIVE

  • What is inner join and Explanation.
  • What is Outer join and Explanation.
  • Memory Management & Optimization of Joins

SQOOP

  • Introduction to Apache SQOOP
  • Environment Set Up For Apache SQOOP
  • IMPORT In Apache SQOOP
  • EXPORT In Apache SQOOP

AZURE CLOUD WITH DATA BRICKS

  • Setting up the cluster on Azure
  • Creating the data lake on Cluster
  • Loading the data on Azure
  • Azure synapse
  • Azure data Factory
  • Pulling Multiple datasets on Azure with data bricks
  • Monitoring the Jobs on DataBricks

FAQ'S

Do I get any discount on the course?

Yes, you get two kinds of discounts. They are group discount and referral discount. Group discount is offered when you join as a group, and referral discount is offered when you are referred from someone who has already enrolled in our training.

Who will provide the environment to execute the Practicals ?

The trainer will give Server Access to the course seekers, and we make sure you acquire practical hands-on training by providing you with every utility that is needed for your understanding of the course.

What is the qualification of the trainer?

The trainer is a certified consultant and has significant amount of experience in working with the technology.

Does MyyesM accept the course fees in installments?

Yes, we accept payments in two installments.

How does MyyesM Refund Policy work?

If you are enrolled in classes and/or have paid fees, but want to cancel the registration for certain reason, it can be attained within first 2 sessions of the training. Please make a note that refunds will be processed within 30 days of prior request.

Course Testimonials

Disclaimer: Yes-M Systems and/or their instructors reserve the right to make any changes to the syllabus as deemed necessary to best fulfill the course objectives. Students registered for this course will be made aware of any changes in a timely fashion using reasonable means.