Data Engineering Training Bootcamp

  • Course Duration3 Month
  • Course StartEnrollment Monthly

Description

Who is a Data Engineer?

A Data Engineer is someone with specialized skills in creating software solutions around data. Their skills are predominantly based around Hadoop, Spark, and the open source Big Data ecosystem projects. Data Engineers come from a Software Engineering background and program in Java, Scala, or Python.

A Data Engineer has realized the need to go from being a general Software Engineer and specialize in Big Data as a Data Engineer. This is because Big Data is changing and they need to keep up with the changes. Also, there is a copious amount of knowledge that a Data Engineer needs to know and there isn’t enough time to keep up with Big Data and other general software topics.

A qualified Data Engineer’s value is to know the right tool for the job. They understand the subtle differences in use cases and between technologies, and they can create data pipelines. This course will take you the right skills for the job.

Introduction to Hadoop

  1. Understanding Big data
  2. Distributed Hadoop architecture overview
  3. Hadoop releases and Ecosystem Overview

Hadoop Architecture and Concepts

  1. Mapr and Apache Hadoop architectural concepts
  2. HDFS/Mapr red and write
  3. HDFS commands

MapReduce

  1. Introduction MapReduce
  2. MapReduce program

Introduction to Hadoop Ecosystem

  1. Introduction to Spark
  2. Writing and running spark job
  3. Introduction to hive/impala
  4. Introduction to Sqoop
  5. Introduction to Drill (SQL Querying engine)
  6. Cassandra Unstructured Key - value pair data storage
  7. Hbase unstructured key-value pair storage
  8. Introduction to Kafka and Flume

Advanced Spark Concepts

  1. Reading files with spark
  2. Reading and writing to Cassandra table
  3. Reading and writing to Hbase table
  4. Spark Streaming concepts

Data Ingestions

  1. Introduction to Streamsets
  2. Introductions to Data ingestions using Zeppelin

Data Analytics & Visualization

  1. Introductions to Python for data analytics
  2. Introductions to data visualization using Zeppelin
  3. Introduction to Grafana, Elasticsearch and Kibana

Our Partners

Institutions we have partnered with or Worked with previously

MapR Technologies
Kaggle
Dataiku
Nita
Kenya Tourism Board
Barclays
British American Tobacco
Coop Bank
Craft Silicon
CRDB Bank
ICPAK
IPSOS
KAM
Lapfund
National Land Comission
NSSF Uganda
Reinsuance
Safaricom
URA