Course description
Learn HDFS, Spark, Kafka, Machine Learning, Hadoop, Hadoop MapReduce, Cassandra, CAP, Predictive Analytics and much more.
What you will learn:
Big Data
Big Data Enabling Technologies
Hadoop Stack for Big Data
Hadoop Distributed File System (HDFS)
Hadoop MapReduce with Example
Spark
Parallel Programming with Spark
Spark Built-in Libraries
Data Placement Strategies
CAP Theorem
Design of Zookeeper
CQL (Cassandra Query Language)
Spark Streaming and Sliding Window Analytics
Kafka
Machine Learning
Machine Learning Algorithm K-means using Map Reduce for Big Data Analytics
Decision Trees for Big Data Analytics
Predictive Analytics
Spark GraphX & Graph Analytics
DESCRIPTION
Big data is a combination of structured, semi structured and unstructured data collected by organisation’s that can be mined for information and used in machine learning projects, predictive modelling and other advanced analytics applications.
Systems that process and store big data have become a common component of data management architectures in organisations, combined with tools that support big data analytics uses. Big data is often characterised by the three V’s:
the large volume of data in many environments;
the wide variety of data types frequently stored in big data systems; and
the velocity at which much of the data is generated, collected and processed.
Big data is a great quantity of diverse information that arrives in increasing volumes and with ever-higher velocity.
Big data can be structured (often numeric, easily formatted and stored) or unstructured (more free-form, less quantifiable).
Nearly every department in a company can utilise findings from big data analysis but handling its clutter and noise can pose problems.
Big data can be collected from publicly shared comments on social networks and websites, voluntarily gathered from personal electronics and apps, through questionnaires, product purchases, and electronic check-ins.
Big data is most often stored in computer databases and is analysed using software specifically designed to handle large, complex data sets.
DURATION OF SELF-PACED ONLINE COURSE DELIVERY
8 sections • 36 lectures • 18h 40m total length
Topics Covered in these course are:
Big Data Enabling Technologies
Hadoop Stack for Big Data
Hadoop Distributed File System (HDFS)
Hadoop MapReduce
MapReduce Examples
Spark
Parallel Programming with Spark
Spark Built-in Libraries
Data Placement Strategies
Data Placement Strategies
Design of Zookeeper
CQL (Cassandra Query Language)
Design of HBase
Spark Streaming and Sliding Window Analytics
Kafka
Big Data Machine Learning
Machine Learning Algorithm K-means using Map Reduce for Big Data Analytics
Parallel K-means using Map Reduce on Big Data Cluster Analysis
Decision Trees for Big Data Analytics
Big Data Predictive Analytics
PageRank Algorithm in Big Data
Spark GraphX & Graph Analytics
Case Studies of big companies and how they operate.
Who is this course for?
Graduates
Software engineers
Developers
CERTIFICATION
Barony Certificate in Big Data Essentials – digital certificate.
Certificate is downloadable once the course is completed.
This is not a tutor-delivered course online.
Instead, we give you online access to the course materials to go through in your own time.