Contact us

Big Data - Hadoop & PySpark

Language: English

Instructors: Blismos Academy



Why this course?


The Big Data - Hadoop & PySpark Course covers the topics from the very basics of Big Data to the level needed to work in live projects. The Course is a must for anyone in the IT industry and prospective Big Data experts. The Course is the right blend of big data concepts and hands-on in HDFS, MapReduce, HIVE, SQOOP, No-SQL, Kafka, Python, PySpark in detail.

Course Curriculum

Welcome to the Course Video (2:00) Preview
The Fundamentals
Data VS Information (4:00)
Data Storage and Processing (8:00)
Data Sources (7:00)
Big Data Introduction (12:00) Preview
Fundamentals Assessment
Live Class on March 6 2023 (59:00)
The Foundations of Big Data
2.1 Emergence of Big Data
Emergence of Big Data (5:00) Preview
Basic Terminologies (4:00)
Foundations Assessment 1
2.2 Central Theme of Big Data
Central Theme of Big Data (8:00)
Requirements of Programming Model (13:00)
Understand Distributed Processing through a Story (6:00)
Foundations Assessment 2
LiveClassMarch82023 (67:00)
LiveClassMarch152023 (46:00)
Environment and Installations
1Oracle_VM_Installation_1 (3:00)
Google Cloud Platform Setup
How to install Ubuntu operating system on Virtual box (7:00)
How to install PySpark on Ubuntu with Java and Python_3 (10:00)
How to configure Pyspark with Pycharm_with_Installation (7:00)
Google Cloud Platform Setup (10:00)
Hadoop Ecosystem
3.1 Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem (12:00) Preview
Hadoop Ecosystem Assessment 1
3.2 Hadoop Distributed File System
What is HDFS? (8:00)
Nodes in HDFS (7:00)
LiveClassMarch172023 (51:00)
HDFS Assessment1
Storing File in HDFS (14:00)
Reading File from HDFS (5:00)
HDFS Assessment2
Challenges in Distributed Systems (5:00)
LiveclassMarch232023 (41:00)
Managing the Data Node Failure (17:00)
LiveClassMarch242023 (60:00)
HDFS Assessment3
Managing Name Node Failure (10:00)
LiveclassMarch272023 (82:00)
HDFS Commands Part 1 (11:00)
HDFS Commands Part2 (7:00)
LiveclassMarch282023 (77:00)
HDFS Assessment 4
3.3 Map Reduce
Introduction to Map Reduce (9:00)
Map Reduce Flow Example 1 (8:00)
Map Reduce Implementation (5:00)
LiveClassMarch292023 (65:00)
Map Reduce Example 2 - User View Count (8:00)
Map Reduce Mappers and Reducers (12:00)
MapReduce Assessment 1
Shuffle-Sort-Partitions (12:00)
LiveClassMarch302023 (75:00)
Map Reduce Combiners (11:00)
Combiner with Caution (9:00)
Map Reduce Wrap Up (2:00)
MapReduce Assessment 2
3.4 Hive
Transactional and Analytical Processing (10:00)
What is Data warehouse? (6:00)
Introducing Hive (10:00)
LiveclassMarch312023 (76:00)
Hive Assessment 1
Hive Hands-on1 (11:00)
Hive Hands-on2 (10:00)
Hive Hands-On Assessment 1
Hive vs RDBMS (15:00)
LiveClassApril32023 (63:00)
Hive Architecture (8:00)
LiveClassApril042023 (64:00)
Hive Metastore (4:00)
Hive Assessment 2
Hive Hands-On 3 (7:00)
Hive Hands_on Assessment 2
Primitive Datatypes in Hive (7:00)
How storage works in Hive (4:00)
Different types of Tables in Hive (7:00)
LiveApril52023 (58:00)
Hive Assessment 3
Hive Hands-on4 (12:00)
Hive Hands-on5 (13:00)
Hive Hands-on Assessment3
Inserting the Data into Hive Tables (4:00)
Hive Hands-on6-Inserting data into Tables (22:00)
LiveclassApril62023 (63:00)
LiveClassApril102023 (65:00)
Hive Complex Datatypes (5:00)
LiveclassApril112023 (47:00)
Hive User Defined Functions (7:00)
LiveclassApril122023 (71:00)
Hive Assessment 4
Hive Hands-on7 _Complex Datatype (12:00)
Hive Hands-on Retrieving Elements from Complex data types columns and Explode (12:00)
Hive Hands_on Assessment 4
Denormalized Storage in Hive (13:00)
Hive Optimization Of The Queries Theory (11:00)
Hive Partition & Bucketing Theorypart1 (13:00)
LiveClassApril132023 (66:00)
Hive Partition & Bucketing Theory Part2 (9:00)
Hive Assessment 5
Hive Hands-on9- Partitioning Part1 (24:00)
LiveClassApril142023 (38:00)
Hive Hands on10 Partitioning Part 2 (12:00)
Hive Hands on11-Bucketing (13:00)
LiveClassApril172023 (52:00)
Hive Hands Assessment-5
Python for PySpark
Introduction to Programming
Introduction to Programming (16:00)
LiveClassApril182023IntrotoProgramming (64:00)
Python Programming
Introduction to Python (7:00)
Environment for Python (4:00)
Executing Python Code (6:00)
LiveclassApril192023PythonIntro1 (24:00)
Python Assessment 1
Syntax, Indentation and Comments (6:00)
Syntax, Indentation and Comments - Practical (5:00)
LiveClassApril202023PythonIndentationVariables (60:00)
Variables (12:00)
Variable Practical's (11:00)
Python Datatypes (15:00)
LiveClassApril212023PythonDataTypes (56:00)
Python Datatypes Practicals (10:00)
Python Assessment 2
Python Operator Concepts (13:00)
Python Operator Praticals (9:00)
LiveClassApril252023PythonOperators (84:00)
Control Flows in Python (4:00)
LiveClassApril262023PythonControlFlowIntro (20:00)
Control Flows - IF ELSE Concepts (6:00)
If Else Practical (4:00)
Loops Theory (10:00)
Loops Practical (8:00)
LiveclassApril272023PythonControlFlow (64:00)
Python Assessment 3
Python Function Concepts (11:00)
Python Function Hands-on (9:00)
Apache Spark
Introduction to Spark
Why Spark? (6:00)
Advantages of Spark (7:00) Preview
LiveClassMay42023WhySparkandAdvantagesofSpark (46:00)
What is Spark? (6:00) Preview
Components of Spark (2:00)
LiveclassMay52023WhatisSpark (39:00)
History of Spark (5:00)
Introduction to Spark Assessment1
Overview of the Spark
Architecture of Spark (9:00)
LiveClassMay82023ArchitecutreofSpark (49:00)
Spark Session (6:00)
Spark Session Terminal & Jupyter notebook Hands-On (4:00)
Spark Language API (4:00)
Overview of the Spark Assessment1
Dataframes and Partitions (8:00)
How to Create Dataframe in Terminal and in Jupyter Notebook? (4:00)
Spark Transformations (9:00)
Spark Actions (8:00)
Overview of the Spark Assessment2
Structured API Overview
Structured APIs - Dataframes and Datasets (7:00)
Schema Definition (4:00)
Spark Types (5:00)
Structured API Execution (6:00)
Structured API Overview Assessment1
Operations on Dataframes
Dataframe Columns (11:00)
Columns as Expression (5:00)
Dataframe Rows (6:00)
Operations on Dataframe Assessment1
Ways of Creating Dataframe (16:00)
Methods to Manipulate Columns (20:00)
DataFrame Transformations (2:00)
Operations on Dataframe Assessment2
Dataframe Transformation - Columns (15:00)
Dataframe Transformations - Rows Part1 (14:00)
Dataframe Transformation - Rows Part2 (19:00)
Operations on Dataframe Assessment3
Working with Different Types of Data
Introduction to working with Different Types of Data (2:00)
Working with Booleans (15:00)
Working with Numbers (15:00)
Working with Strings (10:00)
Working with Strings Practical1 (8:00)
Working with String Practical2 (7:00)
Introduction to working with Different Types of Data Assessment 1
Working with Date and Time Stamps (17:00)
Working with Null Concepts (7:00)
Working with Nulls Practicals (15:00)
Working with Complex Types (9:00)
Working with Complex types practical (11:00)
User Defined Functions - Concepts (12:00)
User Defined Functions - Practicals (8:00)
Introduction to working with Different Types of Data Assessment 2
Creating Dataframes from different sources
Data Sources Introduction (4:00)
Read-API- Data Sources (4:00)
Read-API-Practical (12:00)
Write-API-Data Sources (3:00)
Write-API-Practical (13:00)
Creating Dataframes from different sources Assessment 1
Reading from CSV Files (10:00)
Writing into CSV Files (5:00)
Reading from JSON Files and Writing into JSON (9:00)
Reading from Parquet and writing into Parquet (11:00)
Reading from ORC and writing into ORC (9:00)
Unstructured Data - Text File - Reading and Writing (12:00)
Introduction to reading data from structured sources (6:00)
Reading data from structured sources - Database - Concepts (9:00)
Reading data from structured sources - Database - Practicals (15:00)
Query Pushdown Concepts (8:00)
Query Pushdown Praticals (7:00)
Writing into structured sources - Database - Concepts (5:00)
Writing into structured sources - Database - Practicals (11:00)
Creating Dataframes from different sources Assessment 2
Introduction to Aggregations (9:00)
Aggregataion Concepts - Count (7:00)
Aggregation_Practical-1-Count (15:00)
Aggregation Concepts - First, Sum and Average (4:00)
Aggregation - Practical - 2FirstLastAverage (12:00)
Aggregation Assessment 1
Aggregation concepts - Statistical Functions (11:00)
Aggregation-Practical-3-StatisticalFunctions (11:00)
Aggregation Concepts - Grouping (5:00)
Aggregation-Practical-4-GroupBy (9:00)
Aggregation Concepts - Window Functions (8:00)
Aggregation-Practical-5-WindowFunctions (15:00)
Aggregation Concepts - RollUp and Cube (6:00)
Aggregation-Practical-6-RollupandCube (12:00)
Aggregation Assessment 2
Spark Joins
Spark Joins Theory-1-Introduction (5:00)
Spark Joins Theory-2-How Joins Work (5:00)
Spark Joins-Theory-3-Inner Joins (2:00)
Spark Joins -Practical -1-Innerjoins (7:00)
Saprk Joins - Theory-4 - Outer Joins (5:00)
Spark Joins -Practical - OuterJoins (6:00)
Spark Joins -Theory - 5-Left Semi & Anti Joins (6:00)
Spark Joins - Practical - LeftSemiAntiJoins (4:00)
Spark Joins -Theory -6-CrossJoin (4:00)
Spark Joins - Practical- CrossJoins (3:00)
Spark Joins -Theory -7-ChallengesInJoins (7:00)
Spark Joins-5-Practical-TacklingtheChallengesinJoins (16:00)
Spark Joins -Theory -8-CommunicationStrategies (17:00)
Joins Assessment
Resilient Distributed Datasets-RDDs
What is an RDD ? (6:00)
Introduction to Low Level APIs (7:00)
Properties Of RDD (3:00)
When to use RDDs (3:00)
Creating RDDs (12:00)
RDD Practical-1-Creating RDDs (10:00)
RDD Assessment 1
RDD Lineage (6:00)
RDD Transformations (13:00)
RDD - Transformations Practical (10:00)
RDD Actions (12:00)
RDD Actions - Practical (11:00)
RDDT Saving To File (3:00)
RDD Saving to a File - Practical (3:00)
RDD Assessment 2
Distributed Variables
Distributed Variables - Introduction (3:00)
Broadcast Variables (13:00)
Broadcast Variables - Practical (6:00)
Accumulators (10:00)
Accumulators - Practical (6:00)
Distributed Variables Assessment
How Spark runs on a Cluster
Introduction (3:00)
How Spark runs on a Cluster - ClusterManager (3:00)
How Spark runs on a Cluster - ExecutionModes (4:00)
Life Cycle a Spark Application - Outside Spark (7:00)
Life Cycle of a Spark Application - Inside Spark (12:00)
How Spark runs on a Cluster Assessment
LiveclassMay92023PySparkSparkSession (76:00)
LiveclassMay112023PysparkTransformations (56:00)
LiveclassMay122023PySparkActions (40:00)
LiveclassMay152023SparkStructuredAPIDatatypes (46:00)
LiveclassMay162023PySparkLogicalPhysicalCatalystOptimizer (78:00)
LiveclassMay182023PySparkColumnsandRows (26:00)
LiveclassMay192023PysparkCreatingDataframes (52:00)
LiveclassMay222023PySparkColumnManipulation (74:00)
LiveClassMay242023PySparkRowTransformationsSort (46:00)
LiveclassMay252023PySparkBooleans (49:00)
LiveclassMay262023PySparkNumbers&Spaces (61:00)
LiveclassMay292023PySparkStringManipulationDate (40:00)
LiveclassMay30PySparkNullpracticals&ComplexDataTypes (59:00)
LiveclassJune12023PySparkCompleTypesPracticalUDFTheory (44:00)
LiveclassJune22023PySparkUDFPracticals (61:00)
LiveclassJune52023PySparkDataSources1 (45:00)
LiveclassJune072023PySparkWritemode (83:00)
LiveclassJune082023PySparkCSVJSONParquet (71:00)
LiveclassJune092023DataSourceTextFileandplans (62:00)
LiveclassJune132023PySparkReadingfromDatabase (55:00)
LiveClassJune142023PySparkWritingtoDBandAggregationINtro (67:00)
LiveClassJune162023PySparkAggregationsGroupBY (47:00)
LiveclassJune192023PySparkWindowRollUPCube (77:00)
LiveclassJune202023PySparkJOins1 (59:00)
LiveClassJune212023PysparkJoins2 (52:00)
LiveClassJune272023PySparkCommunicationStrategies (43:00)
LiveClassJune282023PySparkJoinStrategiesHandson (64:00)
LiveclassJuly032023PySparkJoinStrategyHints (43:00)
LiveClassJuly42023PySparkRDD1 (57:00)
LiveClassJuly52023PySparkRDD2Transformation (57:00)
LiveClassJuly062023PySparkRDDActions (35:00)
LiveClassJuly72023PySparkDistributedVariables (46:00)
SparkExecutionModes (46:00)
Feb 5 2024 Batch Live Videos
IntrotoDataFeb52024 (79:00)
Feb7WhatisBigData (32:00)
3rdClassFebBigDataTerminolgies (42:00)
CentralThemeofBigDataFeb122024 (65:00)
PastaStoryHadoopEcsystemINtro (19:00)
HDFS1NameNodeDataNode (59:00)
TacklingtheChallengesofDataNodeandNamenodeFailure (57:00)
MapReducePart1 (72:00)
MapReducePart2Reducers&Combiners (58:00)
TransactionalVsAnalyticalDatawarehouseintrotohive (72:00)
IntroductiontoProgrammingLanguage (28:00)
PythonIntroductionVariables (49:00)
PythongIntrotoDatatypes (11:00)
PythonDataTypesPracticalOperatorsConcepts (25:00)
Python-ControlFlows (34:00)
PythonFunction (20:00)
SparkIntroduction (47:00)
SparkArchitecture (73:00)
SparkSession&Dataframecreation (43:00)
SparkTransformationandActions (50:00)
SparkTypesandSchema (34:00)
StructuredAPIExecutionandLogicalandPhysicalPlan (41:00)
ColumnsRowsandCreatingtheDataframes (26:00)
WorkingwithDataframes (50:00)
ColumnandRowManipulations (48:00)
RowsSortandUnion (35:00)
DifferentTypesofData-BooleanandNumericals (46:00)
SparkStringManipulations (37:00)
SparkDates (20:00)
SparkHandlingNull (36:00)
SparkHandlingComplexDataTypes (18:00)
SparkUDfs (36:00)
SparkPythonUDFProcessandDataFramereader (51:00)
video1972650244 (29:00)
SparkReadingCSVRepartition&coalesce (54:00)
SparkJSONParquetORCandtextfiles (60:00)
SparkReadingDataRDBMS (36:00)
SparkPushedDownQueryandWritingDataintoRDBMS (56:00)
SparkAggregation1 (37:00)
SparkAggregations2 (41:00)
SparkAggregationsGroupBy (26:00)
SparkAggregationsWindowFunctions (52:00)
SparkAggregationsRollUpandCube (35:00)
SparkJoins1 (47:00)
Sparkjoins2 (25:00)
SparkChallengesinJoins (35:00)
SparkJoinsCommunicationStrategies (54:00)
SparkRDD1 (58:00)
Spark-RDDManipulations (24:00)
SparkRDDTransformationsandActions (46:00)
SparkRDDsWrittingintoafile (17:00)
Spark-DistributedVaraibles (55:00)
SparkExecutionModes (46:00)
SparkLifeCycleoutsideandinside (44:00)
SparkPerformancetuning-Caching (23:00)
SparkTuningCachingPersistenceHands-on (32:00)
SparkPTJoinsHints (35:00)
SparkTuningCoalesceHints (32:00)
IntrotoPandaNumpy&Matplot (8:00)
SparkPerformanceTuning2 (43:00)
PerfromanceTuning_Hands-on1 (50:00)
PerformanceTuningHandsOnColumnPrunRowfilter (21:00)
PerformanceTuningSparkPartitioning (46:00)
SparkBucketingPerformanceTuning (27:00)
AQE-Intro (15:00)
SparkPerformanceTuningAQEConcepts (17:00)
SparkPerformanceTuningAQEHands-on (40:00)
PassbatchNamenodefailure (45:00)
APSSRequirementsofprogrammingmodel (55:00)
APSSMapReduceParallelism (53:00)
APSS (64:00)
PASSclassDatawarehouse (116:00)
PASSDataWarehouseSCDs (57:00)
PASSHive (75:00)
PASSWhySparkandWhatisSpark (70:00)
PASS-SparkArchitectureSessionTransformation (121:00)
PASSSparkActions-StructuredAPIs (66:00)
PASS-SparkSchemaLogicalandPhysicalPlan (114:00)
PASS-SparkTypeSafeExplainmodesColsRows (88:00)
PASSSparkColsManipulation (19:00)
PASSSparkSelectcolexpr (16:00)
PASSSparkColManipulations2 (115:00)
PASS-SparkROwManipulation1 (55:00)
PASSSparkRowManipulations (124:00)
PASSSparkUnion (46:00)
PASSDoubtsClearing (60:00)
PASSBoolean (103:00)
PASSSparkNumbers (44:00)

How to Use

After successful purchase, this item would be added to your courses.You can access your courses in the following ways :

  • From the computer, you can access your courses after successful login
  • For other devices, you can access your library using this web app through browser of your device.
