This event is now over

Event Details


Apache Hadoop, the open source data management software that helps organizations analyze massive volumes of structured and unstructured data, is a very hot topic across the tech industry. Employed by such big named websites as eBay, Facebook, and Yahoo, Hadoop is being tagged by many as one of the most desired tech skills for 2013 and coming years along with Cloud Computing.

  •  Understand Big Data & Hadoop Ecosystem
  •  Hadoop Distributed File System – HDFS
  •  Use Map Reduce API and write common algorithms
  •  Best practices for developing and debugging map reduce programs
  •  Advanced Map Reduce Concepts & Algorithms
  •  Hadoop Best Practices & Tip and Techniques
  •  Managing and Monitoring Hadoop Cluster
  •  Importing and exporting data using Sqoop
  •  Leverage Hive & Pig for analysis
  •  Running Hadoop on Cloud


• What is Big Data & Why Hadoop?

  • Big Data Characteristics, Challenges with traditional system

• Hadoop Overview & it’s Ecosystem

  • Anatomy of Hadoop Cluster, Installing and Configuring Hadoop

  • Hands-On Exercise

• HDFS – Hadoop Distributed File System

  • Name Nodes and Data Nodes

  • Hands-On Exercise

• Map Reduce Anatomy

  • How Map Reduce Works?

  • The Mapper & Reducer, InputFormats & OutputFormats, Data Type & Customer Writables

• Developing Map Reduce Programs

  • Setting up Eclipse Development Environment, Creating Map Reduce Projects, Debugging and Unit Testing Map Reduce Code, Testing with MRUnit

  • Hands-On Exercise

• Advanced Map Reduce Concepts

  • Combiner, Partitioner, Counter, Compression, Setup and teardown, Speculative Execution, Zero Reducer and Distributed Cache

  • Hands-On Exercise 

• Advanced Map Reduce Algorithms

  • Sorting, Searching and Indexing, Multiple Inputs, Chaining multiple jobs

  • Joins, Handling Binary & Unstructured data

  • Hands-On Exercise

• Advanced Tips & Techniques

  • Determining optimal number of reducers, skipping bad records

  • Partitioning into multiple output files & Passing parameters to tasks

  • Optimizing Hadoop Cluster & Performance Tuning

• Monitoring & Management of Hadoop

  • Managing HDFS with Tools like fsck and dfsadmin

  • Using HDFS & Job Tracker Web UI

  • Routine Administration Procedures

  • Commissioning and decommissioning of nodes

  • Hands-On Exercise

• Using Hive & Pig

  • Hive Basics & Pig Basics

  • Hands-On Exercise

• Sqoop

  • Importing and Exporting data from using RDBMS

  • Hands-On Exercise

• Deploying Hadoop on Cloud

  • Deploying and Configuring Hadoop on Amazon EC2

  • Using Amazon EMR (Elastic Map Reduce)

• Hadoop Best Practices and Use Cases

Instructor Bio

Trainer has about 15+ years of industry experience which includes 5+ years at customer locations (USA, UK, Japan, Singapore) and 3+ years of experience in training overall. He has worked with Wipro Technologies over 13 years growing from role to role and Worked with prestigious customers like Mitsubishi Corporation, Ministry of Defence - Singapore, DAIWA Institute of Research, Marsh & McLennan, Zurich Insurance Company, Novartis, Pfizer Worked extensively on technologies such as J2EE, Oracle, PL/SQL, Data Warehouse, Analytics, Big Data and Hadoop. Possesses sound exposure of consulting in scalable multi-tier web applications, analytical and ERP solutions using J2EE, and other competitive technologies in UNIX, Windows and Linux Environments Led teams in contemporary trends like Big Data Analytics, Mobile apps, SAAS, digital marketing and web analytics Conducted 50+ trainings and consulting engagements for his team and organizations on Technical (PL/SQL, Oracle, SQL Tuning, Big Data & Hadoop), Process, Program & Project Management.


For more details please call Mr. Zach: 9538878795/


February 26, 2013 — 9:00 am to
June 30, 2013 — 6:00 pm

Add to your calendar


Pune, New Delhi