Hadoop Tutorial:
Developing Big-Data Applications with Apache Hadoop

Interested in live training from the author of these tutorials? See the upcoming Hadoop training course in Maryland, co-sponsored by Johns Hopkins Engineering for Professionals. Or, contact hall@coreservlets.com for info on customized Hadoop courses onsite at your location.


Following is an extensive series of tutorials on developing Big-Data Applications with Hadoop. Since each section includes exercises and exercise solutions, this can also be viewed as a self-paced Hadoop training course. All the slides, source code, exercises, and exercise solutions are free for unrestricted use. Click on a section below to expand its content. The relatively few parts on IDE development and deployment use Eclipse, but of course none of the actual code is Eclipse-specific. These tutorials assume that you already know Java; they definitely move too fast for those without at least moderate prior Java experience. If you don't already know the Java language, please see the Java programming tutorial series.

For options for customized Hadoop training onsite at your organization, please see the Hadoop training course page or email hall@coreservlets.com.

Overview of the Hadoop Tutorial Series

It is becoming increasingly common to have data sets that are too large to be handled by traditional databases, or by any technique that runs on a single computer or even a small cluster of computers. In the age of Big-Data, Hadoop has evolved as the library of choice for handling it. This tutorial gives a thorough introduction to Hadoop, along with many of the supporting libraries and packages. It also includes a free downloadable virtual machine that already has Hadoop installed and configured, so that you can quickly write code and test it out. See the "Source Code and Virtual Machine" section at the bottom of this tutorial.

These tutorials are written by Hadoop expert Dima May, and are derived from the world-renowned coreservlets.com live training courses. Customized courses on Hadoop are usually taught on-site at customer locations, but JSF 2.2, PrimeFaces, Spring, Hibernate, RESTful Web Services, Android, Hadoop, Ajax/jQuery, GWT, Java 7, and Java 8 training courses at public venues are periodically scheduled for people with too few developers for an onsite course. For descriptions of the various other courses that are available, please see the Java EE and Ajax training course page. For options for customized Hadoop training onsite at your organization, please see the Hadoop training course page or email hall@coreservlets.com.

If you find these free tutorials helpful, we would appreciate it if you would link to us.

The Development Environment

This section walks you through setting up and using the development environment, starting and stopping Hadoop, and so forth.

Overview of Hadoop

This section walks you through setting up and using the development environment, starting and stopping Hadoop, and so forth.

HDFS Part 1 -- Overview

HDFS Part 2 -- Installation and Shell

HDFS Part 3 -- Java API

HBase Part 1 -- Overview

HBase Part 2 -- Installation and Shell

HBase Part 3 -- Java Client API

HBase Part 4 -- Java Admin API

HBase Part 5 -- Java Client API Advanced Topics

HBase Part 6 -- Key Design

Map-Reduce on YARN Part 1 -- Overview and Installation

Map-Reduce Part 2 -- Developing First MapReduce Job

Map-Reduce Part 3 -- Running Jobs

Map-Reduce Part 4 -- Input and Output

Map-Reduce Part 5 -- MapReduce Features

Map-Reduce Part 6 -- Job Execution on YARN

Map-Reduce Part 7 -- Hadoop Streaming

Map-Reduce Part 8 -- MapReduce Workflows

Oozie

Pig Part 1 -- Introduction

Pig Part 2 -- Joining Data Sets and Other Advanced Topics

Hive

Source Code and Virtual Machine

Installing and configuring Hadoop is a tedious and time-consuming process. So, we have provided a Ubuntu Virtual Machine with Hadoop already installed (plus Java, Eclipse, and all the code from this tutorial and its associated exercises). This VM can be installed for free on any Windows, MacOS, Linux, or Solaris platform. Click on the link below for details.

VM download and installation info

If you already have Hadoop installed, you can also download the source code separately:

More Information

Java

JSF (JavaServer Faces)

Servlets & JSP
Ajax, GWT, & JavaScript

Spring, Hibernate, & JPA

Struts