What is HDInsight?
- HDInsight is Microsoft's Hadoop-based service that brings a 100% Apache Hadoop solution to the cloud.
- HDInsight comes in two flavours:
- HDInsight Server for local on premise installation of the Hadoop distribution
- HDInsight on Azure Service which is the easiest way to deploy, manage and scale Hadoop based solution
- Allows to easily build a Hadoop cluster in minutes when you need it, and tear it down after you run your MapReduce jobs which makes cost efficient
- Works seamlessly with Windows Azure Blobs, mechanisms for storing large amounts of unstructured data that can be accessed from anywhere
- HDInsight service is Microsoft's azure based cloud service that allows you to gain deep & richer insight for better decision making from any data, any size and any location.
- Basically used to manage, analyse, and report on big data.
Technology used &
Skillset required
- Windows Azure Cloud service subscription
- Apache Hadoop
- MapReduce & HDFS
- Pig, Hive, Sqoop
- Programming language (C#, Java, Python)
- Structured Query Language (SQL)
Scenario where we can
opt to use HDInsight
1.
Problem statement: The famous Word count problem
Assume we
have a large text file or probably a novel. We need to find the occurrence of
each word in the given file.
Solution:
We can use
Hadoop MapReduce for quick processing and getting the desired result in
HDInsight
Later we can
analyse the same result using Power BI
For a sample demo on word count problem click here
For a sample demo on word count problem click here
2. Problem
statement: Analysing crime data
Suppose we
have a crime data for past 5 years containing billions of records.
We need to
perform simple goal such as calculating the number and type of crime incident
Solution:
We can achieve
this in HDInsight using Hadoop MapReduce or Hive where output can be similar
like shown below
Table1. Crime
data summarized by type (showing top 5 crimes)
Theft
|
823689
|
Murder
|
746563
|
Malicious mischief
|
445654
|
Drink & drive
|
339823
|
Damage to property
|
123377
|
For more details visit azure.microsoft.com