Sunday 20 July 2014

Microsoft Windows Azure HDInsight - "A Cloud based Big Data Solution"



       What is HDInsight?

  • HDInsight is Microsoft's Hadoop-based service that brings a 100% Apache Hadoop solution to the cloud. 
  • HDInsight comes in two flavours: 
  1.  HDInsight Server for local on premise installation of the Hadoop distribution
  2.  HDInsight on Azure Service which is the easiest way to deploy, manage and scale Hadoop based solution
Why HDInsight?
  •  Allows to easily build a Hadoop cluster in minutes when you need it, and tear it down after you run your MapReduce jobs which makes cost efficient
  • Works seamlessly with Windows Azure Blobs, mechanisms for storing large amounts of unstructured data that can be accessed from anywhere
Why it can be useful for a particular organization
  •  HDInsight service is Microsoft's azure based cloud service that allows you to gain deep & richer insight for better decision making from any data, any size and any location.
  • Basically used to manage, analyse, and report on big data.

       Technology used & Skillset required
  • Windows Azure Cloud service subscription
  • Apache Hadoop
  • MapReduce & HDFS
  • Pig, Hive, Sqoop
  • Programming language (C#, Java, Python)
  • Structured Query Language (SQL)

      Scenario where we can opt to use HDInsight

1.  Problem statement: The famous Word count problem
Assume we have a large text file or probably a novel. We need to find the occurrence of each word in the given file.
Solution:
We can use Hadoop MapReduce for quick processing and getting the desired result in HDInsight
Later we can analyse the same result using Power BI
                          For a sample demo on word count problem click here

2.  Problem statement: Analysing crime data
Suppose we have a crime data for past 5 years containing billions of records.
We need to perform simple goal such as calculating the number and type of crime incident 
Solution:
We can achieve this in HDInsight using Hadoop MapReduce or Hive where output can be similar like shown below 

Table1. Crime data summarized by type (showing top 5 crimes)

Theft
823689
Murder
746563
Malicious mischief
445654
Drink & drive
339823
Damage to property
123377

For more details visit azure.microsoft.com