Saturday 21 June 2014

Using Apache Pig to perform Hadoop MapReduce for WordCount Problem

1. Download Apache Hadoop, Apache Pig packages & JDK1.7 and install them at some specific locations and give them a path using >sudo gedit /etc/profile/. For Pig installations refer  http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.1/CDH4-Installation-Guide/cdh4ig_topic_16_2.html.
I have kept my files of wordcount code inside pig-wordcount directory as shown above

 2. There are two files inside directory. One is the input file & another Pig Latin code file for wordcount

3. As seen above, input to the Pig file which consists of different sentences or precisely words

4. This is the heart of Pig Latin i.e. the wordcount code which is much simple and short as compared to the java code. This code counts the occurrences of each word

5. Now open terminal and go to the location where the files are stored

6. Check whether Pig is properly working or not by providing 'pig' command

7. Now run the pig script by following command
    > pig wordcount.pig
where wordcount is the name of the pig script file & with .pig extension

8. This is how the MapReduce function starts running to perform operation to get the desired result

9. After the completion you will get the above message & you will also observe that a wordcount directory has been created which contains the output as seen above

10. So the final output is the count of words from the input file

So what's the big deal...? We can achieve this by using any normal programming language..!!!
Firstly this is not a common programming language like java, pHp, asp or any other. Its 'PIG LATIN' language used along with Apache Pig for processing & analyzing very large datasets containing billion's of record. This continuously processes data by using Apache Hadoop's MapReduce functionality to get the output.
For now we have considered very small dataset which you might not observe its beauty. It works great for processing very large datasets (in terabytes or petabytes). Many companies like Facebook, Twitter or Google uses this architecture for processing their data.
Even in my last blog where I have shown analyzing data using Microsoft Power BI took a huge time for processing & getting the desired result. So that's the benefit of using Hadoop.
Thank you..!
                               







Friday 20 June 2014

Analyzing Facebook Data using Power BI

 1. We will be using Excel 2013 with Power Query installed. From other sources option click 'From Facebook' to import your Facebook data

 2. You will be prompted to enter your Facebook credentials. Then specify an object ('me') and connection (Feeds, comments, likes, posts, notifications, etc.)


3. After you import, you will get a query editor as shown above with different fields (here it shows the pages which i have liked) & related data

 4. We perform some ETL(Extraction, Transformation & Loading) operations to filter out the data which is required

 5. Here we keep only those fields which we are going to work with

 6. Final table will be as shown with different operations applied(refer Applied Steps at extreme right)


7. Give this query editor a name(My Facebook Like here) & click 'Apply & Close' at top left corner


8. You will see a window where it is waiting for data to be loaded from Facebook.com using internet connectivity


9. Total of 147 rows loaded as i have liked this many number of different pages on Facebook


10. After loading is done, click on insert tab and then Power View as shown for analyzing data


11. Select the fields which you want to put it on different axis for analyzing and then select you visualization type(pie chart, bar graph, etc). This view shows different personalities & their like hits on Facebook. Even PM Narendra Modi has likes(approx 19 Million) lesser then hollywood singers :)

 
12. In a similar way you can perform analysis of your friends in your Facebook list by selecting 'Friends'

 13. This view shows analysis for count of number of Facebook friends with their same name greater then or equals to 5

14. This last view shows analysis of my 'Feeds' from the date I started using Facebook i.e. status updates, links, photo updates, videos, etc on my Facebook wall

      Thank you !!!

Thursday 19 June 2014

To create dynamic input parameter in Power Query using SQL Server


 1. Create a database and a table to fetch data from into Excel 2013 with some values inserted.

2. Do check that Power Query plugin is installed.

 3. Import data from Sql server into Excel 2013 using Power Query option given above.

 4. Enter the necessary credentials and the SQL query to import data.

 5. After clicking OK above window will be displayed. Before doing dynamic filtering, we should first use define static filter. Therefore select arrow on Emp_address & perform filter as per your criteria.

 6. Here it shows filter where Emp_address equals London. Then click OK.

 7. Now when you click on view tab and then Advanced Editor you will find the following code.

 8. Create a new simple excel sheet with two parameters[Parameter Name & Value] as shown and save the sheet.

 9. Give the new excel sheet a name [Pass_Parameter] and make first row as header.

 10. Open up the sheet and select any one value to perform a drill down operation.

 11. Now view the code in Advanced Editor and copy those two lines selected.

 12. You will see the sheet like this after performing drill down operation.

 13.Now open the main sheet and paste the code copied from previous Editor as shown. But change the Source [here 'Par'] and Value [here CityValue (you can give of your choice)] and hit OK.

 14. Now write the parameter value to filter [Here I gave as 'Mumbai'].


15. When you click refresh on left side as shown in circle, you will get the filtered result. You can give different parameter value as per your choice to filter the dataset. That's all!!!
 
        Thank you !!!