TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

Pig Example

Apache Pig Example for beginners and professionals with examples on hive, pig, hbase, hdfs, mapreduce, oozie, zooker, spark, sqoop

<< Back to PIG

Pig Example

Use case: Using Pig find the most occurred start letter.

Solution:

Case 1: Load the data into bag named "lines". The entire line is stuck to element line of type character array.

grunt> lines  = LOAD "/user/Desktop/data.txt" AS (line: chararray);

Case 2: The text in the bag lines needs to be tokenized this produces one word per row.

grunt>tokens = FOREACH lines GENERATE flatten(TOKENIZE(line))   As token: chararray;

Case 3: To retain the first letter of each word type the below command .This commands uses substring method to take the first character.

grunt>letters = FOREACH tokens  GENERATE SUBSTRING(0,1)   as letter : chararray;

Case 4: Create a bag for unique character where the grouped bag will contain the same character for each occurrence of that character.

grunt>lettergrp = GROUP letters by letter;

Case 5: The number of occurrence is counted in each group.

grunt>countletter  = FOREACH  lettergrp  GENERATE group  , COUNT(letters);

Case 6: Arrange the output according to count in descending order using the commands below.

grunt>OrderCnt = ORDER countletter  BY  $1  DESC;

Case 7: Limit to One to give the result.

grunt> result  =LIMIT    OrderCnt    1;

Case 8: Store the result in HDFS . The result is saved in output directory under sonoo folder.

grunt> STORE   result   into 'home/sonoo/output';
Next TopicPig UDF




Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf