C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Pig ExampleUse case: Using Pig find the most occurred start letter. Solution: Case 1: Load the data into bag named "lines". The entire line is stuck to element line of type character array. grunt> lines = LOAD "/user/Desktop/data.txt" AS (line: chararray); Case 2: The text in the bag lines needs to be tokenized this produces one word per row. grunt>tokens = FOREACH lines GENERATE flatten(TOKENIZE(line)) As token: chararray; Case 3: To retain the first letter of each word type the below command .This commands uses substring method to take the first character. grunt>letters = FOREACH tokens GENERATE SUBSTRING(0,1) as letter : chararray; Case 4: Create a bag for unique character where the grouped bag will contain the same character for each occurrence of that character. grunt>lettergrp = GROUP letters by letter; Case 5: The number of occurrence is counted in each group. grunt>countletter = FOREACH lettergrp GENERATE group , COUNT(letters); Case 6: Arrange the output according to count in descending order using the commands below. grunt>OrderCnt = ORDER countletter BY $1 DESC; Case 7: Limit to One to give the result. grunt> result =LIMIT OrderCnt 1; Case 8: Store the result in HDFS . The result is saved in output directory under sonoo folder. grunt> STORE result into 'home/sonoo/output';
Next TopicPig UDF
|