User:PRIYAM232/sandbox

Pig is a high level programming language useful for analyzing large data sets. Pig was a result of development effort at Yahoo!Pig enables people to focus more on analyzing bulk data sets and to spend less time in writing Map-Reduce programs.Pig consists of two components: Pig Latin, which is a language Runtime environment, for running Pig Latin programs. To write data analysis programs, Pig provides a high-level language known as Pig Latin. This language provides various operators using which programmers can develop their own functions for reading, writing, and processing data. To analyze data using Apache Pig, programmers need to write scripts using Pig Latin language. All these scripts are internally converted to Map and Reduce tasks. Apache Pig has a component known as Pig Engine that accepts the Pig Latin scripts as input and converts those scripts into MapReduce jobs. Why Do We Need Apache Pig? Using Pig Latin, you can perform MapReduce tasks easily without having to type complex codes in Java. Pig uses multi-query approach, thereby reducing the length of codes. For example, an operation that would require you to type 200 lines of code (LoC) in Java can be easily done by typing as less as just 10 LoC in Apache Pig. Pig reduces the development time by almost 16 times. Pig Latin is SQL-like language and it is easy to learn Features Rich set of operators Ease of programming Optimization opportunities Extensibility User-defined Functions Handles all kinds of data Pig Latin A Pig Latin program consist of a series of operations or transformations which are applied to the input data to produce output. These operations describe a data flow which is translated into an executable representation, by Pig execution environment. Underneath, results of these transformations are series of MapReduce jobs which a programmer is unaware of. So, in a way, Pig allows programmer to focus on data rather than the nature of execution. execution modes Pig has two execution modes: Local mode : In this mode, Pig runs in a single JVM and makes use of local file system. This mode is suitable only for analysis of small data sets using Pig Map Reduce mode: In this mode, queries written in Pig Latin are translated into MapReduce jobs and are run on a Hadoop cluster (cluster may be pseudo or fully distributed). MapReduce mode with fully distributed cluster is useful of running Pig on large data sets.