Before we take a look at the operators that Pig Latin provides, we first need to understand Pig’s data model. This includes Pig’s data types, how it handles concepts such as missing data, and how you can describe your data to Pig.
Pig’s data types can be divided into two categories: scalar types, which contain a single value, and complex types, which contain other types.
Pig’s scalar types are simple types that appear in most programming
languages. With the exception of bytearrays, they are all represented in
Pig interfaces by
java.lang classes, making them
easy to work with in UDFs:
An integer. Ints are represented in interfaces by
java.lang.Integer. They store four-byte
signed integers. Constant integers are expressed as integer
numbers: for example,
A long integer. Longs are represented in interfaces by
They store eight-byte signed integers. Constant longs are
expressed as integer numbers with an
L appended: for example,
An integer of effectively infinite size (it is bounded only
by available memory). Bigintegers are represented in
There are no biginteger constants.1 Chararray and numeric types can be cast to biginteger to produce a constant value in the script. An important note: performance of bigintegers is significantly worse than ints or longs. Whenever your value will fit into one of those types you should use it rather than biginteger. ...