Cover by Alan Gates

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

Testing Your Scripts with PigUnit

As part of your development, you will want to test your Pig Latin scripts. Even once they are finished, regular testing helps assure that changes to your UDFs, to your scripts, or in the versions of Pig and Hadoop that you are using do not break your code. PigUnit provides a unit-testing framework that plugs into JUnit to help you write unit tests that can be run on a regular basis. PigUnit was added in Pig 0.8.

Let’s walk through an example of how to test a script with PigUnit. First, you need a script to test:

--pigunit.pig
divs   = load 'NYSE_dividends' as (exchange, symbol, date, dividends);  
grpd   = group divs all;                                                
avgdiv = foreach grpd generate AVG(divs.dividends);                              
store avgdiv into 'average_dividend';

Second, you will need the pigunit.jar JAR file. This is not distributed as part of the standard Pig distribution, but you can build it from the source code included in your distribution. To do this, go to the directory your distribution is in and type ant jar pigunit-jar. Once this is finished, there should be two files in the directory: pig.jar and pigunit.jar. You will need to place these in your classpath when running PigUnit tests.

Third, you need data to run through your script. You can use an existing input file, or you can manufacture some input in your test and run that through your script. We will look at how to do both.

Finally, you need to write a Java class that JUnit can use to run your test. Let’s start with a simple example that runs the preceding script:

 // java/example/PigUnitExample.java
public class PigUnitExample {
    private PigTest test;
    private static Cluster cluster;

    @Test
    public void testDataInFile() throws ParseException, IOException {
        // Construct an instance of PigTest that will use the script
        // pigunit.pig.
        test = new PigTest("../pigunit.pig");

        // Specify our expected output.  The format is a string for each line.
        // In this particular case we expect only one line of output.
        String[] output = { "(0.27305267014925455)" };

        // Run the test and check that the output matches our expectation.
        // The "avgdiv" tells PigUnit what alias to check the output value
        // against.  It inserts a store for that alias and then checks the 
        // contents of the stored file against output.
        test.assertOutput("avgdiv", output);
    }
}

You can also specify the input inline in your test rather than relying on an existing datafile:

// java/example/PigUnitExample.java
    @Test
    public void testTextInput() throws ParseException, IOException  {
        test = new PigTest("../pigunit.pig");

        // Rather than read from a file, generate synthetic input.
        // Format is one record per line, tab-separated.
        String[] input = {
            "NYSE\tCPO\t2009-12-30\t0.14",
            "NYSE\tCPO\t2009-01-06\t0.14",
            "NYSE\tCCS\t2009-10-28\t0.414",
            "NYSE\tCCS\t2009-01-28\t0.414",
            "NYSE\tCIF\t2009-12-09\t0.029",
        };

        String[] output = { "(0.22739999999999996)" };

        // Run the example script using the input we constructed
        // rather than loading whatever the load statement says.
        // "divs" is the alias to override with the input data.
        // As with the previous example, "avgdiv" is the alias
        // to test against the value(s) in output.
        test.assertOutput("divs", input, "avgdiv", output);
    }

It is also possible to specify the Pig Latin script in your test and to test the output against an existing file that contains the expected results:

 // java/example/PigUnitExample.java
    @Test
    public void testFileOutput() throws ParseException, IOException {
        // The script as an array of strings, one line per string.
          String[] script = {
            "divs   = load '../../../data/NYSE_dividends' as (exchange, symbol, 
            "grpd   = group divs all;",
            "avgdiv = foreach grpd generate AVG(divs.dividends);",
            "store avgdiv into 'average_dividend';",
        };
        test = new PigTest(script);
           
        // Test output against an existing file that contains the
        // expected output.
        test.assertOutput(new File("../expected.out"));
    }

Finally, let’s look at how to integrate PigUnit with parameter substitution, and how to specify expected output that will be compared against the stored result (rather than specifying an alias to check):

 // java/example/PigUnitExample.java
    @Test
    public void testWithParams() throws ParseException, IOException {
        // Parameters to be substituted in Pig Latin script before the 
        // test is run.  Format is one string for each parameter,
        // parameter=value
        String[] params = {
            "input=../../../data/NYSE_dividends",
            "output=average_dividend2"
        };
        test = new PigTest("../pigunitwithparams.pig", params);

        String[] output = { "(0.27305267014925455)" };

        // Test output in stored file against specified result
        test.assertOutput(output);
    }

These examples can be run by using the build.xml file included in the examples from this chapter. These examples are not exhaustive; see the code itself for a complete listing. For more in-depth examples, you can check out the tests for PigUnit located in test/org/apache/pig/test/pigunit/TestPigTest.java in your Pig distribution. This file exercises most of the features of PigUnit.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required