Time for action – defining the schema

Let's now create this simplified UFO schema in a single Avro schema file.

Create the following as ufo.avsc:

{ "type": "record",
  "name": "UFO_Sighting_Record",
  "fields" : [
    {"name": "sighting_date", "type": "string"},
    {"name": "city", "type": "string"},
    {"name": "shape", "type": ["null", "string"]}, 
    {"name": "duration", "type": "float"}
] 
}

What just happened?

As can be seen, Avro uses JSON in its schemas, which are usually saved with the .avsc extension. We create here a schema for a format that has four fields, as follows:

  • The Sighting_date field of type string to hold a date of the form yyyy-mm-dd
  • The City field of type string that will contain the city's name where the sighting occurred
  • The Shape field, an optional ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.