Time for action – defining the schema
Let's now create this simplified UFO schema in a single Avro schema file.
Create the following as ufo.avsc
:
{ "type": "record", "name": "UFO_Sighting_Record", "fields" : [ {"name": "sighting_date", "type": "string"}, {"name": "city", "type": "string"}, {"name": "shape", "type": ["null", "string"]}, {"name": "duration", "type": "float"} ] }
What just happened?
As can be seen, Avro uses JSON in its schemas, which are usually saved with the .avsc
extension. We create here a schema for a format that has four fields, as follows:
- The Sighting_date field of type string to hold a date of the form
yyyy-mm-dd
- The City field of type string that will contain the city's name where the sighting occurred
- The Shape field, an optional ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.