Reading XML files

For reading XML files, there is a step named Get data from XML input. In order to specify which fields to read from the file, you do two things:

  1. First, select the path that will identify the current node. This is optimally the repeating node in the file. You select the path by filling in the Loop XPath textbox in the Content tab.
  2. Then specify the fields to get. You do it by filling the grid in the Fields tab by using XPath notation. The location is relative to the path indicated in the Content tab.

The Get Data from XML step is the step that you will use for reading XML structures in most cases. However, when the data structures in your files are very big or complex, or when the file itself is very large, there is an alternative ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.