Practical RDF

Preface

The Resource Description Framework (RDF) offers developers a powerful toolkit for making statements and connecting those statements to derive meaning. The World Wide Web Consortium (W3C) has been developing RDF as a key component of its vision for a Semantic Web, but RDF’s capabilities fit well in many different computing contexts. RDF offers a different, and in some ways more powerful, framework for data representation than XML or relational databases, while remaining far more generic than object structures.

RDF’s foundations are built on a very simple model, but the basic logic can support large-scale information management and processing in a variety of different contexts. The assertions in different RDF files can be combined, providing far more information together than they contain separately. RDF supports flexible and powerful query structures, and developers have created a wide variety of tools for working with RDF.

While RDF is commonly described as an arcane tool for working with an enormous volume of complex information, organized with ontologies and other formal models, it also has tremendous value for smaller, more informal projects. I learned about RDF, specifically RDF/XML, when I started working with Mozilla back in the early days of development for this project. At the time, the Mozilla team was using RDF as a way of defining the XML used to provide the data for dynamic tables of contents (TOC) in the application framework. This included providing the data for the favorites, the sidebar, and so on.

I created a tutorial about developing applications using the Mozilla components as part of a presentation I was giving at an XML-related conference. Unfortunately, every time a new release of Mozilla was issued, my tutorial would break. The primary reason was the RDF/XML supported by the application; it kept changing to keep up with the changes currently underway with the RDF specification itself. At that point I went to the RDF specifications, managed to read my way through the first specification document (the RDF Model and Syntax Specification), and have been following along with the changes related to RDF ever since.

One main reason I was so interested in RDF and the associated RDF/XML is that, ever since I started working with XML in its earliest days, I’ve longed for a metamodel to define vocabularies in XML that could then be merged with other vocabularies, all of which can be manipulated by the same APIs (Application Programming Interfaces) and tools. I found this with RDF and RDF/XML.

Because my introduction to RDF and RDF/XML had such pragmatic beginnings, my interest in the specification has always focused on how it can be used in business applications today, rather than in some Semantic Web someday. When I approached O’Reilly & Associates about the possibility of writing a book on RDF, I suggested a practical introduction to RDF, and the title and focus of the book was born.

This book attempts to present all the different viewpoints of RDF in such a way that we begin to see a complete picture of RDF from all of its various components. I say “attempt” because I’m finding that just when I think I have my arms around all the different aspects of the RDF specification, someone comes along with a new and interesting twist on a previously familiar concept. However, rather than weaken RDF’s overall utility, these new variations actually demonstrate the richness of the specification.

It is only fair to give you a warning ahead of time that I’m a practical person. When faced with a new technology, rather than ooh and aah and think to myself, “New toy!”, my first response tends to be, “Well, that’s great. But, what can I do with it?” I am, by nature, an engineer, and this book reflects that bias. Much of RDF is associated with some relatively esoteric efforts, including its use within the implementation of the so-called Semantic Web. However, rather than get heavily into the more theoretical aspects of RDF, in this book I focus more on the practical aspects of the RDF specification and the associated technologies.

This isn’t to say I won’t cover theory—all engineers have to have a good understanding of the concepts underlying any technology they use. However, the theory is presented as a basis for understanding, rather than as the primary focus. In other words, the intent of Practical RDF is on using RDF and the associated RDF/XML in our day-to-day technology efforts in order to meet our needs as programming, data, and markup technologists, in addition to the needs of the businesses we support.

This book provides comprehensive coverage of the current RDF specifications, as well as the use of RDF for Semantic Web activities such as the ontology efforts underway at the W3C. However, the focus of this book is on the use of RDF to manage data that may, or may not, be formatted in XML to manage data, often XML data.

Audience

If you want to know how to apply RDF to information processing, this book is for you. Whether your interests lie in large-scale information aggregation and analysis or in smaller-scale projects like weblog syndication, this book will provide you with a solid foundation for working with RDF. If you are looking for a theoretical explanation of intelligent web bots, tutorials on how to create knowledge systems, or an in-depth look at topic maps and ontologies, you should probably look elsewhere. Also, a basic understanding of XML and web technologies is helpful for reading this book, so you may want to start with those first if you don’t have any background in them.

Structure of This Book

The first section of this book (Chapter 1 through Chapter 6) focuses on the RDF specifications. Chapter 1 focuses on introducing RDF, but more than that, it also looks at some of the historical events leading up to the current RDF effort. In addition, this chapter also looks at issues of when you would, and would not, use RDF/XML as compared to “standard” XML.

Following the introductory chapter, the rest of the first section covers the RDF specification documents themselves. This includes coverage of the RDF Semantics and Concepts and Abstract Model specifications (covered in Chapter 2); the basic XML syntax (covered in Chapter 3); coverage of some of the more unusual RDF constructs—containers, collections, and reification (covered in Chapter 4); and the RDF Schema (covered in Chapter 5). As a way of pulling all of the coverage together, Chapter 6 then uses all we’ve learned about RDF to that point to create a relatively complex vocabulary, which is then used for demonstration purposes throughout the rest of the book.

The second section of the book focuses on programming language support, as well as the tools and utilities that allow a person to review, edit, parse, and generally work with RDF/XML. Chapter 7 focuses on various RDF editors, including those with graphical support for creating RDF models. In addition, the chapter also covers an RDF/XML browser, as well as a couple of the more popular RDF/XML parsers.

To be useful, any specification related to data requires tools to work with the data, and RDF is no exception. Chapter 8 provides an overview and examples of accessing and generating RDF/XML using Jena, a Java-based RDF API. Chapter 9, which covers APIs that are based in PHP, Perl, and Python—the three Ps—follows this.

After the programming language grounding, the book refocuses on RDF’s data roots with a chapter that examines some of the RDF query languages used to query RDF model data, in a database or as persisted to RDF/XML documents. Chapter 10 also has the code for the RDF Query-O-Matic, a utility that processes RDQL (RDF Query Language) queries.

The last chapter in the second section finishes the review of programming and framework support for RDF by looking at some other programming language support, as well as some of the frameworks, such as Redland and Redfoot.

The last section of the book then focuses on the use of RDF and RDF/XML, beginning with an overview of the W3C’s ontology language effort, OWL. If RDF is analogous to the relational data model, and RDF/XML is analogous to relational database systems, then OWL is equivalent to applications such as SAP and PeopleSoft, which implement a business domain model on top of the relational store.

The next chapter focuses on RSS, the implementation of RDF/XML most widely used, which supports syndication and aggregation of news sources. RSS is used to syndicate news sources as diverse as salon.com and Wired, as well as online personal journals known as weblogs, a web technology gaining popularity.

A specification is only as good as the applications that use it, and RDF is used in a surprising number of sophisticated commercial and noncommercial applications. I say “surprising” primarily because RDF is not a well-known specification. However, it is one of the older specifications. RDF’s maturity, combined with the specification’s data manipulation and organizational capabilities, makes it easy to see why the growing interest in RDF is arising.

Note

The RDF Validator-generated graphs have been replaced with illustrations in order to fit the examples within the constraints imposed by the page width.

Conventions Used in This Book

The following font conventions are used in this book:

Italic is used for:

Pathnames, filenames, and program names
Internet addresses, such as domain names and URLs
New items where they are defined

Constant width is used for:

Command lines and options that should be typed verbatim
Names and keywords in programs, including method names, variable names, and class names
XML element tags
URIs used as identifiers by RDF

Constant-width bold is used for emphasis in program code lines.

Constant-width italic is used for replaceable arguments within program code.

Tip

This icon indicates a tip, suggestion, or general note.

Warning

This icon indicates a warning or caution.

How to Contact Us

We have tested and verified the information in this book to the best of our ability, but you may find that features have changed (or even that we have made a few mistakes!). Please let us know about any errors you find, as well as your suggestions for future editions, by writing to:

O’Reilly & Associates, Inc.

1005 Gravenstein Highway North

Sebastopol, CA 95471

1-800-998-9938 (in the U.S. or Canada)

1-707-829-0515 (international/local)

1-707-829-0104 (fax)

You can also send us messages electronically. To be put on the mailing list or request a catalog, send email to:

info@oreilly.com

To ask technical questions or comment on the book, send email to:

bookquestions@oreilly.com

We have a web site for the book, where we’ll list examples, errata, and any plans for future editions. You can access this page at:

http://www.oreilly.com/catalog/pracrdf/

For more information abut this book and others, see the O’Reilly web site:

http://www.oreilly.com/

Acknowledgments

First among the people I want to acknowledge is the RDF Working Group, the folks who have worked the last two-plus years to get the updated RDF specifications out on the street and into action. The listing of people is quite extensive, but I want to specifically mention a few who were particularly helpful to me while I worked on the book: Brian McBride, Pat Hayes, Dave Beckett, and Frank Manola.

This book would never have hit the streets if it weren’t for the patience and good humor of the lead editor, Simon St.Laurent. During the almost year and a half this book was in development, Simon never once lost patience, though other editors might have given up on RDF as a topic.

In addition to Simon, I want to extend my appreciation to the technical editors on the book including Dorothea Salo, Dave Beckett, Uche Ogbuji, and Andy Seaborne. Less formally, I want to also extend my appreciation to those from the RDF community who were so kind as to review one or more chapters in the book for completeness and accuracy:

Danny Ayers	Kevin Marks
Chris Parnell	Aaron Swartz
Chris Dolin	David Jacobs
Emmanual Pietriga	Bill Simoni
Ken MacLeod	Seth Ladd
York Sure	Bill Kerney
Ben Hammersley	Jens Jacob Andersen
David Allsop	Resty Cena
Barry Sheward	Tingley Chase

My apologies if I have inadvertently left someone off this list.

Finally, I want to extend my thanks and appreciation to the organizations and people responsible for the software and technologies covered in this book. These include:

Jena—Hewlett-Packard and Brian McBride, Janet Bruten, Jeremy Carroll, Steve Cayzer, Ian Dickinson, Chris Dollin, Martin Merry, Dave Reynolds, Andy Seaborne, Paul Shabajee, and Stuart Williams
Brownsauce—Hewlett-Packard and Damien Steer
IsaViz—Emmanual Pietriga
The RDF Validator—Art Barstow and Emmanual Pietriga
Intellidimension’s RDF Gateway and Geoff Chappell
AmphetaDesk and Morbus Iff
Ginger Alliance PerlRDF and Petr Cimprich
RDFLib and Redfoot from Daniel “elkeon” Krech
RDFStore and Alberto Reggiori
SMORE and Aditya Kalyanpur
RDF API for PHP and Chris Bizer
Redland and Dave Beckett
C# Drive and Rahul Singh
Wilbur from Ora Lassila and Nokia
Plugged In Software’s Tucana Knowledge Store and David Wood
Sidrean Software’s Seamark Server and Bradley Allan
Adobe’s XMP
Sesame’s Arjohn Kampan
Meerkat—O’Reilly and Rael Dornfest
Ranchero Software’s NetNewsWire and Brent Simmons
The Mozilla development team members
Stanford University’s Knowledge Modeling Group and Protégé
The Dublin Core effort
FOAF, FOAFbot, and FOAF-O-Matic by Leigh Dobbs, Edd Dumbill, Dan Brickley, Libby Miller, rdfweb-dev, and friends
The web sites from several weblogging friends including Allan Moult, Chris Kovacs, Jonathon Delacour, Loren Webster, and Dorothea Salo

Books don’t get written in a vacuum and this book is no exception. I’d like to thank some special friends for their support and encouragement during the long, long period this book was in development. This includes my best friend, Robert Porter, as well as AKM and Margaret Adam, Jonathon Delacour, Simon St.Laurent, Allan Moult, Chris Kovacks, Loren Webster, Jeneane Sessum, Chris Locke, Dorothea Salo, and others whom I met in the threaded void known as the Internet. Thanks, friends. It’s finally done.

Get Practical RDF now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial