When URL and Doctitle Namespaces Don’t Suffice
You want to
cram as much as possible into the URL and doctitle namespaces,
because these are what you get back “for free” from
search engines. But inevitably, there will be missing pieces.
Consider the newsgroup results in Figure 8.2.
There’s clearly a value that will map well to creation date
—namely, the posting date of each newsgroup
message. However, that value appears neither in the document’s
URL nor in its document title. In fact, documents in the primary
conferencing docbase had no HTML document titles, because they
weren’t HTML documents; they were newsgroup messages that
carried their fielded information in NNTP headers.
In these cases, you have to dig out the information another way. How? That depends on the nature of the docbase—whether it resides remotely on a server you don’t control or locally on a server you do control, whether you have access to the server’s file system, and possibly other factors.
Note that the
newsgroup search results in Figure 8.2 presented
two links. The N link’s address was a
news:// URL such as news://dev4.byte.com/358C707B.ED39B9C1@aol.com,
and the W link pointed to a mirrored web page
such as http://dev4.byte.com/syscon/02137.html. Given
this mirrored-docbase situation, there were many possible ways to get
hold of a creation date
for a newsgroup search
result.
One solution would have been to reengineer the doctitle namespace in the web mirror of the newsgroup, just as was done for the other ...
Get Practical Internet Groupware now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.