Finding the Index for Partially Matched Strings

The problem considered here concerns a large number of string keys that need to be accessed by full or partial match. Each string is unique, so the full-match access can easily be handled by a standard hash-table structure (e.g., java.util.HashMap). But the partial-match access needs to collect all objects that have string keys starting with a particular substring.

So, for example, if you had the hash consisting of keys and values:

"hello"          1
"bye"            2
"hi"             3

Then the full match for key "hi" retrieves 3, and the partial match against strings starting with "h" retrieves the collection {1,3}. Using a hash-table structure for the partial-match access is expensive because it requires that all keys be iterated over, and then each key matching the corresponding object needs to be collated.

Of course, I am considering here a large collection of strings. Alternatives are not usually necessary for a few (or even a few thousand) strings. But for large collections, performance-tuning techniques become necessary.

The tuning procedure here should be to look for data structures that quickly match any partial string. The task is somewhat simpler than the most generic version of this type of problem, because you need to match only the first few consecutive characters. This means that some sort of tree structure is probably ideal. Of the structures available from the JDK, TreeMap looks like it can provide exactly the required functionality; it gives a minimal ...

Get Java Performance Tuning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.