11.12 Conclusion and Outlook

This paper described the automatic extraction of hyponyms from the Wikipedia corpus based on several deep and shallow patterns. The shallow patterns are designed on the basis of tokens and the deep patterns as semantic networks. Both types of patterns were applied to the German Wikipedia. The extracted hypotheses were afterwards validated with a support vector machine and a graph kernel. The use of a graph kernel leads to an improvement in F-measure, accuracy, and recall where the increase in recall is significant. A preliminary evaluation was done for the weighted graph kernel where only a very slight (but not significant) improvement was reached. Furthermore, we compared our method SemQuire to a GermaNet classifier and to the context feature of Cimiano where both of them were clearly outperformed. We plan to optimize the weights of the individual kernels (feature and graph kernel) by a grid search, which is expected to further improve the results.

Currently the weighting is done only on the basis of the distance measured in number of edges. Other factors such as the edge labels are not taken into account. So future work could be to develop a more sophisticated weighting scheme.

Get Statistical and Machine Learning Approaches for Network Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.