15

ALI BABA: A TEXT MINING TOOL FOR SYSTEMS BIOLOGY

Jörg Hakenberg

Computer Science and Engineering, Arizona State University, Tempe, AZ, USA

Conrad Plake

Biotechnological Centre, Technische Universität Dresden, Dresden, Germany

Ulf Leser

Knowledge Management in Bioinformatics, Humboldt-Universität zu Berlin, Berlin, Germany

15.1 INTRODUCTION TO TEXT MINING

Text mining is the process of automatically deriving information from text (as opposed to data mining that works on structured data). This process starts with accessing the relevant literature and ends with extracting the desired pieces of information. Access mostly is provided by Web-based search tools, the best known of which is PubMed [1]. PubMed currently contains citations from close to 18 million publications in the biomedical domain (biology, biochemistry, medicine, and related fields), from approximately 5200 journals, since 1865. Up to 4000 citations (abstract and bibliographical information) are added to PubMed per day, which necessitates automated means to efficiently handle searches for high-quality information.

Text mining falls into several tasks, most of which depend on each other, but few of which have been sufficiently solved. The first task is information retrieval (IR): given a user's query, find the (most) relevant documents containing the keywords or, even better, providing an answer to the question the user actually has in mind. The later part is also called question answering (QA), where the task is not ...

Get Elements of Computational Systems Biology now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.