Book description
Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis.
Winner of a 2012 PROSE Award in Computing and Information Sciences from the Association of American Publishers, this book presents a comprehensive how-to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities.
The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically.
- Extensive case studies, most in a tutorial format, allow the reader to 'click through' the example using a software program, thus learning to conduct text mining analyses in the most rapid manner of learning possible
- Numerous examples, tutorials, power points and datasets available via companion website on Elsevierdirect.com
- Glossary of text mining terms provided in the appendix
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- Dedication
- Endorsements for Practical Text Mining & Statistical Analysis for Non-structured Text Data Applications
- Foreword 1
- Foreword 2
- Foreword 3
- Acknowledgments
- Preface
- About the Authors
- Introduction
- List of Tutorials by Guest Authors
-
Part I: Basic Text Mining Principles
-
Chapter 1. The History of Text Mining
- Preamble
- The Roots of Text Mining: Information Retrieval, Extraction, and Summarization
- Information Extraction and Modern Text Mining
- Major Innovations in Text Mining since 2000
- The Development of Enabling Technology in Text Mining
- Emerging Applications in Text Mining
- Sentiment Analysis and Opinion Mining
- IBM’s Watson: An “Intelligent” Text Mining Machine?
- What’s Next?
- Postscript
- References
- Chapter 2. The Seven Practice Areas of Text Analytics
- Chapter 3. Conceptual Foundations of Text Mining and Preprocessing Steps
-
Chapter 4. Applications and Use Cases for Text Mining
- Preamble
- Why Is Text Mining Useful?
- Extracting “Meaning” from Unstructured Text
- Summarizing Text
- Common Approaches to Extracting Meaning
- Extracting Information through Statistical Natural Language Processing
- Statistical Analysis of Dimensions of Meaning
- Beyond Statistical Analysis of Word Frequencies: Parsing and Analyzing Syntax
- Review
- Improving Accuracy in Predictive Modeling
- Using Statistical Natural Language Processing to Improve Lift
- Using Dictionaries to Improve Prediction
- Identifying Similarity and Relevance by Searching
- Part of Speech Tagging and Entity Extraction
- Summary
- Postscript
- References
- Chapter 5. Text Mining Methodology
- Chapter 6. Three Common Text Mining Software Tools
-
Chapter 1. The History of Text Mining
-
Part II: Introduction to the Tutorial and Case Study Section of This Book
- Introduction
- Tutorial AA. Case Study: Using the Social Share of Voice to Predict Events That Are about to Happen
-
Tutorial BB. Mining Twitter for Airline Consumer Sentiment
- Introduction
- What Is R?
- Loading Data into R
- The twitteR Package
- Extracting Text from Tweets
- The plyr Package
- Estimating Sentiment
- Loading the Opinion Lexicon
- Implementing Our Sentiment Scoring Algorithm
- Algorithm Sanity Check
- data.frames Hold Tabular Data
- Scoring the Tweets
- Repeat for Each Airline
- Compare the Score Distributions
- Ignore the Middle
- Compare with ACSI’s Customer Satisfaction Index
- Scrape the ACSI Website
- Compare Twitter Results with ACSI Scores
- Graph the Results
- Notes and Acknowledgments
- References
- Tutorial A. Using STATISTICA Text Miner to Monitor and Predict Success of Marketing Campaigns Based on Social Media Data
- Tutorial B. Text Mining Improves Model Performance in Predicting Airplane Flight Accident Outcome
- Tutorial C. Insurance Industry: Text Analytics Adds “Lift” to Predictive Models with STATISTICA Text and Data Miner
- Tutorial D. Analysis of Survey Data for Establishing the “Best Medical Survey Instrument” Using Text Mining
- Tutorial E. Analysis of Survey Data for Establishing “Best Medical Survey Instrument” Using Text Mining: Central Asian (Russian Language) Study Tutorial 2: Potential for Constructing Instruments That Have Increased Validity
- Tutorial F. Using eBay Text for Predicting ATLAS Instrumental Learning
- Tutorial G. Text Mining for Patterns in Children’s Sleep Disorders Using STATISTICA Text Miner
- Tutorial H. Extracting Knowledge from Published Literature Using RapidMiner
- Tutorial I. Text Mining Speech Samples: Can the Speech of Individuals Diagnosed with Schizophrenia Differentiate Them from Unaffected Controls?
- Tutorial J. Text Mining Using STM™, CART®, and TreeNet® from Salford Systems: Analysis of 16,000 iPod Auctions on eBay
-
Tutorial K. Predicting Micro Lending Loan Defaults Using SAS® Text Miner
- Introduction
- About SAS® Text Miner
- Project Overview
- Preparing the Data and Setting Up the Diagram
- Creating a New Project
- Registering the Table
- Creating a New Diagram
- Text Filter Node
- Text Topic Node
- Creating the Text Mining Flow
- Inserting the Data
- Understanding Text Parsing
- Synonyms and Multiterm Words
- Defining Topics
- Other Uses of the Interactive Topic Viewer
- Making the Predictive Model
- Final Results
- Viewing the Reports
- Text Only Decision Tree
- All Variable Text and Relational
- Conclusion
- Tutorial L. Opera Lyrics: Text Analytics Compared by the Composer and the Century of Composition—Wagner versus Puccini
- Tutorial M. Case Study: Sentiment-Based Text Analytics to Better Predict Customer Satisfaction and Net Promoter® Score Using IBM®SPSS® Modeler
- Tutorial N. Case Study: Detecting Deception in Text with Freely Available Text and Data Mining Tools
- Tutorial O. Predicting Box Office Success of Motion Pictures with Text Mining
- Tutorial P. A Hands-On Tutorial of Text Mining in PASW: Clustering and Sentiment Analysis Using Tweets from Twitter
- Tutorial Q. A Hands-On Tutorial on Text Mining in SAS®: Analysis of Customer Comments for Clustering and Predictive Modeling
- Tutorial R. Scoring Retention and Success of Incoming College Freshmen Using Text Analytics
- Tutorial S. Searching for Relationships in Product Recall Data from the Consumer Product Safety Commission with STATISTICA Text Miner
- Tutorial T. Potential Problems That Can Arise in Text Mining: Example Using NALL Aviation Data
- Tutorial U. Exploring the Unabomber Manifesto Using Text Miner
- Tutorial V. Text Mining PubMed: Extracting Publications on Genes and Genetic Markers Associated with Migraine Headaches from PubMed Abstracts
-
Tutorial W. Case Study: The Problem with the Use of Medical Abbreviations by Physicians and Health Care Providers
- The Present Problem in the use of Medical Abbreviations by Physicians and Health Care Providers
- TJC (JCAHO) “Do Not Use” Abbreviations
- Additional Abbreviations, Acronyms, and Symbols
- Using the “Text Mining Project” Format of STATISTICA Text Miner
- Using TextMiner3.dbs
- Conclusion
- Intervention Training Needed
- References
-
Tutorial X. Classifying Documents with Respect to “Earnings” and Then Making a Predictive Model for the Target Variable Using Decision Trees, MARSplines, Naïve Bayes Classifier, and K-Nearest Neighbors with STATISTICA Text Miner
- Introduction: Automatic Text Classification
- Data File with File References
- Specifying the Analysis
- Processing the Data Analysis
- Saving the Extracted Word Frequencies to the Input File
- Initial Feature Selection
- General Classification and Regression Trees
- K-Nearest Neighbors Modeling
- Conclusion
- Reference
- Tutorial y. Case Study: Predicting Exposure of Social Messages: The Bin Laden Live Tweeter
- Tutorial Z. The InFLUence Model: Web Crawling, Text Mining, and Predictive Analysis with 2010–2011 Influenza Guidelines—CDC, IDSA, WHO, and FMC
-
Part III: Advanced Topics
- Chapter 7. Text Classification and Categorization
-
Chapter 8. Prediction in Text Mining: The Data Mining Algorithms of Predictive Analytics
- Preamble
- Introduction
- The Power of Simple Descriptive Statistics, Graphics, and Visual Text Mining
- Visual Data Mining
- Predictive Modeling (Supervised Learning)
- Statistical Models versus General Predictive Modeling
- Clustering (Unsupervised Learning)
- Singular Value Decomposition, Principal Components Analysis, and Dimension Reduction
- Association and Link Analysis
- Summary
- Postscript
- References
- Chapter 9. Entity Extraction
- Chapter 10. Feature Selection and Dimensionality Reduction
-
Chapter 11. Singular Value Decomposition in Text Mining
- Preamble
- Introduction
- Redundancy in Text
- Dimensions of Meaning: Latent Semantic Indexing
- The Math of Singular Value Decomposition
- Graphical Representations and Simple Examples
- Singular Value Decomposition in Equation Form
- Singular Value Decomposition and Principal Components Analysis Eigenvalues
- Some Practical Considerations
- Extracting Dimensions
- Subjective Methods: Reviewing Graphs
- Analytical Methods: Building Models for Dimensions
- Useful Analyses Based on Singular Value Decomposition Scores
- Cluster Analysis
- Predictive Modeling
- When SVD Is Not Useful
- Summary
- Postscript
- References
- Chapter 12. Web Analytics and Web Mining
- Chapter 13. Clustering Words and Documents
- Chapter 14. Leveraging Text Mining in Property and Casualty Insurance
- Chapter 15. Focused Web Crawling
-
Chapter 16. The Future of Text and Web Analytics
- Text Analytics and Text Mining
- The Pros and Cons of Commercial Software versus Open Source Software
- The Future of Text Mining
- The Future of Web Analytics
- Multisession Pathing
- Integration of Web Analytics with Standard BI Tools
- Attribution across Multiple Sessions
- The Future: What Does It Hold?
- New Areas That May Use Text Analytics in the Future
- IBM Watson
- Summary
- References
- IBM-Watson References
- Chapter 17. Summary
- Glossary
- Index
- How to Use the Data Sets and the Text Mining Software on the DVD or on Links for Practical Text Mining
Product information
- Title: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications
- Author(s):
- Release date: January 2012
- Publisher(s): Academic Press
- ISBN: 9780123870117
You might also like
book
Text Mining with R
Much of the data available today is unstructured and text-heavy, making it challenging for analysts to …
book
Text Mining and Analysis
Big data: It's unstructured, it's coming at you fast, and there's lots of it. In fact, …
book
Mastering Text Mining with R
Master text-taming techniques and build effective text-processing applications with R About This Book Develop all the …
book
Essential Statistics for Non-STEM Data Analysts
Reinforce your understanding of data science and data analysis from a statistical perspective to extract meaningful …