You are previewing Quantitative Corpus Linguistics with R.
O'Reilly logo
Quantitative Corpus Linguistics with R

Book Description

The first textbook of its kind, Quantitative Corpus Linguistics with R demonstrates how to use the open source programming language R for corpus linguistic analyses. Computational and corpus linguists doing corpus work will find that R provides an enormous range of functions that currently require several programs to achieve – searching and processing corpora, arranging and outputting the results of corpus searches, statistical evaluation, and graphing.

Table of Contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Acknowledgments
  5. 1. Introduction
    1. 1.1 Why Another Introduction to Corpus Linguistics?
    2. 1.2 Outline of the Book
    3. 1.3 Recommendation for Instructors
  6. 2. The Three Central Corpus-linguistic Methods
    1. 2.1 Corpora
      1. 2.1.1 What is a Corpus?
      2. 2.1.2 What Kinds of Corpora are There?
    2. 2.2 Frequency Lists
    3. 2.3 Lexical Co-occurrence: Collocations
    4. 2.4 (Lexico-)Grammatical Co-occurrence: Concordances
  7. 3. An Introduction to R
    1. 3.1 A Few Central Notions: Data Structures, Functions, and Arguments
    2. 3.2 Vectors
      1. 3.2.1 Basics
      2. 3.2.2 Loading Vectors
      3. 3.2.3 Accessing and Processing (Parts of) Vectors
      4. 3.2.4 Saving Vectors
    3. 3.3 Factors
    4. 3.4 Data Frames
      1. 3.4.1 Generating Data Frames
      2. 3.4.2 Loading and Saving Data Frames
      3. 3.4.3 Accessing and Processing (Parts of) Data Frames
    5. 3.5 Lists
    6. 3.6 Elementary Programming Functions
      1. 3.6.1 Conditional Expressions
      2. 3.6.2 Loops
      3. 3.6.3 Rules of Programming
    7. 3.7 Character/String Processing
      1. 3.7.1 Getting Information from and Accessing (Vectors of) Character Strings
      2. 3.7.2 Elementary Ways to Change (Vectors of) Character Strings
      3. 3.7.3 Merging and Splitting (Vectors of) Character Strings without Regular Expressions
      4. 3.7.4 Searching and Replacing without Regular Expressions
      5. 3.7.5 Searching and Replacing with Regular Expressions
      6. 3.7.6 Merging and Splitting (Vectors of) Character Strings with Regular Expressions
    8. 3.8 File and Directory Operations
  8. 4. Using R in Corpus Linguistics
    1. 4.1 Frequency Lists
      1. 4.1.1 A Frequency List of an Unannotated Corpus
      2. 4.1.2 A Reverse Frequency List of an Unannotated Corpus
      3. 4.1.3 A Frequency List of an Annotated Corpus
      4. 4.1.4 A Frequency List of Tag-word Sequences from an Annotated Corpus
      5. 4.1.5 A Frequency List of Word Pairs from an Annotated Corpus
      6. 4.1.6 A Frequency List of an Annotated Corpus (with One Word Per Line)
      7. 4.1.7 A Frequency List of Word Pairs of an Annotated Corpus (with One Word Per Line)
    2. 4.2 Concordances
      1. 4.2.1 A Concordance of an Unannotated Text File
      2. 4.2.2 A Simple Concordance from Files of a POS-tagged (SGML) Corpus
      3. 4.2.3 More Complex Concordances from Files of a POS-tagged (SGML) Corpus
      4. 4.2.4 A Lemma-based Concordance from Files of a POS-tagged and Lemmatized (XML) Corpus
    3. 4.3 Collocations
    4. 4.4 Excursus 1: Processing Multi-tiered Corpora
    5. 4.5 Excursus 2: Unicode
    6. 4.5.1 Frequency Lists
    7. 4.5.2 Concordancing
  9. 5. Some Statistics for Corpus Linguistics
    1. 5.1 Introduction to Statistical Thinking
      1. 5.1.1 Variables and their Roles in an Analysis
      2. 5.1.2 Variables and their Information Value
      3. 5.1.3 Hypotheses: Formulation and Operationalization
      4. 5.1.4 Data Analysis
      5. 5.1.5 Hypothesis (and Significance) Testing
    2. 5.2 Categorical Dependent Variables
      1. 5.2.1 One Categorical Dependent Variable, No Independent Variable
      2. 5.2.2 One Categorical Dependent Variable, One Categorical Independent Variable
      3. 5.2.3 One Categorical Dependent Variable, 2+ Independent Variables
    3. 5.3 Interval/Ratio-scaled Dependent Variables
      1. 5.3.1 Descriptive Statistics for Interval/Ratio-scaled Dependent Variables
      2. 5.3.2 One Interval/Ratio-scaled Dependent Variable, One Categorical Independent Variable
      3. 5.3.3 One Interval/Ratio-scaled Dependent Variable, One Interval/Ratio-scaled Independent Variable
      4. 5.3.4 One Interval/Ratio-scaled Dependent Variable, 2+ Independent Variables
    4. 5.4 Customizing Statistical Plots
    5. 5.5 Reporting Results
  10. 6. Case Studies and Pointers to Other Applications
    1. 6.1 Introduction to the Case Studies
    2. 6.2 Some Pointers to Further Applications
  11. Appendix
  12. References
  13. Endnotes