You are previewing Handbook of Research on Web Log Analysis.
O'Reilly logo
Handbook of Research on Web Log Analysis

Book Description

The Handbook of Research on Web Log Analysis reflects on the multifaceted themes of Web use and presents various approaches to log analysis. Over 20 research contributions from international experts comprehensively cover the latest user-behavior analytic and log analysis methodologies, and consider new research directions and novel applications.

Table of Contents

  1. Copyright
  2. List of Contributors
  3. Preface
  4. I. Research and Methodological Foundations Of Transaction Log Analysis
    1. ABSTRACT
    2. INTRODUCTION
    3. BEHAVIORISM
      1. Behaviors
      2. Trace Data
    4. UNOBTRUSIVE METHOD
      1. Transaction Log Analysis as Unobtrusive Method
    5. CONCLUSION
    6. REFERENCES
    7. KEY TERMS
  5. A. APPENDIX
  6. I. Web Log Analysis: Perspectives, Issues, and Directions
    1. II. Historic Perspective of Log Analysis
      1. ABSTRACT
      2. INTRODUCTION: GENERAL PERSPECIVE AND OBJECTIVES OF CHAPTER
      3. BACKGROUND: INFORMATION RETRIEVAL GOES ONLINE
      4. TRANSACTION LOG ANALYSIS: THE EARLY YEARS
      5. USER TRACKING FOR ADAPTIVE PROMPTING
      6. OPACS ARE IMPLEMENTED AND STUDIED
      7. THE INTERNET AND THE WEB ENTER THE SCENE
      8. SUMMARIZING THE STAGES OF EVOLUTION OF TRANSACTION LOG ANALYSIS
      9. PRIVACY ISSUES
      10. CONCLUSION: WHAT WE CAN LEARN FROM HISTORY
      11. REFERENCES
      12. KEY TERMS
    2. III. Surveys as a Complementary Method for Web Log Analysis
      1. ABSTRACT
      2. INTRODUCTION
      3. REVIEW OF LITERATURE
      4. PLANNING AND CONDUCTING A SURVEY
      5. DESIGNING A SURVEY INSTRUMENT
        1. Multiple-choice Question
        2. Likert-scale Question
        3. Open-Ended Question
      6. A CASE STUDY USING SURVEY METHODOLOGY
        1. Pew Internet & American Life Project
        2. Exploratorium survey Overview
        3. Design and Data Collection Procedures
          1. Sample Design
          2. Contact Procedures
          3. Weighting and Analysis
          4. Effects of Sample Design on Statistical Inference
          5. Response Rate
      7. CONCLUSION
      8. REFERENCES
      9. KEY TERMS
    3. B. APPENDIX
    4. IV. Watching the Web: An Ontological and Epistemological Critique of Web-Traffic Measurement
      1. ABSTRACT
      2. INTRODUCTION
      3. THE ORIGINATION OF WEB-TRAFFIC MEASUREMENT
      4. RECENT FORMS OF WEB-TRAFFIC MEASUREMENT: ASP TOOLS
      5. WEB-TRAFFIC MEASUREMENT'S TURN TO POSITIVISM
      6. ONTOLOGICAL CLAIMS OF WEB TRAFFIC RESEARCH
      7. EPISTEMOLOGICAL CLAIMS OF WEB TRAFFIC RESEARCH
      8. RECOMMENDATIONS FOR IMPROVEMENT
      9. REFERENCES
      10. KEY TERMS
    5. V. Privacy Concerns for Web Logging Data
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
        1. General Privacy Theory
        2. Privacy Concerns for Web Browsing Data
      4. PRIVACY CHALLENGES ASSOCIATED WITH THE LOGGING OF WEB DATA
        1. Governmental and Organizational Regulations
        2. Impact of Privacy Concerns for Data Collection on Natural Web Browsing Behaviors
      5. ENHANCING PRIVACY DURING OBSERVATIONAL DATA COLLECTION
        1. Recommendations for Increasing Understanding and Trust
          1. Recommendation 1: Educate Participants
          2. Recommendation 2: Only Record / Receive as Much Information as Needed
          3. Recommendation 3: Provide Opportunities for Inspection of Data
        2. Affording Privacy Preservation Through Action
          1. Recommendation 4: Provide the Ability to Pause Recording
          2. Recommendation 5: Provide the Ability for Participants to Mask Data
          3. Recommendation 6: Provide Participants with the Ability to Delete Data
      6. FUTURE TRENDS
      7. CONCLUSION
      8. REFERENCES
      9. Glossary
  7. II. Methodology and Metrics
    1. VI. The Methodology of Search Log Analysis
      1. ABSTRACT
      2. INTRODUCTION
      3. REVIEW OF LITERATURE
        1. What is a Search Log?
        2. How are These Interactions Collected?
        3. Why Collect This Data?
        4. What is the Theoretical Basis of TLA (and SLA)?
        5. How is SLA Used?
        6. How is SLA Critiqued?
        7. What are the Tools to Support SLA?
        8. How to Conduct TLA for Web Searching Research?
      4. SLA PROCESS
        1. Data collection
          1. Fields in a Standard Search Log
        2. Data Preparation
          1. Cleaning the Data
          2. Parsing the Data
          3. Normalizing Searching Episodes
      5. DATA ANALYSIS
        1. Analysis Levels
          1. Term Level Analysis
          2. Query Level Analysis
          3. Session Level Analysis
        2. Conducting the Data Analysis
      6. DISCUSSION
      7. CONCLUSION
      8. REFERENCES
      9. KEY TERMS
    2. C. APPENDIX A
    3. VII. Uses, Limitations, and Trends in Web Analytics
      1. ABSTRACT
      2. INTRODUCTION
      3. CURRENT USES OF, AND PROBLEMS WITH, WEB ANALYTICS
        1. Data Included In, and Uses Of, Web Analytics
        2. Limitations of, and Remedies for, Log File Data
        3. Correcting Deficiencies in Log File Data
      4. NEW TECHNIQUES IN WEB ANALYTICS
      5. WEB 2.0 CONSIDERATIONS
      6. CONCLUSION
      7. REFERENCES
      8. ENDNOTES
      9. KEY TERMS
    4. VIII. A Review of Methodologies for Analyzing Websites
      1. ABSTRACT
      2. INTRODUCTION
      3. METRICS
        1. Visitor Type
        2. Visit Length
        3. Demographics and System Statistics
        4. Internal Search
        5. Visitor Path
        6. Top Pages
        7. Referrers and Keyword Analysis
        8. Errors
      4. GATHERING INFORMATION
        1. Log Files
        2. Page Tagging
        3. The Problems with Data
      5. CHOOSING KEY PERFORMANCE INDICATORS
        1. Knowing Your Business Goals
        2. Identifying KPIs Based on Website Type
          1. Commerce
          2. Lead Generation
          3. Content/Media
          4. Support/Self Service
      6. KEY BEST PRACTICES
        1. Identify Key Stakeholders
        2. Define Primary Goals for Your Website
        3. Identify the Most Important Site Visitors
        4. Determine the Key Performance Indicators
        5. Identify and Implement the Right Solution
        6. Use Multiple Technologies and Methods
        7. Make Improvements Iteratively
        8. Hire and Empower a Full-time Analyst
        9. Establish a Process of Continuous Improvement
      7. SPECIFIC TOOLS
        1. Choosing a Tool
        2. Free Tools
        3. Paid Tools
      8. CONCLUSION
      9. REFERENCESCH
      10. KEY TERMS
    5. IX. The Unit of Analysis and the Validity of Web Log Data
      1. ABSTRACT
      2. INTRODUCTION
      3. A UNIT OF ANALYSIS
        1. Measurement Units
          1. Time
          2. Frequency of Login vs. Page Request
          3. Defining an Episode of Use: Stand-Alone Software vs. Internet
          4. Attention to the Media
        2. Two Types of Log Files (Client vs. Server)
          1. Cost vs. Privacy
          2. Multiple Computer Access
        3. Time vs. Page Request (Page Access)
        4. Methodological challenges
          1. Caching
          2. Individual User Recognition and Sessions
          3. Time Calculation Algorithm
      4. CONCLUSION
      5. REFERENCES
      6. KEY TERMS
    6. X. Recommendations for Reporting Web Usage Studies
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
        1. Standardized Reporting
        2. Seminal Works
        3. The Evolution Of the Web Environment and Its Users
          1. In the Beginning
          2. Home Users and Browser Wars
          3. Work and Home: The Need for Speed
          4. In the Mainstream: Just Google it
          5. As a Daily Tool
          6. Web 2.0 and Wide Spread Social Networking
        4. Temporal Context
        5. Methods Of Observing Web Browsing Behavior
      4. IMPORTANCE OF REPORTING CONTEXTUAL INFORMATION
      5. RECOMMENDATIONS FOR REPORTING CONTEXTUAL INFORMATION
        1. Recommendation 1: Report User Characteristics
          1. Sample Size and Sex
          2. Age
          3. Participant Background/Occupation
          4. Web Experience
        2. Recommendation 2: Report Temporal Information About The Study
        3. Recommendation 3: Report Details Of The Study Web Browsing Environment
          1. Setting of the Study
          2. Browsing Software Used By Participants
        4. Recommendation 4: Report Details About The Nature Of The Web Browsing Task
          1. Task Motivation
          2. Task Details
        5. Recommendation 5: Report Details Of The Measures Used To Collect Data
          1. Data Collection Methods
          2. Study Metrics
        6. Recommendation 6: Provide Descriptive Reporting Of The Data
        7. Recommendation 7: Provide Details Of Statistical Analysis
        8. Recommendation 8: Report The Results In Context Of Prior Studies
      6. DISCUSSION
      7. CONCLUSION
      8. REFERENCES
      9. KEY TERMS
  8. III. Behavior Analysis
    1. XI. From Analysis to Estimation of User Behavior
      1. ABSTRACT
      2. INTRODUCTION
      3. SEARCH ENGINE USER BEHAVIOR ANALYSIS
        1. Literature Review Of Search Engine User Behavior Studies
          1. Multimedia Queries
          2. Sexual Queries
          3. Question and Request Format Queries
          4. E-Commerce Searching
          5. Multitasking Searching
        2. Detailed Explanation Of Methodologies Used For Web Log And User Behavior Analysis
          1. Exploratory Data Analysis
          2. Correlation and Test of Independence
          3. Markov Models
          4. Poisson Sampling
      4. ESTIMATION
        1. Literature review of studies Estimating search Engine User behavior
          1. Automatic New Topic Identification
          2. Topic Estimation
        2. Detailed Explanation of Methodologies Used for User behavior Estimation
          1. Probabilistic and Statistical Methods
          2. Artificial Intelligence Methods
      5. DISCUSSION: CHALLENGES AND FUTURE DIRECTIONS
      6. CONCLUSION
      7. ACKNOWLEDGMENT
      8. REFERENCES
      9. KEY TERMS
    2. XII. An Integrated Approach to Interaction Design and Log Analysis
      1. ABSTRACT
      2. LOGGING THE USER INTERACTION: AN INTRODUCTION
        1. Log Analysis in IR and the Motivation for Our Work
      3. A FRAMEWORK FOR MODELING THE INTERACTION AND THE LOGGING
        1. Explicit Vs. Implicit Logging of States
        2. Design Patterns for System Design and Log Analysis
        3. The Procedure
      4. CASE STUDY: MEDIATED INFORMATION RETRIEVAL
        1. The Mediated Retrieval Model
        2. The MIR Project
        3. State-Based Design Of Interaction and Logging in MIR
        4. Discussion and Evaluation
      5. CONTRIBUTIONS AND FUTURE WORK
        1. Related Work
        2. Future Research Directions
      6. ACKNOWLEDGMENT
      7. REFERENCES
      8. ENDNOTES
      9. KEY TERMS
    3. XIII. Tips for Tracking Web Information Seeking Behavior
      1. ABSTRACT
      2. INTRODUCTION
      3. INDIVIDUAL DIFFERENCES, TASKS, AND INFORMATION SEEKING BEHAVIOR
        1. Cognitive style
        2. Learning style
        3. Cognitive complexity
        4. Need for cognition
        5. Self-and Other-Orientation
      4. METHODS FOR COLLECTING INDIVIDUAL DIFFERENCES, TASK, AND WEB TRACKING DATA
      5. WEB METRICS TO COLLECT AND TECHNIQUES FOR ANALYZING THEM
      6. ADDRESSING PRIVACY IN WEB INFORMATION SEEKING ANALYSIS
      7. CONCLUSION
      8. REFERENCES
      9. KEY TERMS
    4. D. APPENDIX A: INDIVIDUAL DIFFERENCES QUESTIONNAIRE
    5. E. APPENDIX B: THE TASK SURVEY
    6. XIV. Identifying Users Stereotypes for Dynamic Web Pages Customization
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
        1. Web Personalization
        2. Web Mining
        3. Semantic Web
        4. Adaptive Hypermedia
      4. RELATED WORKS
      5. AN INTEGRATION APPROACH
        1. Web Usage Mining and semantic Information Integration
        2. Structure and content Adaptation
        3. Experimental results
      6. CONCLUSION
      7. REFERENCES
      8. KEY TERMS
    7. XV. Finding Meaning in Online, Very-Large Scale Conversations
      1. ABSTRACT
      2. INTRODUCTION
      3. RESEARCH FRAMEWORK
        1. Researcher as Participant Observer
      4. DATA COLLECTION
      5. INDIVIDUAL LEVEL ANALYSIS: ANALYZING PLAYER DISCUSSIONS AND ACTIONS
        1. Open coding and Data tagging
        2. Categorizing Data
      6. COMMUNITY LEVEL ANALYSIS: ANALYZING ACTOR RELATIONSHIPS AND PATTERNS OF INTERACTION
        1. Social Network Analysis
        2. Overview of Relationships and Patterns in Data
        3. Creating and Visualizing Social Networks
        4. Analyzing Group cohesiveness
        5. Analyzing Individual Prominence
        6. Betweenness Centrality
      7. DISCUSSION
      8. REFERENCES
      9. AUTHOR NOTE
      10. KEY TERMS
  9. IV. Query Log Analysis
    1. XVI. Machine Learning Approach to Search Query Classification
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
      4. METHODOLOGY
        1. Bootstrapping from Known Class-Related Terms
        2. Automatic retrieval of background sets
        3. Finding New class related terms
      5. DISCUSSION
        1. Creating Evaluation Scenarios
        2. Appraising Classification Results
      6. CONCLUSION AND FUTURE RESEARCH
      7. REFERENCES
      8. KEY TERM
    2. XVII. Topic Analysis and Identification of Queries
      1. ABSTRACT
      2. INTRODUCTION
      3. LITERATURE REVIEW FOR TOPIC ANALYSIS AND IDENTIFICATION OF USER QUERIES
        1. Topic Analysis of Search Engine Queries
        2. Session Identification
        3. Query Clustering and Classification
        4. Automatic New Topic Identification and topic Estimation
        5. Text Classification and categorization Models
      4. EXPLANATION OF METHODOLOGIES USED FOR TOPIC IDENTIFICATION OF SEARCH ENGINE QUERIES
        1. Maximum Entropy Modeling
        2. Hidden Markov Models
        3. Conditional random Fields
      5. DISCUSSION: CHALLENGES AND FUTURE DIRECTIONS
      6. CONCLUSION
      7. ACKNOWLEDGMENT
      8. REFERENCES
      9. KEY TERMS
    3. XVIII. Query Log Analysis in Biomedicine
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
      4. MEDLINE
        1. Biomedical Controlled Vocabularies
        2. MEDLINE Indexing Using the Medical Subject Headings (MeSH)
        3. Unified Medical Language System (UMLs)
        4. PubMed
      5. MAIN THRUST OF CHAPTER
        1. Questions Addressed by Query Log Analysis
        2. Techniques for Analyzing Biomedical Query Logs
        3. Semantic Analysis
        4. Understanding User Information Needs Using semantic Analysis
        5. Session Boundary Determination Using Semantic Distance
        6. Navigational vs. Informational Queries
        7. Published Query Log Analysis in the biomedical Domain (brief Literature review)
          1. Query Log Analyses Focused on the Information Needs of Clinicians
          2. Query Log Analyses of Search Engines Intended for Healthcare Consumers
      6. ISSUES AND CONTROVERSIES
      7. SOLUTIONS AND RECOMMENDATIONS
      8. FUTURE TRENDS
        1. Information Explosion in Biomedicine as an Information Challenge
      9. CONCLUSION
      10. REFERENCES
      11. KEY TERMS
    4. XIX. Processing and Analysis of Search Query Logs in Chinese
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
      4. ANALYSIS METHODS
      5. Data collecting and Pre-Processing
        1. General Analysis
        2. Zipf Distribution Analysis
        3. Term Analysis
      6. TIMWAY: AN EXAMPLE
      7. CONCLUSION AND FUTURE RESEARCH
      8. REFERENCES
      9. KEY TERMS
    5. XX. Query Log Analysis for Adaptive Dialogue-Driven Search
      1. ABSTRACT
      2. INTRODUCTION
      3. RELATED WORK
        1. General Log Analysis
        2. Log Analysis for Improving IIS
        3. Log Analysis for building Adaptive Domain Models
        4. Log Analysis for Modelling Human Interaction with Data
        5. Log Analysis for Evaluating IIS
      4. CASE STUDY: UKSEARCH
        1. Overview
        2. Modelling of Domain Structure
        3. Interaction with UKsearch
        4. Adaptive Modelling of Interactions using Query Logs
        5. Findings of UKsearch study
      5. CASE STUDY: HITIQA
        1. Overview
        2. Modelling of Domain structure
        3. Interaction with HItIQA
        4. Adaptive Modelling of Interactions using Query Logs
        5. Findings of the HITIQA study
      6. CONCLUSION
      7. ACKNOWLEDGMENT
      8. REFERENCES
      9. KEY TERMS
  10. V. Contextual and Specialized Analysis
    1. XXI. Using Action-Object Pairs as a Conceptual Framework for Transaction Log Analysis
      1. ABSTRACT
      2. MOTIVATION
      3. SCIENTIFIC FOUNDATIONS
        1. Modeling in Information searching
        2. Interaction
        3. Implicit Feedback
        4. Adaptive Hypermedia system
      4. ACTION–OBJECT PAIR APPROACH DESCRIPTION
      5. APPLICATION
        1. Transaction Log collection
        2. Transaction Log Analysis
        3. User Modeling
      6. CASE STUDY
        1. Structuring Queries
          1. Agent Assistance
        2. Spelling
          1. Agent Assistance
        3. Query Refinement
          1. Agent Assistance
        4. Managing results
          1. Agent Assistance
        5. Relevance Feedback
          1. Agent Assistance
      7. CONCLUSION
      8. REFERENCES
      9. KEY TERMS
    2. XXII. Analysis and Evaluation of the Connector Website
      1. ABSTRACT
      2. PREMISE
      3. INTRODUCTION
      4. THE CONNECTOR WEBSITE MODEL
        1. Research on connector Websites
      5. PIONEERING CONNECTOR WEBSITES
        1. First Generation connectors
        2. Second Generation connectors: Emergence of social Network sites
      6. DATA & METHODS
      7. COMSCORE MEDIA METRIX ANALYSIS
        1. Traffic Volatility
        2. Website Age and Maturity
      8. DISCUSSION & IMPLICATIONS
        1. What do the Lessons of Existing connector Websites Imply for Future startups?
      9. CONCLUSION
      10. ACKNOWLEDGMENT
      11. REFERENCES
      12. ENDNOTE
      13. Key Terms1
    3. F. APPENDIX A
    4. G. APPENDIX B
    5. XXIII. Information Extraction from Blogs
      1. ABSTRACT
      2. INTRODUCTION
      3. BACKGROUND
      4. INFORMATION EXTRACTION TECHNIQUES
        1. Overview
        2. Tokenization and Lexical Analysis
        3. Natural Language Processing
        4. Information Extraction
        5. Additional Particularities of Blogs
      5. APPLICATIONS
        1. Topic and thread Detection
        2. Opinion Mining
        3. Argumentation Mining
      6. FUTURE RESEARCH
      7. CONCLUSION
      8. ACKNOWLEDGMENT
      9. REFERENCES
      10. ENDNOTE
      11. KEY TERMS
    6. XXIV. Nethnography: A Naturalistic Approach Towards Online Interaction
      1. ABSTRACT
      2. INTRODUCTION
      3. THE ETHNOGRAPHIC TRADITION
      4. NETHNOGRAPHY: POSSIBILITIES AND LIMITS OF A NON-PARTICIPANT OBSERVATION
      5. THE LOGFILE TEMPTATION
      6. ETHNOGRAPHY APPLIED TO ONLINE COMMUNICATION: A METHODOLOGICAL PERSPECTIVE
      7. AN ANALYTIC APPLICATION
        1. The Entrance
        2. Conflicts
        3. Informal theorization of Femininity
      8. FINAL REMARKS
      9. REFERENCES
      10. Key Terms
    7. XXV. Web Log Analysis: Diversity of Research Methodologies
      1. ABSTRACT
      2. INTRODUCTION
      3. RESEARCH METHODOLOGIES
      4. CONCEPTUAL FRAMEWORK / INQUIRY
        1. Transaction Log Analysis
        2. Complementing the Web Log Analysis Methodology
        3. Search Logs Analysis
        4. Website Analytics
        5. Website Key Performance Indicators
        6. Action-Object Pairs
      5. PHENOMENOLOGY / ETHNOMETHODOLOGY
        1. Estimating User Behavior
        2. Interaction Design for Studying User Behavior
        3. Tips for Tracking Web User Behavior
        4. User Profiling for Dynamic Page Customization
        5. Social Networks
      6. CONTENT ANALYSIS
        1. Query Classification
        2. Topic Analysis
        3. Domain Specific Log Analysis
        4. Language Specific Log Analysis
        5. Goal Specific Query Analysis
      7. ETHNOGRAPHY
        1. Nethnography
        2. The Blogs
        3. Finding Meaning in Online Discussions
      8. HISTORICAL METHOD
        1. Historic Perspective
      9. DISCOURSE ANALYSIS
        1. Web-Traffic Measurement
      10. CASE STUDY
        1. Unit of Analysis and Validity of Web Log Data
      11. DIVERSE RESEARCH METHODOLOGIES, COMMON ISSUES
        1. Privacy and Web Logging
        2. Recommendations for Reporting Web Usage Studies
      12. CONCLUSION
      13. REFERENCES
      14. KEY TERMS
  11. Glossary
  12. Compilation of References
  13. About the Contributors