You are previewing Programming Collective Intelligence.

Programming Collective Intelligence

Cover of Programming Collective Intelligence by Toby Segaran Published by O'Reilly Media, Inc.
  1. Programming Collective Intelligence
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. A Note Regarding Supplemental Files
    3. Praise for Programming Collective Intelligence
    4. Preface
      1. Prerequisites
      2. Style of Examples
      3. Why Python?
      4. Open APIs
      5. Overview of the Chapters
      6. Conventions
      7. Using Code Examples
      8. How to Contact Us
      9. Safari® Books Online
      10. Acknowledgments
    5. 1. Introduction to Collective Intelligence
      1. What Is Collective Intelligence?
      2. What Is Machine Learning?
      3. Limits of Machine Learning
      4. Real-Life Examples
      5. Other Uses for Learning Algorithms
    6. 2. Making Recommendations
      1. Collaborative Filtering
      2. Collecting Preferences
      3. Finding Similar Users
      4. Recommending Items
      5. Matching Products
      6. Building a del.icio.us Link Recommender
      7. Item-Based Filtering
      8. Using the MovieLens Dataset
      9. User-Based or Item-Based Filtering?
      10. Exercises
    7. 3. Discovering Groups
      1. Supervised versus Unsupervised Learning
      2. Word Vectors
      3. Hierarchical Clustering
      4. Drawing the Dendrogram
      5. Column Clustering
      6. K-Means Clustering
      7. Clusters of Preferences
      8. Viewing Data in Two Dimensions
      9. Other Things to Cluster
      10. Exercises
    8. 4. Searching and Ranking
      1. What's in a Search Engine?
      2. A Simple Crawler
      3. Building the Index
      4. Querying
      5. Content-Based Ranking
      6. Using Inbound Links
      7. Learning from Clicks
      8. Exercises
    9. 5. Optimization
      1. Group Travel
      2. Representing Solutions
      3. The Cost Function
      4. Random Searching
      5. Hill Climbing
      6. Simulated Annealing
      7. Genetic Algorithms
      8. Real Flight Searches
      9. Optimizing for Preferences
      10. Network Visualization
      11. Other Possibilities
      12. Exercises
    10. 6. Document Filtering
      1. Filtering Spam
      2. Documents and Words
      3. Training the Classifier
      4. Calculating Probabilities
      5. A Naïve Classifier
      6. The Fisher Method
      7. Persisting the Trained Classifiers
      8. Filtering Blog Feeds
      9. Improving Feature Detection
      10. Using Akismet
      11. Alternative Methods
      12. Exercises
    11. 7. Modeling with Decision Trees
      1. Predicting Signups
      2. Introducing Decision Trees
      3. Training the Tree
      4. Choosing the Best Split
      5. Recursive Tree Building
      6. Displaying the Tree
      7. Classifying New Observations
      8. Pruning the Tree
      9. Dealing with Missing Data
      10. Dealing with Numerical Outcomes
      11. Modeling Home Prices
      12. Modeling "Hotness"
      13. When to Use Decision Trees
      14. Exercises
    12. 8. Building Price Models
      1. Building a Sample Dataset
      2. k-Nearest Neighbors
      3. Weighted Neighbors
      4. Cross-Validation
      5. Heterogeneous Variables
      6. Optimizing the Scale
      7. Uneven Distributions
      8. Using Real Data—the eBay API
      9. When to Use k-Nearest Neighbors
      10. Exercises
    13. 9. Advanced Classification: Kernel Methods and SVMs
      1. Matchmaker Dataset
      2. Difficulties with the Data
      3. Basic Linear Classification
      4. Categorical Features
      5. Scaling the Data
      6. Understanding Kernel Methods
      7. Support-Vector Machines
      8. Using LIBSVM
      9. Matching on Facebook
      10. Exercises
    14. 10. Finding Independent Features
      1. A Corpus of News
      2. Previous Approaches
      3. Non-Negative Matrix Factorization
      4. Displaying the Results
      5. Using Stock Market Data
      6. Exercises
    15. 11. EVOLVING INTELLIGENCE
      1. What Is Genetic Programming?
      2. Programs As Trees
      3. Creating the Initial Population
      4. Testing a Solution
      5. Mutating Programs
      6. Crossover
      7. Building the Environment
      8. A Simple Game
      9. Further Possibilities
      10. Exercises
    16. 12. Algorithm Summary
      1. Bayesian Classifier
      2. Decision Tree Classifier
      3. Neural Networks
      4. Support-Vector Machines
      5. k-Nearest Neighbors
      6. Clustering
      7. Multidimensional Scaling
      8. Non-Negative Matrix Factorization
      9. Optimization
    17. A. Third-Party Libraries
      1. Universal Feed Parser
      2. Python Imaging Library
      3. Beautiful Soup
      4. pysqlite
      5. NumPy
      6. matplotlib
      7. pydelicious
    18. B. Mathematical Formulas
      1. Euclidean Distance
      2. Pearson Correlation Coefficient
      3. Weighted Mean
      4. Tanimoto Coefficient
      5. Conditional Probability
      6. Gini Impurity
      7. Entropy
      8. Variance
      9. Gaussian Function
      10. Dot-Products
    19. Index
    20. About the Author
    21. Colophon
    22. SPECIAL OFFER: Upgrade this ebook with O’Reilly
O'Reilly logo

Network Visualization

The final example in this chapter shows another way in which optimization can be used on problems that are completely unrelated to one another. In this case, the problem is the visualization of networks. A network in this case is any set of things that are connected together. A good example in online applications is a social network like MySpace, Facebook, or LinkedIn, where people are connected because they are friends or have a professional relationship. Each member of the site chooses to whom they are connected, and collectively this creates a network of people. It is interesting to visualize such networks to determine their structure, perhaps in order to find the people who are connectors (those who know a lot of people or who serve as a link between otherwise self-contained cliques).

The Layout Problem

When drawing a network to visualize a big group of people and the links between them, one problem is deciding where each name (or icon) should be placed in the picture. For example, consider the network in Figure 5-7.

A confusing network layout

Figure 5-7. A confusing network layout

In this figure, you can see that Augustus is friends with Willy, Violet, and Miranda. But the layout of the network is a bit messy, and adding more people would make it very confusing. A much cleaner layout is shown in Figure 5-8.

Figure 5-8. A clean network layout

This section will look at how optimization can ...

The best content for your career. Discover unlimited learning on demand for around $1/day.