You are previewing Programming Collective Intelligence.

Programming Collective Intelligence

Cover of Programming Collective Intelligence by Toby Segaran Published by O'Reilly Media, Inc.
  1. Programming Collective Intelligence
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. A Note Regarding Supplemental Files
    3. Praise for Programming Collective Intelligence
    4. Preface
      1. Prerequisites
      2. Style of Examples
      3. Why Python?
      4. Open APIs
      5. Overview of the Chapters
      6. Conventions
      7. Using Code Examples
      8. How to Contact Us
      9. Safari® Books Online
      10. Acknowledgments
    5. 1. Introduction to Collective Intelligence
      1. What Is Collective Intelligence?
      2. What Is Machine Learning?
      3. Limits of Machine Learning
      4. Real-Life Examples
      5. Other Uses for Learning Algorithms
    6. 2. Making Recommendations
      1. Collaborative Filtering
      2. Collecting Preferences
      3. Finding Similar Users
      4. Recommending Items
      5. Matching Products
      6. Building a del.icio.us Link Recommender
      7. Item-Based Filtering
      8. Using the MovieLens Dataset
      9. User-Based or Item-Based Filtering?
      10. Exercises
    7. 3. Discovering Groups
      1. Supervised versus Unsupervised Learning
      2. Word Vectors
      3. Hierarchical Clustering
      4. Drawing the Dendrogram
      5. Column Clustering
      6. K-Means Clustering
      7. Clusters of Preferences
      8. Viewing Data in Two Dimensions
      9. Other Things to Cluster
      10. Exercises
    8. 4. Searching and Ranking
      1. What's in a Search Engine?
      2. A Simple Crawler
      3. Building the Index
      4. Querying
      5. Content-Based Ranking
      6. Using Inbound Links
      7. Learning from Clicks
      8. Exercises
    9. 5. Optimization
      1. Group Travel
      2. Representing Solutions
      3. The Cost Function
      4. Random Searching
      5. Hill Climbing
      6. Simulated Annealing
      7. Genetic Algorithms
      8. Real Flight Searches
      9. Optimizing for Preferences
      10. Network Visualization
      11. Other Possibilities
      12. Exercises
    10. 6. Document Filtering
      1. Filtering Spam
      2. Documents and Words
      3. Training the Classifier
      4. Calculating Probabilities
      5. A Naïve Classifier
      6. The Fisher Method
      7. Persisting the Trained Classifiers
      8. Filtering Blog Feeds
      9. Improving Feature Detection
      10. Using Akismet
      11. Alternative Methods
      12. Exercises
    11. 7. Modeling with Decision Trees
      1. Predicting Signups
      2. Introducing Decision Trees
      3. Training the Tree
      4. Choosing the Best Split
      5. Recursive Tree Building
      6. Displaying the Tree
      7. Classifying New Observations
      8. Pruning the Tree
      9. Dealing with Missing Data
      10. Dealing with Numerical Outcomes
      11. Modeling Home Prices
      12. Modeling "Hotness"
      13. When to Use Decision Trees
      14. Exercises
    12. 8. Building Price Models
      1. Building a Sample Dataset
      2. k-Nearest Neighbors
      3. Weighted Neighbors
      4. Cross-Validation
      5. Heterogeneous Variables
      6. Optimizing the Scale
      7. Uneven Distributions
      8. Using Real Data—the eBay API
      9. When to Use k-Nearest Neighbors
      10. Exercises
    13. 9. Advanced Classification: Kernel Methods and SVMs
      1. Matchmaker Dataset
      2. Difficulties with the Data
      3. Basic Linear Classification
      4. Categorical Features
      5. Scaling the Data
      6. Understanding Kernel Methods
      7. Support-Vector Machines
      8. Using LIBSVM
      9. Matching on Facebook
      10. Exercises
    14. 10. Finding Independent Features
      1. A Corpus of News
      2. Previous Approaches
      3. Non-Negative Matrix Factorization
      4. Displaying the Results
      5. Using Stock Market Data
      6. Exercises
    15. 11. EVOLVING INTELLIGENCE
      1. What Is Genetic Programming?
      2. Programs As Trees
      3. Creating the Initial Population
      4. Testing a Solution
      5. Mutating Programs
      6. Crossover
      7. Building the Environment
      8. A Simple Game
      9. Further Possibilities
      10. Exercises
    16. 12. Algorithm Summary
      1. Bayesian Classifier
      2. Decision Tree Classifier
      3. Neural Networks
      4. Support-Vector Machines
      5. k-Nearest Neighbors
      6. Clustering
      7. Multidimensional Scaling
      8. Non-Negative Matrix Factorization
      9. Optimization
    17. A. Third-Party Libraries
      1. Universal Feed Parser
      2. Python Imaging Library
      3. Beautiful Soup
      4. pysqlite
      5. NumPy
      6. matplotlib
      7. pydelicious
    18. B. Mathematical Formulas
      1. Euclidean Distance
      2. Pearson Correlation Coefficient
      3. Weighted Mean
      4. Tanimoto Coefficient
      5. Conditional Probability
      6. Gini Impurity
      7. Entropy
      8. Variance
      9. Gaussian Function
      10. Dot-Products
    19. Index
    20. About the Author
    21. Colophon
    22. SPECIAL OFFER: Upgrade this ebook with O’Reilly
O'Reilly logo

Programs As Trees

In order to create programs that can be tested, mutated, and bred, you'll need a way to represent and run them from within your Python code. The representation has to lend itself to easy modification and, more importantly, has to be guaranteed to be an actual program—which means generating random strings and trying to treat them as Python code won't work. Researchers have come up with a few different ways to represent programs for genetic programming, and the most commonly used is a tree representation.

Most programming languages, when compiled or interpreted, are first turned into a parse tree, which is very similar to what you'll be working with here. (The programming language Lisp and its variants are essentially ways of entering a parse tree directly.) An example of a parse tree is shown in Figure 11-2.

Example program tree

Figure 11-2. Example program tree

Each node represents either an operation on its child nodes or an endpoint, such as a parameter with a constant value. For example, the circular node is a sum operation on its two branches, in this case, the values Y and 5. Once this point is evaluated, it is given to the node above it, which in turn applies its own operation to its branches. You'll also notice that one of the nodes has the operation "if," which specifies that if its leftmost branch evaluates to true, return its center branch; if it doesn't, return its rightmost ...

The best content for your career. Discover unlimited learning on demand for around $1/day.