Creating our own non-linear dataset

A good way to create a non-linear dataset is to mix sines with different phases. The dataset we will work with in this chapter is created with the following Python script and exported to a CSV file:

import numpy as npn_samples = 1000de_linearize = lambda X: np.cos(1.5 * np.pi * X) + np.cos( 5 * np.pi * X )X = np.sort(np.random.rand(n_samples)) * 2y = de_linearize(X) + np.random.randn(n_samples) * 0.1

As usual, X is the predictor, and y the outcome. You can use variations on that script to easily generate other non-linear datasets. Note that we have used a lambda function, which is a Pythonic way of declaring a function on the spot when needed. Then we shuffle the dataset by sorting randomly (np.random.rand(n_samples) ...

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.