A good way to create a non-linear dataset is to mix sines with different phases. The dataset we will work with in this chapter is created with the following Python script and exported to a CSV file:
import numpy as npn_samples = 1000de_linearize = lambda X: np.cos(1.5 * np.pi * X) + np.cos( 5 * np.pi * X )X = np.sort(np.random.rand(n_samples)) * 2y = de_linearize(X) + np.random.randn(n_samples) * 0.1
As usual, X is the predictor, and y the outcome. You can use variations on that script to easily generate other non-linear datasets. Note that we have used a lambda function, which is a Pythonic way of declaring a function on the spot when needed. Then we shuffle the dataset by sorting randomly (np.random.rand(n_samples) ...