Wrapping up

We can combine all these queries while also selecting the original attributes. Since the data is still ordered by pclass and passenger name in alphabetical order, we should also randomize the results. We end up with the following query:

SELECT pclass, survived, name, sex, COALESCE(age, 28) as age, sibsp, parch, ticket, COALESCE(fare, 14.5) as fare, cabin, embarked, boat, body, home_dest, CASE  WHEN age is null THEN 0 ELSE 1 END as is_age_missing, log(fare + 1, 2) as log_fare,split(name, ' ')[2] as title,substr(cabin, 1, 1) as deck,sibsp + parch + 1 as family_size FROM titanicORDER BY RAND();

Let us run that query. The results will be displayed in the results panel and also written in a CSV file in the query result location on ...

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.