While maybe not the most fun part of a machine learning problem, loading the data is an important step. I'm going to cover my data loading methodology here so that you can get a feel for how I handle loading a dataset.
from sklearn.preprocessing import StandardScalerimport pandas as pdTRAIN_DATA = "./data/train/train_data.csv"VAL_DATA = "./data/val/val_data.csv"TEST_DATA = "./data/test/test_data.csv"def load_data(): """Loads train, val, and test datasets from disk""" train = pd.read_csv(TRAIN_DATA) val = pd.read_csv(VAL_DATA) test = pd.read_csv(TEST_DATA) # we will use sklearn's StandardScaler to scale our data to 0 mean, unit variance. scaler = StandardScaler() train = scaler.fit_transform(train) val = scaler.transform(val) ...