Deduplication of nonconflicting data items
Duplication is a common problem when collecting large amounts of data. In this recipe, we will combine similar records in a way that ensures no information is lost.
Getting ready
Create an input.csv
file with repeated data:
How to do it...
Create a new file, which we will call Main.hs
, and perform the following steps:
- We will be using the
CSV
,Map
, andMaybe
packages:import Text.CSV (parseCSV, Record) import Data.Map (fromListWith) import Control.Applicative ((<|>))
- Define the
Item
data type corresponding to the CSV input:data Item = Item { name :: String , color :: Maybe String , cost :: Maybe Float } deriving ...
Get Haskell Data Analysis Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.