Population scale clustering and geographic ethnicity

Next-generation genome sequencing (NGS) reduces overhead and time for genomic sequencing, leading to big data production in an unprecedented way. In contrast, analyzing this large-scale data is computationally expensive and increasingly becomes the key bottleneck. This increase in NGS data in terms of number of samples overall and features per sample demands solutions for massively parallel data processing, which imposes extraordinary challenges on machine learning solutions and bioinformatics approaches. The use of genomic information in medical practice requires efficient analytical methodologies to cope with data from thousands of individuals and millions of their variants.

One of the ...

Get Scala Machine Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.