Push the envelope of data science by exploring emerging topics such as data management, machine learning, natural language processing, crowdsourcing, and algorithm design with this O’Reilly video collection—taken from the Hardcore Data Science sessions at Strata + Hadoop World 2014 in New York.
This video collection includes:
Doing the Impossible (Almost)
Ted Dunning, Chief Application Architect, MapR Technologies
Computing quantities such as medians or the number of unique elements usually requires a lot of time, a lot of memory, or both. But not always. Ted describes how these algorithms can be much simpler, and shows you how to apply them to applications like anomaly detection.
Tupleware: Redefining Modern Analytics
Tim Kraska, Professor, Brown University
Learn about Tupleware, a new system developed at Brown University specifically aimed at the challenges faced by the typical user. Tupleware automatically compiles analytical workflows into highly efficient distributed programs instead of interpreting the workflows at run-time.
Data Science for Humans, Not Robots
Alice Zheng, Director of Data Science, Dato
Data is intended for human consumption, yet governed, analyzed, and processed by machines. In this session, you’ll take the perspective of how data appears to machines in order to become more effective at using machines to model and analyze data for people.
Big Data: Efficient Collection and Processing
Anna Gilbert, Professor, University of Michigan
You could spend your time collecting a ton of data from scientific applications, but there are more efficient ways to answer questions of interest. In this session, you’ll learn how to acquire data in summarized or compressed measurements.
Computational Problems in Managing Social Information
Jon Kleinberg, Professor, Cornell University
Social media networks aren’t just venues for people to come together; they’re also explicitly designed environments whose architectures serve to shape behavior. You’ll learn several computational challenges that illustrate this tension between organic interaction and algorithmic design.
Small Data Problems
Kira Radinsky, CTO, SalesPredict
What if you don't have enough data and still want to make predictions? Small data brings a completely different set of problems than big data. Instead of dealing with scale and efficiency, the game here is to draw statistical significant results from very few noisy examples.
Building and Deploying Large-scale Machine Learning Pipelines Using the Berkeley Data Analytics Stack
Ben Recht, Assistant Professor, University of California, Berkeley
Focus on scalable computational tools for large-scale data analysis, statistical signal processing, and machine learning. Ben explores the intersections of convex optimization, mathematical statistics, and randomized algorithms.
Learning About Music and Listeners
Brian Whitman, Principal Scientist, Spotify
Understand how services such as Spotify merge machine-learning and knowledge-based approaches to music understanding with unprecedented amounts of user activity data to unlock the meaning of music taste and preference at a large scale.
Statistical Topic Modeling
Hanna Wallach, Researcher & Professor, Microsoft Research NYC & University of Massachusetts Amherst
Understand how this state-of-the-art machine-learning framework helps you analyze massive document collections. Statistical topic models automatically infer groups of semantically related words (topics) from word co-occurrence patterns in documents without human intervention.
The Aha! Moment: From Data to Insight
Dafna Shahaf, Postdoctoral Fellow, Stanford University
Large-scale data has potential to transform almost every aspect of our world, from science to business. But for this potential to be realized, we must turn data into insight. In this talk, Dafna will describe two of his efforts to address this problem computationally.