Data Science

Humble Book Bundle – A Collection of 16 O’Reilly Data Science Books

The Triple B’s (Barry Bargain Books)

We recently showcased the finest open source notebook software. Notebooks offer a more exploratory method to write code compared with Integrated Development Environments. They provide a handy way to run ad-hoc queries, to perform complex data analysis and data visualizations. Great software for data scientists.

And data scientists will also be interested in this rather special collection of data science books, all from the publishing doors of O’Reilly. It’s a time-limited offer, so don’t delay.

The books available for just $1 (or more):

  • Data Science at the Command Line: Facing the Future with Time-Tested Tools – a hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.
  • Graph Databases: New Opportunities for Connected Data – learn how to design and implement a graph database that brings the power of graphs to bear on a broad range of problem domains.
  • Practical Machine Learning: A New Look at Anomaly Detection – uses practical examples to explain how the underlying concepts of anomaly detection work.
  • Practical Machine Learning: Innovations in Recommendation – this report explains innovations that make machine learning practical for business production settings—and demonstrates how even a small-scale development team can design an effective large-scale recommendation system.
  • Time Series Databases: New Ways to Store and Access Data – shows you effective ways to collect, persist, and access large-scale time series data for analysis.

Pay $8 or more to also get the following books:

  • Doing Data Science: Straight Talk from the Frontline – based on Columbia University’s Introduction to Data Science class
  • Practical Machine Learning with H20: Powerful, Scalable Techniques for Deep Learning and AI – explore several modern machine-learning techniques such as deep learning, random forests, unsupervised learning, and ensemble learning.
  • Learning Spark: Lightning-Fast Big Data Analysis – this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates. 
  • Head First Data Analysis: A learner’s guide to big numbers, statistics, and good decisions – learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others.
  • Think Stats: Exploratory Data Analysis – this concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python. 
  • Think Bayes: Bayesian Statistics in Python – learn how to solve statistical problems with Python code instead of mathematical notation, and use discrete probability distributions instead of continuous mathematics.

Pay $15 or more to also get the following books.

  • High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark – demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. 
  • Thoughtful Machine Learning with Python – shows you how to integrate and test machine learning algorithms in your code, without the academic subtext. 
  • R in a Nutshell: A Desktop Quick Reference – provides a quick and practical guide to just about everything you can do with the open source R language and software environment. You’ll learn how to write R functions and use R packages to help you prepare, visualize, and analyze data. 
  • Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale – learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. 
  • Cassandra: The Definitive Guide: Distributed Data at Web Scale – learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers.

Grab the books from Humble Bundle’s website.






Leave a Reply