The R language (and open-source software) is the de facto standard among statisticians for the development of statistical software, and is widely used for statistical software development and data analysis. R is a modern implementation of S, one of several statistical programming languages designed at Bell Laboratories.
R is much more than a programming language. It’s an interactive environment for performing statistics. R offers a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The ability to download and install R packages is a key factor which makes R an excellent language to learn. Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. The CRAN package repository hosts over 10,000 packages, and Bioconductor hosts nearly 1,300 packages.
R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form.
We have published a series covering the best open source programming books for other popular languages. Read them here.
By William N Venables, David M Smith, and the R Core Team (105 pages)
This tutorial manual provides a comprehensive introduction to R, a software package for statistical computing and graphics.
R supports a wide range of statistical techniques and is easily extensible via user-defined functions. One of R’s strengths is the ease with which publication-quality plots can be produced in a wide variety of formats.
The manual is released under an open source license.
An Introduction to R is one of the R Manuals. Visit the Comprehensive R Archive Network to read the others.
By Hadley Wickham and Garrett Grolemund (522 pages)
R for Data Science teaches you how to do data science with R. It introduces the reader to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
Learn how to:
This book is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.
By Julia Silge and David Robinson (HTML)
This book serves as an introduction to text mining using the tidytext package and other tidy tools in R. The authors of this book developed the tidytext package.
The functions provided by the tidytext package are relatively simple; what is important are the possible applications. This book provides compelling examples of real text mining problems. The chapters cover:
Text Mining with R is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License.
By Patrick Burns (126 pages)
The R Inferno is an essential guide to the trouble spots and oddities of R. The book shares a lot of useful information and maintains the reader’s interest. The book provides many useful techniques and tips for reducing memory usage, improving performance, and avoiding errors in computational analysis.
R is regarded as an excellent computing environment for most data analysis tasks. R is free, released under an open-source license, and has thousands of contributed packages. It is used in such diverse fields as ecology, finance, genomics and music.
Chapters are headed:
The book is illuminated with famous Botticelli artwork: The Giants, The Sowers of Discord, and The Thieves.
Note: The book is not officially released under an open source license, but Patrick Burns seems fine about it being treated as open.
By G. Jay Kerns (412 pages)
Introduction to Probability and Statistics Using R is a textbook for an undergraduate course in probability and statistics. The approximate prerequisites are two or three semesters of calculus and some linear algebra. Students attending the class include mathematics, engineering, and computer science majors.
Introduction to Probability and Statistics Using R is licensed under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation.
By Trevor Martin (68 pages)
The Undergraduate Guide to R is a beginner’s introduction to the R programming language.
After reading this book, you’ll be able to perform most common data manipulating, analyzing, comparing and viewing tasks with R. The book also provides the necessary foundation blocks to enable the reader to progress to more advanced R techniques, and offers general tips and suggestions about how to code in R.
The Undergraduate Guide to R is written so that the reader needs no prior knowledge of programming (although basic knowledge of general computer skills and statistics is essential).
The book is freely available, licensed under Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0).
By Benjamin Yakir (324 pages)
Introduction to Statistical Thinking is targeted at college students who need to learn statistics, students with little background in mathematics and often no motivation to learn more. This book uses the basic structure of generic introduction to statistics course.
Much of the book is based on material from the online book “Collaborative Statistics” by Barbara Illowsky and Susan Dean.
The book is licensed under the conditions of the Creative Commons Attribution License (CC-BY 3.0).
By Chester Ismay and Albert Y. Kim (HTML)
ModernDive is a textbook that teaches students how to:
The book uses many R packages, and makes effective use real-world data sets to communicate key concepts. The book offers good treatment of the basics of data analysis (data wrangling and data exploration and data visualization, including the elegant roadmap for selecting a chart type shown below) and statistical concepts including simulation, regression and hypothesis testing.
The book also aims to give students an understanding of the overarching data analysis process, including concepts like reproducibility and telling stories with data.
This book is written using the CC0 1.0 Universal License.
By Avril Coghlan (35 pages)
Little Book of R for Biomedical Statistics is a simple introduction to biomedical statistics using the R statistics software.
This booklet tells you how to use the R software to carry out some simple analyses that are common in biomedical statistics. In particular, the focus is on cohort and case-control studies that aim to test whether particular factors are associated with disease, randomised trials, and meta-analysis.
This booklet assumes the reader has some basic knowledge of biomedical statistics, and the principal focus of the booklet is not to explain biomedical statistics analyses, but instead to explain how to carry out these analyses using R.
The booklet examines:
The book is licensed under a Creative Commons Attribution 3.0 License.
By Julian J. Faraway (213 pages)
Practical Regression and Anova in R is an intermediate text on the practice of regression and analysis of variance. The objective is to learn what methods are available and more importantly, when they should be applied. The book is not an introduction to R.
Permission to reproduce individual copies of this book for personal use is granted.
Here are good free-to-download programming books.
- Learning Statistics with R – covers the contents of an introductory statistics class, as typically taught to undergraduate psychology students, focusing on the use of the R statistical software.
The license terms of these books are unclear.
- Population Health Data Science with R – introduces R to public health epidemiologists and health care analysts conducting population health analyses.
- Spatial Epidemiology Notes: Applications and Vignettes in R – presents a relatively brief, non-jargony overview of how practicing epidemiologists can apply some of the extremely powerful spatial analytic tools that are easily available to them.