David G. Robinson (data scientist)

David G. Robinson is a data scientist at the Heap analytics company. He is a co-author of the tidytext R (programming language) package and the O’Reilly book, Text Mining with R. Robinson has previously worked as a chief data scientist at DataCamp and as a data scientist at Stack Overflow. He was also a data engineer at Flatiron Health in 2019.

Education
Robinson graduated from Harvard University with a Bachelor of Arts degree in Statistics in 2010. He received a PhD in Quantitative and Computational Biology from Princeton University.

Career
Robinson previously worked at Flatiron Health, where he used data science in the fight against cancer on the Data Insights Engineering team. He has three courses on DataCamp published, which assist people with learning R and data science. He also co-authored Text Mining with R: A Tidy Approach with Julia Silge. The book was published by O'Reilly in July 2017 and is a guide to drawing insights from text using the tidytext package in R. Another book authored by Robinson is Introduction to Empirical Bayes: Examples from Baseball Statistics, an e-book demonstrating the statistical method of empirical Bayes, based on the example of estimating baseball batting averages.

Robinson is known for his author profiling and sentiment analysis of Donald Trump's tweets in 2016, when he found that posts from Trump's official account came from multiple sources.

Publications
Robinson has numerous publications including, "Widespread changes in mRNA stability contribute to quiescence-specific gene expression patterns in a fibroblast model of quiescence", "broom: An R package for converting statistical analysis objects into tidy data frames", "A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays", "subSeq: Determining appropriate sequencing depth through efficient read subsampling", "Design and Analysis of Bar-seq Experiments", and "OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences".

As mentioned, his book Introduction to Empirical Bayes helps readers understand Bayesian methods for estimating binomial proportions, through a series of examples drawn from baseball statistics.