A Window into the R Packages mentioned at the R New-York Conference
I attended the R New York Conference https://www.rstats.nyc on May 10 and May 11, 2019. The conference was packed with a number of wonderful presentations. Many of the speakers were kind enough to share the R packages they used in the work they presented. I was familiar with a few of the packages but most were new to me. Here is a short summary of the different R packages that were mentioned by the different speakers in the conference. All the package names are linked to their documentation. The Use Cases here correspond to the examples used by the speakers.
- tidyverse: Collection of R packages designed for datascience
- Speaker: Ludmilla Janda, Amplify
- Use case: Building Tidyverse from Scratch Using Scratch Box to teach Data Visualization to middle school students. Different blocks are created for each of the tidyverse components. The blocks can be chained together to complete the tasks. Students are able to do data processing and build complex visualizations by interacting only with the blocks.
- h20: Open source machine learning platform for parallelized implementations of supervised and unsupervised algorithms
- Speaker: Emily Dodwell, AT&T Labs Research
- Use Case: Building recommendation systems for DirectTV using tangled Lasso and Boosted trees
- sparklyr: R interface to Apache Spark*
- Speaker: Emily Dodwell, AT&T Labs Research
- Use Case: Building recommendation systems for DirectTV using tangled Lasso and Boosted trees
- usethis: A workflow package for creating R packages
- Speaker: Emily Robinson, DataCamp
- Use Case: Building R packages
- Resources: Git With R, R-Pkgs
- drake: Pipeline toolkit for building reproducible and replicable projects - an alternate to Makefile
- Resource: NYC FIRES
- Speaker: Amanda Dobbyn, Earlybird Software
- Use Case: Workflow for analysing when and where fires happen in NYC
- arrow: R interface to Apache Arrow, a cross-language development platform for in-memory data
- Resource: Arrow Apache
- Speaker: Wes McKinney, Ursa Labs
- *Use Case: Build compatibility between the different platforms of R, Python, Matlabs, etc *
- rnn: Recurrent neural network
- Speaker: Dr. Michelle Gill, Benevolent AI
- Use Case: AI driven drug discovery
- glmnet: Fit a generalized linear model via penalized maximum likelihood
- Speaker: Dr. Adam Chekroud, Spring Health
- Use Case: In the process of personaling mental health care, fit a model to measure risk by predicting suicide attempts and suicide deaths following outpatient visits
- xgboost: Extreme Gradient Boosting
- Speaker: Dr. Adam Chekroud, Spring Health
- Use Case: In the process of personalizing mental health care, rank most likely reasons why an individual won’t get treatment
- lme4, nlme: Fit linear and non-linear mixed effects models
- Speaker: Dr. Adam Chekroud, Spring Health
- Use Case: Compare new informed treatment versus psychological treatment for depression and anxiety
- keras: High level neural network API using Pythin and tensorflow in the background
- Speaker: Jacqueline Nolis, Nolis LLC
- Use Case: Create a sample neural network to predict new pet names
- plumber: A R package that converts your existing R code to a web API using a handful of special one-line comments.
- rocker: Use Docker (containers) to deploy an R Plumber API
- Speaker: Heather Nolix, T-Mobile
- Use Case: Use elastic container storage on AWS to set up the dockers (containers) to deploy the neural network application built using plumbr
- Resource: Detailed instructions on setting up docker
- rvest: Scrape data from web html pages
- Speaker: Namita Nandakumar, Philadelphia Eagles
- Use Case: Scrape hockey reference draft data
- rstanarm: Bayesian applied regression modeling via Stan
- Speaker: Namita Nandakumar, Philadelphia Eagles
- Use Case: Used baysian statistics and logistic regression to estimate the probability of picking overagers during draft
- Resource: Code and slides on Github
- neuroconductor: An open source platform in R for rapid testing analyzing neuro images
- Speaker: Elizabeth Sweeney, Weill Cornell
- Use Case: Analyse structural MRI images
- fslr: Useful open-source scriptable software using wrapper fucntions for neuroimaging analysis.
- Speaker: Elizabeth Sweeney, Weill Cornell
- Use Case: Analyze functional MRI images of the brain
- brms: Bayesian regression models using Stan
- Speaker: Jim Savage, Scmidt Futures
- Use Case: Integrate loss function over posterior estimate
- bart: Machine learning with bayesian additive regression trees
- Speaker: Jim Savage, Scmidt Futures
- Use Case: Machine learning as a bayesian equivalent to xgboost
- rchie: A parser for Archie ML.
- Speaker: Dr. Naom Ross, ROpenSci & EcoHealth Alliance
- Resource: Archie ML - Archie Markup Language is a structured text format optimized for human writability.
- Use Case: Write in google doc with a little markup
- redoc: Two-way R Markdown-Microsoft Word workflow. It generates Word documents that can be de-rendered back into R Markdown, retaining edits on the Word document, including tracked changes.
- Speaker: Dr. Naom Ross, ROpenSci & EcoHealth Alliance
- Use Case: Collaboration between datascience and management teams using spearate R workflow and office workflow
- parsnip: Tidymodels package that generalizes model interfaces across package - part of caret, has a tidy interface and creates a unified interface to models
- Speaker: Dr. Max Khun,, RStudio
- Use Case: Build any of the machine learning models
- tabulizer - provides R bindings to the Tabula java library, which can be used to computationally extract tables from PDF documents
- Speaker: Brooke Wason, Senior Data Scientist, ACLU
- Use Case: ACLU extracts tables from pdf documents on immigration reports
- daff: Identify changes in dataframes over time
- Speaker: Brooke Wason, Senior Data Scientist, ACLU
- Use Case: ACLU tracks changes in data extracted from immigration reports
- readr: Tidyverse package providing a fast and friendly way to read rectangular data (like csv, tsv, and fwf).
- Speaker: Brooke Wason, Senior Data Scientist, ACLU
- Use Case: Read multiple file types
- visdat: Visual display of observations
with missing data
- Speaker: Brooke Wason, Senior Data Scientist, ACLU
- Use Case: Visualize missing data
- rtweet: R client for accessing Twitter’s REST and stream APIs
- Speaker: Amanda Dobbyn, Earlybird Software
- Use Case: Gather incidents of fires in NYC from twitter feed
- ggmap: Extension of ggplot2 to visualize spatial data and models on top of static maps from various online sources like Google maps
- Speaker: Amanda Dobbyn, Earlybird Software
- Use Case: Map the locations of the NYC fires
- survival: R package for doing survival analysis
- Speaker: Elizabeth Sweeney
- Use Case: Survival analysis of data from clinical trials