1

Bayesian Inference

abc

Tools for Approximate Bayesian Computation (ABC)

Implements several ABC algorithms for performing parameter estimation, model selection, and goodnessoffit. Crossvalidation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models.

2

Bayesian Inference

abn

Modelling Multivariate Data with Additive Bayesian Networks

Bayesian network analysis is a form of probabilistic graphical models which derives from empirical data a directed acyclic graph, DAG, describing the dependency structure between random variables. An additive Bayesian network model consists of a form of a DAG where each node comprises a generalized linear model, GLM. Additive Bayesian network models are equivalent to Bayesian multivariate regression using graphical modelling, they generalises the usual multivariable regression, GLM, to multiple dependent variables. ‘abn’ provides routines to help determine optimal Bayesian network models for a given data set, where these models are used to identify statistical dependencies in messy, complex data. The additive formulation of these models is equivalent to multivariate generalised linear modelling (including mixed models with iid random effects). The usual term to describe this model selection process is structure discovery. The core functionality is concerned with model selection  determining the most robust empirical model of data from interdependent variables. Laplace approximations are used to estimate goodness of fit metrics and model parameters, and wrappers are also included to the INLA package which can be obtained from <http://www.rinla.org>. A comprehensive set of documented case studies, numerical accuracy/quality assurance exercises, and additional documentation are available from the ‘abn’ website <http://rbayesiannetworks.org>.

3

Bayesian Inference

AdMit

Adaptive Mixture of Studentt Distributions

Provides functions to perform the fitting of an adaptive mixture of Studentt distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the MetropolisHastings algorithm to obtain quantities of interest for the target density itself.

4

Bayesian Inference

arm (core)

Data Analysis Using Regression and Multilevel/Hierarchical Models

Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.

5

Bayesian Inference

AtelieR

A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests

A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (CentralLimit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).

6

Bayesian Inference

BaBooN

Bayesian Bootstrap Predictive Mean Matching  Multiple and Single Imputation for Discrete Data

Included are two variants of Bayesian Bootstrap Predictive Mean Matching to multiply impute missing data. The first variant is a variablebyvariable imputation combining sequential regression and Predictive Mean Matching (PMM) that has been extended for unordered categorical data. The Bayesian Bootstrap allows for generating approximately proper multiple imputations. The second variant is also based on PMM, but the focus is on imputing several variables at the same time. The suggestion is to use this variant, if the missingdata pattern resembles a data fusion situation, or any other missingbydesign pattern, where several variables have identical missingdata patterns. Both variants can be run as ‘single imputation’ versions, in case the analysis objective is of a purely descriptive nature.

7

Bayesian Inference

BACCO (core)

Bayesian Analysis of Computer Code Output (BACCO)

The BACCO bundle of packages is replaced by the BACCO package, which provides a vignette that illustrates the constituent packages (emulator, approximator, calibrator) in use.

8

Bayesian Inference

BaM

Functions and Datasets for Books by Jeff Gill

Functions and datasets for Jeff Gill: “Bayesian Methods: A Social and Behavioral Sciences Approach”. First, Second, and Third Edition. Published by Chapman and Hall/CRC (2002, 2007, 2014).

9

Bayesian Inference

bamlss

Bayesian Additive Models for Location Scale and Shape (and Beyond)

Infrastructure for estimating probabilistic distributional regression models in a Bayesian framework. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. The conceptual and computational framework is introduced in Umlauf, Klein, Zeileis (2018) <doi:10.1080/10618600.2017.1407325>.

10

Bayesian Inference

BART

Bayesian Additive Regression Trees

Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and timetoevent outcomes. For more information on BART, see Chipman, George and McCulloch (2010) <doi:10.1214/09AOAS285> and Sparapani, Logan, McCulloch and Laud (2016) <doi:10.1002/sim.6893>.

11

Bayesian Inference

BAS

Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling

Package for Bayesian Variable Selection and Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s gprior or mixtures of gpriors corresponding to the ZellnerSiow Cauchy Priors or the mixture of gpriors from Liang et al (2008) <doi:10.1198/016214507000001337> for linear models or mixtures of gpriors in GLMs of Li and Clyde (2018) <arXiv:1503.06913>. Other model selection criteria include AIC, BIC and Empirical Bayes estimates of g. Sampling probabilities may be updated based on the sampled models using Sampling w/out Replacement or an efficient MCMC algorithm samples models using the BAS tree structure as an efficient hash table. Uniform priors over all models or betabinomial prior distributions on model size are allowed, and for large p truncated priors on the model space may be used. The user may force variables to always be included. Details behind the sampling algorithm are provided in Clyde, Ghosh and Littman (2010) <doi:10.1198/jcgs.2010.09049>. This material is based upon work supported by the National Science Foundation under Grant DMS1106891. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

12

Bayesian Inference

BayesDA

Functions and Datasets for the book “Bayesian Data Analysis”

Functions for Bayesian Data Analysis, with datasets from the book “Bayesian data Analysis (second edition)” by Gelman, Carlin, Stern and Rubin. Not all datasets yet, hopefully completed soon.

13

Bayesian Inference

BayesFactor

Computation of Bayes Factors for Common Designs

A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one and twosample designs, oneway designs, general ANOVA designs, and linear regression.

14

Bayesian Inference

bayesGARCH

Bayesian Estimation of the GARCH(1,1) Model with Studentt Innovations

Provides the bayesGARCH() function which performs the Bayesian estimation of the GARCH(1,1) model with Student’s t innovations as described in Ardia (2008) <doi:10.1007/9783540786573>.

15

Bayesian Inference

bayesImageS

Bayesian Methods for Image Segmentation using a Potts Model

Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior of Moores et al. (2015) <doi:10.1016/j.csda.2014.12.001>. Latent labels are sampled using chequerboard updating or SwendsenWang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, approximate Bayesian computation (ABCMCMC and ABCSMC), and the parametric functional approximate Bayesian (PFAB) algorithm. Refer to <doi:10.1007/s1122201495256> and <doi:10.1214/18BA1130> for further details.

16

Bayesian Inference

bayesm (core)

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

17

Bayesian Inference

bayesmeta

Bayesian RandomEffects MetaAnalysis

A collection of functions allowing to derive the posterior distribution of the two parameters in a randomeffects metaanalysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive pvalues, etc.

18

Bayesian Inference

bayesmix

Bayesian Mixture Models with JAGS

The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.

19

Bayesian Inference

bayesQR

Bayesian Quantile Regression

Bayesian quantile regression using the asymmetric Laplace distribution, both continuous as well as binary dependent variables are supported. The package consists of implementations of the methods of Yu & Moyeed (2001) <doi:10.1016/S01677152(01)001249>, Benoit & Van den Poel (2012) <doi:10.1002/jae.1216> and AlHamzawi, Yu & Benoit (2012) <doi:10.1177/1471082X1101200304>. To speed up the calculations, the Markov Chain Monte Carlo core of all algorithms is programmed in Fortran and called from R.

20

Bayesian Inference

BayesSummaryStatLM

MCMC Sampling of Bayesian Linear Models via Summary Statistics

Methods for generating Markov Chain Monte Carlo (MCMC) posterior samples of Bayesian linear regression model parameters that require only summary statistics of data as input. Summary statistics are useful for systems with very limited amounts of physical memory. The package provides two functions: one function that computes summary statistics of data and one function that carries out the MCMC posterior sampling for Bayesian linear regression models where summary statistics are used as input. The function read.regress.data.ff utilizes the R package ‘ff’ to handle data sets that are too large to fit into a user’s physical memory, by reading in data in chunks.

21

Bayesian Inference

bayesSurv (core)

Bayesian Survival Regression with Flexible Error and Random Effects Distributions

Contains Bayesian implementations of MixedEffects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only rightcensored but also intervalcensored, doublyintervalcensored or misclassified intervalcensored.

22

Bayesian Inference

BayesTree

Bayesian Additive Regression Trees

This is an implementation of BART:Bayesian Additive Regression Trees, by Chipman, George, McCulloch (2010).

23

Bayesian Inference

BayesValidate

BayesValidate Package

BayesValidate implements the software validation method described in the paper “Validation of Software for Bayesian Models using Posterior Quantiles” (Cook, Gelman, and Rubin, 2005). It inputs a function to perform Bayesian inference as well as functions to generate data from the Bayesian model being fit, and repeatedly generates and analyzes data to check that the Bayesian inference program works properly.

24

Bayesian Inference

BayesVarSel

Bayes Factors, Model Choice and Variable Selection in Linear Models

Conceived to calculate Bayes factors in linear models and then to provide a formal Bayesian answer to testing and variable selection problems. From a theoretical side, the emphasis in this package is placed on the prior distributions and it allows a wide range of them: Jeffreys (1961); Zellner and Siow(1980)<doi:10.1007/bf02888369>; Zellner and Siow(1984); Zellner (1986)<doi:10.2307/2233941>; Fernandez et al. (2001)<doi:10.1016/s03044076(00)000762>; Liang et al. (2008)<doi:10.1198/016214507000001337> and Bayarri et al. (2012)<doi:10.1214/12aos1013>. The interaction with the package is through a friendly interface that syntactically mimics the wellknown lm() command of R. The resulting objects can be easily explored providing the user very valuable information (like marginal, joint and conditional inclusion probabilities of potential variables; the highest posterior probability model, HPM; the median probability model, MPM) about the structure of the true data generating model. Additionally, this package incorporates abilities to handle problems with a large number of potential explanatory variables through parallel and heuristic versions of the main commands, GarciaDonato and MartinezBeneito (2013)<doi:10.1080/01621459.2012.742443>.

25

Bayesian Inference

BayesX

R Utilities Accompanying the Software Package BayesX

Functions for exploring and visualising estimation results obtained with BayesX, a free software for estimating structured additive regression models (<http://www.BayesX.org>). In addition, functions that allow to read, write and manipulate map objects that are required in spatial analyses performed with BayesX.

26

Bayesian Inference

BayHaz

R Functions for Bayesian Hazard Rate Estimation

A suite of R functions for Bayesian estimation of smooth hazard rates via Compound Poisson Process (CPP) and Bayesian Penalized Spline (BPS) priors.

27

Bayesian Inference

BAYSTAR

On Bayesian analysis of Threshold autoregressive model (BAYSTAR)

The manuscript introduces the BAYSTAR package, which provides the functionality for Bayesian estimation in autoregressive threshold models.

28

Bayesian Inference

bbemkr

Bayesian bandwidth estimation for multivariate kernel regression with Gaussian error

Bayesian bandwidth estimation for NadarayaWatson type multivariate kernel regression with Gaussian error density

29

Bayesian Inference

BCBCSF

BiasCorrected Bayesian Classification with Selected Features

Fully Bayesian Classification with a subset of highdimensional features, such as expression levels of genes. The data are modeled with a hierarchical Bayesian models using heavytailed t distributions as priors. When a large number of features are available, one may like to select only a subset of features to use, typically those features strongly correlated with the response in training cases. Such a feature selection procedure is however invalid since the relationship between the response and the features has be exaggerated by feature selection. This package provides a way to avoid this bias and yield bettercalibrated predictions for future cases when one uses Fstatistic to select features.

30

Bayesian Inference

BCE

Bayesian composition estimator: estimating sample (taxonomic) composition from biomarker data

Function to estimates taxonomic compositions from biomarker data, using a Bayesian approach.

31

Bayesian Inference

bclust

Bayesian Hierarchical Clustering Using Spike and Slab Models

Builds a dendrogram using log posterior as a natural distance defined by the model and meanwhile waits the clustering variables. It is also capable to computing equivalent Bayesian discrimination probabilities. The adopted method suites small sample large dimension setting. The model parameter estimation maybe difficult, depending on data structure and the chosen distribution family.

32

Bayesian Inference

bcp

Bayesian Analysis of Change Point Problems

Provides an implementation of the Barry and Hartigan (1993) product partition model for the normal errors change point problem using Markov Chain Monte Carlo. It also extends the methodology to regression models on a connected graph (Wang and Emerson, 2015); this allows estimation of change point models with multivariate responses. Parallel MCMC, previously available in bcp v.3.0.0, is currently not implemented.

33

Bayesian Inference

BDgraph

Bayesian Structure Learning in Graphical Models using BirthDeath MCMC

Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14BA889>, Letac et al. (2018) <arXiv:1706.04416>, Dobra and Mohammadi (2018) <doi:10.1214/18AOAS1164>, Mohammadi et al. (2017) <doi:10.1111/rssc.12171>. To speed up the computations, the BDMCMC sampling algorithms are implemented in parallel using OpenMP in C++.

34

Bayesian Inference

BLR

Bayesian Linear Regression

Bayesian Linear Regression.

35

Bayesian Inference

BMA

Bayesian Model Averaging

Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).

36

Bayesian Inference

Bmix

Bayesian Sampling for StickBreaking Mixtures

This is a barebones implementation of sampling algorithms for a variety of Bayesian stickbreaking (marginally DP) mixture models, including particle learning and Gibbs sampling for static DP mixtures, particle learning for dynamic BAR stickbreaking, and DP mixture regression. The software is designed to be easy to customize to suit different situations and for experimentation with stickbreaking models. Since particles are repeatedly copied, it is not an especially efficient implementation.

37

Bayesian Inference

bmixture

Bayesian Estimation for Finite Mixture of Distributions

Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and tdistributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s0018001203233> and Mohammadi and SalehiRad (2012) <doi:10.1080/03610918.2011.588358>.

38

Bayesian Inference

BMS

Bayesian Model Averaging Library

Bayesian model averaging for linear models with a wide choice of (customizable) priors. Builtin priors include coefficient priors (fixed, flexible and hyperg priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Postprocessing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison.

39

Bayesian Inference

bnlearn

Bayesian Network Structure Learning, Parameter Learning and Inference

Bayesian network structure learning, parameter learning and inference. This package implements constraintbased (PC, GS, IAMB, InterIAMB, FastIAMB, MMPC, HitonPC, HPC), pairwise (ARACNE and ChowLiu), scorebased (HillClimbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the TreeAugmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, crossvalidation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <http://www.bnlearn.com>.

40

Bayesian Inference

BNSP

Bayesian Non And SemiParametric Model Fitting

MCMC algorithms & processing functions for non and semiparametric models: 1. Dirichlet process mixtures & 2. spikeslab for multivariate (and univariate) response analysis, with nonparametric models for the means, the variances and the correlation matrix.

41

Bayesian Inference

boa (core)

Bayesian Output Analysis Program (BOA) for MCMC

A menudriven program and library of functions for carrying out convergence diagnostics and statistical and graphical analysis of Markov chain Monte Carlo sampling output.

42

Bayesian Inference

Bolstad

Functions for Elementary Bayesian Inference

A set of R functions and data sets for the book Introduction to Bayesian Statistics, Bolstad, W.M. (2017), John Wiley & Sons ISBN 9781118091562.

43

Bayesian Inference

Boom

Bayesian Object Oriented Modeling

A C++ library for Bayesian modeling, with an emphasis on Markov chain Monte Carlo. Although boom contains a few R utilities (mainly plotting functions), its primary purpose is to install the BOOM C++ library on your system so that other packages can link against it.

44

Bayesian Inference

BoomSpikeSlab

MCMC for Spike and Slab Regression

Spike and slab regression a la McCulloch and George (1997).

45

Bayesian Inference

bqtl

Bayesian QTL Mapping Toolkit

QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools.

46

Bayesian Inference

bridgesampling

Bridge Sampling for Marginal Likelihoods and Bayes Factors

Provides functions for estimating marginal likelihoods, Bayes factors, posterior model probabilities, and normalizing constants in general, via different versions of bridge sampling (Meng & Wong, 1996, <http://www3.stat.sinica.edu.tw/statistica/j6n4/j6n43/j6n43.htm>).

47

Bayesian Inference

brms

Bayesian Regression Models using ‘Stan’

Fit Bayesian generalized (non)linear multivariate multilevel models using ‘Stan’ for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit among others linear, robust linear, count data, survival, response times, ordinal, zeroinflated, hurdle, and even selfdefined mixture models all in a multilevel context. Further modeling options include nonlinear and smooth terms, autocorrelation structures, censored data, metaanalytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks and leaveoneout crossvalidation. References: Burkner (2017) <doi:10.18637/jss.v080.i01>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.

48

Bayesian Inference

bsamGP

Bayesian Spectral Analysis Models using Gaussian Process Priors

Contains functions to perform Bayesian inference using a spectral analysis of Gaussian process priors. Gaussian processes are represented with a Fourier series based on cosine basis functions. Currently the package includes parametric linear models, partial linear additive models with/without shape restrictions, generalized linear additive models with/without shape restrictions, and density estimation model. To maximize computational efficiency, the actual Markov chain Monte Carlo sampling for each model is done using codes written in FORTRAN 90. This software has been developed using funding supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (no. NRF2016R1D1A1B03932178 and no. NRF2017R1D1A3B03035235).

49

Bayesian Inference

bspec

Bayesian Spectral Inference

Bayesian inference on the (discrete) power spectrum of time series.

50

Bayesian Inference

bspmma

Bayesian Semiparametric Models for MetaAnalysis

The main functions carry out Gibbs’ sampler routines for nonparametric and semiparametric Bayesian models for random effects metaanalysis.

51

Bayesian Inference

bsts

Bayesian Structural Time Series

Time series regression using dynamic linear models fit using MCMC. See Scott and Varian (2014) <doi:10.1504/IJMMNO.2014.059942>, among many other sources.

52

Bayesian Inference

BVS

Bayesian Variant Selection: Bayesian Model Uncertainty Techniques for Genetic Association Studies

The functions in this package focus on analyzing casecontrol association studies involving a group of genetic variants. In particular, we are interested in modeling the outcome variable as a function of a multivariate genetic profile using Bayesian model uncertainty and variable selection techniques. The package incorporates functions to analyze data sets involving common variants as well as extensions to model rare variants via the Bayesian Risk Index (BRI) as well as haplotypes. Finally, the package also allows the incorporation of external biological information to inform the marginal inclusion probabilities via the iBMU.

53

Bayesian Inference

catnet

Categorical Bayesian Network Inference

Structure learning and parameter estimation of discrete Bayesian networks using likelihoodbased criteria. Exhaustive search for fixed node orders and stochastic search of optimal orders via simulated annealing algorithm are implemented.

54

Bayesian Inference

coalescentMCMC

MCMC Algorithms for the Coalescent

Flexible framework for coalescent analyses in R. It includes a main function running the MCMC algorithm, auxiliary functions for tree rearrangement, and some functions to compute population genetic parameters.

55

Bayesian Inference

coda (core)

Output Analysis and Diagnostics for MCMC

Provides functions for summarizing and plotting the output from Markov Chain Monte Carlo (MCMC) simulations, as well as diagnostic tests of convergence to the equilibrium distribution of the Markov chain.

56

Bayesian Inference

dclone

Data Cloning and MCMC Tools for Maximum Likelihood Methods

Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 (R Journal 2(2):2937). Sequential and parallel MCMC support for ‘JAGS’, ‘WinBUGS’, ‘OpenBUGS’, and ‘Stan’.

57

Bayesian Inference

deBInfer

Bayesian Inference for Differential Equations

A Bayesian framework for parameter inference in differential equations. This approach offers a rigorous methodology for parameter inference as well as modeling the link between unobservable model states and parameters, and observable quantities. Provides templates for the DE model, the observation model and data likelihood, and the model parameters and their prior distributions. A Markov chain Monte Carlo (MCMC) procedure processes these inputs to estimate the posterior distributions of the parameters and any derived quantities, including the model trajectories. Further functionality is provided to facilitate MCMC diagnostics and the visualisation of the posterior distributions of model parameters and trajectories.

58

Bayesian Inference

dlm

Bayesian and Likelihood Analysis of Dynamic Linear Models

Provides routines for Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models.

59

Bayesian Inference

EbayesThresh

Empirical Bayes Thresholding and Related Methods

Empirical Bayes thresholding using the methods developed by I. M. Johnstone and B. W. Silverman. The basic problem is to estimate a mean vector given a vector of observations of the mean vector plus white noise, taking advantage of possible sparsity in the mean vector. Within a Bayesian formulation, the elements of the mean vector are modelled as having, independently, a distribution that is a mixture of an atom of probability at zero and a suitable heavytailed distribution. The mixing parameter can be estimated by a marginal maximum likelihood approach. This leads to an adaptive thresholding approach on the original data. Extensions of the basic method, in particular to wavelet thresholding, are also implemented within the package.

60

Bayesian Inference

ebdbNet

Empirical Bayes Estimation of Dynamic Bayesian Networks

Infer the adjacency matrix of a network from time course data using an empirical Bayes estimation procedure based on Dynamic Bayesian Networks.

61

Bayesian Inference

eco

Ecological Inference in 2x2 Tables

Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 <doi:10.1093/pan/mpm017>) and (2011 <doi:10.18637/jss.v042.i05>) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the ExpectationMaximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individuallevel data can be directly incorporated into the estimation whenever such data are available. Along with insample and outofsample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.

62

Bayesian Inference

eigenmodel

Semiparametric Factor and Regression Models for Symmetric Relational Data

Estimation of the parameters in a model for symmetric relational data (e.g., the abovediagonal part of a square matrix), using a modelbased eigenvalue decomposition and regression. Missing data is accommodated, and a posterior mean for missing data is calculated under the assumption that the data are missing at random. The marginal distribution of the relational data can be arbitrary, and is fit with an ordered probit specification. See Hoff (2007) <arXiv:0711.1146> for details on the model.

63

Bayesian Inference

ensembleBMA

Probabilistic Forecasting using Ensembles and Bayesian Model Averaging

Bayesian Model Averaging to create probabilistic forecasts from ensemble forecasts and weather observations.

64

Bayesian Inference

EntropyMCMC

MCMC Simulation and Convergence Evaluation using Entropy and KullbackLeibler Divergence Estimation

Tools for Markov Chain Monte Carlo (MCMC) simulation and performance analysis. Simulate MCMC algorithms including adaptive MCMC, evaluate their convergence rate, and compare candidate MCMC algorithms for a same target density, based on entropy and KullbackLeibler divergence criteria. MCMC algorithms can be simulated using provided functions, or imported from external codes. This package is based upon work starting with Chauveau, D. and Vandekerkhove, P. (2013) <doi:10.1051/ps/2012004> and next articles.

65

Bayesian Inference

evdbayes

Bayesian Analysis in Extreme Value Theory

Provides functions for the bayesian analysis of extreme value models, using MCMC methods.

66

Bayesian Inference

exactLoglinTest

Monte Carlo Exact Tests for Loglinear models

Monte Carlo and MCMC goodness of fit tests for loglinear models

67

Bayesian Inference

factorQR

Bayesian quantile regression factor models

Package to fit Bayesian quantile regression models that assume a factor structure for at least part of the design matrix.

68

Bayesian Inference

FME

A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis

Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package ‘deSolve’, or a steadystate solver from package ‘rootSolve’. However, the methods can also be used with other types of functions.

69

Bayesian Inference

geoR

Analysis of Geostatistical Data

Geostatistical analysis including traditional, likelihoodbased and Bayesian methods.

70

Bayesian Inference

geoRglm

A Package for Generalised Linear Spatial Models

Functions for inference in generalised linear spatial models. The posterior and predictive inference is based on Markov chain Monte Carlo methods. Package geoRglm is an extension to the package geoR, which must be installed first.

71

Bayesian Inference

ggmcmc

Tools for Analyzing MCMC Simulations from Bayesian Inference

Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables.

72

Bayesian Inference

gRain

Graphical Independence Networks

Probability propagation in graphical independence networks, also known as Bayesian networks or probabilistic expert systems.

73

Bayesian Inference

hbsae

Hierarchical Bayesian Small Area Estimation

Functions to compute small area estimates based on a basic area or unitlevel model. The model is fit using restricted maximum likelihood, or in a hierarchical Bayesian way. In the latter case numerical integration is used to average over the posterior density for the betweenarea variance. The output includes the model fit, small area estimates and corresponding MSEs, as well as some model selection measures. Additional functions provide means to compute aggregate estimates and MSEs, to minimally adjust the small area estimates to benchmarks at a higher aggregation level, and to graphically compare different sets of small area estimates.

74

Bayesian Inference

HI

Simulation from distributions supported by nested hyperplanes

Simulation from distributions supported by nested hyperplanes, using the algorithm described in Petris & Tardella, “A geometric approach to transdimensional Markov chain Monte Carlo”, Canadian Journal of Statistics, v.31, n.4, (2003). Also random direction multivariate Adaptive Rejection Metropolis Sampling.

75

Bayesian Inference

Hmisc

Harrell Miscellaneous

Contains many functions useful for data analysis, highlevel graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.

76

Bayesian Inference

iterLap

Approximate Probability Densities by Iterated Laplace Approximations

The iterLap (iterated Laplace approximation) algorithm approximates a general (possibly nonnormalized) probability density on R^p, by repeated Laplace approximations to the difference between current approximation and true density (on log scale). The final approximation is a mixture of multivariate normal distributions and might be used for example as a proposal distribution for importance sampling (eg in Bayesian applications). The algorithm can be seen as a computational generalization of the Laplace approximation suitable for skew or multimodal densities.

77

Bayesian Inference

LaplacesDemon

Complete Environment for Bayesian Inference

Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview). The README describes the history of the package development process.

78

Bayesian Inference

LearnBayes

Functions for Learning Bayesian Inference

A collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.

79

Bayesian Inference

lme4

Linear MixedEffects Models using ‘Eigen’ and S4

Fit linear and generalized linear mixedeffects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

80

Bayesian Inference

lmm

Linear Mixed Models

It implements Expectation/Conditional Maximization Either (ECME) and rapidly converging algorithms as well as Bayesian inference for linear mixed models, which is described in Schafer, J.L. (1998) “Some improved procedures for linear mixed models”. Dept. of Statistics, The Pennsylvania State University.

81

Bayesian Inference

MasterBayes

ML and MCMC Methods for Pedigree Reconstruction and Analysis

The primary aim of MasterBayes is to use MCMC techniques to integrate over uncertainty in pedigree configurations estimated from molecular markers and phenotypic data. Emphasis is put on the marginal distribution of parameters that relate the phenotypic data to the pedigree. All simulation is done in compiled C++ for efficiency.

82

Bayesian Inference

matchingMarkets

Analysis of Stable Matchings

Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes onesided matching of agents into groups as well as twosided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.

83

Bayesian Inference

mcmc (core)

Markov Chain Monte Carlo

Simulates continuous distributions of random vectors using Markov chain Monte Carlo (MCMC). Users specify the distribution by an R function that evaluates the log unnormalized density. Algorithms are random walk Metropolis algorithm (function metrop), simulated tempering (function temper), and morphometric random walk Metropolis (Johnson and Geyer, 2012, <doi:10.1214/12AOS1048>, function morph.metrop), which achieves geometric ergodicity by change of variable.

84

Bayesian Inference

MCMCglmm

MCMC Generalised Linear Mixed Models

MCMC Generalised Linear Mixed Models.

85

Bayesian Inference

MCMCpack (core)

Markov Chain Monte Carlo (MCMC) Package

Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudorandom number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.

86

Bayesian Inference

MCMCvis

Tools to Visualize, Manipulate, and Summarize MCMC Output

Performs key functions for MCMC analysis using minimal code  visualizes, manipulates, and summarizes MCMC output. Functions support simple and straightforward subsetting of model parameters within the calls, and produce presentable and ‘publicationready’ output. MCMC output may be derived from Bayesian model output fit with JAGS, Stan, or other MCMC samplers.

87

Bayesian Inference

mgcv

Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.

88

Bayesian Inference

mlogitBMA

Bayesian Model Averaging for Multinomial Logit Models

Provides a modified function bic.glm of the BMA package that can be applied to multinomial logit (MNL) data. The data is converted to binary logit using the Begg & Gray approximation. The package also contains functions for maximum likelihood estimation of MNL.

89

Bayesian Inference

MNP

R Package for Fitting the Multinomial Probit Model

Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 132. <doi:10.18637/jss.v014.i03>.

90

Bayesian Inference

mombf

Bayesian Model Selection and Averaging for NonLocal and Local Priors

Bayesian model selection and averaging for regression and mixtures for nonlocal and selected local priors.

91

Bayesian Inference

monomvn

Estimation for Multivariate Normal and Studentt Data with Monotone Missingness

Estimation of multivariate normal and studentt data of arbitrary dimension where the pattern of missing data is monotone. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scalemixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), NormalGamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and studentt errors (from Geweke) is also provided.

92

Bayesian Inference

NetworkChange

Bayesian Package for Network Changepoint Analysis

Network changepoint analysis for undirected network data. The package implements a hidden Markov network change point model (Park and Sohn 2019). Functions for break number detection using the approximate marginal likelihood and WAIC are also provided.

93

Bayesian Inference

nimble (core)

MCMC, Particle Filtering, and Programmable Hierarchical Modeling

A system for writing hierarchical statistical models largely compatible with ‘BUGS’ and ‘JAGS’, writing nimbleFunctions to operate models and do basic Rstyle math, and compiling both models and nimbleFunctions via customgenerated C++. ‘NIMBLE’ includes default methods for MCMC, particle filtering, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers ‘NIMBLE’ provides. ‘NIMBLE’ extends the ‘BUGS’/‘JAGS’ language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the ‘BUGS’/‘JAGS’ language for writing models, one can use ‘NIMBLE’ for writing arbitrary other kinds of modelgeneric algorithms as well. A full User Manual is available at <https://rnimble.org>.

94

Bayesian Inference

openEBGM

EBGM Disproportionality Scores for Adverse Event Data Mining

An implementation of DuMouchel’s (1999) <doi:10.1080/00031305.1999.10474456> Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and quantile scores from the posterior distribution using the GammaPoisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the cooccurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the ‘PhViD’ package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency. Now includes an implementation of the EM algorithm for hyperparameter estimation loosely derived from the ‘mederrRank’ package.

95

Bayesian Inference

pacbpred

PACBayesian Estimation and Prediction in Sparse Additive Models

This package is intended to perform estimation and prediction in highdimensional additive models, using a sparse PACBayesian point of view and a MCMC algorithm. The method is fully described in Guedj and Alquier (2013), ‘PACBayesian Estimation and Prediction in Sparse Additive Models’, Electronic Journal of Statistics, 7, 264291.

96

Bayesian Inference

PAWL

Implementation of the PAWL algorithm

Implementation of the Parallel Adaptive WangLandau algorithm. Also implemented for comparison: parallel adaptive MetropolisHastings,SMC sampler.

97

Bayesian Inference

pcFactorStan

Stan Models for the Pairwise Comparison Factor Model

Provides convenience functions and preprogrammed Stan models related to the paired comparison factor model. Its purpose is to make fitting paired comparison data using Stan easy.

98

Bayesian Inference

plotMCMC

MCMC Diagnostic Plots

Markov chain Monte Carlo diagnostic plots. The purpose of the package is to combine existing tools from the ‘coda’ and ‘lattice’ packages, and make it easy to adjust graphical details.

99

Bayesian Inference

predmixcor

Classification rule based on Bayesian mixture models with feature selection bias corrected

“train_predict_mix” predicts the binary response with binary features

100

Bayesian Inference

PReMiuM

Dirichlet Process Bayesian Clustering, Profile Regression

Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for postprocessing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. The main reference for the package is Liverani, Hastie, Azizi, Papathomas and Richardson (2015) <doi:10.18637/jss.v064.i07>.

101

Bayesian Inference

prevalence

Tools for Prevalence Assessment Studies

The prevalence package provides Frequentist and Bayesian methods for prevalence assessment studies. IMPORTANT: the truePrev functions in the prevalence package call on JAGS (Just Another Gibbs Sampler), which therefore has to be available on the user’s system. JAGS can be downloaded from http://mcmcjags.sourceforge.net/.

102

Bayesian Inference

profdpm

Profile Dirichlet Process Mixtures

This package facilitates profile inference (inference at the posterior mode) for a class of product partition models (PPM). The Dirichlet process mixture is currently the only available member of this class. These methods search for the maximum posterior (MAP) estimate for the data partition in a PPM.

103

Bayesian Inference

pscl

Political Science Computational Laboratory

Bayesian analysis of itemresponse theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zeroinflated and hurdle models for count data; goodnessoffit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seatsvotes curves.

104

Bayesian Inference

R2BayesX

Estimate Structured Additive Regression Models with ‘BayesX’

An R interface to estimate structured additive regression (STAR) models with ‘BayesX’.

105

Bayesian Inference

R2jags

Using R to Run ‘JAGS’

Providing wrapper functions to implement Bayesian analysis in JAGS. Some major features include monitoring convergence of a MCMC model using Rubin and Gelman Rhat statistics, automatically running a MCMC model till it converges, and implementing parallel processing of a MCMC model for multiple chains.

106

Bayesian Inference

R2WinBUGS

Running ‘WinBUGS’ and ‘OpenBUGS’ from ‘R’ / ‘SPLUS’

Invoke a ‘BUGS’ model in ‘OpenBUGS’ or ‘WinBUGS’, a class “bugs” for ‘BUGS’ results and functions to work with that class. Function write.model() allows a ‘BUGS’ model file to be written. The class and auxiliary functions could be used with other MCMC programs, including ‘JAGS’.

107

Bayesian Inference

ramps

Bayesian Geostatistical Modeling with RAMPS

Bayesian geostatistical modeling of Gaussian processes using a reparameterized and marginalized posterior sampling (RAMPS) algorithm designed to lower autocorrelation in MCMC samples. Package performance is tuned for large spatial datasets.

108

Bayesian Inference

revdbayes

RatioofUniforms Sampling for Bayesian Extreme Value Analysis

Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.rproject.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.rproject.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the Kgaps model of Suveges and Davison (2010) <doi:10.1214/09AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.

109

Bayesian Inference

RJaCGH

Reversible Jump MCMC for the Analysis of CGH Arrays

Bayesian analysis of CGH microarrays fitting Hidden Markov Chain models. The selection of the number of states is made via their posterior probability computed by Reversible Jump Markov Chain Monte Carlo Methods. Also returns probabilistic common regions for gains/losses.

110

Bayesian Inference

rjags

Bayesian Graphical Models using MCMC

Interface to the JAGS MCMC library.

111

Bayesian Inference

RSGHB

Functions for Hierarchical Bayesian Estimation: A Flexible Approach

Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (nonvarying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative lognormal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with nonnormal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.

112

Bayesian Inference

RSGHB

Functions for Hierarchical Bayesian Estimation: A Flexible Approach

Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (nonvarying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative lognormal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with nonnormal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.

113

Bayesian Inference

rstan

R Interface to Stan

Userfacing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the headeronly Stan library provided by the ‘StanHeaders’ package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via ‘variational’ approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.

114

Bayesian Inference

rstiefel

Random Orthonormal Matrix Generation and Optimization on the Stiefel Manifold

Simulation of random orthonormal matrices from linear and quadratic exponential family distributions on the Stiefel manifold. The most general type of distribution covered is the matrixvariate Binghamvon MisesFisher distribution. Most of the simulation methods are presented in Hoff(2009) “Simulation of the Matrix Binghamvon MisesFisher Distribution, With Applications to Multivariate and Relational Data” <doi:10.1198/jcgs.2009.07177>. The package also includes functions for optimization on the Stiefel manifold based on algorithms described in Wen and Yin (2013) “A feasible method for optimization with orthogonality constraints” <doi:10.1007/s1010701205841>.

115

Bayesian Inference

runjags

Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS

Userfriendly interface utilities for MCMC models via Just Another Gibbs Sampler (JAGS), facilitating the use of parallel (or distributed) processors for multiple chains, automated control of convergence and sample length diagnostics, and evaluation of the performance of a model using dropk validation or against simulated data. Template model specifications can be generated using a standard lme4style formula interface to assist users less familiar with the BUGS syntax. A JAGS extension module provides additional distributions including the Pareto family of distributions, the DuMouchel prior and the halfCauchy prior.

116

Bayesian Inference

Runuran

R Interface to the ‘UNU.RAN’ Random Variate Generators

Interface to the ‘UNU.RAN’ library for Universal NonUniform RANdom variate generators. Thus it allows to build nonuniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.

117

Bayesian Inference

RxCEcolInf

‘R x C Ecological Inference With Optional Incorporation of Survey Information’

Fits the R x C inference model described in Greiner and Quinn (2009). Allows incorporation of survey results.

118

Bayesian Inference

SamplerCompare

A Framework for Comparing the Performance of MCMC Samplers

A framework for running sets of MCMC samplers on sets of distributions with a variety of tuning parameters, along with plotting functions to visualize the results of those simulations.

119

Bayesian Inference

SampleSizeMeans

Sample size calculations for normal means

A set of R functions for calculating sample size requirements using three different Bayesian criteria in the context of designing an experiment to estimate a normal mean or the difference between two normal means. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of normal means are provided. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided.

120

Bayesian Inference

SampleSizeProportions

Calculating sample size requirements when estimating the difference between two binomial proportions

A set of R functions for calculating sample size requirements using three different Bayesian criteria in the context of designing an experiment to estimate the difference between two binomial proportions. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of binomial observations are provided. In all cases, estimation of the difference between two binomial proportions is considered. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided.

121

Bayesian Inference

sbgcop

Semiparametric Bayesian Gaussian Copula Estimation and Imputation

Estimation and inference for parameters in a Gaussian copula model, treating the univariate marginal distributions as nuisance parameters as described in Hoff (2007) <doi:10.1214/07AOAS107>. This package also provides a semiparametric imputation procedure for missing multivariate data.

122

Bayesian Inference

SimpleTable

Bayesian Inference and Sensitivity Analysis for Causal Effects from 2 x 2 and 2 x 2 x K Tables in the Presence of Unmeasured Confounding

SimpleTable provides a series of methods to conduct Bayesian inference and sensitivity analysis for causal effects from 2 x 2 and 2 x 2 x K tables when unmeasured confounding is present or suspected.

123

Bayesian Inference

sna

Tools for Social Network Analysis

A range of tools for social network analysis, including node and graphlevel indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.

124

Bayesian Inference

spBayes

Univariate and Multivariate SpatialTemporal Modeling

Fits univariate and multivariate spatiotemporal random effects models for pointreferenced data using Markov chain Monte Carlo (MCMC). Details are given in Finley, Banerjee, and Gelfand (2015) <doi:10.18637/jss.v063.i13> and Finley, Banerjee, and Cook (2014) <doi:10.1111/2041210X.12189>.

125

Bayesian Inference

spikeslab

Prediction and variable selection using spike and slab regression

Spike and slab for prediction and variable selection in linear regression models. Uses a generalized elastic net for variable selection.

126

Bayesian Inference

spikeSlabGAM

Bayesian Variable Selection and Model Choice for Generalized Additive Mixed Models

Bayesian variable selection, model choice, and regularized estimation for (spatial) generalized additive mixed regression models via stochastic search variable selection with spikeandslab priors.

127

Bayesian Inference

spTimer

SpatioTemporal Bayesian Modelling

Fits, spatially predicts and temporally forecasts large amounts of spacetime data using [1] Bayesian Gaussian Process (GP) Models, [2] Bayesian AutoRegressive (AR) Models, and [3] Bayesian Gaussian Predictive Processes (GPP) based AR Models for spatiotemporal bign problems. Bakar and Sahu (2015) <doi:10.18637/jss.v063.i15>.

128

Bayesian Inference

ssgraph

Bayesian Graphical Estimation using SpikeandSlab Priors

Bayesian estimation for undirected graphical models using spikeandslab priors. The package handles continuous, discrete, and mixed data. To speed up the computations, the computationally intensive tasks of the package are implemented in C++ in parallel using OpenMP.

129

Bayesian Inference

ssMousetrack

Bayesian StateSpace Modeling of MouseTracking Experiments via Stan

Estimates previously compiled statespace modeling for mousetracking experiments using the ‘rstan’ package, which provides the R interface to the Stan C++ library for Bayesian estimation.

130

Bayesian Inference

stochvol

Efficient Bayesian Inference for Stochastic Volatility (SV) Models

Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods. Methodological details are given in Kastner and FruhwirthSchnatter (2014) <doi:10.1016/j.csda.2013.01.002>; the most common use cases are described in Kastner (2016) <doi:10.18637/jss.v069.i05>. Also incorporates SV with leverage.

131

Bayesian Inference

tgp

Bayesian Treed Gaussian Process Models

Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP singleindex models. Provides 1d and 2d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgpclass output. Sensitivity analysis and multiresolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivativefree optimization of noisy blackbox functions.

132

Bayesian Inference

zic

Bayesian Inference for ZeroInflated Count Models

Provides MCMC algorithms for the analysis of zeroinflated count models. The case of stochastic search variable selection (SVS) is also considered. All MCMC samplers are coded in C++ for improved efficiency. A data set considering the demand for health care is provided.

133

Chemometrics and Computational Physics

ALS (core)

Multivariate Curve Resolution Alternating Least Squares (MCRALS)

Alternating least squares is often used to resolve components contributing to data with a bilinear structure; the basic technique may be extended to alternating constrained least squares. Commonly applied constraints include unimodality, nonnegativity, and normalization of components. Several data matrices may be decomposed simultaneously by assuming that one of the two matrices in the bilinear decomposition is shared between datasets.

134

Chemometrics and Computational Physics

AnalyzeFMRI

Functions for Analysis of fMRI Datasets Stored in the ANALYZE or NIFTI Format

Functions for I/O, visualisation and analysis of functional Magnetic Resonance Imaging (fMRI) datasets stored in the ANALYZE or NIFTI format. Note that the latest version of XQuartz seems to be necessary under MacOS.

135

Chemometrics and Computational Physics

AquaEnv

Integrated Development Toolbox for Aquatic Chemical Model Generation

Toolbox for the experimental aquatic chemist, focused on acidification and CO2 airwater exchange. It contains all elements to model the pH, the related CO2 airwater exchange, and aquatic acidbase chemistry for an arbitrary marine, estuarine or freshwater system. It contains a suite of tools for sensitivity analysis, visualisation, modelling of chemical batches, and can be used to build dynamic models of aquatic systems. As from version 1.04, it also contains functions to calculate the buffer factors.

136

Chemometrics and Computational Physics

astro

Astronomy Functions, Tools and Routines

The astro package provides a series of functions, tools and routines in everyday use within astronomy. Broadly speaking, one may group these functions into 7 main areas, namely: cosmology, FITS file manipulation, the Sersic function, plotting, data manipulation, statistics and general convenience functions and scripting tools.

137

Chemometrics and Computational Physics

astrochron

A Computational Tool for Astrochronology

Routines for astrochronologic testing, astronomical time scale construction, and time series analysis. Also included are a range of statistical analysis and modeling routines that are relevant to time scale development and paleoclimate analysis.

138

Chemometrics and Computational Physics

astrodatR

Astronomical Data

A collection of 19 datasets from contemporary astronomical research. They are described the textbook ‘Modern Statistical Methods for Astronomy with R Applications’ by Eric D. Feigelson and G. Jogesh Babu (Cambridge University Press, 2012, Appendix C) or on the website of Penn State’s Center for Astrostatistics (http://astrostatistics.psu.edu/datasets). These datasets can be used to exercise methodology involving: density estimation; heteroscedastic measurement errors; contingency tables; twosample hypothesis tests; spatial point processes; nonlinear regression; mixture models; censoring and truncation; multivariate analysis; classification and clustering; inhomogeneous Poisson processes; periodic and stochastic time series analysis.

139

Chemometrics and Computational Physics

astroFns

Astronomy: time and position functions, misc. utilities

Miscellaneous astronomy functions, utilities, and data.

140

Chemometrics and Computational Physics

astrolibR

Astronomy Users Library

Several dozen lowlevel utilities and codes from the Interactive Data Language (IDL) Astronomy Users Library (http://idlastro.gsfc.nasa.gov) are implemented in R. They treat: time, coordinate and proper motion transformations; terrestrial precession and nutation, atmospheric refraction and aberration, barycentric corrections, and related effects; utilities for astrometry, photometry, and spectroscopy; and utilities for planetary, stellar, Galactic, and extragalactic science.

141

Chemometrics and Computational Physics

ATmet

Advanced Tools for Metrology

This package provides functions for smart sampling and sensitivity analysis for metrology applications, including computationally expensive problems.

142

Chemometrics and Computational Physics

Bchron

Radiocarbon Dating, AgeDepth Modelling, Relative Sea Level Rate Estimation, and NonParametric Phase Modelling

Enables quick calibration of radiocarbon dates under various calibration curves (including user generated ones); agedepth modelling as per the algorithm of Haslett and Parnell (2008) <doi:10.1111/j.14679876.2008.00623.x>; Relative sea level rate estimation incorporating time uncertainty in polynomial regression models (Parnell and Gehrels 2015) <doi:10.1002/9781118452547.ch32>; nonparametric phase modelling via Gaussian mixtures as a means to determine the activity of a site (and as an alternative to the Oxcal function SUM; currently unpublished), and reverse calibration of dates from calibrated into uncalibrated years (also unpublished).

143

Chemometrics and Computational Physics

BioMark

Find Biomarkers in TwoClass Discrimination Problems

Variable selection methods are provided for several classification methods: the lasso/elastic net, PCLDA, PLSDA, and several ttests. Two approaches for selecting cutoffs can be used, one based on the stability of model coefficients under perturbation, and the other on higher criticism.

144

Chemometrics and Computational Physics

bvls

The StarkParker algorithm for boundedvariable least squares

An R interface to the StarkParker implementation of an algorithm for boundedvariable least squares

145

Chemometrics and Computational Physics

celestial

Collection of Common Astronomical Conversion Routines and Functions

Contains a number of common astronomy conversion routines, particularly the HMS and degrees schemes, which can be fiddly to convert between on mass due to the textural nature of the former. It allows users to coordinate match datasets quickly. It also contains functions for various cosmological calculations.

146

Chemometrics and Computational Physics

chemCal (core)

Calibration Functions for Analytical Chemistry

Simple functions for plotting linear calibration functions and estimating standard errors for measurements according to the Handbook of Chemometrics and Qualimetrics: Part A by Massart et al. There are also functions estimating the limit of detection (LOD) and limit of quantification (LOQ). The functions work on model objects from  optionally weighted  linear regression (lm) or robust linear regression (‘rlm’ from the ‘MASS’ package).

147

Chemometrics and Computational Physics

chemometrics

Multivariate Statistical Analysis in Chemometrics

R companion to the book “Introduction to Multivariate Statistical Analysis in Chemometrics” written by K. Varmuza and P. Filzmoser (2009).

148

Chemometrics and Computational Physics

ChemometricsWithR

Chemometrics with R  Multivariate Data Analysis in the Natural Sciences and Life Sciences

Functions and scripts used in the book “Chemometrics with R  Multivariate Data Analysis in the Natural Sciences and Life Sciences” by Ron Wehrens, Springer (2011). Data used in the package are available from github.

149

Chemometrics and Computational Physics

ChemoSpec

Exploratory Chemometrics for Spectroscopy

A collection of functions for topdown exploratory data analysis of spectral data including nuclear magnetic resonance (NMR), infrared (IR), Raman, Xray fluorescence (XRF) and other similar types of spectroscopy. Includes functions for plotting and inspecting spectra, peak alignment, hierarchical cluster analysis (HCA), principal components analysis (PCA) and modelbased clustering. Robust methods appropriate for this type of highdimensional data are available. ChemoSpec is designed for structured experiments, such as metabolomics investigations, where the samples fall into treatment and control groups. Graphical output is formatted consistently for publication quality plots. ChemoSpec is intended to be very user friendly and to help you get usable results quickly. A vignette covering typical operations is available.

150

Chemometrics and Computational Physics

ChemoSpec2D

Exploratory Chemometrics for 2D Spectroscopy

A collection of functions for exploratory chemometrics of 2D spectroscopic data sets such as COSY (correlated spectroscopy) and HSQC (heteronuclear single quantum coherence) 2D NMR (nuclear magnetic resonance) spectra. ‘ChemoSpec2D’ deploys methods aimed primarily at classification of samples and the identification of spectral features which are important in distinguishing samples from each other. Each 2D spectrum (a matrix) is treated as the unit of observation, and thus the physical sample in the spectrometer corresponds to the sample from a statistical perspective. In addition to chemometric tools, a few tools are provided for plotting 2D spectra, but these are not intended to replace the functionality typically available on the spectrometer. ‘ChemoSpec2D’ takes many of its cues from ‘ChemoSpec’ and tries to create consistent graphical output and to be very user friendly.

151

Chemometrics and Computational Physics

CHNOSZ

Thermodynamic Calculations and Diagrams for Geochemistry

An integrated set of tools for thermodynamic calculations in aqueous geochemistry and geobiochemistry. Functions are provided for writing balanced reactions to form species from userselected basis species and for calculating the standard molal properties of species and reactions, including the standard Gibbs energy and equilibrium constant. Calculations of the nonequilibrium chemical affinity and equilibrium chemical activity of species can be portrayed on diagrams as a function of temperature, pressure, or activity of basis species; in two dimensions, this gives a maximum affinity or predominance diagram. The diagrams have formatted chemical formulas and axis labels, and water stability limits can be added to EhpH, oxygen fugacity temperature, and other diagrams with a redox variable. The package has been developed to handle common calculations in aqueous geochemistry, such as solubility due to complexation of metal ions, mineral buffers of redox or pH, and changing the basis species across a diagram (“mosaic diagrams”). CHNOSZ also has unique capabilities for comparing the compositional and thermodynamic properties of different proteins.

152

Chemometrics and Computational Physics

clustvarsel

Variable Selection for Gaussian ModelBased Clustering

Variable selection for Gaussian modelbased clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forwardbackward or backwardforward direction, with or without subsampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.

153

Chemometrics and Computational Physics

compositions

Compositional Data Analysis

Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. PawlowskyGlahn.

154

Chemometrics and Computational Physics

constants

Reference on Constants, Units and Uncertainty

CODATA internationally recommended values of the fundamental physical constants, provided as symbols for direct use within the R language. Optionally, the values with errors and/or the values with units are also provided if the ‘errors’ and/or the ‘units’ packages are installed. The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee of the International Council for Science which periodically provides the internationally accepted set of values of the fundamental physical constants. This package contains the “2014 CODATA” version, published on 25 June 2015: Mohr, P. J., Newell, D. B. and Taylor, B. N. (2016) <doi:10.1103/RevModPhys.88.035009>, <doi:10.1063/1.4954402>.

155

Chemometrics and Computational Physics

cosmoFns

Functions for cosmological distances, times, luminosities, etc

Package encapsulates standard expressions for distances, times, luminosities, and other quantities useful in observational cosmology, including molecular line observations. Currently coded for a flat universe only.

156

Chemometrics and Computational Physics

CRAC

Cosmology R Analysis Code

R functions for cosmological research. The main functions are similar to the python library, cosmolopy.

157

Chemometrics and Computational Physics

dielectric

Defines some physical constants and dielectric functions commonly used in optics, plasmonics

Physical constants. Gold, silver and glass permittivities, together with spline interpolation functions.

158

Chemometrics and Computational Physics

drc

Analysis of DoseResponse Curves

Analysis of doseresponse data is made available through a suite of flexible and versatile model fitting and afterfitting functions.

159

Chemometrics and Computational Physics

eChem

Simulations for Electrochemistry Experiments

Simulates cyclic voltammetry, linearsweep voltammetry (both with and without stirring of the solution), and singlepulse and doublepulse chronoamperometry and chronocoulometry experiments using the implicit finite difference method outlined in Gosser (1993, ISBN: 9781560810261) and in Brown (2015) <doi:10.1021/acs.jchemed.5b00225>. Additional functions provide ways to display and to examine the results of these simulations. The primary purpose of this package is to provide tools for use in courses in analytical chemistry.

160

Chemometrics and Computational Physics

EEM

Read and Preprocess Fluorescence ExcitationEmission Matrix (EEM) Data

Read raw EEM data and prepares them for further analysis.

161

Chemometrics and Computational Physics

elasticnet

ElasticNet for Sparse Estimation and Sparse PCA

Provides functions for fitting the entire solution path of the ElasticNet and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 200510.

162

Chemometrics and Computational Physics

enpls

Ensemble Partial Least Squares Regression

An algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.

163

Chemometrics and Computational Physics

errors

Uncertainty Propagation for R Vectors

Support for measurement errors in R vectors, matrices and arrays: automatic uncertainty propagation and reporting. Documentation about ‘errors’ is provided in the paper by Ucar, Pebesma & Azcorra (2018, <doi:10.32614/RJ2018075>), included in this package as a vignette; see ‘citation(“errors”)’ for details.

164

Chemometrics and Computational Physics

fastICA

FastICA Algorithms to Perform ICA and Projection Pursuit

Implementation of FastICA algorithm to perform Independent Component Analysis (ICA) and Projection Pursuit.

165

Chemometrics and Computational Physics

fingerprint

Functions to Operate on Binary Fingerprint Data

Functions to manipulate binary fingerprints of arbitrary length. A fingerprint is represented by an object of S4 class ‘fingerprint’ which is internally represented a vector of integers, such that each element represents the position in the fingerprint that is set to 1. The bitwise logical functions in R are overridden so that they can be used directly with ‘fingerprint’ objects. A number of distance metrics are also available (many contributed by Michael Fadock). Fingerprints can be converted to Euclidean vectors (i.e., points on the unit hypersphere) and can also be folded using OR. Arbitrary fingerprint formats can be handled via line handlers. Currently handlers are provided for CDK, MOE and BCI fingerprint data.

166

Chemometrics and Computational Physics

FITSio

FITS (Flexible Image Transport System) Utilities

Utilities to read and write files in the FITS (Flexible Image Transport System) format, a standard format in astronomy (see e.g. <https://en.wikipedia.org/wiki/FITS> for more information). Present lowlevel routines allow: reading, parsing, and modifying FITS headers; reading FITS images (multidimensional arrays); reading FITS binary and ASCII tables; and writing FITS images (multidimensional arrays). Higherlevel functions allow: reading files composed of one or more headers and a single (perhaps multidimensional) image or single table; reading tables into data frames; generating vectors for image array axes; scaling and writing images as 16bit integers. Known incompletenesses are reading random group extensions, as well as bit, complex, and array descriptor data types in binary tables.

167

Chemometrics and Computational Physics

fmri

Analysis of fMRI Experiments

Contains Rfunctions to perform an fMRI analysis as described in Tabelow et al. (2006) <doi:10.1016/j.neuroimage.2006.06.029>, Polzehl et al. (2010) <doi:10.1016/j.neuroimage.2010.04.241>, Tabelow and Polzehl (2011) <doi:10.18637/jss.v044.i11>.

168

Chemometrics and Computational Physics

fpca

Restricted MLE for Functional Principal Components Analysis

A geometric approach to MLE for functional principal components

169

Chemometrics and Computational Physics

FTICRMS

Programs for Analyzing Fourier TransformIon Cyclotron Resonance Mass Spectrometry Data

This package was developed partially with funding from the NIH Training Program in Biomolecular Technology (2T32GM08799).

170

Chemometrics and Computational Physics

homals

Gifi Methods for Optimal Scaling

Performs a homogeneity analysis (multiple correspondence analysis) and various extensions. Rank restrictions on the category quantifications can be imposed (nonlinear PCA). The categories are transformed by means of optimal scaling with options for nominal, ordinal, and numerical scale levels (for rank1 restrictions). Variables can be grouped into sets, in order to emulate regression analysis and canonical correlation analysis.

171

Chemometrics and Computational Physics

hyperSpec

Work with Hyperspectral Data, i.e. Spectra + Meta Information (Spatial, Time, Concentration, …)

Comfortable ways to work with hyperspectral data sets. I.e. spatially or timeresolved spectra, or spectra with any other kind of information associated with each of the spectra. The spectra can be data as obtained in XRF, UV/VIS, Fluorescence, AES, NIR, IR, Raman, NMR, MS, etc. More generally, any data that is recorded over a discretized variable, e.g. absorbance = f (wavelength), stored as a vector of absorbance values for discrete wavelengths is suitable.

172

Chemometrics and Computational Physics

investr

Inverse Estimation/Calibration Functions

Functions to facilitate inverse estimation (e.g., calibration) in linear, generalized linear, nonlinear, and (linear) mixedeffects models. A generic function is also provided for plotting fitted regression models with or without confidence/prediction bands that may be of use to the general user.

173

Chemometrics and Computational Physics

Iso (core)

Functions to Perform Isotonic Regression

Linear order and unimodal order (univariate) isotonic regression; bivariate isotonic regression with linear order on both variables.

174

Chemometrics and Computational Physics

kohonen (core)

Supervised and Unsupervised SelfOrganising Maps

Functions to train selforganising maps (SOMs). Also interrogation of the maps and prediction using trained maps are supported. The name of the package refers to Teuvo Kohonen, the inventor of the SOM.

175

Chemometrics and Computational Physics

leaps

Regression Subset Selection

Regression subset selection, including exhaustive search.

176

Chemometrics and Computational Physics

lira

LInear Regression in Astronomy

Performs Bayesian linear regression and forecasting in astronomy. The method accounts for heteroscedastic errors in both the independent and the dependent variables, intrinsic scatters (in both variables) and scatter correlation, time evolution of slopes, normalization, scatters, Malmquist and Eddington bias, upper limits and break of linearity. The posterior distribution of the regression parameters is sampled with a Gibbs method exploiting the JAGS library.

177

Chemometrics and Computational Physics

lspls

LSPLS Models

Implements the LSPLS (least squares  partial least squares) method described in for instance Jorgensen, K., Segtnan, V. H., Thyholt, K., Nas, T. (2004) “A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables” Journal of Chemometrics, 18(10), 451464, <doi:10.1002/cem.890>.

178

Chemometrics and Computational Physics

MALDIquant

Quantitative Analysis of Mass Spectrometry Data

A complete analysis pipeline for matrixassisted laser desorption/ionizationtimeofflight (MALDITOF) and other twodimensional mass spectrometry data. In addition to commonly used plotting and processing methods it includes distinctive features, namely baseline subtraction methods such as morphological filters (TopHat) or the statisticssensitive nonlinear iterative peakclipping algorithm (SNIP), peak alignment using warping functions, handling of replicated measurements as well as allowing spectra with different resolutions.

179

Chemometrics and Computational Physics

MALDIrppa

MALDI Mass Spectrometry Data Robust PreProcessing and Analysis

Provides methods for quality control and robust preprocessing and analysis of MALDI mass spectrometry data.

180

Chemometrics and Computational Physics

measurements

Tools for Units of Measurement

Collection of tools to make working with physical measurements easier. Convert between metric and imperial units, or calculate a dimension’s unknown value from other dimensions’ measurements.

181

Chemometrics and Computational Physics

metRology

Support for Metrological Applications

Provides classes and calculation and plotting functions for metrology applications, including measurement uncertainty estimation and interlaboratory metrology comparison studies.

182

Chemometrics and Computational Physics

minpack.lm

R Interface to the LevenbergMarquardt Nonlinear LeastSquares Algorithm Found in MINPACK, Plus Support for Bounds

The nls.lm function provides an R interface to lmder and lmdif from the MINPACK library, for solving nonlinear leastsquares problems by a modification of the LevenbergMarquardt algorithm, with support for lower and upper parameter bounds. The implementation can be used via nlslike calls using the nlsLM function.

183

Chemometrics and Computational Physics

NISTunits

Fundamental Physical Constants and Unit Conversions from NIST

Fundamental physical constants (Quantity, Value, Uncertainty, Unit) for SI (International System of Units) and nonSI units, plus unit conversions Based on the data from NIST (National Institute of Standards and Technology, USA)

184

Chemometrics and Computational Physics

nlme

Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixedeffects models.

185

Chemometrics and Computational Physics

nlreg

Higher Order Inference for Nonlinear Heteroscedastic Models

Likelihood inference based on higher order approximations for nonlinear models with possibly non constant variance.

186

Chemometrics and Computational Physics

nnls (core)

The LawsonHanson algorithm for nonnegative least squares (NNLS)

An R interface to the LawsonHanson implementation of an algorithm for nonnegative least squares (NNLS). Also allows the combination of nonnegative and nonpositive constraints.

187

Chemometrics and Computational Physics

OrgMassSpecR

Organic Mass Spectrometry

Organic/biological mass spectrometry data analysis.

188

Chemometrics and Computational Physics

pcaPP

Robust PCA by Projection Pursuit

Provides functions for robust PCA by projection pursuit. The methods are described in Croux et al. (2006) <doi:10.2139/ssrn.968376>, Croux et al. (2013) <doi:10.1080/00401706.2012.727746>, Todorov and Filzmoser (2013) <doi:10.1007/9783642330421_31>.

189

Chemometrics and Computational Physics

PET

Simulation and Reconstruction of PET Images

Implementation of different analytic/direct and iterative reconstruction methods of radon transformed data such as PET data. It also offer the possibility to simulate PET data.

190

Chemometrics and Computational Physics

planar

Multilayer Optics

Solves the electromagnetic problem of reflection and transmission at a planar multilayer interface. Also computed are the decay rates and emission profile for a dipolar emitter.

191

Chemometrics and Computational Physics

pls (core)

Partial Least Squares and Principal Component Regression

Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).

192

Chemometrics and Computational Physics

plspm

Tools for Partial Least Squares Path Modeling (PLSPM)

Partial Least Squares Path Modeling (PLSPM) analysis for both metric and nonmetric data, as well as REBUS analysis.

193

Chemometrics and Computational Physics

ppls

Penalized Partial Least Squares

Contains linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via crossvalidation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.

194

Chemometrics and Computational Physics

prospectr

Miscellaneous functions for processing and sample selection of visNIR diffuse reflectance data

The package provides functions for pretreatment and sample selection of visible and near infrared diffuse reflectance spectra

195

Chemometrics and Computational Physics

psy

Various procedures used in psychometry

Kappa, ICC, Cronbach alpha, screeplot, mtmm

196

Chemometrics and Computational Physics

PTAk (core)

Principal Tensor Analysis on k Modes

A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting nonidentity metrics and penalisations. 2way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tuckern) and PARAFAC/CANDECOMP with these extensions.

197

Chemometrics and Computational Physics

rcdk

Interface to the ‘CDK’ Libraries

Allows the user to access functionality in the ‘CDK’, a Java framework for chemoinformatics. This allows the user to load molecules, evaluate fingerprints, calculate molecular descriptors and so on. In addition, the ‘CDK’ API allows the user to view structures in 2D.

198

Chemometrics and Computational Physics

rcdklibs

The CDK Libraries Packaged for R

An R interface to the Chemistry Development Kit, a Java library for chemoinformatics. Given the size of the library itself, this package is not expected to change very frequently. To make use of the CDK within R, it is suggested that you use the ‘rcdk’ package. Note that it is possible to directly interact with the CDK using ‘rJava’. However ‘rcdk’ exposes functionality in a more idiomatic way. The CDK library itself is released as LGPL and the sources can be obtained from <https://github.com/cdk/cdk>.

199

Chemometrics and Computational Physics

represent

Determine the representativity of two multidimensional data sets

Contains workhorse function jrparams(), as well as two helper functions Mboxtest() and JRsMahaldist(), and four example data sets.

200

Chemometrics and Computational Physics

resemble

Regression and Similarity Evaluation for MemoryBased Learning in Spectral Chemometrics

Implementation of functions for spectral similarity/dissimilarity analysis and memorybased learning (MBL) for nonlinear modeling in complex spectral datasets. In chemometrics MBL is also known as local modeling.

201

Chemometrics and Computational Physics

RobPer

Robust Periodogram and Periodicity Detection Methods

Calculates periodograms based on (robustly) fitting periodic functions to light curves (irregularly observed time series, possibly with measurement accuracies, occurring in astroparticle physics). Three main functions are included: RobPer() calculates the periodogram. Outlying periodogram bars (indicating a period) can be detected with betaCvMfit(). Artificial light curves can be generated using the function tsgen(). For more details see the corresponding article: Thieler, Fried and Rathjens (2016), Journal of Statistical Software 69(9), 136, <doi:10.18637/jss.v069.i09>.

202

Chemometrics and Computational Physics

rpubchem

An Interface to the PubChem Collection

Access PubChem data (compounds, substance, assays) using R. Structural information is provided in the form of SMILES strings. It currently only provides access to a subset of the precalculated data stored by PubChem. Bioassay data can be accessed to obtain descriptions as well as the actual data. It is also possible to search for assay ID’s by keyword.

203

Chemometrics and Computational Physics

sapa

Spectral Analysis for Physical Applications

Software for the book Spectral Analysis for Physical Applications, Donald B. Percival and Andrew T. Walden, Cambridge University Press, 1993.

204

Chemometrics and Computational Physics

SCEPtER

Stellar CharactEristics Pisa Estimation gRid

SCEPtER pipeline for estimating the stellar age, mass, and radius given observational effective temperature, [Fe/H], and astroseismic parameters. The results are obtained adopting a maximum likelihood technique over a grid of precomputed stellar models.

205

Chemometrics and Computational Physics

SCEPtERbinary

Stellar CharactEristics Pisa Estimation gRid for Binary Systems

SCEPtER pipeline for estimating the stellar age for doublelined detached binary systems. The observational constraints adopted in the recovery are the effective temperature, the metallicity [Fe/H], the mass, and the radius of the two stars. The results are obtained adopting a maximum likelihood technique over a grid of precomputed stellar models.

206

Chemometrics and Computational Physics

simecol

Simulation of Ecological (and Other) Dynamic Systems

An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individualbased (or agentbased) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and reusability of code.

207

Chemometrics and Computational Physics

snapshot

Gadget Nbody cosmological simulation code snapshot I/O utilities

Functions for reading and writing Gadget Nbody snapshots. The Gadget code is popular in astronomy for running Nbody / hydrodynamical cosmological and merger simulations. To find out more about Gadget see the main distribution page at www.mpagarching.mpg.de/gadget/

208

Chemometrics and Computational Physics

solaR

Radiation and Photovoltaic Systems

Calculation methods of solar radiation and performance of photovoltaic systems from daily and intradaily irradiation data sources.

209

Chemometrics and Computational Physics

som

SelfOrganizing Map

SelfOrganizing Map (with application in gene clustering).

210

Chemometrics and Computational Physics

SPADAR

Spherical Projections of Astronomical Data

Provides easy to use functions to create allsky grid plots of widely used astronomical coordinate systems (equatorial, ecliptic, galactic) and scatter plots of data on any of these systems including onthefly system conversion. It supports any type of spherical projection to the plane defined by the ‘mapproj’ package.

211

Chemometrics and Computational Physics

speaq

Tools for Nuclear Magnetic Resonance (NMR) Spectra Alignment, Peak Based Processing, Quantitative Analysis and Visualizations

Makes Nuclear Magnetic Resonance spectroscopy (NMR spectroscopy) data analysis as easy as possible by only requiring a small set of functions to perform an entire analysis. ‘speaq’ offers the possibility of raw spectra alignment and quantitation but also an analysis based on features whereby the spectra are converted to peaks which are then grouped and turned into features. These features can be processed with any number of statistical tools either included in ‘speaq’ or available elsewhere on CRAN. More details can be found in Vu et al. (2011) <doi:10.1186/1471210512405> and Beirnaert et al. (2018) <doi:10.1371/journal.pcbi.1006018>.

212

Chemometrics and Computational Physics

spectralAnalysis

PreProcess, Visualize and Analyse Process Analytical Data, by Spectral Data Measurements Made During a Chemical Process

Infrared, nearinfrared and Raman spectroscopic data measured during chemical reactions, provide structural fingerprints by which molecules can be identified and quantified. The application of these spectroscopic techniques as inline process analytical tools (PAT), provides the (pharma)chemical industry with novel tools, allowing to monitor their chemical processes, resulting in a better process understanding through insight in reaction rates, mechanistics, stability, etc. Data can be read into R via the generic spcformat, which is generally supported by spectrometer vendor software. Versatile preprocessing functions are available to perform baseline correction by linking to the ‘baseline’ package; noise reduction via the ‘signal’ package; as well as time alignment, normalization, differentiation, integration and interpolation. Implementation based on the S4 object system allows storing a preprocessing pipeline as part of a spectral data object, and easily transferring it to other datasets. Interactive plotting tools are provided based on the ‘plotly’ package. Nonnegative matrix factorization (NMF) has been implemented to perform multivariate analyses on individual spectral datasets or on multiple datasets at once. NMF provides a partsbased representation of the spectral data in terms of spectral signatures of the chemical compounds and their relative proportions. The functionality to read in spcfiles was adapted from the ‘hyperSpec’ package.

213

Chemometrics and Computational Physics

spls

Sparse Partial Least Squares (SPLS) Regression and Classification

Provides functions for fitting a sparse partial least squares (SPLS) regression and classification (Chun and Keles (2010) <doi:10.1111/j.14679868.2009.00723.x>).

214

Chemometrics and Computational Physics

stellaR

stellar evolution tracks and isochrones

A package to manage and display stellar tracks and isochrones from Pisa lowmass database. Includes tools for isochrones construction and tracks interpolation.

215

Chemometrics and Computational Physics

stepPlr

L2 Penalized Logistic Regression with Stepwise Variable Selection

L2 penalized logistic regression for both continuous and discrete predictors, with forward stagewise/forward stepwise variable selection procedure.

216

Chemometrics and Computational Physics

subselect

Selecting Variable Subsets

A collection of functions which (i) assess the quality of variable subsets as surrogates for a full data set, in either an exploratory data analysis or in the context of a multivariate linear model, and (ii) search for subsets which are optimal under various criteria.

217

Chemometrics and Computational Physics

TIMP

Fitting Separable Nonlinear Models in Spectroscopy and Microscopy

A problemsolving environment (PSE) for fitting separable nonlinear models to measurements arising in physics and chemistry experiments; has been extensively applied to timeresolved spectroscopy and FLIMFRET data.

218

Chemometrics and Computational Physics

titan

Titration analysis for mass spectrometry data

GUI to analyze mass spectrometric data on the relative abundance of two substances from a titration series.

219

Chemometrics and Computational Physics

titrationCurves

Acid/Base, Complexation, Redox, and Precipitation Titration Curves

A collection of functions to plot acid/base titration curves (pH vs. volume of titrant), complexation titration curves (pMetal vs. volume of EDTA), redox titration curves (potential vs.volume of titrant), and precipitation titration curves (either pAnalyte or pTitrant vs. volume of titrant). Options include the titration of mixtures, the ability to overlay two or more titration curves, and the ability to show equivalence points.

220

Chemometrics and Computational Physics

units

Measurement Units for R Vectors

Support for measurement units in R vectors, matrices and arrays: automatic propagation, conversion, derivation and simplification of units; raising errors in case of unit incompatibility. Compatible with the POSIXct, Date and difftime classes. Uses the UNIDATA udunits library and unit database for unit compatibility checking and conversion. Documentation about ‘units’ is provided in the paper by Pebesma, Mailund & Hiebert (2016, <doi:10.32614/RJ2016061>), included in this package as a vignette; see ‘citation(“units”)’ for details.

221

Chemometrics and Computational Physics

UPMASK

Unsupervised Photometric Membership Assignment in Stellar Clusters

An implementation of the UPMASK method for performing membership assignment in stellar clusters in R. It is prepared to use photometry and spatial positions, but it can take into account other types of data. The method is able to take into account arbitrary error models, and it is unsupervised, datadriven, physicalmodelfree and relies on as few assumptions as possible. The approach followed for membership assessment is based on an iterative process, dimensionality reduction, a clustering algorithm and a kernel density estimation.

222

Chemometrics and Computational Physics

varSelRF

Variable Selection using Random Forests

Variable selection from random forests using both backwards variable elimination (for the selection of small sets of nonredundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highlycorrelated variables). Main applications in highdimensional data (e.g., microarray data, and other genomics and proteomics applications).

223

Chemometrics and Computational Physics

webchem

Chemical Information from the Web

Chemical information from around the web. This package interacts with a suite of web APIs for chemical information.

224

Chemometrics and Computational Physics

WilcoxCV

Wilcoxonbased variable selection in crossvalidation

This package provides functions to perform fast variable selection based on the Wilcoxon rank sum test in the crossvalidation or MonteCarlo crossvalidation settings, for use in microarraybased binary classification.

225

Clinical Trial Design, Monitoring, and Analysis

adaptTest (core)

Adaptive twostage tests

The functions defined in this program serve for implementing adaptive twostage tests. Currently, four tests are included: Bauer and Koehne (1994), Lehmacher and Wassmer (1999), Vandemeulebroecke (2006), and the horizontal conditional error function. Userdefined tests can also be implemented. Reference: Vandemeulebroecke, An investigation of twostage tests, Statistica Sinica 2006.

226

Clinical Trial Design, Monitoring, and Analysis

AGSDest

Estimation in Adaptive Group Sequential Trials

Calculation of repeated confidence intervals as well as confidence intervals based on the stagewise ordering in group sequential designs and adaptive group sequential designs. For adaptive group sequential designs the confidence intervals are based on the conditional rejection probability principle. Currently the procedures do not support the use of futility boundaries or more than one adaptive interim analysis.

227

Clinical Trial Design, Monitoring, and Analysis

asd (core)

Simulations for Adaptive Seamless Designs

Package runs simulations for adaptive seamless designs with and without early outcomes for treatment selection and subpopulation type designs.

228

Clinical Trial Design, Monitoring, and Analysis

asypow

Calculate Power Utilizing Asymptotic Likelihood Ratio Methods

A set of routines written in the S language that calculate power and related quantities utilizing asymptotic likelihood ratio methods.

229

Clinical Trial Design, Monitoring, and Analysis

bcrm (core)

Bayesian Continual Reassessment Method for Phase I DoseEscalation Trials

Implements a wide variety of one and twoparameter Bayesian CRM designs. The program can run interactively, allowing the user to enter outcomes after each cohort has been recruited, or via simulation to assess operating characteristics.

230

Clinical Trial Design, Monitoring, and Analysis

binomSamSize

Confidence Intervals and Sample Size Determination for a Binomial Proportion under Simple Random Sampling and Pooled Sampling

A suite of functions to compute confidence intervals and necessary sample sizes for the parameter p of the Bernoulli B(p) distribution under simple random sampling or under pooled sampling. Such computations are e.g. of interest when investigating the incidence or prevalence in populations. The package contains functions to compute coverage probabilities and coverage coefficients of the provided confidence intervals procedures. Sample size calculations are based on expected length.

231

Clinical Trial Design, Monitoring, and Analysis

blockrand (core)

Randomization for block random clinical trials

Create randomizations for block random clinical trials. Can also produce a pdf file of randomization cards.

232

Clinical Trial Design, Monitoring, and Analysis

clinfun (core)

Clinical Trial Design and Data Analysis Functions

Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2stage and group sequential designs and for data analysis such as JonckheereTerpstra test and estimating survival quantiles.

233

Clinical Trial Design, Monitoring, and Analysis

clinsig

Clinical Significance Functions

Functions for calculating clinical significance.

234

Clinical Trial Design, Monitoring, and Analysis

clusterPower

Power Calculations for ClusterRandomized and ClusterRandomized Crossover Trials

Calculate power for cluster randomized trials (CRTs) that compare two means, two proportions, or two counts using closedform solutions. In addition, calculate power for cluster randomized crossover trials using Monte Carlo methods. For more information, see Reich et al. (2012) <doi:10.1371/journal.pone.0035564>.

235

Clinical Trial Design, Monitoring, and Analysis

coin

Conditional Inference Procedures in a Permutation Test Framework

Conditional inference procedures for the general independence problem including twosample, Ksample (nonparametric ANOVA), correlation, censored, ordered and multivariate problems.

236

Clinical Trial Design, Monitoring, and Analysis

conf.design

Construction of factorial designs

This small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs.

237

Clinical Trial Design, Monitoring, and Analysis

CRM

Continual Reassessment Method (CRM) for Phase I Clinical Trials

Functions for phase I clinical trials using the continual reassessment method.

238

Clinical Trial Design, Monitoring, and Analysis

CRTSize (core)

Sample Size Estimation Functions for Cluster Randomized Trials

Sample size estimation in cluster (group) randomized trials. Contains traditional powerbased methods, empirical smoothing (Rotondi and Donner, 2009), and updated metaanalysis techniques (Rotondi and Donner, 2012).

239

Clinical Trial Design, Monitoring, and Analysis

dfcrm (core)

DoseFinding by the Continual Reassessment Method

Provides functions to run the CRM and TITECRM in phase I trials and calibration tools for trial planning purposes.

240

Clinical Trial Design, Monitoring, and Analysis

dfped

Extrapolation and Bridging of Adult Information in Early Phase DoseFinding Paediatrics Studies

A unified method for designing and analysing dosefinding trials in paediatrics, while bridging information from adults, is proposed in the ‘dfped’ package. The dose range can be calculated under three extrapolation methods: linear, allometry and maturation adjustment, using pharmacokinetic (PK) data. To do this, it is assumed that target exposures are the same in both populations. The working model and prior distribution parameters of the dosetoxicity and doseefficacy relationships can be obtained using early phase adult toxicity and efficacy data at several dose levels through ‘dfped’ package. Priors are used into the dose finding process through a Bayesian model selection or adaptive priors, to facilitate adjusting the amount of prior information to differences between adults and children. This calibrates the model to adjust for misspecification if the adult and paediatric data are very different. User can use his/her own Bayesian model written in Stan code through the ‘dfped’ package. A template of this model is proposed in the examples of the corresponding R functions in the package. Finally, in this package you can find a simulation function for one trial or for more than one trial. These methods are proposed by Petit et al, (2016) <doi:10.1177/0962280216671348>.

241

Clinical Trial Design, Monitoring, and Analysis

dfpk

Bayesian DoseFinding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials

Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods, proposed by Ursino et al, (2017) <doi:10.1002/bimj.201600084>, enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).

242

Clinical Trial Design, Monitoring, and Analysis

DoseFinding

Planning and Analyzing Dose Finding Experiments

The DoseFinding package provides functions for the design and analysis of dosefinding experiments (with focus on pharmaceutical Phase II clinical trials). It provides functions for: multiple contrast tests, fitting nonlinear doseresponse models (using Bayesian and nonBayesian estimation), calculating optimal designs and an implementation of the MCPMod methodology.

243

Clinical Trial Design, Monitoring, and Analysis

epibasix

Elementary Epidemiological Functions for Epidemiology and Biostatistics

Contains elementary tools for analysis of common epidemiological problems, ranging from sample size estimation, through 2x2 contingency table analysis and basic measures of agreement (kappa, sensitivity/specificity). Appropriate print and summary statements are also written to facilitate interpretation wherever possible. Source code is commented throughout to facilitate modification. The target audience includes advanced undergraduate and graduate students in epidemiology or biostatistics courses, and clinical researchers.

244

Clinical Trial Design, Monitoring, and Analysis

ewoc

Escalation with Overdose Control

An implementation of a variety of escalation with overdose control designs introduced by Babb, Rogatko and Zacks (1998) <doi:10.1002/(SICI)10970258(19980530)17:10%3C1103::AIDSIM793%3E3.0.CO;29>. It calculates the next dose as a clinical trial proceeds as well as performs simulations to obtain operating characteristics.

245

Clinical Trial Design, Monitoring, and Analysis

experiment (core)

R Package for Designing and Analyzing Randomized Experiments

Provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomizedblock and matchedpair designs based on possibly multivariate pretreatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, twostage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.

246

Clinical Trial Design, Monitoring, and Analysis

FrF2

Fractional Factorial Designs with 2Level Factors

Regular and nonregular Fractional Factorial 2level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the builtin function alias).

247

Clinical Trial Design, Monitoring, and Analysis

GroupSeq (core)

A GUIBased Program to Compute Probabilities Regarding Group Sequential Designs

A graphical user interface to compute group sequential designs based on normally distributed test statistics, particularly critical boundaries, power, drift, and confidence intervals of such designs. All computations are based on the alpha spending approach by LanDeMets with various alpha spending functions being available to choose among.

248

Clinical Trial Design, Monitoring, and Analysis

gsbDesign

Group Sequential Bayes Design

Group Sequential Operating Characteristics for Clinical, Bayesian twoarm Trials with known Sigma and Normal Endpoints.

249

Clinical Trial Design, Monitoring, and Analysis

gsDesign (core)

Group Sequential Design

Derives group sequential designs and describes their properties.

250

Clinical Trial Design, Monitoring, and Analysis

HH

Statistical Analysis and Data Display: Heiberger and Holland

Support software for Statistical Analysis and Data Display (Second Edition, Springer, ISBN 9781493921218, 2015) and (First Edition, Springer, ISBN 0387402705, 2004) by Richard M. Heiberger and Burt Holland. This contemporary presentation of statistical methods features extensive use of graphical displays for exploring data and for displaying the analysis. The second edition includes redesigned graphics and additional chapters. The authors emphasize how to construct and interpret graphs, discuss principles of graphical design, and show how accompanying traditional tabular results are used to confirm the visual impressions derived directly from the graphs. Many of the graphical formats are novel and appear here for the first time in print. All chapters have exercises. All functions introduced in the book are in the package. R code for all examples, both graphs and tables, in the book is included in the scripts directory of the package.

251

Clinical Trial Design, Monitoring, and Analysis

Hmisc (core)

Harrell Miscellaneous

Contains many functions useful for data analysis, highlevel graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.

252

Clinical Trial Design, Monitoring, and Analysis

InformativeCensoring

Multiple Imputation for Informative Censoring

Multiple Imputation for Informative Censoring. This package implements two methods. Gamma Imputation from Jackson et al. (2014) <doi:10.1002/sim.6274> and Risk Score Imputation from Hsu et al. (2009) <doi:10.1002/sim.3480>.

253

Clinical Trial Design, Monitoring, and Analysis

ldbounds (core)

LanDeMets Method for Group Sequential Boundaries

Computations related to group sequential boundaries. Includes calculation of bounds using the LanDeMets alpha spending function approach.

254

Clinical Trial Design, Monitoring, and Analysis

MCPMod

Design and Analysis of DoseFinding Studies

Implements a methodology for the design and analysis of doseresponse studies that combines aspects of multiple comparison procedures and modeling approaches (Bretz, Pinheiro and Branson, 2005, Biometrics 61, 738748, <doi:10.1111/j.15410420.2005.00344.x>). The package provides tools for the analysis of dose finding trials as well as a variety of tools necessary to plan a trial to be conducted with the MCPMod methodology. Please note: The ‘MCPMod’ package will not be further developed, all future development of the MCPMod methodology will be done in the ‘DoseFinding’ Rpackage.

255

Clinical Trial Design, Monitoring, and Analysis

Mediana

Clinical Trial Simulations

Provides a general framework for clinical trial simulations based on the Clinical Scenario Evaluation (CSE) approach. The package supports a broad class of data models (including clinical trials with continuous, binary, survivaltype and counttype endpoints as well as multivariate outcomes that are based on combinations of different endpoints), analysis strategies and commonly used evaluation criteria.

256

Clinical Trial Design, Monitoring, and Analysis

meta

General Package for MetaAnalysis

Userfriendly general package providing standard methods for metaanalysis and supporting Schwarzer, Carpenter, and Rucker <doi:10.1007/9783319214160>, “MetaAnalysis with R” (2015):  fixed effect and random effects metaanalysis;  several plots (forest, funnel, Galbraith / radial, L’Abbe, Baujat, bubble);  statistical tests and trimandfill method to evaluate bias in metaanalysis;  import data from ‘RevMan 5’;  prediction interval, HartungKnapp and PauleMandel method for random effects model;  cumulative metaanalysis and leaveoneout metaanalysis;  metaregression;  generalised linear mixed models;  produce forest plot summarising several (subgroup) metaanalyses.

257

Clinical Trial Design, Monitoring, and Analysis

metafor

MetaAnalysis Package for R

A comprehensive collection of functions for conducting metaanalyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit fixed, random, and mixedeffects models to such data, carry out moderator and metaregression analyses, and create various types of metaanalytical plots (e.g., forest, funnel, radial, L’Abbe, Baujat, GOSH plots). For metaanalyses of binomial and persontime data, the package also provides functions that implement specialized methods, including the MantelHaenszel method, Peto’s method, and a variety of suitable generalized linear (mixedeffects) models (i.e., mixedeffects logistic and Poisson regression models). Finally, the package provides functionality for fitting metaanalytic multivariate/multilevel models that account for nonindependent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network metaanalyses and metaanalyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted.

258

Clinical Trial Design, Monitoring, and Analysis

metaLik

Likelihood Inference in MetaAnalysis and MetaRegression Models

First and higherorder likelihood inference in metaanalysis and metaregression models.

259

Clinical Trial Design, Monitoring, and Analysis

metasens

Advanced Statistical Methods to Model and Adjust for Bias in MetaAnalysis

The following methods are implemented to evaluate how sensitive the results of a metaanalysis are to potential bias in metaanalysis and to support Schwarzer et al. (2015) <doi:10.1007/9783319214160>, Chapter 5 ‘SmallStudy Effects in MetaAnalysis’:  Copas selection model described in Copas & Shi (2001) <doi:10.1177/096228020101000402>;  limit metaanalysis by Rucker et al. (2011) <doi:10.1093/biostatistics/kxq046>;  upper bound for outcome reporting bias by Copas & Jackson (2004) <doi:10.1111/j.0006341X.2004.00161.x>;  imputation methods for missing binary data by Gamble & Hollis (2005) <doi:10.1016/j.jclinepi.2004.09.013> and Higgins et al. (2008) <doi:10.1177/1740774508091600>.

260

Clinical Trial Design, Monitoring, and Analysis

multcomp

Simultaneous Inference in General Parametric Models

Simultaneous tests and confidence intervals for general linear hypotheses in parametric models, including linear, generalized linear, linear mixed effects, and survival models. The package includes demos reproducing analyzes presented in the book “Multiple Comparisons Using R” (Bretz, Hothorn, Westfall, 2010, CRC Press).

261

Clinical Trial Design, Monitoring, and Analysis

nppbib

Nonparametric PartiallyBalanced Incomplete Block Design Analysis

Implements a nonparametric statistical test for rank or score data from partiallybalanced incomplete blockdesign experiments.

262

Clinical Trial Design, Monitoring, and Analysis

PIPS (core)

Predicted Interval Plots

Generate Predicted Interval Plots. Simulate and plot confidence intervals of an effect estimate given observed data and a hypothesis about the distribution of future data.

263

Clinical Trial Design, Monitoring, and Analysis

PowerTOST (core)

Power and Sample Size Based on Two OneSided tTests (TOST) for (Bio)Equivalence Studies

Contains functions to calculate power and sample size for various study designs used for bioequivalence studies. See function known.designs() for study designs covered. Moreover the package contains functions for power and sample size based on ‘expected’ power in case of uncertain (estimated) variability and/or uncertain theta0. ― Added are functions for the power and sample size for the ratio of two means with normally distributed data on the original scale (based on Fieller’s confidence (‘fiducial’) interval). ― Contains further functions for power and sample size calculations based on noninferiority ttest. This is not a TOST procedure but eventually useful if the question of ‘nonsuperiority’ must be evaluated. The power and sample size calculations based on noninferiority test may also performed via ‘expected’ power in case of uncertain (estimated) variability and/or uncertain theta0. ― Contains functions power.scABEL() and sampleN.scABEL() to calculate power and sample size for the BE decision via scaled (widened) BE acceptance limits (EMA recommended) based on simulations. Contains also functions scABEL.ad() and sampleN.scABEL.ad() to iteratively adjust alpha in order to maintain the overall consumer risk in ABEL studies and adapt the sample size for the loss in power. Contains further functions power.RSABE() and sampleN.RSABE() to calculate power and sample size for the BE decision via reference scaled ABE criterion according to the FDA procedure based on simulations. Contains further functions power.NTIDFDA() and sampleN.NTIDFDA() to calculate power and sample size for the BE decision via the FDA procedure for NTID’s based on simulations. Contains further functions power.HVNTID() and sampleN.HVNTID() to calculate power and sample size for the BE decision via the FDA procedure for highly variable NTID’s (see FDA Dabigatran / rivaroxaban guidances) ― Contains functions for power analysis of a sample size plan for ABE (pa.ABE()), scaled ABE (pa.scABE()) and scaled ABE for NTID’s (pa.NTIDFDA()) analysing power if deviating from assumptions of the plan. ― Contains further functions for power calculations / sample size estimation for dose proportionality studies using the Power model.

264

Clinical Trial Design, Monitoring, and Analysis

pwr (core)

Basic Functions for Power Analysis

Power analysis functions along the lines of Cohen (1988).

265

Clinical Trial Design, Monitoring, and Analysis

PwrGSD (core)

Power in a Group Sequential Design

Tools for the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives. See Izmirlian, G. (2014) <doi:10.4310/SII.2014.v7.n1.a4>.

266

Clinical Trial Design, Monitoring, and Analysis

qtlDesign (core)

Design of QTL experiments

Tools for the design of QTL experiments

267

Clinical Trial Design, Monitoring, and Analysis

rmeta

MetaAnalysis

Functions for simple fixed and random effects metaanalysis for twosample comparisons and cumulative metaanalyses. Draws standard summary plots, funnel plots, and computes summaries and tests for association and heterogeneity.

268

Clinical Trial Design, Monitoring, and Analysis

samplesize

Sample Size Calculation for Various tTests and WilcoxonTest

Computes sample size for Student’s ttest and for the WilcoxonMannWhitney test for categorical data. The ttest function allows paired and unpaired (balanced / unbalanced) designs as well as homogeneous and heterogeneous variances. The Wilcoxon function allows for ties.

269

Clinical Trial Design, Monitoring, and Analysis

speff2trial (core)

Semiparametric efficient estimation for a twosample treatment effect

The package performs estimation and testing of the treatment effect in a 2group randomized clinical trial with a quantitative, dichotomous, or rightcensored timetoevent endpoint. The method improves efficiency by leveraging baseline predictors of the endpoint. The inverse probability weighting technique of Robins, Rotnitzky, and Zhao (JASA, 1994) is used to provide unbiased estimation when the endpoint is missing at random.

270

Clinical Trial Design, Monitoring, and Analysis

ssanv

Sample Size Adjusted for Nonadherence or Variability of Input Parameters

A set of functions to calculate sample size for twosample difference in means tests. Does adjustments for either nonadherence or variability that comes from using data to estimate parameters.

271

Clinical Trial Design, Monitoring, and Analysis

survival (core)

Survival Analysis

Contains the core survival analysis routines, including definition of Surv objects, KaplanMeier and AalenJohansen (multistate) curves, Cox models, and parametric accelerated failure time models.

272

Clinical Trial Design, Monitoring, and Analysis

TEQR (core)

Target Equivalence Range Design

The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is deescalated only if an unacceptable level of toxicity is experienced.

273

Clinical Trial Design, Monitoring, and Analysis

ThreeArmedTrials

Design and Analysis of Clinical NonInferiority or Superiority Trials with Active and Placebo Control

Design and analyze threearm noninferiority or superiority trials which follow a goldstandard design, i.e. trials with an experimental treatment, an active, and a placebo control. Method for the following distributions are implemented: Poisson (Mielke and Munk (2009) <arXiv:0912.4169>), negative binomial (Muetze et al. (2016) <doi:10.1002/sim.6738>), normal (Pigeot et al. (2003) <doi:10.1002/sim.1450>; Hasler et al. (2009) <doi:10.1002/sim.3052>), binary (Friede and Kieser (2007) <doi:10.1002/sim.2543>), nonparametric (Muetze et al. (2017) <doi:10.1002/sim.7176>), exponential (Mielke and Munk (2009) <arXiv:0912.4169>).

274

Clinical Trial Design, Monitoring, and Analysis

ThreeGroups

ML Estimator for BaselinePlaceboTreatment (ThreeGroup) Experiments

Implements the Maximum Likelihood estimator for baseline, placebo, and treatment groups (threegroup) experiments with noncompliance proposed by Gerber, Green, Kaplan, and Kern (2010).

275

Clinical Trial Design, Monitoring, and Analysis

TrialSize (core)

R functions in Chapter 3,4,6,7,9,10,11,12,14,15

functions and examples in Sample Size Calculation in Clinical Research.

276

Cluster Analysis & Finite Mixture Models

AdMit

Adaptive Mixture of Studentt Distributions

Provides functions to perform the fitting of an adaptive mixture of Studentt distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the MetropolisHastings algorithm to obtain quantities of interest for the target density itself.

277

Cluster Analysis & Finite Mixture Models

ADPclust

Fast Clustering Using Adaptive Density Peak Detection

An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon the idea of Rodriguez and Laio (2014)<doi:10.1126/science.1242072>. ADPclust clusters data by finding density peaks in a densitydistance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes a user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional “densitydistance plot”. Here is the research article associated with this package: “Wang, XiaoFeng, and Yifan Xu (2015)<doi:10.1177/0962280215609948> Fast clustering using adaptive density peak detection.” Statistical methods in medical research“. url: http://smm.sagepub.com/content/early/2015/10/15/0962280215609948.abstract.

278

Cluster Analysis & Finite Mixture Models

amap

Another Multidimensional Analysis Package

Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).

279

Cluster Analysis & Finite Mixture Models

apcluster

Affinity Propagation Clustering

Implements Affinity Propagation clustering introduced by Frey and Dueck (2007) <doi:10.1126/science.1136800>. The algorithms are largely analogous to the ‘Matlab’ code published by Frey and Dueck. The package further provides leveraged affinity propagation and an algorithm for exemplarbased agglomerative clustering that can also be used to join clusters obtained from affinity propagation. Various plotting functions are available for analyzing clustering results.

280

Cluster Analysis & Finite Mixture Models

BayesLCA

Bayesian Latent Class Analysis

Bayesian Latent Class Analysis using several different methods.

281

Cluster Analysis & Finite Mixture Models

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

282

Cluster Analysis & Finite Mixture Models

bayesmix

Bayesian Mixture Models with JAGS

The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.

283

Cluster Analysis & Finite Mixture Models

bclust

Bayesian Hierarchical Clustering Using Spike and Slab Models

Builds a dendrogram using log posterior as a natural distance defined by the model and meanwhile waits the clustering variables. It is also capable to computing equivalent Bayesian discrimination probabilities. The adopted method suites small sample large dimension setting. The model parameter estimation maybe difficult, depending on data structure and the chosen distribution family.

284

Cluster Analysis & Finite Mixture Models

bgmm

Gaussian Mixture Modeling Algorithms and the BeliefBased Mixture Modeling

Two partially supervised mixture modeling methods: softlabel and beliefbased modeling are implemented. For completeness, we equipped the package also with the functionality of unsupervised, semi and fully supervised mixture modeling. The package can be applied also to selection of the bestfitting from a set of models with different component numbers or constraints on their structures. For detailed introduction see: Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software <doi:10.18637/jss.v047.i03>.

285

Cluster Analysis & Finite Mixture Models

biclust

BiCluster Algorithms

The main function biclust() provides several algorithms to find biclusters in twodimensional data: Cheng and Church (2000, ISBN:1577351150), spectral (2003) <doi:10.1101/gr.648603>, plaid model (2005) <doi:10.1016/j.csda.2004.02.003>, xmotifs (2003) <doi:10.1142/9789812776303_0008> and bimax (2006) <doi:10.1093/bioinformatics/btl060>. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.

286

Cluster Analysis & Finite Mixture Models

Bmix

Bayesian Sampling for StickBreaking Mixtures

This is a barebones implementation of sampling algorithms for a variety of Bayesian stickbreaking (marginally DP) mixture models, including particle learning and Gibbs sampling for static DP mixtures, particle learning for dynamic BAR stickbreaking, and DP mixture regression. The software is designed to be easy to customize to suit different situations and for experimentation with stickbreaking models. Since particles are repeatedly copied, it is not an especially efficient implementation.

287

Cluster Analysis & Finite Mixture Models

bmixture

Bayesian Estimation for Finite Mixture of Distributions

Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and tdistributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s0018001203233> and Mohammadi and SalehiRad (2012) <doi:10.1080/03610918.2011.588358>.

288

Cluster Analysis & Finite Mixture Models

cba

Clustering for Business Analytics

Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.

289

Cluster Analysis & Finite Mixture Models

cclust

Convex Clustering Methods and Clustering Indexes

Convex Clustering methods, including Kmeans algorithm, Online Update algorithm (Hard Competitive Learning) and Neural Gas algorithm (Soft Competitive Learning), and calculation of several indexes for finding the number of clusters in a data set.

290

Cluster Analysis & Finite Mixture Models

CEC

CrossEntropy Clustering

CEC divides data into Gaussian type clusters. The implementation allows the simultaneous use of various type Gaussian mixture models, performs the reduction of unnecessary clusters and it’s able to discover new groups. Based on Spurek, P. and Tabor, J. (2014) <doi:10.1016/j.patcog.2014.03.006>.

291

Cluster Analysis & Finite Mixture Models

CHsharp

Choi and Hall Style Data Sharpening

Functions for use in perturbing data prior to use of nonparametric smoothers and clustering.

292

Cluster Analysis & Finite Mixture Models

clue

Cluster Ensembles

CLUster Ensembles.

293

Cluster Analysis & Finite Mixture Models

cluster (core)

“Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.

294

Cluster Analysis & Finite Mixture Models

clusterCrit

Clustering Indices

Compute clustering validation indices.

295

Cluster Analysis & Finite Mixture Models

clusterfly

Explore clustering interactively using R and GGobi

Visualise clustering algorithms with GGobi. Contains both general code for visualising clustering results and specific visualisations for modelbased, hierarchical and SOM clustering.

296

Cluster Analysis & Finite Mixture Models

clusterGeneration

Random Cluster Generation (with Specified Degree of Separation)

We developed the clusterGeneration package to provide functions for generating random clusters, generating random covariance/correlation matrices, calculating a separation index (data and population version) for pairs of clusters or cluster distributions, and 1D and 2D projection plots to visualize clusters. The package also contains a function to generate random clusters based on factorial designs with factors such as degree of separation, number of clusters, number of variables, number of noisy variables.

297

Cluster Analysis & Finite Mixture Models

ClusterR

Gaussian Mixture Models, KMeans, MiniBatchKmeans, KMedoids and Affinity Propagation Clustering

Gaussian mixture models, kmeans, minibatchkmeans, kmedoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of ‘RcppArmadillo’ to speed up the computationally intensive parts of the functions. For more information, see (i) “Clustering in an ObjectOriented Environment” by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) “Webscale kmeans clustering” by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) “Armadillo: a templatebased C++ library for linear algebra” by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) “Clustering by Passing Messages Between Data Points” by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972976, <doi:10.1126/science.1136800>.

298

Cluster Analysis & Finite Mixture Models

clusterRepro

Reproducibility of Gene Expression Clusters

This is a function for validating microarray clusters via reproducibility, based on the paper referenced below.

299

Cluster Analysis & Finite Mixture Models

clusterSim

Searching for Optimal Clustering Procedure for a Data Set

Distance measures (GDM1, GDM2, SokalMichener, BrayCurtis, for symbolic intervalvalued data), cluster quality indices (CalinskiHarabasz, BakerHubert, HubertLevine, Silhouette, KrzanowskiLai, Hartigan, Gap, DaviesBouldin), data normalization formulas (metric data, intervalvalued symbolic data), data generation (typical and nontypical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic intervalvalued data). (MILLIGAN, G.W., COOPER, M.C. (1985) <doi:10.1007/BF02294245>, HUBERT, L., ARABIE, P. (1985) <doi:10.1007%2FBF01908075>, RAND, W.M. (1971) <doi:10.1080/01621459.1971.10482356>, JAJUGA, K., WALESIAK, M. (2000) <doi:10.1007/9783642572807_11>, MILLIGAN, G.W., COOPER, M.C. (1988) <doi:10.1007/BF01897163>, JAJUGA, K., WALESIAK, M., BAK, A. (2003) <doi:10.1007/9783642557217_12>, DAVIES, D.L., BOULDIN, D.W. (1979) <doi:10.1109/TPAMI.1979.4766909>, CALINSKI, T., HARABASZ, J. (1974) <doi:10.1080/03610927408827101>, HUBERT, L. (1974) <doi:10.1080/01621459.1974.10480191>, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) <doi:10.1111/14679868.00293>, BRECKENRIDGE, J.N. (2000) <doi:10.1207/S15327906MBR3502_5>, WALESIAK, M., DUDEK, A. (2008) <doi:10.1007/9783540782469_11>).

300

Cluster Analysis & Finite Mixture Models

clustMixType

kPrototypes Clustering for Mixed VariableType Data

Functions to perform kprototypes partitioning clustering for mixed variabletype data according to Z.Huang (1998): Extensions to the kMeans Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283304, <doi:10.1023/A:1009769707641>.

301

Cluster Analysis & Finite Mixture Models

clustvarsel

Variable Selection for Gaussian ModelBased Clustering

Variable selection for Gaussian modelbased clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forwardbackward or backwardforward direction, with or without subsampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.

302

Cluster Analysis & Finite Mixture Models

clv

Cluster Validation Techniques

Package contains most of the popular internal and external cluster validation methods ready to use for the most of the outputs produced by functions coming from package “cluster”. Package contains also functions and examples of usage for cluster stability approach that might be applied to algorithms implemented in “cluster” package as well as user defined clustering algorithms.

303

Cluster Analysis & Finite Mixture Models

clValid

Validation of Clustering Results

Statistical and biological validation of clustering results.

304

Cluster Analysis & Finite Mixture Models

CoClust

Copula Based Cluster Analysis

A copula based clustering algorithm that finds clusters according to the complex multivariate dependence structure of the data generating process. The updated version of the algorithm is described in Di Lascio, F.M.L. and Giannerini, S. (2016). “Clustering dependent observations with copula functions”. Statistical Papers, p.117. <doi:10.1007/s0036201608223>.

305

Cluster Analysis & Finite Mixture Models

compHclust

Complementary Hierarchical Clustering

Performs the complementary hierarchical clustering procedure and returns X’ (the expected residual matrix) and a vector of the relative gene importances.

306

Cluster Analysis & Finite Mixture Models

dbscan

Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms

A fast reimplementation of several densitybased algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (densitybased spatial clustering of applications with noise) and OPTICS (ordering points to identify the clustering structure) clustering algorithms HDBSCAN (hierarchical DBSCAN) and the LOF (local outlier factor) algorithm. The implementations use the kdtree data structure (from library ANN) for faster knearest neighbor search. An R interface to fast kNN and fixedradius NN search is also provided.

307

Cluster Analysis & Finite Mixture Models

dendextend

Extending ‘dendrogram’ Functionality in R

Offers a set of functions for extending ‘dendrogram’ objects in R, letting you visualize and compare trees of ‘hierarchical clusterings’. You can (1) Adjust a tree’s graphical parameters  the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different ‘dendrograms’ to one another.

308

Cluster Analysis & Finite Mixture Models

depmix

Dependent Mixture Models

Fits (multigroup) mixtures of latent or hidden Markov models on mixed categorical and continuous (timeseries) data. The Rdonlp2 package can optionally be used for optimization of the loglikelihood and is available from Rforge.

309

Cluster Analysis & Finite Mixture Models

depmixS4

Dependent Mixture Models  Hidden Markov Models of GLMs and Other Distributions in S4

Fits latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models, see Visser & Speekenbrink (2010, <doi:10.18637/jss.v036.i07>).

310

Cluster Analysis & Finite Mixture Models

dpmixsim

Dirichlet Process Mixture Model Simulation for Clustering and Image Segmentation

The ‘dpmixsim’ package implements a Dirichlet Process Mixture (DPM) model for clustering and image segmentation. The DPM model is a Bayesian nonparametric methodology that relies on MCMC simulations for exploring mixture models with an unknown number of components. The code implements conjugate models with normal structure (conjugate normalnormal DP mixture model). The package’s applications are oriented towards the classification of magnetic resonance images according to tissue type or region of interest.

311

Cluster Analysis & Finite Mixture Models

dynamicTreeCut

Methods for Detection of Clusters in Hierarchical Clustering Dendrograms

Contains methods for detection of clusters in hierarchical clustering dendrograms.

312

Cluster Analysis & Finite Mixture Models

e1071

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

313

Cluster Analysis & Finite Mixture Models

edci

Edge Detection and Clustering in Images

Detection of edge points in images based on the difference of two asymmetric Mkernel estimators. Linear and circular regression clustering based on redescending Mestimators. Detection of linear edges in images.

314

Cluster Analysis & Finite Mixture Models

EMCluster

EM Algorithm for ModelBased Clustering of Finite Mixture Gaussian Distribution

EM algorithms and several efficient initialization methods for modelbased clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semisupervised learning.

315

Cluster Analysis & Finite Mixture Models

evclust

Evidential Clustering

Various clustering algorithms that produce a credal partition, i.e., a set of DempsterShafer mass functions representing the membership of objects to clusters. The mass functions quantify the clustermembership uncertainty of the objects. The algorithms are: Evidential cMeans (ECM), Relational Evidential cMeans (RECM), Constrained Evidential cMeans (CECM), EVCLUS and EKNNclus.

316

Cluster Analysis & Finite Mixture Models

FactoClass

Combination of Factorial Methods and Cluster Analysis

Some functions of ‘ade4’ and ‘stats’ are combined in order to obtain a partition of the rows of a data table, with columns representing variables of scales: quantitative, qualitative or frequency. First, a principal axes method is performed and then, a combination of Ward agglomerative hierarchical classification and Kmeans is performed, using some of the first coordinates obtained from the previous principal axes method. See, for example: Lebart, L. and Piron, M. and Morineau, A. (2006). Statistique Exploratoire Multidimensionnelle, Dunod, Paris. In order to permit to have different weights of the elements to be clustered, the function ‘kmeansW’, programmed in C++, is included. It is a modification of ‘kmeans’. Some graphical functions include the option: ‘gg=FALSE’. When ‘gg=TRUE’, they use the ‘ggplot2’ and ‘ggrepel’ packages to avoid the superposition of the labels.

317

Cluster Analysis & Finite Mixture Models

fastcluster

Fast Hierarchical Clustering Routines for R and ‘Python’

This is a twoinone package which provides interfaces to both R and ‘Python’. It implements fast hierarchical, agglomerative clustering routines. Part of the functionality is designed as dropin replacement for existing routines: linkage() in the ‘SciPy’ package ‘scipy.cluster.hierarchy’, hclust() in R’s ‘stats’ package, and the ‘flashClust’ package. It provides the same functionality with the benefit of a much faster implementation. Moreover, there are memorysaving routines for clustering of vector data, which go beyond what the existing packages provide. For information on how to install the ‘Python’ files, see the file INSTALL in the source distribution. Based on the present package, Christoph Dalitz also wrote a pure ‘C++’ interface to ‘fastcluster’: <http://informatik.hsnr.de/~dalitz/data/hclust>.

318

Cluster Analysis & Finite Mixture Models

fclust

Fuzzy Clustering

Algorithms for fuzzy clustering, cluster validity indices and plots for cluster validity and visualizing fuzzy clustering results.

319

Cluster Analysis & Finite Mixture Models

flashClust

Implementation of optimal hierarchical clustering

Fast implementation of hierarchical clustering

320

Cluster Analysis & Finite Mixture Models

flexclust (core)

Flexible Cluster Algorithms

The main function kcca implements a general framework for kcentroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, …), and bootstrap methods for the analysis of cluster stability.

321

Cluster Analysis & Finite Mixture Models

flexmix (core)

Flexible Mixture Modeling

A general framework for finite mixtures of regression models using the EM algorithm is implemented. The Estep and all data handling are provided, while the Mstep can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and modelbased clustering.

322

Cluster Analysis & Finite Mixture Models

fpc

Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Clusterwise cluster stability assessment. Methods for estimation of the number of clusters: CalinskiHarabasz, Tibshirani and Walther’s prediction strength, Fang and Wang’s bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variablewise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

323

Cluster Analysis & Finite Mixture Models

FunCluster

Functional Profiling of Microarray Expression Data

FunCluster performs a functional analysis of microarray expression data based on Gene Ontology & KEGG functional annotations. From expression data and functional annotations FunCluster builds classes of putatively coregulated biological processes through a specially designed clustering procedure.

324

Cluster Analysis & Finite Mixture Models

funFEM

Clustering in the Discriminative Functional Subspace

The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.

325

Cluster Analysis & Finite Mixture Models

funHDDC

Univariate and Multivariate ModelBased Clustering in GroupSpecific Functional Subspaces

The funHDDC algorithm allows to cluster functional univariate (Bouveyron and Jacques, 2011, <doi:10.1007/s1163401100956>) or multivariate data (Schmutz et al., 2018) by modeling each group within a specific functional subspace.

326

Cluster Analysis & Finite Mixture Models

gamlss.mx

Fitting Mixture Distributions with GAMLSS

The main purpose of this package is to allow fitting of mixture distributions with GAMLSS models.

327

Cluster Analysis & Finite Mixture Models

genie

A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm

A new hierarchical clustering linkage criterion: the Genie algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed, see (Gagolewski et al. 2016a <doi:10.1016/j.ins.2016.05.003>, 2016b <doi:10.1007/9783319456560_16>) for more details.

328

Cluster Analysis & Finite Mixture Models

GLDEX

Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods

The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KSresample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.

329

Cluster Analysis & Finite Mixture Models

GMCM

Fast Estimation of Gaussian Mixture Copula Models

Unsupervised Clustering and Metaanalysis using Gaussian Mixture Copula Models.

330

Cluster Analysis & Finite Mixture Models

GSM

Gamma Shape Mixture

Implementation of a Bayesian approach for estimating a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavytailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter.

331

Cluster Analysis & Finite Mixture Models

HDclassif

High Dimensional Supervised Classification and Clustering

Discriminant analysis and data clustering methods for high dimensional data, based on the assumption that highdimensional data live in different subspaces with low dimensionality proposing a new parametrization of the Gaussian mixture model which combines the ideas of dimension reduction and constraints on the model.

332

Cluster Analysis & Finite Mixture Models

hybridHclust

Hybrid Hierarchical Clustering

Hybrid hierarchical clustering via mutual clusters. A mutual cluster is a set of points closer to each other than to all other points. Mutual clusters are used to enrich topdown hierarchical clustering.

333

Cluster Analysis & Finite Mixture Models

idendr0

Interactive Dendrograms

Interactive dendrogram that enables the user to select and color clusters, to zoom and pan the dendrogram, and to visualize the clustered data not only in a builtin heat map, but also in ‘GGobi’ interactive plots and usersupplied plots. This is a backport of Qtbased ‘idendro’ (<https://github.com/tsieger/idendro>) to base R graphics and Tcl/Tk GUI.

334

Cluster Analysis & Finite Mixture Models

IMIFA

Infinite Mixtures of Infinite Factor Analysers and Related Models

Provides flexible Bayesian estimation of Infinite Mixtures of Infinite Factor Analysers and related models, for nonparametrically clustering highdimensional data, introduced by Murphy et al. (2018) <arXiv:1701.07010v4>. The IMIFA model conducts Bayesian nonparametric modelbased clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or clusterspecific latent factors, mostly via efficient Gibbs updates. Modelspecific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty.

335

Cluster Analysis & Finite Mixture Models

isopam

Isopam (Clustering)

Isopam clustering algorithm and utilities. Isopam optimizes clusters and optionally cluster numbers in a brute force style and aims at an optimum separation by all or some descriptors (typically species).

336

Cluster Analysis & Finite Mixture Models

kernlab

KernelBased Machine Learning Lab

Kernelbased machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

337

Cluster Analysis & Finite Mixture Models

kml

KMeans for Longitudinal Data

An implementation of kmeans specifically design to cluster longitudinal data. It provides facilities to deal with missing value, compute several quality criterion (Calinski and Harabatz, Ray and Turie, Davies and Bouldin, BIC, …) and propose a graphical interface for choosing the ‘best’ number of clusters.

338

Cluster Analysis & Finite Mixture Models

latentnet

Latent Position and Cluster Models for Statistical Networks

Fit and simulate latent position and cluster models for statistical networks.

339

Cluster Analysis & Finite Mixture Models

LCAvarsel

Variable Selection for Latent Class Analysis

Variable selection for latent class analysis for modelbased clustering of multivariate categorical data. The package implements a general framework for selecting the subset of variables with relevant clustering information and discard those that are redundant and/or not informative. The variable selection method is based on the approach of Fop et al. (2017) <doi:10.1214/17AOAS1061> and Dean and Raftery (2010) <doi:10.1007/s1046300902589>. Different algorithms are available to perform the selection: stepwise, swapstepwise and evolutionary stochastic search. Concomitant covariates used to predict the class membership probabilities can also be included in the latent class analysis model. The selection procedure can be run in parallel on multiple cores machines.

340

Cluster Analysis & Finite Mixture Models

lcmm

Extended Mixed Models Using Latent Classes and Latent Processes

Estimation of various extensions of the mixed models including latent class mixed models, joint latent latent class mixed models and mixed models for curvilinear univariate or multivariate longitudinal outcomes using a maximum likelihood estimation method.

341

Cluster Analysis & Finite Mixture Models

mcclust

Process an MCMC Sample of Clusterings

Implements methods for processing a sample of (hard) clusterings, e.g. the MCMC output of a Bayesian clustering model. Among them are methods that find a single best clustering to represent the sample, which are based on the posterior similarity matrix or a relabelling algorithm.

342

Cluster Analysis & Finite Mixture Models

mclust (core)

Gaussian Mixture Modelling for ModelBased Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for modelbased clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resamplingbased inference.

343

Cluster Analysis & Finite Mixture Models

MetabolAnalyze

Probabilistic latent variable models for metabolomic data

Fits probabilistic principal components analysis, probabilistic principal components and covariates analysis and mixtures of probabilistic principal components models to metabolomic spectral data.

344

Cluster Analysis & Finite Mixture Models

mixAK

Multivariate Normal Mixture Models and Mixtures of Generalized Linear Mixed Models Including Model Based Clustering

Contains a mixture of statistical methods including the MCMC methods to analyze normal mixtures. Additionally, model based clustering methods are implemented to perform classification based on (multivariate) longitudinal (or otherwise correlated) data. The basis for such clustering is a mixture of multivariate generalized linear mixed models.

345

Cluster Analysis & Finite Mixture Models

MixAll

Clustering and Classification using ModelBased Mixture Models

Algorithms and methods for modelbased clustering and classification. It supports various types of data: continuous, categorical and counting and can handle mixed data of these types. It can fit Gaussian (with diagonal covariance structure), gamma, categorical and Poisson models. The algorithms also support missing values. This package can be used as an independent alternative to the (not free) ‘mixtcomp’ software available at <https://massiccc.lille.inria.fr/>.

346

Cluster Analysis & Finite Mixture Models

mixdist

Finite Mixture Distribution Models

Fit finite mixture distribution models to grouped data and conditional data by maximum likelihood using a combination of a Newtontype algorithm and the EM algorithm.

347

Cluster Analysis & Finite Mixture Models

mixPHM

Mixtures of Proportional Hazard Models

Fits multiple variable mixtures of various parametric proportional hazard models using the EMAlgorithm. Proportionality restrictions can be imposed on the latent groups and/or on the variables. Several survival distributions can be specified. Missing values and censored values are allowed. Independence is assumed over the single variables.

348

Cluster Analysis & Finite Mixture Models

mixRasch

Mixture Rasch Models with JMLE

Estimates Rasch models and mixture Rasch models, including the dichotomous Rasch model, the rating scale model, and the partial credit model.

349

Cluster Analysis & Finite Mixture Models

mixreg

Functions to Fit Mixtures of Regressions

Fits mixtures of (possibly multivariate) regressions (which has been described as doing ANCOVA when you don’t know the levels).

350

Cluster Analysis & Finite Mixture Models

MixSim

Simulating Data to Study Performance of Clustering Algorithms

The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of ‘MixSim’, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and nonGaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.

351

Cluster Analysis & Finite Mixture Models

mixsmsn

Fitting Finite Mixture of Scale Mixture of SkewNormal Distributions

Functions to fit finite mixture of scale mixture of skewnormal (FMSMSN) distributions.

352

Cluster Analysis & Finite Mixture Models

mixtools

Tools for Analyzing Finite Mixture Models

Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixturesofregressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictordependent mixing proportions, random effects regressions, hierarchical mixturesofexperts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic and model selection criteria). Bayesian estimation of mixturesoflinearregressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES0518772.

353

Cluster Analysis & Finite Mixture Models

mixture

Finite Gaussian Mixture Models for Clustering and Classification

An implementation of all 14 Gaussian parsimonious clustering models (GPCMs) for modelbased clustering and modelbased classification.

354

Cluster Analysis & Finite Mixture Models

MOCCA

MultiObjective Optimization for Collecting Cluster Alternatives

Provides methods to analyze cluster alternatives based on multiobjective optimization of cluster validation indices. For details see Kraus et al. (2011) <doi:10.1007/s0018001102446>.

355

Cluster Analysis & Finite Mixture Models

MoEClust

Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2018) <arXiv:1711.05632v2>. This package fits finite Gaussian mixture models with a formula interface for supplying gating and/or expert network covariates using a range of parsimonious covariance parameterisations from the GPCM family via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots and the inclusion of an additional noise component is also facilitated. A greedy forward stepwise search algorithm is provided for identifying the optimal model in terms of the number of components, the GPCM covariance parameterisation, and the subsets of gating/expert network covariates.

356

Cluster Analysis & Finite Mixture Models

movMF

Mixtures of von MisesFisher Distributions

Fit and simulate mixtures of von MisesFisher distributions.

357

Cluster Analysis & Finite Mixture Models

mritc

MRI Tissue Classification

Various methods for MRI tissue classification.

358

Cluster Analysis & Finite Mixture Models

NbClust

Determining the Best Number of Clusters in a Data Set

It provides 30 indexes for determining the optimal number of clusters in a data set and offers the best clustering scheme from different results to the user.

359

Cluster Analysis & Finite Mixture Models

nor1mix

Normal aka Gaussian (1d) Mixture Models (S3 Classes and Methods)

Onedimensional Normal (i.e. Gaussian) Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used MarronWand densities. Efficient random number generation and graphics. Fitting to data by efficient ML (Maximum Likelihood) or traditional EM estimation.

360

Cluster Analysis & Finite Mixture Models

optpart

Optimal Partitioning of Similarity Relations

Contains a set of algorithms for creating partitions and coverings of objects largely based on operations on (dis)similarity relations (or matrices). There are several iterative reassignment algorithms optimizing different goodnessofclustering criteria. In addition, there are covering algorithms ‘clique’ which derives maximal cliques, and ‘maxpact’ which creates a covering of maximally compact sets. Graphical analyses and conversion routines are also included.

361

Cluster Analysis & Finite Mixture Models

ORIClust

Orderrestricted Information Criterionbased Clustering Algorithm

ORIClust is a userfriendly Rbased software package for gene clustering. Clusters are given by genes matched to prespecified profiles across various ordered treatment groups. It is particularly useful for analyzing data obtained from short timecourse or doseresponse microarray experiments.

362

Cluster Analysis & Finite Mixture Models

pdfCluster

Cluster Analysis via Nonparametric Density Estimation

Cluster analysis via nonparametric density estimation is performed. Operationally, the kernel method is used throughout to estimate the density. Diagnostics methods for evaluating the quality of the clustering are available. The package includes also a routine to estimate the probability density function obtained by the kernel method, given a set of data with arbitrary dimensions.

363

Cluster Analysis & Finite Mixture Models

pmclust

Parallel ModelBased Clustering using ExpectationGatheringMaximization Algorithm for Finite Mixture Gaussian Model

Aims to utilize modelbased clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs ‘pbdMPI’ to perform a expectationgatheringmaximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through ‘pbdMPI’ and MPI’ implementations such as ‘OpenMPI’ and ‘MPICH’. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.

364

Cluster Analysis & Finite Mixture Models

poLCA

Polytomous variable Latent Class Analysis

Latent class analysis and latent class regression models for polytomous outcome variables. Also known as latent structure analysis.

365

Cluster Analysis & Finite Mixture Models

prabclus

Functions for Clustering and Testing of PresenceAbsence, Abundance and Multilocus Genetic Data

Distancebased parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presenceabsence, abundance and multilocus genetical data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distancebased regressions are equal. Try package?prabclus for on overview.

366

Cluster Analysis & Finite Mixture Models

prcr

PersonCentered Analysis

Provides an easytouse yet adaptable set of tools to conduct personcenter analysis using a twostep clustering procedure. As described in Bergman and ElKhouri (1999) <doi:10.1002/(SICI)15214036(199910)41:6%3C753::AIDBIMJ753%3E3.0.CO;2K>, hierarchical clustering is performed to determine the initial partition for the subsequent kmeans clustering procedure.

367

Cluster Analysis & Finite Mixture Models

PReMiuM

Dirichlet Process Bayesian Clustering, Profile Regression

Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for postprocessing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. The main reference for the package is Liverani, Hastie, Azizi, Papathomas and Richardson (2015) <doi:10.18637/jss.v064.i07>.

368

Cluster Analysis & Finite Mixture Models

profdpm

Profile Dirichlet Process Mixtures

This package facilitates profile inference (inference at the posterior mode) for a class of product partition models (PPM). The Dirichlet process mixture is currently the only available member of this class. These methods search for the maximum posterior (MAP) estimate for the data partition in a PPM.

369

Cluster Analysis & Finite Mixture Models

protoclust

Hierarchical Clustering with Prototypes

Performs minimax linkage hierarchical clustering. Every cluster has an associated prototype element that represents that cluster as described in Bien, J., and Tibshirani, R. (2011), “Hierarchical Clustering with Prototypes via Minimax Linkage,” The Journal of the American Statistical Association, 106(495), 10751084.

370

Cluster Analysis & Finite Mixture Models

psychomix

Psychometric Mixture Models

Psychometric mixture models based on ‘flexmix’ infrastructure. At the moment Rasch mixture models with different parameterizations of the score distribution (saturated vs. mean/variance specification), BradleyTerry mixture models, and MPT mixture models are implemented. These mixture models can be estimated with or without concomitant variables. See vignette(‘raschmix’, package = ‘psychomix’) for details on the Rasch mixture models.

371

Cluster Analysis & Finite Mixture Models

pvclust

Hierarchical Clustering with PValues via Multiscale Bootstrap Resampling

An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) pvalue as well as BP (bootstrap probability) value for each cluster in a dendrogram.

372

Cluster Analysis & Finite Mixture Models

randomLCA

Random Effects Latent Class Analysis

Fits standard and random effects latent class models. The single level random effects model is described in Qu et al <doi:10.2307/2533043> and the two level random effects model in Beath and Heller <doi:10.1177/1471082X0800900302>. Examples are given for their use in diagnostic testing.

373

Cluster Analysis & Finite Mixture Models

rebmix

Finite Mixture Modeling, Clustering & Classification

R functions for random univariate and multivariate finite mixture model generation, estimation, clustering, latent class analysis and classification. Variables can be continuous, discrete, independent or dependent and may follow normal, lognormal, Weibull, gamma, binomial, Poisson, Dirac or circular von Mises parametric families.

374

Cluster Analysis & Finite Mixture Models

rjags

Bayesian Graphical Models using MCMC

Interface to the JAGS MCMC library.

375

Cluster Analysis & Finite Mixture Models

Rmixmod (core)

Classification with Mixture Modelling

Interface of ‘MIXMOD’ software for supervised, unsupervised and semisupervised classification with mixture modelling.

376

Cluster Analysis & Finite Mixture Models

RPMM

Recursively Partitioned Mixture Model

Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a modelbased clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.

377

Cluster Analysis & Finite Mixture Models

seriation

Infrastructure for Ordering Objects Using Seriation

Infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

378

Cluster Analysis & Finite Mixture Models

sigclust

Statistical Significance of Clustering

SigClust is a statistical method for testing the significance of clustering results. SigClust can be applied to assess the statistical significance of splitting a data set into two clusters. For more than two clusters, SigClust can be used iteratively.

379

Cluster Analysis & Finite Mixture Models

skmeans

Spherical kMeans Clustering

Algorithms to compute spherical kmeans partitions. Features several methods, including a genetic and a fixedpoint algorithm and an interface to the CLUTO vcluster program.

380

Cluster Analysis & Finite Mixture Models

som

SelfOrganizing Map

SelfOrganizing Map (with application in gene clustering).

381

Cluster Analysis & Finite Mixture Models

somspace

Spatial Analysis with SelfOrganizing Maps

Application of the SelfOrganizing Maps technique for spatial classification of time series. The package uses spatial data, point or gridded, to create clusters with similar characteristics. The clusters can be further refined to a smaller number of regions by hierarchical clustering and their spatial dependencies can be presented as complex networks. Thus, meaningful maps can be created, representing the regional heterogeneity of a single variable. More information and an example of implementation can be found in Markonis and Strnad (2019).

382

Cluster Analysis & Finite Mixture Models

tclust

Robust Trimmed Clustering

Provides functions for robust trimmed clustering. The methods are described in GarciaEscudero (2008) <doi:10.1214/07AOS515>, Fritz et al. (2012) <doi:10.18637/jss.v047.i12> and others.

383

Cluster Analysis & Finite Mixture Models

teigen

ModelBased Clustering and Classification with the Multivariate t Distribution

Fits mixtures of multivariate tdistributions (with eigendecomposed covariance structure) via the expectation conditionalmaximization algorithm under a clustering or classification paradigm.

384

Cluster Analysis & Finite Mixture Models

treeClust

Cluster Distances Through Trees

Create a measure of interpoint dissimilarity useful for clustering mixed data, and, optionally, perform the clustering.

385

Cluster Analysis & Finite Mixture Models

trimcluster

Cluster Analysis with Trimming

Trimmed kmeans clustering.

386

Cluster Analysis & Finite Mixture Models

VarSelLCM

Variable Selection for ModelBased Clustering of MixedType Data Set with Missing Values

Full model selection (detection of the relevant features and estimation of the number of clusters) for modelbased clustering (see reference here <doi:10.1007/s1122201696701>). Data to analyze can be continuous, categorical, integer or mixed. Moreover, missing values can occur and do not necessitate any preprocessing. Shiny application permits an easy interpretation of the results.

387

Databases with R

bigrquery

An Interface to Google’s ‘BigQuery’ ‘API’

Easily talk to Google’s ‘BigQuery’ database from R.

388

Databases with R

dbfaker

A Tool to Ensure the Validity of Database Writes

A tool to ensure the validity of database writes. It provides a set of utilities to analyze and type check the properties of data frames that are to be written to databases with SQL support.

389

Databases with R

DBI (core)

R Database Interface

A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.

390

Databases with R

DBItest

Testing ‘DBI’ Back Ends

A helper that tests ‘DBI’ back ends for conformity to the interface.

391

Databases with R

dbplyr

A ‘dplyr’ Back End for Databases

A ‘dplyr’ back end for databases that allows you to work with remote database tables as if they are inmemory data frames. Basic features works with any database that has a ‘DBI’ back end; more advanced features require ‘SQL’ translation to be provided by the package author.

392

Databases with R

elastic

General Purpose Interface to ‘Elasticsearch’

Connect to ‘Elasticsearch’, a ‘NoSQL’ database built on the ‘Java’ Virtual Machine. Interacts with the ‘Elasticsearch’ ‘HTTP’ API (<https://www.elastic.co/products/elasticsearch>), including functions for setting connection details to ‘Elasticsearch’ instances, loading bulk data, searching for documents with both ‘HTTP’ query variables and ‘JSON’ based body requests. In addition, ‘elastic’ provides functions for interacting with API’s for ‘indices’, documents, nodes, clusters, an interface to the cat API, and more.

393

Databases with R

filehashSQLite

Simple keyvalue database using SQLite

Simple keyvalue database using SQLite as the backend

394

Databases with R

implyr

R Interface for Apache Impala

‘SQL’ backend to ‘dplyr’ for Apache Impala, the massively parallel processing query engine for Apache ‘Hadoop’. Impala enables lowlatency ‘SQL’ queries on data stored in the ‘Hadoop’ Distributed File System ‘(HDFS)’, Apache ‘HBase’, Apache ‘Kudu’, Amazon Simple Storage Service ‘(S3)’, Microsoft Azure Data Lake Store ‘(ADLS)’, and Dell ‘EMC’ ‘Isilon’. See <https://impala.apache.org> for more information about Impala.

395

Databases with R

influxdbr

R Interface to InfluxDB

An R interface to the InfluxDB time series database <https://www.influxdata.com>. This package allows you to fetch and write time series data from/to an InfluxDB server. Additionally, handy wrappers for the Influx Query Language (IQL) to manage and explore a remote database are provided.

396

Databases with R

liteq

Lightweight Portable Message Queue Using ‘SQLite’

Temporary and permanent message queues for R. Built on top of ‘SQLite’ databases. ‘SQLite’ provides locking, and makes it possible to detect crashed consumers. Crashed jobs can be automatically marked as “failed”, or put in the queue again, potentially a limited number of times.

397

Databases with R

mongolite

Fast and Simple ‘MongoDB’ Client for R

Highperformance MongoDB client based on ‘mongocdriver’ and ‘jsonlite’. Includes support for aggregation, indexing, mapreduce, streaming, encryption, enterprise authentication, and GridFS. The online user manual provides an overview of the available methods in the package: <https://jeroen.github.io/mongolite/>.

398

Databases with R

odbc (core)

Connect to ODBC Compatible Databases (using the DBI Interface)

A DBIcompatible interface to ODBC databases.

399

Databases with R

ora

Convenient Tools for Working with Oracle Databases

Easytouse functions to explore Oracle databases and import data into R. User interface for the ROracle package.

400

Databases with R

pointblank

Validation of Local and Remote Data Tables

Validate data in data frames, ‘tibble’ objects, in ‘CSV’ and ‘TSV’ files, and in database tables (‘PostgreSQL’ and ‘MySQL’). Validation pipelines can be made using easilyreadable, consecutive validation steps and such pipelines allow for switching of the data table context. Upon execution of the validation plan, several reporting options are available. Userdefined thresholds for failure rates allow for the determination of appropriate reporting actions.

401

Databases with R

pool

Object Pooling

Enables the creation of object pools, which make it less computationally expensive to fetch a new object. Currently the only supported pooled objects are ‘DBI’ connections.

402

Databases with R

R4CouchDB

A R Convenience Layer for CouchDB 2.0

Provides a collection of functions for basic database and document management operations such as add, get, list access or delete. Every cdbFunction() gets and returns a list() containing the connection setup. Such a list can be generated by cdbIni().

403

Databases with R

RCassandra

R/Cassandra interface

This packages provides a direct interface (without the use of Java) to the most basic functionality of Apache Cassanda such as login, updates and queries.

404

Databases with R

RcppRedis

‘Rcpp’ Bindings for ‘Redis’ using the ‘hiredis’ Library

Connection to the ‘Redis’ key/value store using the Clanguage client library ‘hiredis’ (included as a fallback) with ‘MsgPack’ encoding provided via ‘RcppMsgPack’ headers.

405

Databases with R

redux

R Bindings to ‘hiredis’

A ‘hiredis’ wrapper that includes support for transactions, pipelining, blocking subscription, serialisation of all keys and values, ‘Redis’ error handling with R errors. Includes an automatically generated ‘R6’ interface to the full ‘hiredis’ ‘API’. Generated functions are faithful to the ‘hiredis’ documentation while attempting to match R’s argument semantics. Serialisation must be explicitly done by the user, but both binary and textmode serialisation is supported.

406

Databases with R

RGreenplum

Interface to ‘Greenplum’ Database

Fully ‘DBI’compliant interface to ‘Greenplum’ <https://greenplum.org/>, an opensource parallel database. This is an extension of the ‘RPostgres’ package <https://github.com/rdbi/RPostgres>.

407

Databases with R

RH2

DBI/RJDBC Interface to H2 Database

DBI/RJDBC interface to h2 database. h2 version 1.3.175 is included.

408

Databases with R

RJDBC

Provides Access to Databases Through the JDBC Interface

The RJDBC package is an implementation of R’s DBI interface using JDBC as a backend. This allows R to connect to any DBMS that has a JDBC driver.

409

Databases with R

RMariaDB

Database Interface and ‘MariaDB’ Driver

Implements a ‘DBI’compliant interface to ‘MariaDB’ (<https://mariadb.org/>) and ‘MySQL’ (<https://www.mysql.com/>) databases.

410

Databases with R

RMySQL

Database Interface and ‘MySQL’ Driver for R

Legacy ‘DBI’ interface to ‘MySQL’ / ‘MariaDB’ based on old code ported from SPLUS. A modern ‘MySQL’ client based on ‘Rcpp’ is available from the ‘RMariaDB’ package.

411

Databases with R

RODBC

ODBC Database Access

An ODBC database interface.

412

Databases with R

ROracle

OCI Based Oracle Database Interface for R

Oracle Database interface (DBI) driver for R. This is a DBIcompliant Oracle driver based on the OCI.

413

Databases with R

rpostgis

R Interface to a ‘PostGIS’ Database

Provides an interface between R and ‘PostGIS’enabled ‘PostgreSQL’ databases to transparently transfer spatial data. Both vector (points, lines, polygons) and raster data are supported in read and write modes. Also provides convenience functions to execute common procedures in ‘PostgreSQL/PostGIS’.

414

Databases with R

RPostgres

‘Rcpp’ Interface to ‘PostgreSQL’

Fully ‘DBI’compliant ‘Rcpp’backed interface to ‘PostgreSQL’ <https://www.postgresql.org/>, an opensource relational database.

415

Databases with R

RPostgreSQL

R Interface to the ‘PostgreSQL’ Database System

Database interface and ‘PostgreSQL’ driver for ‘R’. This package provides a Database Interface ‘DBI’ compliant driver for ‘R’ to access ‘PostgreSQL’ database systems. In order to build and install this package from source, ‘PostgreSQL’ itself must be present your system to provide ‘PostgreSQL’ functionality via its libraries and header files. These files are provided as ‘postgresqldevel’ package under some Linux distributions. On ‘macOS’ and ‘Microsoft Windows’ system the attached ‘libpq’ library source will be used.

416

Databases with R

RPresto

DBI Connector to Presto

Implements a ‘DBI’ compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: <https://prestodb.io/>.

417

Databases with R

RSQLite

‘SQLite’ Interface for R

Embeds the ‘SQLite’ database engine in R and provides an interface compliant with the ‘DBI’ package. The source for the ‘SQLite’ engine is included.

418

Databases with R

sqldf

Manipulate R Data Frames Using SQL

The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. ‘RSQLite’, ‘RH2’, ‘RMySQL’ and ‘RPostgreSQL’ backends are supported.

419

Databases with R

TScompare

‘TSdbi’ Database Comparison

Utilities for comparing the equality of series on two databases. Comprehensive examples of all the ’TS*‘packages is provided in the vignette Guide.pdf with the ’TSdata’ package.

420

Databases with R

uptasticsearch

Get Data Frame Representations of ‘Elasticsearch’ Results

‘Elasticsearch’ is an opensource, distributed, documentbased datastore (<https://www.elastic.co/products/elasticsearch>). It provides an ‘HTTP’ ‘API’ for querying the database and extracting datasets, but that ‘API’ was not designed for common data science workflows like pulling large batches of records and normalizing those documents into a data frame that can be used as a training dataset for statistical models. ‘uptasticsearch’ provides an interface for ‘Elasticsearch’ that is explicitly designed to make these data science workflows easy and fun.

421

Differential Equations

adaptivetau

TauLeaping Stochastic Simulation

Implements adaptive tau leaping to approximate the trajectory of a continuoustime stochastic process as described by Cao et al. (2007) The Journal of Chemical Physics <doi:10.1063/1.2745299> (aka. the Gillespie stochastic simulation algorithm). This package is based upon work supported by NSF DBI0906041 and NIH K99GM104158 to Philip Johnson and NIH R01AI049334 to Rustom Antia.

422

Differential Equations

bvpSolve (core)

Solvers for Boundary Value Problems of Differential Equations

Functions that solve boundary value problems (‘BVP’) of systems of ordinary differential equations (‘ODE’) and differential algebraic equations (‘DAE’). The functions provide an interface to the FORTRAN functions ‘twpbvpC’, ‘colnew/colsys’, and an Rimplementation of the shooting method.

423

Differential Equations

cOde

Automated C Code Generation for ‘deSolve’, ‘bvpSolve’

Generates all necessary C functions allowing the user to work with the compiledcode interface of ode() and bvptwp(). The implementation supports “forcings” and “events”. Also provides functions to symbolically compute Jacobians, sensitivity equations and adjoint sensitivities being the basis for sensitivity analysis.

424

Differential Equations

CollocInfer

Collocation Inference for Dynamic Systems

These functions implement collocationinference for continuoustime and discretetime stochastic processes. They provide modelbased smoothing, gradientmatching, generalized profiling and forwards prediction error methods.

425

Differential Equations

dde

Solve Delay Differential Equations

Solves ordinary and delay differential equations, where the objective function is written in either R or C. Suitable only for nonstiff equations, the solver uses a ‘DormandPrince’ method that allows interpolation of the solution at any point. This approach is as described by Hairer, Norsett and Wanner (1993) <ISBN:3540604529>. Support is also included for iterating difference equations.

426

Differential Equations

deSolve (core)

Solvers for Initial Value Problems of Differential Equations (‘ODE’, ‘DAE’, ‘DDE’)

Functions that solve initial value problems of a system of firstorder ordinary differential equations (‘ODE’), of partial differential equations (‘PDE’), of differential algebraic equations (‘DAE’), and of delay differential equations. The functions provide an interface to the FORTRAN functions ‘lsoda’, ‘lsodar’, ‘lsode’, ‘lsodes’ of the ‘ODEPACK’ collection, to the FORTRAN functions ‘dvode’, ‘zvode’ and ‘daspk’ and a Cimplementation of solvers of the ‘RungeKutta’ family with fixed or variable time steps. The package contains routines designed for solving ‘ODEs’ resulting from 1D, 2D and 3D partial differential equations (‘PDE’) that have been converted to ‘ODEs’ by numerical differencing.

427

Differential Equations

deTestSet

Test Set for Differential Equations

Solvers and test set for stiff and nonstiff differential equations, and differential algebraic equations.

428

Differential Equations

diffeqr

Solving Differential Equations (ODEs, SDEs, DDEs, DAEs)

An interface to ‘DifferentialEquations.jl’ <http://docs.juliadiffeq.org/latest/> from the R programming language. It has unique high performance methods for solving ordinary differential equations (ODE), stochastic differential equations (SDE), delay differential equations (DDE), differentialalgebraic equations (DAE), and more. Much of the functionality, including features like adaptive time stepping in SDEs, are unique and allow for multiple orders of magnitude speedup over more common methods. ‘diffeqr’ attaches an R interface onto the package, allowing seamless use of this tooling by R users.

429

Differential Equations

dMod

Dynamic Modeling and Parameter Estimation in ODE Models

The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives.

430

Differential Equations

ecolMod

“A practical guide to ecological modelling  using R as a simulation platform”

Figures, data sets and examples from the book “A practical guide to ecological modelling  using R as a simulation platform” by Karline Soetaert and Peter MJ Herman (2009). Springer. All figures from chapter x can be generated by “demo(chapx)”, where x = 1 to 11. The Rscripts of the model examples discussed in the book are in subdirectory “examples”, ordered per chapter. Solutions to model projects are in the same subdirectories.

431

Differential Equations

FME

A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis

Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package ‘deSolve’, or a steadystate solver from package ‘rootSolve’. However, the methods can also be used with other types of functions.

432

Differential Equations

GillespieSSA

Gillespie’s Stochastic Simulation Algorithm (SSA)

Provides a simple to use, intuitive, and extensible interface to several stochastic simulation algorithms for generating simulated trajectories of finite population continuoustime model. Currently it implements Gillespie’s exact stochastic simulation algorithm (Direct method) and several approximate methods (Explicit tauleap, Binomial tauleap, and Optimized tauleap). The package also contains a library of template models that can be run as demo models and can easily be customized and extended. Currently the following models are included, ‘DecayingDimerization’ reaction set, linear chain system, logistic growth model, ‘Lotka’ predatorprey model, RosenzweigMacArthur predatorprey model, ‘KermackMcKendrick’ SIR model, and a ‘metapopulation’ SIRS model. PinedaKrch et al. (2008) <doi:10.18637/jss.v025.i12>.

433

Differential Equations

mkin

Kinetic Evaluation of Chemical Degradation Data

Calculation routines based on the FOCUS Kinetics Report (2006, 2014). Includes a function for conveniently defining differential equation models, model solution based on eigenvalues if possible or using numerical solvers and a choice of the optimisation methods made available by the ‘FME’ package. If a C compiler (on windows: ‘Rtools’) is installed, differential equation models are solved using compiled C functions. Please note that no warranty is implied for correctness of results or fitness for a particular purpose.

434

Differential Equations

nlmeODE

Nonlinear mixedeffects modelling in nlme using differential equations

This package combines the odesolve and nlme packages for mixedeffects modelling using differential equations.

435

Differential Equations

odeintr

C++ ODE Solvers Compiled onDemand

Wraps the Boost odeint library for integration of differential equations.

436

Differential Equations

PBSddesolve

Solver for Delay Differential Equations

Routines for solving systems of delay differential equations by interfacing numerical routines written by Simon N. Wood , with contributions by Benjamin J. Cairns. These numerical routines first appeared in Simon Wood’s ‘solv95’ program. This package includes a vignette and a complete user’s guide. ‘PBSddesolve’ originally appeared on CRAN under the name ‘ddesolve’. That version is no longer supported. The current name emphasizes a close association with other PBS packages, particularly ‘PBSmodelling’.

437

Differential Equations

PBSmodelling

GUI Tools Made Easy: Interact with Models and Explore Data

Provides software to facilitate the design, testing, and operation of computer models. It focuses particularly on tools that make it easy to construct and edit a customized graphical user interface (‘GUI’). Although our simplified ‘GUI’ language depends heavily on the R interface to the ‘Tcl/Tk’ package, a user does not need to know ‘Tcl/Tk’. Examples illustrate models built with other R packages, including ‘PBSmapping’, ‘PBSddesolve’, and ‘BRugs’. A complete user’s guide ‘PBSmodellingUG.pdf’ shows how to use this package effectively.

438

Differential Equations

phaseR

Phase Plane Analysis of One and Two Dimensional Autonomous ODE Systems

Performs a qualitative analysis of one and two dimensional autonomous ODE systems, using phase plane methods. Programs are available to identify and classify equilibrium points, plot the direction field, and plot trajectories for multiple initial conditions. In the one dimensional case, a program is also available to plot the phase portrait. Whilst in the two dimensional case, programs are additionally available to plot nullclines and stable/unstable manifolds of saddle points. Many example systems are provided for the user.

439

Differential Equations

pomp

Statistical Inference for Partially Observed Markov Processes

Tools for data analysis with partially observed Markov process (POMP) models (also known as stochastic dynamical systems, hidden Markov models, and nonlinear, nonGaussian, statespace models). The package provides facilities for implementing POMP models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. It is also a versatile platform for implementation of inference methods for general POMP models.

440

Differential Equations

pracma

Practical Numerical Math Functions

Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some wellknown special mathematical functions. Uses ‘MATLAB’ function names where appropriate to simplify porting.

441

Differential Equations

primer

Functions and data for A Primer of Ecology with R

Functions are primarily functions for systems of ordinary differential equations, difference equations, and eigenanalysis and projection of demographic matrices; data are for examples.

442

Differential Equations

QPot

QuasiPotential Analysis for Stochastic Differential Equations

Tools to 1) simulate and visualize stochastic differential equations and 2) determine stability of equilibria using the orderedupwind method to compute the quasipotential.

443

Differential Equations

ReacTran

Reactive Transport Modelling in 1d, 2d and 3d

Routines for developing models that describe reaction and advectivediffusive transport in one, two or three dimensions. Includes transport routines in porous media, in estuaries, and in bodies with variable shape.

444

Differential Equations

rODE

Ordinary Differential Equation (ODE) Solvers Written in R Using S4 Classes

Show physics, math and engineering students how an ODE solver is made and how effective R classes can be for the construction of the equations that describe natural phenomena. Inspiration for this work comes from the book on “Computer Simulations in Physics” by Harvey Gould, Jan Tobochnik, and Wolfgang Christian. Book link: <http://www.compadre.org/osp/items/detail.cfm?ID=7375>.

445

Differential Equations

rodeo

A Code Generator for ODEBased Models

Provides an R6 class and several utility methods to facilitate the implementation of models based on ordinary differential equations. The heart of the package is a code generator that creates compiled ‘Fortran’ (or ‘R’) code which can be passed to a numerical solver. There is direct support for solvers contained in packages ‘deSolve’ and ‘rootSolve’.

446

Differential Equations

rootSolve (core)

Nonlinear Root Finding, Equilibrium and SteadyState Analysis of Ordinary Differential Equations

Routines to find the root of nonlinear functions, and to perform steadystate and equilibrium analysis of ordinary differential equations (ODE). Includes routines that: (1) generate gradient and jacobian matrices (full and banded), (2) find roots of nonlinear equations by the ‘NewtonRaphson’ method, (3) estimate steadystate conditions of a system of (differential) equations in full, banded or sparse form, using the ‘NewtonRaphson’ method, or by dynamically running, (4) solve the steadystate conditions for uniand multicomponent 1D, 2D, and 3D partial differential equations, that have been converted to ordinary differential equations by numerical differencing (using the methodoflines approach). Includes fortran code.

447

Differential Equations

scaRabee

Optimization Toolkit for PharmacokineticPharmacodynamic Models

scaRabee is a port of the Scarabee toolkit originally written as a Matlabbased application. It provides a framework for simulation and optimization of pharmacokineticpharmacodynamic models at the individual and population level. It is built on top of the neldermead package, which provides the direct search algorithm proposed by Nelder and Mead for model optimization.

448

Differential Equations

sde (core)

Simulation and Inference for Stochastic Differential Equations

Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 9780387758381, Springer, NY.

449

Differential Equations

Sim.DiffProc

Simulation of Diffusion Processes

It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1562523422.

450

Differential Equations

simecol

Simulation of Ecological (and Other) Dynamic Systems

An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individualbased (or agentbased) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and reusability of code.

451

Differential Equations

sundialr

An Interface to ‘SUNDIALS’ Ordinary Differential Equation (ODE) Solvers

Provides a way to call the functions in ‘SUNDIALS’ C ODE solving library (<https://computation.llnl.gov/projects/sundials>). Currently the serial version of ODE solver, ‘CVODE’ and sensitivity calculator ‘CVODES’ from the ‘SUNDIALS’ library are implemented. The package requires ODE to be written as an ‘R’ or ‘Rcpp’ function and does not require the ‘SUNDIALS’ library to be installed on the local machine.

452

Probability Distributions

actuar (core)

Actuarial Functions and Heavy Tailed Distributions

Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss amounts and loss frequency: 19 continuous heavy tailed distributions; the Poissoninverse Gaussian discrete distribution; zerotruncated and zeromodified extensions of the standard discrete distributions. Support for phasetype distributions commonly used to compute ruin probabilities.

453

Probability Distributions

AdMit

Adaptive Mixture of Studentt Distributions

Provides functions to perform the fitting of an adaptive mixture of Studentt distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the MetropolisHastings algorithm to obtain quantities of interest for the target density itself.

454

Probability Distributions

agricolae

Statistical Procedures for Agricultural Research

Original idea was presented in the thesis “A statistical analysis tool for agricultural research” to obtain the degree of Master on science, National Engineering University (UNI), LimaPeru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, GraecoLatin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several nonparametric tests comparison, biodiversity indexes and consensus cluster.

455

Probability Distributions

ald

The Asymmetric Laplace Distribution

It provides the density, distribution function, quantile function, random number generator, likelihood function, moments and Maximum Likelihood estimators for a given sample, all this for the three parameter Asymmetric Laplace Distribution defined in Koenker and Machado (1999). This is a special case of the skewed family of distributions available in Galarza et.al. (2017) <doi:10.1002/sta4.140> useful for quantile regression.

456

Probability Distributions

AtelieR

A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests

A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (CentralLimit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).

457

Probability Distributions

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

458

Probability Distributions

benchden

28 benchmark densities from Berlinet/Devroye (1994)

Full implementation of the 28 distributions introduced as benchmarks for nonparametric density estimation by Berlinet and Devroye (1994). Includes densities, cdfs, quantile functions and generators for samples as well as additional information on features of the densities. Also contains the 4 histogram densities used in Rozenholc/Mildenberger/Gather (2010).

459

Probability Distributions

BiasedUrn

Biased Urn Model Distributions

Statistical models of biased sampling in the form of univariate and multivariate noncentral hypergeometric distributions, including Wallenius’ noncentral hypergeometric distribution and Fisher’s noncentral hypergeometric distribution (also called extended hypergeometric distribution). See vignette(“UrnTheory”) for explanation of these distributions.

460

Probability Distributions

bivariate

Bivariate Probability Distributions

Contains convenience functions for constructing, plotting and evaluating bivariate probability distributions, including their probability mass functions, probability density functions and cumulative distribution functions. Supports uniform (discrete and continuous), binomial, Poisson, categorical, normal, bimodal and Dirichlet (trivariate) distributions, and kernel smoothing and empirical cumulative distribution functions.

461

Probability Distributions

Bivariate.Pareto

Bivariate Pareto Models

Perform competing risks analysis under bivariate Pareto models. See Shih et al. (2018) <doi:10.1080/03610926.2018.1425450> for details.

462

Probability Distributions

BivarP

Estimating the Parameters of Some Bivariate Distributions

Parameter estimation of bivariate distribution functions modeled as a Archimedean copula function. The input data may contain values from right censored. Used marginal distributions are twoparameter. Methods for density, distribution, survival, random sample generation.

463

Probability Distributions

bivgeom

Roy’s Bivariate Geometric Distribution

Implements Roy’s bivariate geometric model (Roy (1993) <doi:10.1006/jmva.1993.1065>): joint probability mass function, distribution function, survival function, random generation, parameter estimation, and more.

464

Probability Distributions

bmixture

Bayesian Estimation for Finite Mixture of Distributions

Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and tdistributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s0018001203233> and Mohammadi and SalehiRad (2012) <doi:10.1080/03610918.2011.588358>.

465

Probability Distributions

BMT

The BMT Distribution

Density, distribution, quantile function, random number generation for the BMT (BezierMontenegroTorres) distribution. TorresJimenez C.J. and MontenegroDiaz A.M. (2017) <arXiv:1709.05534>. Moments, descriptive measures and parameter conversion for different parameterizations of the BMT distribution. Fit of the BMT distribution to noncensored data by maximum likelihood, moment matching, quantile matching, maximum goodnessoffit, also known as minimum distance, maximum product of spacing, also called maximum spacing, and minimum quantile distance, which can also be called maximum quantile goodnessoffit. Fit of univariate distributions for noncensored data using maximum product of spacing estimation and minimum quantile distance estimation is also included.

466

Probability Distributions

bridgedist

An Implementation of the Bridge Distribution with LogitLink as in Wang and Louis (2003)

An implementation of the bridge distribution with logitlink in R. In Wang and Louis (2003) <doi:10.1093/biomet/90.4.765>, such a univariate bridge distribution was derived as the distribution of the random intercept that ‘bridged’ a marginal logistic regression and a conditional logistic regression. The conditional and marginal regression coefficients are a scalar multiple of each other. Such is not the case if the random intercept distribution was Gaussian.

467

Probability Distributions

cbinom

Continuous Analog of a Binomial Distribution

Implementation of the d/p/q/r family of functions for a continuous analog to the standard discrete binomial with continuous size parameter and continuous support with x in [0, size + 1], following Ilienko (2013) <arXiv:1303.5990>.

468

Probability Distributions

CDVine

Statistical Inference of C And DVine Copulas

Functions for statistical inference of canonical vine (Cvine) and Dvine copulas. Tools for bivariate exploratory data analysis and for bivariate as well as vine copula selection are provided. Models can be estimated either sequentially or by joint maximum likelihood estimation. Sampling algorithms and plotting methods are also included. Data is assumed to lie in the unit hypercube (socalled copula data).

469

Probability Distributions

CircStats

Circular Statistics, from “Topics in Circular Statistics” (2001)

Circular Statistics, from “Topics in Circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.

470

Probability Distributions

circular

Circular Statistics

Circular Statistics, from “Topics in circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.

471

Probability Distributions

cmvnorm

The Complex Multivariate Gaussian Distribution

Various utilities for the complex multivariate Gaussian distribution.

472

Probability Distributions

coga

Convolution of Gamma Distributions

Evaluation for density and distribution function of convolution of gamma distributions in R. Two related exact methods and one approximate method are implemented with efficient algorithm and C++ code. A quick guide for choosing correct method and usage of this package is given in package vignette.

473

Probability Distributions

CompGLM

ConwayMaxwellPoisson GLM and Distribution Functions

A function (which uses a similar interface to the ‘glm’ function) for the fitting of a ConwayMaxwellPoisson GLM. There are also various methods for analysis of the model fit. The package also contains functions for the ConwayMaxwellPoisson distribution in a similar interface to functions ‘dpois’, ‘ppois’ and ‘rpois’. The functions are generally quick, since the workhorse functions are written in C++ (thanks to the Rcpp package).

474

Probability Distributions

CompLognormal

Functions for actuarial scientists

Computes the probability density function, cumulative distribution function, quantile function, random numbers of any composite model based on the lognormal distribution.

475

Probability Distributions

compoisson

ConwayMaxwellPoisson Distribution

Provides routines for density and moments of the ConwayMaxwellPoisson distribution as well as functions for fitting the COMPoisson model for over/underdispersed count data.

476

Probability Distributions

Compositional

Compositional Data Analysis

Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. The standard textbook for such data is John Aitchison’s (1986) “The statistical analysis of compositional data”. Relevant papers include a) Tsagris M.T., Preston S. and Wood A.T.A. (2011) A databased power transformation for compositional data. Fourth International International Workshop on Compositional Data Analysis. b) Tsagris M. (2014). The kNN algorithm for compositional data: a revised approach with and without zero values present. Journal of Data Science, 12(3):519534. c) Tsagris M. (2015). Regression analysis with compositional data containing zero values. Chilean Journal of Statistics, 6(2):4757. d) Tsagris M., Preston S. and Wood A.T.A. (2016). Improved supervised classification for compositional data using the alphatransformation. Journal of Classification, 33(2):243261. <doi:10.1007/s0035701692075>. e) Tsagris M., Preston S. and Wood A.T.A. (2017). Nonparametric hypothesis testing for equality of means on the simplex. Journal of Statistical Computation and Simulation, 87(2): 406422. <doi:10.1080/00949655.2016.1216554> f) Tsagris M. and Stewart C. (2018). A Dirichlet regression model for compositional data with zeros. Lobachevskii Journal of Mathematics, 39(3): 398412. <doi:10.1134/S1995080218030198>. g) Alenazi A. (2019). Regression for compositional data with compositional data as predictor variables with or without zero values. Journal of Data Science, 17(1): 219238. <doi:10.6339/JDS.201901_17(1).0010> Furher, we include functions for percentages (or proportions).

477

Probability Distributions

compositions

Compositional Data Analysis

Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. PawlowskyGlahn.

478

Probability Distributions

Compounding

Computing Continuous Distributions

Computing Continuous Distributions Obtained by Compounding a Continuous and a Discrete Distribution

479

Probability Distributions

CompQuadForm

Distribution Function of Quadratic Forms in Normal Variables

Computes the distribution function of quadratic forms in normal variables using Imhof’s method, Davies’s algorithm, Farebrother’s algorithm or Liu et al.’s algorithm.

480

Probability Distributions

condMVNorm

Conditional Multivariate Normal Distribution

Computes conditional multivariate normal probabilities, random deviates and densities.

481

Probability Distributions

copBasic

General Bivariate Copula Theory and Many Utility Functions

Extensive functions for bivariate copula (bicopula) computations and related operations for bicopula theory. The lower, upper, product, and select other bicopula are implemented along with operations including the diagonal, survival copula, dual of a copula, cocopula, and numerical bicopula density. Level sets, horizontal and vertical sections are supported. Numerical derivatives and inverses of a bicopula are provided through which simulation is implemented. Bicopula composition, convex combination, and products also are provided. Support extends to the Kendall Function as well as the Lmoments thereof. Kendall Tau, Spearman Rho and Footrule, Gini Gamma, Blomqvist Beta, Hoeffding Phi, Schweizer Wolff Sigma, tail dependency, tail order, skewness, and bivariate Lmoments are implemented, and positive/negative quadrant dependency, left (right) increasing (decreasing) are available. Other features include KullbackLeibler divergence, Vuong procedure, spectral measure, and Lcomoments for inference, maximum likelihood, and AIC, BIC, and RMSE for goodnessoffit.

482

Probability Distributions

copula (core)

Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extremevalue and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extremevalue dependence, goodnessoffit) and model selection based on crossvalidation. Empirical copula, smoothed versions, and nonparametric estimators of the Pickands dependence function.

483

Probability Distributions

csn

Closed SkewNormal Distribution

Provides functions for computing the density and the loglikelihood function of closedskew normal variates, and for generating random vectors sampled from this distribution. See GonzalezFarias, G., DominguezMolina, J., and Gupta, A. (2004). The closed skew normal distribution, Skewelliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 2542.

484

Probability Distributions

Davies

The Davies Quantile Function

Various utilities for the Davies distribution.

485

Probability Distributions

degreenet

Models for Skewed Count Distributions Relevant to Networks

Likelihoodbased inference for skewed count distributions used in network modeling. “degreenet” is a part of the “statnet” suite of packages for network analysis.

486

Probability Distributions

Delaporte

Statistical Functions for the Delaporte Distribution

Provides probability mass, distribution, quantile, randomvariate generation, and methodofmoments parameterestimation functions for the Delaporte distribution. The Delaporte is a discrete probability distribution which can be considered the convolution of a negative binomial distribution with a Poisson distribution. Alternatively, it can be considered a counting distribution with both Poisson and negative binomial components. It has been studied in actuarial science as a frequency distribution which has more variability than the Poisson, but less than the negative binomial.

487

Probability Distributions

dirmult

Estimation in DirichletMultinomial distribution

Estimate parameters in DirichletMultinomial and compute profile loglikelihoods.

488

Probability Distributions

disclap

Discrete Laplace Exponential Family

Discrete Laplace exponential family for models such as a generalized linear model

489

Probability Distributions

DiscreteInverseWeibull

Discrete Inverse Weibull Distribution

Probability mass function, distribution function, quantile function, random generation and parameter estimation for the discrete inverse Weibull distribution.

490

Probability Distributions

DiscreteLaplace

Discrete Laplace Distributions

Probability mass function, distribution function, quantile function, random generation and estimation for the skew discrete Laplace distributions.

491

Probability Distributions

DiscreteWeibull

Discrete Weibull Distributions (Type 1 and 3)

Probability mass function, distribution function, quantile function, random generation and parameter estimation for the type I and III discrete Weibull distributions.

492

Probability Distributions

distcrete

Discrete Distribution Approximations

Creates discretised versions of continuous distribution functions by mapping continuous values to an underlying discrete grid, based on a (uniform) frequency of discretisation, a valid discretisation point, and an integration range. For a review of discretisation methods, see Chakraborty (2015) <doi:10.1186/s4048801500286>.

493

Probability Distributions

distr (core)

Object Oriented Implementation of Distributions

S4classes and methods for distributions.

494

Probability Distributions

distrDoc

Documentation for ‘distr’ Family of R Packages

Provides documentation in form of a common vignette to packages ‘distr’, ‘distrEx’, ‘distrMod’, ‘distrSim’, ‘distrTEst’, ‘distrTeach’, and ‘distrEllipse’.

495

Probability Distributions

distrEllipse

S4 Classes for Elliptically Contoured Distributions

Distribution (S4)classes for elliptically contoured distributions (based on package ‘distr’).

496

Probability Distributions

distrEx

Extensions of Package ‘distr’

Extends package ‘distr’ by functionals, distances, and conditional distributions.

497

Probability Distributions

DistributionUtils

Distribution Utilities

Utilities are provided which are of use in the packages I have developed for dealing with distributions. Currently these packages are GeneralizedHyperbolic, VarianceGamma, and SkewHyperbolic and NormalLaplace. Each of these packages requires DistributionUtils. Functionality includes sample skewness and kurtosis, loghistogram, tail plots, moments by integration, changing the point about which a moment is calculated, functions for testing distributions using inversion tests and the Massart inequality. Also includes an implementation of the incomplete Bessel K function.

498

Probability Distributions

distrMod

Object Oriented Implementation of Probability Models

Implements S4 classes for probability models based on packages ‘distr’ and ‘distrEx’.

499

Probability Distributions

distrSim

Simulation Classes Based on Package ‘distr’

S4classes for setting up a coherent framework for simulation within the distr family of packages.

500

Probability Distributions

distrTeach

Extensions of Package ‘distr’ for Teaching Stochastics/Statistics in Secondary School

Provides flexible examples of LLN and CLT for teaching purposes in secondary school.

501

Probability Distributions

distrTEst

Estimation and Testing Classes Based on Package ‘distr’

Evaluation (S4)classes based on package distr for evaluating procedures (estimators/tests) at data/simulation in a unified way.

502

Probability Distributions

dng

Distributions and Gradients

Provides density, distribution function, quantile function and random generation for the split normal and splitt distributions, and computes their mean, variance, skewness and kurtosis for the two distributions (Li, F, Villani, M. and Kohn, R. (2010) <doi:10.1016/j.jspi.2010.04.031>).

503

Probability Distributions

dqrng

Fast Pseudo Random Number Generators

Several fast random number generators are provided as C++ header only libraries: The PCG family by O’Neill (2014 <https://www.cs.hmc.edu/tr/hmccs20140905.pdf>) as well as Xoroshiro128+ and Xoshiro256+ by Blackman and Vigna (2018 <arXiv:1805.01407>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+ and Xoshiro256+ as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011 <doi:10.1145/2063384.2063405>) as provided by the package ‘sitmo’.

504

Probability Distributions

e1071

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

505

Probability Distributions

ecd

Elliptic Lambda Distribution and Option Pricing Model

Elliptic lambda distribution and lambda option pricing model have been evolved into a framework of stablelaw inspired distributions, such as the extended stable lambda distribution for asset return, stable count distribution for volatility, and LihnLaplace process as a leptokurtic extension of Wiener process. This package contains functions for the computation of density, probability, quantile, random variable, fitting procedures, option prices, volatility smile. It also comes with sample financial data, and plotting routines.

506

Probability Distributions

Emcdf

Computation and Visualization of Empirical Joint Distribution (Empirical Joint CDF)

Computes and visualizes empirical joint distribution of multivariate data with optimized algorithms and multithread computation. There is a faster algorithm using dynamic programming to compute the whole empirical joint distribution of a bivariate data. There are optimized algorithms for computing empirical joint CDF function values for other multivariate data. Visualization is focused on bivariate data. Levelplots and wireframes are included.

507

Probability Distributions

emdbook

Support Functions and Data for “Ecological Models and Data”

Auxiliary functions and data sets for “Ecological Models and Data”, a book presenting maximum likelihood estimation and related topics for ecologists (ISBN 9780691125220).

508

Probability Distributions

emg

Exponentially Modified Gaussian (EMG) Distribution

Provides basic distribution functions for a mixture model of a Gaussian and exponential distribution.

509

Probability Distributions

empichar

Evaluates the Empirical Characteristic Function for Multivariate Samples

Evaluates the empirical characteristic function of univariate and multivariate samples. This package uses ‘RcppArmadillo’ for fast evaluation. It is also possible to export the code to be used in other packages at ‘C++’ level.

510

Probability Distributions

EnvStats

Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous builtin data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 9781461484554, <http://www.springer.com/book/9781461484554>).

511

Probability Distributions

evd

Functions for Extreme Value Distributions

Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.

512

Probability Distributions

evdbayes

Bayesian Analysis in Extreme Value Theory

Provides functions for the bayesian analysis of extreme value models, using MCMC methods.

513

Probability Distributions

evir

Extreme Values in R

Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.

514

Probability Distributions

evmix

Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

The usual distribution functions, maximum likelihood inference and model diagnostics for univariate stationary extreme value mixture models are provided. Kernel density estimation including various boundary corrected kernel density estimation methods and a wide choice of kernels, with crossvalidation likelihood based bandwidth estimator. Reasonable consistency with the base functions in the ‘evd’ package is provided, so that users can safely interchange most code.

515

Probability Distributions

extraDistr

Additional Univariate and Multivariate Distributions

Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, betabinomial, betanegative binomial, beta prime, Bhattacharjee, BirnbaumSaunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichletmultinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gammaPoisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, halfCauchy, halfnormal, halft, Huber density, inverse chisquared, inversegamma, Kumaraswamy, Laplace, locationscale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, nonstandard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zeroinflated binomial, zeroinflated negative binomial, zeroinflated Poisson.

516

Probability Distributions

extremefit

Estimation of Extreme Conditional Quantiles and Probabilities

Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.

517

Probability Distributions

FAdist

Distributions that are Sometimes Used in Hydrology

Probability distributions that are sometimes useful in hydrology.

518

Probability Distributions

FatTailsR

Kiener Distributions and Fat Tails in Finance

Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in market finance, neuroscience and other disciplines. Two algorithms to estimate with a high accuracy distribution parameters, quantiles, valueatrisk and expected shortfall. Include power hyperbolas and power hyperbolic functions.

519

Probability Distributions

fBasics

Rmetrics  Markets and Basic Statistics

Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.

520

Probability Distributions

fCopulae (core)

Rmetrics  Bivariate Dependence Structures with Copulae

Provides a collection of functions to manage, to investigate and to analyze bivariate financial returns by Copulae. Included are the families of Archemedean, Elliptical, Extreme Value, and Empirical Copulae.

521

Probability Distributions

fExtremes

Rmetrics  Modelling Extreme Events in Finance

Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data preprocessing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.

522

Probability Distributions

fgac

Generalized Archimedean Copula

Bivariate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).

523

Probability Distributions

fitdistrplus

Help to Fit of a Parametric Distribution to NonCensored or Censored Data

Extends the fitdistr() function (of the MASS package) with several functions to help the fit of a parametric distribution to noncensored or censored data. Censored data may contain left censored, right censored and interval censored values, with several lower and upper bounds. In addition to maximum likelihood estimation (MLE), the package provides moment matching (MME), quantile matching (QME) and maximum goodnessoffit estimation (MGE) methods (available only for noncensored data). Weighted versions of MLE, MME and QME are available. See e.g. Casella & Berger (2002). Statistical inference. Pacific Grove.

524

Probability Distributions

fitteR

Fit Hundreds of Theoretical Distributions to Empirical Data

Systematic fit of hundreds of theoretical univariate distributions to empirical data via maximum likelihood estimation. Fits are reported and summarized by a data.frame, a csv file or a ‘shiny’ app (here with additional features like visual representation of fits). All output formats provide assessment of goodnessoffit by the following methods: KolmogorovSmirnov test, ShapiroWilks test, AndersonDarling test.

525

Probability Distributions

flexsurv

Flexible Parametric Survival and MultiState Models

Flexible parametric models for timetoevent data, including the RoystonParmar spline model, generalized gamma and generalized F distributions. Any userdefined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multistate models.

526

Probability Distributions

FMStable

Finite Moment Stable Distributions

This package implements some basic procedures for dealing with log maximally skew stable distributions, which are also called finite moment log stable distributions.

527

Probability Distributions

fpow

Computing the noncentrality parameter of the noncentral F distribution

Returns the noncentrality parameter of the noncentral F distribution if probability of type I and type II error, degrees of freedom of the numerator and the denominator are given. It may be useful for computing minimal detectable differences for general ANOVA models. This program is documented in the paper of A. Baharev, S. Kemeny, On the computation of the noncentral F and noncentral beta distribution; Statistics and Computing, 2008, 18 (3), 333340.

528

Probability Distributions

frmqa

The Generalized Hyperbolic Distribution, Related Distributions and Their Applications in Finance

A collection of R and C++ functions to work with the generalized hyperbolic distribution, related distributions and their applications in financial risk management and quantitative analysis.

529

Probability Distributions

fromo

Fast Robust Moments

Fast, numerically robust computation of weighted moments via ‘Rcpp’. Supports computation on vectors and matrices, and Monoidal append of moments. Moments and cumulants over running fixed length windows can be computed, as well as over timebased windows. Moment computations are via a generalization of Welford’s method, as described by Bennett et. (2009) <doi:10.1109/CLUSTR.2009.5289161>.

530

Probability Distributions

gambin

Fit the Gambin Model to Species Abundance Distributions

Fits unimodal and multimodal gambin distributions to speciesabundance distributions from ecological data, as in in Matthews et al. (2014) <doi:10.1111/ecog.00861>. ‘gambin’ is short for ‘gammabinomial’. The main function is fit_abundances(), which estimates the ‘alpha’ parameter(s) of the gambin distribution using maximum likelihood. Functions are also provided to generate the gambin distribution and for calculating likelihood statistics.

531

Probability Distributions

gamlss.dist (core)

Distributions for Generalized Additive Models for Location Scale and Shape

A set of distributions which can be used for modelling the response variables in Generalized Additive Models for Location Scale and Shape, Rigby and Stasinopoulos (2005), <doi:10.1111/j.14679876.2005.00510.x>. The distributions can be continuous, discrete or mixed distributions. Extra distributions can be created, by transforming, any continuous distribution defined on the real line, to a distribution defined on ranges 0 to infinity or 0 to 1, by using a ”log” or a ”logit’ transformation respectively.

532

Probability Distributions

gamlss.mx

Fitting Mixture Distributions with GAMLSS

The main purpose of this package is to allow fitting of mixture distributions with GAMLSS models.

533

Probability Distributions

gaussDiff

Difference measures for multivariate Gaussian probability density functions

A collection difference measures for multivariate Gaussian probability density functions, such as the Euclidea mean, the Mahalanobis distance, the KullbackLeibler divergence, the JCoefficient, the Minkowski L2distance, the Chisquare divergence and the Hellinger Coefficient.

534

Probability Distributions

gb

Generalize Lambda Distribution and Generalized Bootstrapping

A collection of algorithms and functions for fitting data to a generalized lambda distribution via moment matching methods, and generalized bootstrapping.

535

Probability Distributions

GB2

Generalized Beta Distribution of the Second Kind: Properties, Likelihood, Estimation

Package GB2 explores the Generalized Beta distribution of the second kind. Density, cumulative distribution function, quantiles and moments of the distributions are given. Functions for the full loglikelihood, the profile loglikelihood and the scores are provided. Formulas for various indicators of inequality and poverty under the GB2 are implemented. The GB2 is fitted by the methods of maximum pseudolikelihood estimation using the full and profile loglikelihood, and nonlinear least squares estimation of the model parameters. Various plots for the visualization and analysis of the results are provided. Variance estimation of the parameters is provided for the method of maximum pseudolikelihood estimation. A mixture distribution based on the compounding property of the GB2 is presented (denoted as “compound” in the documentation). This mixture distribution is based on the discretization of the distribution of the underlying random scale parameter. The discretization can be left or right tail. Density, cumulative distribution function, moments and quantiles for the mixture distribution are provided. The compound mixture distribution is fitted using the method of maximum pseudolikelihood estimation. The fit can also incorporate the use of auxiliary information. In this new version of the package, the mixture case is complemented with new functions for variance estimation by linearization and comparative density plots.

536

Probability Distributions

GenBinomApps

ClopperPearson Confidence Interval and Generalized Binomial Distribution

Density, distribution function, quantile function and random generation for the Generalized Binomial Distribution. Functions to compute the ClopperPearson Confidence Interval and the required sample size. Enhanced model for burnin studies, where failures are tackled by countermeasures.

537

Probability Distributions

gendist

Generated Probability Distribution Models

Computes the probability density function (pdf), cumulative distribution function (cdf), quantile function (qf) and generates random values (rg) for the following general models : mixture models, composite models, folded models, skewed symmetric models and arc tan models.

538

Probability Distributions

GeneralizedHyperbolic

The Generalized Hyperbolic Distribution

Functions for the hyperbolic and related distributions. Density, distribution and quantile functions and random number generation are provided for the hyperbolic distribution, the generalized hyperbolic distribution, the generalized inverse Gaussian distribution and the skewLaplace distribution. Additional functionality is provided for the hyperbolic distribution, normal inverse Gaussian distribution and generalized inverse Gaussian distribution, including fitting of these distributions to data. Linear models with hyperbolic errors may be fitted using hyperblmFit.

539

Probability Distributions

GenOrd

Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions

A gaussian copula based procedure for generating samples from discrete random variables with prescribed correlation matrix and marginal distributions.

540

Probability Distributions

geoR

Analysis of Geostatistical Data

Geostatistical analysis including traditional, likelihoodbased and Bayesian methods.

541

Probability Distributions

ghyp

A Package on Generalized Hyperbolic Distribution and Its Special Cases

Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Studentt and Gaussian distribution). Especially, it contains fitting procedures, an AICbased model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution.

542

Probability Distributions

GIGrvg

Random Variate Generator for the GIG Distribution

Generator and density function for the Generalized Inverse Gaussian (GIG) distribution.

543

Probability Distributions

gk

gandk and gandh Distribution Functions

Functions for the gandk and generalised gandh distributions.

544

Probability Distributions

gld

Estimation and Use of the Generalised (Tukey) Lambda Distribution

The generalised lambda distribution, or Tukey lambda distribution, provides a wide variety of shapes with one functional form. This package provides random numbers, quantiles, probabilities, densities and density quantiles for four different types of the distribution; see documentation for details. It provides the density function, distribution function, and QuantileQuantile plots. It implements a variety of estimation methods for the distribution, including diagnostic plots. Estimation methods include the starship (all 4 types), method of LMoments for the GPD and FKML types, and a number of methods for only the FKML parameterisation. These include maximum likelihood, maximum product of spacings, Titterington’s method, Moments, Trimmed LMoments and Distributional Least Absolutes.

545

Probability Distributions

GLDEX

Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods

The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KSresample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.

546

Probability Distributions

glogis

Fitting and Testing Generalized Logistic Distributions

Tools for the generalized logistic distribution (Type I, also known as skewlogistic distribution), encompassing basic distribution functions (p, q, d, r, score), maximum likelihood estimation, and structural change methods.

547

Probability Distributions

greybox

Toolbox for Model Building and Forecasting

Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with nonstandard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.

548

Probability Distributions

GSM

Gamma Shape Mixture

Implementation of a Bayesian approach for estimating a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavytailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter.

549

Probability Distributions

gumbel

The GumbelHougaard Copula

Provides probability functions (cumulative distribution and density functions), simulation function (Gumbel copula multivariate simulation) and estimation functions (Maximum Likelihood Estimation, Inference For Margins, Moment Based Estimation and Canonical Maximum Likelihood).

550

Probability Distributions

HAC

Estimation, Simulation and Visualization of Hierarchical Archimedean Copulae (HAC)

Package provides the estimation of the structure and the parameters, sampling methods and structural plots of Hierarchical Archimedean Copulae (HAC).

551

Probability Distributions

hermite

Generalized Hermite Distribution

Probability functions and other utilities for the generalized Hermite distribution.

552

Probability Distributions

HI

Simulation from distributions supported by nested hyperplanes

Simulation from distributions supported by nested hyperplanes, using the algorithm described in Petris & Tardella, “A geometric approach to transdimensional Markov chain Monte Carlo”, Canadian Journal of Statistics, v.31, n.4, (2003). Also random direction multivariate Adaptive Rejection Metropolis Sampling.

553

Probability Distributions

HistogramTools

Utility Functions for R Histograms

Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.

554

Probability Distributions

hyper2

The Hyperdirichlet Distribution, Mark 2

A suite of routines for the hyperdirichlet distribution; supersedes the ‘hyperdirichlet’ package.

555

Probability Distributions

HyperbolicDist

The hyperbolic distribution

This package provides functions for the hyperbolic and related distributions. Density, distribution and quantile functions and random number generation are provided for the hyperbolic distribution, the generalized hyperbolic distribution, the generalized inverse Gaussian distribution and the skewLaplace distribution. Additional functionality is provided for the hyperbolic distribution, including fitting of the hyperbolic to data.

556

Probability Distributions

ihs

Inverse Hyperbolic Sine Distribution

Density, distribution function, quantile function and random generation for the inverse hyperbolic sine distribution. This package also provides a function that can fit data to the inverse hyperbolic sine distribution using maximum likelihood estimation.

557

Probability Distributions

kdist

KDistribution and Weibull Paper

Density, distribution function, quantile function and random generation for the Kdistribution. A plotting function that plots data on Weibull paper and another function to draw additional lines. See results from package in T LamontSmith (2018), submitted J. R. Stat. Soc.

558

Probability Distributions

kernelboot

Smoothed Bootstrap and Random Generation from Kernel Densities

Smoothed bootstrap and functions for random generation from univariate and multivariate kernel densities. It does not estimate kernel densities.

559

Probability Distributions

kolmim

An Improved Evaluation of Kolmogorov’s Distribution

Provides an alternative, more efficient evaluation of extreme probabilities of Kolmogorov’s goodnessoffit measure, Dn, when compared to the original implementation of Wang, Marsaglia, and Tsang. These probabilities are used in KolmogorovSmirnov tests when comparing two samples.

560

Probability Distributions

KScorrect

LillieforsCorrected KolmogorovSmirnov GoodnessofFit Tests

Implements the Lillieforscorrected KolmogorovSmirnov test for use in goodnessoffit tests, suitable when population parameters are unknown and must be estimated by sample statistics. Pvalues are estimated by simulation. Can be used with a variety of continuous distributions, including normal, lognormal, univariate mixtures of normals, uniform, loguniform, exponential, gamma, and Weibull distributions. Functions to generate random numbers and calculate density, distribution, and quantile functions are provided for use with the log uniform and mixture distributions.

561

Probability Distributions

LambertW

Probabilistic Models to Analyze and Gaussianize HeavyTailed, Skewed Data

Lambert W x F distributions are a generalized framework to analyze skewed, heavytailed data. It is based on an input/output system, where the output random variable (RV) Y is a nonlinearly transformed version of an input RV X ~ F with similar properties as X, but slightly skewed (heavytailed). The transformed RV Y has a Lambert W x F distribution. This package contains functions to model and analyze skewed, heavytailed data the Lambert Way: simulate random samples, estimate parameters, compute quantiles, and plot/ print results nicely. Probably the most important function is ‘Gaussianize’, which works similarly to ‘scale’, but actually makes the data Gaussian. A doityourself toolkit allows users to define their own Lambert W x ‘MyFavoriteDistribution’ and use it in their analysis right away.

562

Probability Distributions

LaplacesDemon

Complete Environment for Bayesian Inference

Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview). The README describes the history of the package development process.

563

Probability Distributions

LearnBayes

Functions for Learning Bayesian Inference

A collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.

564

Probability Distributions

lhs

Latin Hypercube Samples

Provides a number of methods for creating and augmenting Latin Hypercube Samples.

565

Probability Distributions

LIHNPSD

Poisson Subordinated Distribution

A Poisson Subordinated Distribution to capture major leptokurtic features in logreturn time series of financial data.

566

Probability Distributions

llogistic

The LLogistic Distribution

Density, distribution function, quantile function and random generation for the LLogistic distribution with parameters m and phi. The parameter m is the median of the distribution.

567

Probability Distributions

lmom

LMoments

Functions related to Lmoments: computation of Lmoments and trimmed Lmoments of distributions and data samples; parameter estimation; Lmoment ratio diagram; plot vs. quantiles of an extremevalue distribution.

568

Probability Distributions

lmomco (core)

LMoments, Censored LMoments, Trimmed LMoments, LComoments, and Many Distributions

Extensive functions for Lmoments (LMs) and probabilityweighted moments (PWMs), parameter estimation for distributions, LM computation for distributions, and Lmoment ratio diagrams. Maximum likelihood and maximum product of spacings estimation are also available. LMs for righttail and lefttail censoring by known or unknown threshold and by indicator variable are available. Asymmetric (asy) trimmed LMs (TLmoments, TLMs) are supported. LMs of residual (resid) and reversed (rev) resid life are implemented along with 13 quantile function operators for reliability and survival analyses. Exact analytical bootstrap estimates of order statistics, LMs, and variances covariances of LMs are provided. The HarriCoble Tau34squared Normality Test is available. Distribution support with “L” (LMs), “TL” (TLMs) and added (+) support for righttail censoring (RC) encompasses: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], EtaMu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L,TL], Gen Logistic [L), Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], KappaMu [L], Kumaraswamy [L], Laplace [L], Linear Mean Resid. Quantile Function [L], Normal [L], 3p logNormal [L], Pearson Type III [L], Rayleigh [L], RevGumbel [L+RC], Rice/Rician [L], Slash [TL], 3p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L]. Multivariate sample Lcomoments (LCMs) are implemented to measure asymmetric associations.

569

Probability Distributions

Lmoments

LMoments and Quantile Mixtures

Contains functions to estimate Lmoments and trimmed Lmoments from the data. Also contains functions to estimate the parameters of the normal polynomial quantile mixture and the Cauchy polynomial quantile mixture from Lmoments and trimmed Lmoments.

570

Probability Distributions

logitnorm

Functions for the Logitnormal Distribution

Density, distribution, quantile and random generation function for the logitnormal distribution. Estimation of the mode and the first two moments. Estimation of distribution parameters.

571

Probability Distributions

loglognorm

Double log normal distribution functions

r,d,p,q functions for the double log normal distribution

572

Probability Distributions

marg

Approximate Marginal Inference for RegressionScale Models

Likelihood inference based on higher order approximations for linear nonnormal regression models.

573

Probability Distributions

MASS

Support Functions and Datasets for Venables and Ripley’s MASS

Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).

574

Probability Distributions

matrixNormal

The Matrix Normal Distribution

Computes densities, probabilities, and random deviates of the Matrix Normal (Iranmanesh et al. (2010) <doi:10.7508/ijmsi.2010.02.004>). Also includes simple but useful matrix functions. See the vignette for more information.

575

Probability Distributions

matrixsampling

Simulations of Matrix Variate Distributions

Provides samplers for various matrix variate distributions: Wishart, inverseWishart, normal, t, invertedt, Beta type I and Beta type II. Allows to simulate the noncentral Wishart distribution without the integer restriction on the degrees of freedom.

576

Probability Distributions

mbbefd

Maxwell Boltzmann Bose Einstein Fermi Dirac Distribution and Destruction Rate Modelling

Distributions that are typically used for exposure rating in general insurance, in particular to price reinsurance contracts. The vignettes show code snippets to fit the distribution to empirical data.

577

Probability Distributions

MBSP

Multivariate Bayesian Model with Shrinkage Priors

Implements a sparse Bayesian multivariate linear regression model using shrinkage priors from the three parameter beta normal family. The method is described in Bai and Ghosh (2018) <arXiv:1711.07635>.

578

Probability Distributions

mc2d

Tools for TwoDimensional MonteCarlo Simulations

A complete framework to build and study TwoDimensional MonteCarlo simulations, aka SecondOrder MonteCarlo simulations. Also includes various distributions (pert, triangular, Bernoulli, empirical discrete and continuous).

579

Probability Distributions

mclust

Gaussian Mixture Modelling for ModelBased Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for modelbased clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resamplingbased inference.

580

Probability Distributions

MCMCpack

Markov Chain Monte Carlo (MCMC) Package

Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudorandom number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.

581

Probability Distributions

mded

Measuring the Difference Between Two Empirical Distributions

Provides a function for measuring the difference between two independent or nonindependent empirical distributions and returning a significance level of the difference.

582

Probability Distributions

MEPDF

Creation of Empirical Density Functions Based on Multivariate Data

Based on the input data an ndimensional cube with sub cells of user specified side length is created. The number of sample points which fall in each sub cube is counted, and with the cell volume and overall sample size an empirical probability can be computed. A number of cubes of higher resolution can be superimposed. The basic method stems from J.L. Bentley in “Multidimensional Divide and Conquer”. J. L. Bentley (1980) <doi:10.1145/358841.358850>. Furthermore a simple kernel density estimation method is made available, as well as an expansion of Bentleys method, which offers a kernel approach for the grid method.

583

Probability Distributions

mgpd

mgpd: Functions for multivariate generalized Pareto distribution (MGPD of Type II)

Extends distribution and density functions to parametric multivariate generalized Pareto distributions (MGPD of Type II), and provides fitting functions which calculate maximum likelihood estimates for bivariate and trivariate models. (Help is under progress)

584

Probability Distributions

minimax

Minimax distribution family

The minimax family of distributions is a twoparameter family like the beta family, but computationally a lot more tractible.

585

Probability Distributions

MitISEM

Mixture of Student t Distributions using Importance Sampling and Expectation Maximization

Flexible multivariate function approximation using adapted Mixture of Student t Distributions. Mixture of t distribution is obtained using Importance Sampling weighted Expectation Maximization algorithm.

586

Probability Distributions

MittagLeffleR

MittagLeffler Family of Distributions

Implements the MittagLeffler function, distribution, random variate generation, and estimation. Based on the LaplaceInversion algorithm by Garrappa, R. (2015) <doi:10.1137/140971191>.

587

Probability Distributions

MixedTS

Mixed Tempered Stable Distribution

We provide detailed functions for univariate Mixed Tempered Stable distribution.

588

Probability Distributions

mixtools

Tools for Analyzing Finite Mixture Models

Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixturesofregressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictordependent mixing proportions, random effects regressions, hierarchical mixturesofexperts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic and model selection criteria). Bayesian estimation of mixturesoflinearregressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES0518772.

589

Probability Distributions

MM

The Multiplicative Multinomial Distribution

Various utilities for the Multiplicative Multinomial distribution.

590

Probability Distributions

mnormpow

Multivariate Normal Distributions with Power Integrand

Computes integral of f(x)*x_i^k on a product of intervals, where f is the density of a gaussian law. This a is small alteration of the mnormt code from A. Genz and A. Azzalini.

591

Probability Distributions

mnormt (core)

The Multivariate Normal and t Distributions

Functions are provided for computing the density and the distribution function of multivariate normal and “t” random variables, and for generating random vectors sampled from these distributions. Probabilities are computed via nonMonte Carlo methods; different routines are used in the case d=1, d=2, d>2, if d denotes the number of dimensions.

592

Probability Distributions

modeest

Mode Estimation

Provides estimators of the mode of univariate data or univariate distributions.

593

Probability Distributions

moments

Moments, cumulants, skewness, kurtosis and related tests

Functions to calculate: moments, Pearson’s kurtosis, Geary’s kurtosis and skewness; tests related to them (AnscombeGlynn, D’Agostino, BonettSeier).

594

Probability Distributions

movMF

Mixtures of von MisesFisher Distributions

Fit and simulate mixtures of von MisesFisher distributions.

595

Probability Distributions

msm

MultiState Markov and Hidden Markov Models in Continuous Time

Functions for fitting continuoustime Markov and hidden Markov multistate models to longitudinal data. Designed for processes observed at arbitrary times in continuous time (panel data) but some other observation schemes are supported. Both Markov transition rates and the hidden Markov output process can be modelled in terms of covariates, which may be constant or piecewiseconstant in time.

596

Probability Distributions

MultiRNG

Multivariate PseudoRandom Number Generation

Pseudorandom number generation for 11 multivariate distributions: Normal, t, Uniform, Bernoulli, Hypergeometric, Beta (Dirichlet), Multinomial, DirichletMultinomial, Laplace, Wishart, and Inverted Wishart. The details of the method are explained in Demirtas (2004) <doi:10.22237/jmasm/1099268340>.

597

Probability Distributions

mvprpb

Orthant Probability of the Multivariate Normal Distribution

Computes orthant probabilities multivariate normal distribution.

598

Probability Distributions

mvrtn

Mean and Variance of Truncated Normal Distribution

Mean, variance, and random variates for left/right truncated normal distributions.

599

Probability Distributions

mvtnorm (core)

Multivariate Normal and t Distributions

Computes multivariate normal and t probabilities, quantiles, random deviates and densities.

600

Probability Distributions

nCDunnett

Noncentral Dunnett’s Test Distribution

Computes the noncentral Dunnett’s test distribution (pdf, cdf and quantile) and generates random numbers.

601

Probability Distributions

nCopula

Hierarchical Archimedean Copulas Constructed with Multivariate Compound Distributions

Construct and manipulate hierarchical Archimedean copulas with multivariate compound distributions. The model used is the one of Cossette et al. (2017) <doi:10.1016/j.insmatheco.2017.06.001>.

602

Probability Distributions

Newdistns

Computes Pdf, Cdf, Quantile and Random Numbers, Measures of Inference for 19 General Families of Distributions

Computes the probability density function, cumulative distribution function, quantile function, random numbers and measures of inference for the following general families of distributions (each family defined in terms of an arbitrary cdf G): Marshall Olkin G distributions, exponentiated G distributions, beta G distributions, gamma G distributions, Kumaraswamy G distributions, generalized beta G distributions, beta extended G distributions, gamma G distributions, gamma uniform G distributions, beta exponential G distributions, Weibull G distributions, log gamma G I distributions, log gamma G II distributions, exponentiated generalized G distributions, exponentiated Kumaraswamy G distributions, geometric exponential Poisson G distributions, truncatedexponential skewsymmetric G distributions, modified beta G distributions, and exponentiated exponential Poisson G distributions.

603

Probability Distributions

nor1mix

Normal aka Gaussian (1d) Mixture Models (S3 Classes and Methods)

Onedimensional Normal (i.e. Gaussian) Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used MarronWand densities. Efficient random number generation and graphics. Fitting to data by efficient ML (Maximum Likelihood) or traditional EM estimation.

604

Probability Distributions

NormalGamma

Normalgamma convolution model

The functions proposed in this package compute the density of the sum of a Gaussian and a gamma random variables, estimate the parameters and correct the noise effect in a gammasignal and Gaussiannoise model. This package has been used to implement the background correction method for Illumina microarray data presented in Plancade S., Rozenholc Y. and Lund E. “Generalization of the normalexponential model : exploration of a more accurate parameterization for the signal distribution on Illumina BeadArrays”, BMC Bioinfo 2012, 13(329).

605

Probability Distributions

NormalLaplace

The Normal Laplace Distribution

Functions for the normal Laplace distribution. The package is under development and provides only limited functionality. Density, distribution and quantile functions, random number generation, and moments are provided.

606

Probability Distributions

normalp

Routines for Exponential Power Distribution

Collection of utilities referred to Exponential Power distribution, also known as General Error Distribution (see Mineo, A.M. and Ruggieri, M. (2005), A software Tool for the Exponential Power Distribution: The normalp package. In Journal of Statistical Software, Vol. 12, Issue 4).

607

Probability Distributions

ORDER2PARENT

Estimate parent distributions with data of several order statistics

This package uses Bspline based nonparametric smooth estimators to estimate parent distributions given observations on multiple order statistics.

608

Probability Distributions

OrdNor

Concurrent Generation of Ordinal and Normal Data with Given Correlation Matrix and Marginal Distributions

Implementation of a procedure for generating samples from a mixed distribution of ordinal and normal random variables with prespecified correlation matrix and marginal distributions.

609

Probability Distributions

ParetoPosStable

Computing, Fitting and Validating the PPS Distribution

Statistical functions to describe a Pareto Positive Stable (PPS) distribution and fit it to real data. Graphical and statistical tools to validate the fits are included.

610

Probability Distributions

pbv

Probabilities for Bivariate Normal Distribution

Computes probabilities of the bivariate normal distribution in a vectorized R function (Drezner & Wesolowsky, 1990, <doi:10.1080/00949659008811236>).

611

Probability Distributions

PDQutils

PDQ Functions via Gram Charlier, Edgeworth, and Cornish Fisher Approximations

A collection of tools for approximating the ‘PDQ’ functions (respectively, the cumulative distribution, density, and quantile) of probability distributions via classical expansions involving moments and cumulants.

612

Probability Distributions

PearsonDS (core)

Pearson Distribution System

Implementation of the Pearson distribution system, including full support for the (d,p,q,r)family of functions for probability distributions and fitting via method of moments and maximum likelihood method.

613

Probability Distributions

PhaseType

Inference for Phasetype Distributions

Functions to perform Bayesian inference on absorption time data for Phasetype distributions. Plans to expand this to include frequentist inference and simulation tools.

614

Probability Distributions

pmultinom

OneSided Multinomial Probabilities

Implements multinomial CDF (P(N1<=n1, …, Nk<=nk)) and tail probabilities (P(N1>n1, …, Nk>nk)), as well as probabilities with both constraints (P(l1<N1<=u1, …, lk<Nk<=uk)). Uses a method suggested by Bruce Levin (1981) <doi:10.1214/aos/1176345593>.

615

Probability Distributions

poibin

The Poisson Binomial Distribution

Implementation of both the exact and approximation methods for computing the cdf of the Poisson binomial distribution. It also provides the pmf, quantile function, and random number generation for the Poisson binomial distribution.

616

Probability Distributions

poilog

Poisson lognormal and bivariate Poisson lognormal distribution

Functions for obtaining the density, random deviates and maximum likelihood estimates of the Poisson lognormal distribution and the bivariate Poisson lognormal distribution.

617

Probability Distributions

poistweedie

PoissonTweedie exponential family models

Simulation of models PoissonTweedie.

618

Probability Distributions

polyaAeppli

Implementation of the PolyaAeppli distribution

Functions for evaluating the mass density, cumulative distribution function, quantile function and random variate generation for the PolyaAeppli distribution, also known as the geometric compound Poisson distribution.

619

Probability Distributions

poweRlaw

Analysis of Heavy Tailed Distributions

An implementation of maximum likelihood estimators for a variety of heavy tailed distributions, including both the discrete and continuous power law distributions. Additionally, a goodnessoffit based approach is used to estimate the lower cutoff for the scaling region.

620

Probability Distributions

probhat

Generalized Kernel Smoothing

Computes nonparametric probability distributions (probability density functions, cumulative distribution functions and quantile functions) using kernel smoothing. Supports univariate, multivariate and conditional distributions, and weighted data (possibly useful mixed with fuzzy clustering or frequency data). Also, supports empirical continuous cumulative distribution functions and their inverses, and random number generation.

621

Probability Distributions

qmap

Statistical Transformations for PostProcessing Climate Model Output

Empirical adjustment of the distribution of variables originating from (regional) climate model simulations using quantile mapping.

622

Probability Distributions

qqid

Generation and Support of QQIDs  A HumanCompatible Representation of 128bit Numbers

The string “bird.carp.7TsBWtwqtKAeCTNk8f” is a “QQID”: a representation of a 128bit number, constructed from two “cues” of short, common, English words, and Base64 encoded characters. The primary intended use of QQIDs is as random unique identifiers, e.g. database keys like the “UUIDs” defined in the RFC 4122 Internet standard. QQIDs can be identically interconverted with UUIDs, IPv6 addresses, MD5 hashes etc., and are suitable for a host of applications in which identifiers are read by humans. They are compact, can safely be transmitted in binary and text form, can be used as components of URLs, and it can be established at a glance whether two QQIDs are different or potentially identical. The qqid package contains functions to retrieve true, quantumrandom QQIDs, to generate pseudo random QQIDs, to validate them, and to interconvert them with other 128bit number representations.

623

Probability Distributions

qrandom

True Random Numbers using the ANU Quantum Random Numbers Server

The ANU Quantum Random Number Generator provided by the Australian National University generates true random numbers in realtime by measuring the quantum fluctuations of the vacuum. This package offers an interface using their API. The electromagnetic field of the vacuum exhibits random fluctuations in phase and amplitude at all frequencies. By carefully measuring these fluctuations, one is able to generate ultrahigh bandwidth random numbers. The quantum Random Number Generator is based on the papers by Symul et al., (2011) <doi:10.1063/1.3597793> and Haw, et al. (2015) <doi:10.1103/PhysRevApplied.3.054004>. The package offers functions to retrieve a sequence of random integers or hexadecimals and true random samples from a normal or uniform distribution.

624

Probability Distributions

QRM

Provides RLanguage Code to Examine Quantitative Risk Management Concepts

Accompanying package to the book Quantitative Risk Management: Concepts, Techniques and Tools by Alexander J. McNeil, Rudiger Frey, and Paul Embrechts.

625

Probability Distributions

qrmtools

Tools for Quantitative Risk Management

Functions and data sets for reproducing selected results from the book “Quantitative Risk Management: Concepts, Techniques and Tools”. Furthermore, new developments and auxiliary functions for Quantitative Risk Management practice.

626

Probability Distributions

qrng

(Randomized) QuasiRandom Number Generators

Functionality for generating (randomized) quasirandom numbers in high dimensions.

627

Probability Distributions

randaes

Random number generator based on AES cipher

The deterministic part of the Fortuna cryptographic pseudorandom number generator, described by Schneier & Ferguson “Practical Cryptography”

628

Probability Distributions

random

True Random Numbers using RANDOM.ORG

The true random number service provided by the RANDOM.ORG website created by Mads Haahr samples atmospheric noise via radio tuned to an unused broadcasting frequency together with a skew correction algorithm due to John von Neumann. More background is available in the included vignette based on an essay by Mads Haahr. In its current form, the package offers functions to retrieve random integers, randomized sequences and random strings.

629

Probability Distributions

randtoolbox

Toolbox for Pseudo and Quasi Random Number Generation and Random Generator Tests

Provides (1) pseudo random generators  general linear congruential generators, multiple recursive generators and generalized feedback shift register (SFMersenne Twister algorithm and WELL generators); (2) quasi random generators  the Torus algorithm, the Sobol sequence, the Halton sequence (including the Van der Corput sequence) and (3) some generator tests  the gap test, the serial test, the poker test. See e.g. Gentle (2003) <doi:10.1007/b97336>. The package can be provided without the rngWELL dependency on demand. Take a look at the Distribution task view of types and tests of random number generators. Version in Memoriam of Diethelm and Barbara Wuertz.

630

Probability Distributions

RDieHarder

R Interface to the ‘DieHarder’ RNG Test Suite

The ‘RDieHarder’ package provides an R interface to the ‘DieHarder’ suite of random number generators and tests that was developed by Robert G. Brown and David Bauer, extending earlier work by George Marsaglia and others. The ‘DieHarder’ library is included, but if a version is already installed it will be used instead.

631

Probability Distributions

ReIns

Functions from “Reinsurance: Actuarial and Statistical Aspects”

Functions from the book “Reinsurance: Actuarial and Statistical Aspects” (2017) by Hansjoerg Albrecher, Jan Beirlant and Jef Teugels <http://www.wiley.com/WileyCDA/WileyTitle/productCd0470772689.html>.

632

Probability Distributions

reliaR (core)

Package for some probability distributions

A collection of utilities for some reliability models/probability distributions.

633

Probability Distributions

Renext

Renewal Method for Extreme Values Extrapolation

Peaks Over Threshold (POT) or ‘methode du renouvellement’. The distribution for the exceedances can be chosen, and heterogeneous data (including historical data or block data) can be used in a MaximumLikelihood framework.

634

Probability Distributions

retimes

Reaction Time Analysis

Reaction time analysis by maximum likelihood

635

Probability Distributions

revdbayes

RatioofUniforms Sampling for Bayesian Extreme Value Analysis

Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.rproject.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.rproject.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the Kgaps model of Suveges and Davison (2010) <doi:10.1214/09AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.

636

Probability Distributions

rlecuyer

R Interface to RNG with Multiple Streams

Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L’Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.

637

Probability Distributions

RMKdiscrete

Sundry Discrete Probability Distributions

Sundry discrete probability distributions and helper functions.

638

Probability Distributions

RMTstat

Distributions, Statistics and Tests derived from Random Matrix Theory

Functions for working with the TracyWidom laws and other distributions related to the eigenvalues of large Wishart matrices. The tables for computing the TracyWidom densities and distribution functions were computed by Momar Dieng’s MATLAB package “RMLab” (formerly available on his homepage at http://math.arizona.edu/~momar/research.htm ). This package is part of a collaboration between Iain Johnstone, Zongming Ma, Patrick Perry, and Morteza Shahram. It will soon be replaced by a package with more accuracy and builtin support for relevant statistical tests.

639

Probability Distributions

rmutil

Utilities for Nonlinear Regression and Repeated Measurements Models

A toolkit of functions for nonlinear regression and repeated measurements not to be used by itself but called by other Lindsey packages such as ‘gnlm’, ‘stable’, ‘growth’, ‘repeated’, and ‘event’ (available at <http://www.commanster.eu/rcode.html>).

640

Probability Distributions

rngwell19937

Random number generator WELL19937a with 53 or 32 bit output

Long period linear random number generator WELL19937a by F. Panneton, P. L’Ecuyer and M. Matsumoto. The initialization algorithm allows to seed the generator with a numeric vector of an arbitrary length and uses MRG32k5a by P. L’Ecuyer to achieve good quality of the initialization. The output function may be set to provide numbers from the interval (0,1) with 53 (the default) or 32 random bits. WELL19937a is of similar type as Mersenne Twister and has the same period. WELL19937a is slightly slower than Mersenne Twister, but has better equidistribution and “bitmixing” properties and faster recovery from states with prevailing zeros than Mersenne Twister. All WELL generators with orders 512, 1024, 19937 and 44497 can be found in randtoolbox package.

641

Probability Distributions

rstream

Streams of Random Numbers

Unified object oriented interface for multiple independent streams of random numbers from different sources.

642

Probability Distributions

RTDE

Robust Tail Dependence Estimation

Robust tail dependence estimation for bivariate models. This package is based on two papers by the authors:‘Robust and biascorrected estimation of the coefficient of tail dependence’ and ‘Robust and biascorrected estimation of probabilities of extreme failure sets’. This work was supported by a research grant (VKR023480) from VILLUM FONDEN and an international project for scientific cooperation (PICS6416).

643

Probability Distributions

rtdists

Response Time Distributions

Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model (Ratcliff & McKoon, 2008, <doi:10.1162/neco.2008.1206420>) based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA; Brown & Heathcote, 2008, <doi:10.1016/j.cogpsych.2007.12.002>) with different distributions underlying the drift rate.

644

Probability Distributions

Runuran

R Interface to the ‘UNU.RAN’ Random Variate Generators

Interface to the ‘UNU.RAN’ library for Universal NonUniform RANdom variate generators. Thus it allows to build nonuniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.

645

Probability Distributions

rust

RatioofUniforms Simulation with Transformation

Uses the generalized ratioofuniforms (RU) method to simulate from univariate and (lowdimensional) multivariate continuous distributions. The user specifies the logdensity, up to an additive constant. The RU algorithm is applied after relocation of mode of the density to zero, and the user can choose a tuning parameter r. For details see Wakefield, Gelfand and Smith (1991) <doi:10.1007/BF01889987>, Efficient generation of random variates via the ratioofuniforms method, Statistics and Computing (1991) 1, 129133. A BoxCox variable transformation can be used to make the input density suitable for the RU method and to improve efficiency. In the multivariate case rotation of axes can also be used to improve efficiency. From version 1.2.0 the ‘Rcpp’ package <https://cran.rproject.org/package=Rcpp> can be used to improve efficiency.

646

Probability Distributions

s20x

Functions for University of Auckland Course STATS 201/208 Data Analysis

A set of functions used in teaching STATS 201/208 Data Analysis at the University of Auckland. The functions are designed to make parts of R more accessible to a large undergraduate population who are mostly not statistics majors.

647

Probability Distributions

sadists

Some Additional Distributions

Provides the density, distribution, quantile and generation functions of some obscure probability distributions, including the doubly non central t, F, Beta, and Eta distributions; the lambdaprime and Kprime; the upsilon distribution; the (weighted) sum of noncentral chisquares to a power; the (weighted) sum of log noncentral chisquares; the product of noncentral chisquares to powers; the product of doubly noncentral F variables; the product of independent normals.

648

Probability Distributions

SCI

Standardized Climate Indices Such as SPI, SRI or SPEI

Functions for generating Standardized Climate Indices (SCI). SCI is a transformation of (smoothed) climate (or environmental) time series that removes seasonality and forces the data to take values of the standard normal distribution. SCI was originally developed for precipitation. In this case it is known as the Standardized Precipitation Index (SPI).

649

Probability Distributions

setRNG

Set (Normal) Random Number Generator and Seed

SetRNG provides utilities to help set and record the setting of the seed and the uniform and normal generators used when a random experiment is run. The utilities can be used in other functions that do random experiments to simplify recording and/or setting all the necessary information for reproducibility. See the vignette and reference manual for examples.

650

Probability Distributions

sfsmisc

Utilities from ‘Seminar fuer Statistik’ ETH Zurich

Useful utilities [‘goodies’] from Seminar fuer Statistik ETH Zurich, some of which were ported from Splus in the 1990’s. For graphics, have pretty (Logscale) axes, an enhanced TukeyAnscombe plot, combining histogram and boxplot, 2dresidual plots, a ‘tachoPlot()’, pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides ’Sys.*()’ functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().

651

Probability Distributions

sgt

Skewed Generalized T Distribution Tree

Density, distribution function, quantile function and random generation for the skewed generalized t distribution. This package also provides a function that can fit data to the skewed generalized t distribution using maximum likelihood estimation.

652

Probability Distributions

skellam

Densities and Sampling for the Skellam Distribution

Functions for the Skellam distribution, including: density (pmf), cdf, quantiles and regression.

653

Probability Distributions

SkewHyperbolic

The Skew Hyperbolic Student tDistribution

Functions are provided for the density function, distribution function, quantiles and random number generation for the skew hyperbolic tdistribution. There are also functions that fit the distribution to data. There are functions for the mean, variance, skewness, kurtosis and mode of a given distribution and to calculate moments of any order about any centre. To assess goodness of fit, there are functions to generate a QQ plot, a PP plot and a tail plot.

654

Probability Distributions

skewt

The Skewed Studentt Distribution

Density, distribution function, quantile function and random generation for the skewed t distribution of Fernandez and Steel.

655

Probability Distributions

sld

Estimation and Use of the QuantileBased Skew Logistic Distribution

The skew logistic distribution is a quantiledefined generalisation of the logistic distribution (van Staden and King 2015). Provides random numbers, quantiles, probabilities, densities and density quantiles for the distribution. It provides QuantileQuantile plots and method of LMoments estimation (including asymptotic standard errors) for the distribution.

656

Probability Distributions

smoothmest

Smoothed Mestimators for 1dimensional location

Some Mestimators for 1dimensional location (Bisquare, ML for the Cauchy distribution, and the estimators from application of the smoothing principle introduced in Hampel, Hennig and Ronchetti (2011) to the above, the Huber Mestimator, and the median, main function is smoothm), and Pitman estimator.

657

Probability Distributions

SMR

Externally Studentized Midrange Distribution

Computes the studentized midrange distribution (pdf, cdf and quantile) and generates random numbers

658

Probability Distributions

sn

The SkewNormal and Related Distributions Such as the Skewt

Build and manipulate probability distributions of the skewnormal family and some related ones, notably the skewt family, and provide related statistical methods for data fitting and model diagnostics, in the univariate and the multivariate case.

659

Probability Distributions

sparseMVN

Multivariate Normal Functions for Sparse Covariance and Precision Matrices

Computes multivariate normal (MVN) densities, and samples from MVN distributions, when the covariance or precision matrix is sparse.

660

Probability Distributions

spd

Semi Parametric Distribution

The Semi Parametric Piecewise Distribution blends the Generalized Pareto Distribution for the tails with a kernel based interior.

661

Probability Distributions

stabledist

Stable Distribution Functions

Density, Probability and Quantile functions, and random number generation for (skew) stable distributions, using the parametrizations of Nolan.

662

Probability Distributions

STAR

Spike Train Analysis with R

Functions to analyze neuronal spike trains from a single neuron or from several neurons recorded simultaneously.

663

Probability Distributions

statmod

Statistical Modeling

A collection of algorithms and functions to aid statistical modeling. Includes limiting dilution analysis (aka ELDA), growth curve comparisons, mixed linear models, heteroscedastic regression, inverseGaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Also includes a number of advanced generalized linear model functions including new Tweedie and Digamma glm families and a secure convergence algorithm.

664

Probability Distributions

SuppDists

Supplementary Distributions

Ten distributions supplementing those built into R. Inverse Gauss, KruskalWallis, Kendall’s Tau, Friedman’s chi squared, Spearman’s rho, maximum F ratio, the Pearson product moment correlation coefficient, Johnson distributions, normal scores and generalized hypergeometric distributions. In addition two random number generators of George Marsaglia are included.

665

Probability Distributions

symmoments

Symbolic central and noncentral moments of the multivariate normal distribution

Symbolic central and noncentral moments of the multivariate normal distribution. Computes a standard representation, LateX code, and values at specified mean and covariance matrices.

666

Probability Distributions

TLMoments

Calculate TLMoments and Convert Them to Distribution Parameters

Calculates empirical TLmoments (trimmed Lmoments) of arbitrary order and trimming, and converts them to distribution parameters.

667

Probability Distributions

tmvmixnorm

Sampling from Truncated Multivariate Normal and t Distributions

Efficient sampling of truncated multivariate (scale) mixtures of normals under linear inequality constraints is nontrivial due to the analytically intractable normalizing constant. Meanwhile, traditional methods may subject to numerical issues, especially when the dimension is high and dependence is strong. Algorithms proposed by Li and Ghosh (2015) <doi:10.1080/15598608.2014.996690> are adopted for overcoming difficulties in simulating truncated distributions. Efficient rejection sampling for simulating truncated univariate normal distribution is included in the package, which shows superiority in terms of acceptance rate and numerical stability compared to existing methods and R packages. An efficient function for sampling from truncated multivariate normal distribution subject to convex polytope restriction regions based on Gibbs sampler for conditional truncated univariate distribution is provided. By extending the sampling method, a function for sampling truncated multivariate Student’s t distribution is also developed. Moreover, the proposed method and computation remain valid for high dimensional and strong dependence scenarios. Empirical results in Li and Ghosh (2015) <doi:10.1080/15598608.2014.996690> illustrated the superior performance in terms of various criteria (e.g. mixing and integrated autocorrelation time).

668

Probability Distributions

tmvtnorm

Truncated Multivariate Normal and Student t Distribution

Random number generation for the truncated multivariate normal and Student t distribution. Computes probabilities, quantiles and densities, including onedimensional and bivariate marginal densities. Computes first and second moments (i.e. mean and covariance matrix) for the doubletruncated multinormal case.

669

Probability Distributions

tolerance

Statistical Tolerance Intervals and Regions

Statistical tolerance limits provide the limits between which we can expect to find a specified proportion of a sampled population with a given level of confidence. This package provides functions for estimating tolerance limits (intervals) for various univariate distributions (binomial, Cauchy, discrete Pareto, exponential, twoparameter exponential, extreme value, hypergeometric, Laplace, logistic, negative binomial, negative hypergeometric, normal, Pareto, PoissonLindley, Poisson, uniform, and ZipfMandelbrot), Bayesian normal tolerance limits, multivariate normal tolerance regions, nonparametric tolerance intervals, tolerance bands for regression settings (linear regression, nonlinear regression, nonparametric regression, and multivariate regression), and analysis of variance tolerance intervals. Visualizations are also available for most of these settings.

670

Probability Distributions

trapezoid

The Trapezoidal Distribution

The trapezoid package provides dtrapezoid, ptrapezoid, qtrapezoid, and rtrapezoid functions for the trapezoidal distribution.

671

Probability Distributions

triangle

Provides the Standard Distribution Functions for the Triangle Distribution

Provides the “r, q, p, and d” distribution functions for the triangle distribution.

672

Probability Distributions

truncnorm

Truncated Normal Distribution

Density, probability, quantile and random number generation functions for the truncated normal distribution.

673

Probability Distributions

TSA

Time Series Analysis

Contains R functions and datasets detailed in the book “Time Series Analysis with Applications in R (second edition)” by Jonathan Cryer and KungSik Chan.

674

Probability Distributions

tsallisqexp

Tsallis qExp Distribution

Tsallis distribution also known as the qexponential family distribution. Provide distribution d, p, q, r functions, fitting and testing functions. Project initiated by Paul Higbie and based on Cosma Shalizi’s code.

675

Probability Distributions

TTmoment

Sampling and Calculating the First and Second Moments for the Doubly Truncated Multivariate t Distribution

Computing the first two moments of the truncated multivariate t (TMVT) distribution under the double truncation. Appling the slice sampling algorithm to generate random variates from the TMVT distribution.

676

Probability Distributions

tweedie

Evaluation of Tweedie Exponential Family Models

Maximum likelihood computations for Tweedie families, including the series expansion (Dunn and Smyth, 2005; <doi10.1007/s112220054070y>) and the Fourier inversion (Dunn and Smyth, 2008; <doi:10.1007/s1122200790396>), and related methods.

677

Probability Distributions

UnivRNG

Univariate PseudoRandom Number Generation

Pseudorandom number generation of 17 univariate distributions.

678

Probability Distributions

VarianceGamma

The Variance Gamma Distribution

Provides functions for the variance gamma distribution. Density, distribution and quantile functions. Functions for random number generation and fitting of the variance gamma to data. Also, functions for computing moments of the variance gamma distribution of any order about any location. In addition, there are functions for checking the validity of parameters and to interchange different sets of parameterizations for the variance gamma distribution.

679

Probability Distributions

VGAM (core)

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/9781493928187> gives details of the statistical framework and the package. Currently only fixedeffects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

680

Probability Distributions

VineCopula

Statistical Inference of Vine Copulas

Provides tools for the statistical analysis of vine copula models. The package includes tools for parameter estimation, model selection, simulation, goodnessoffit tests, and visualization. Tools for estimation, selection and exploratory data analysis of bivariate copula models are also provided.

681

Probability Distributions

vines

Multivariate Dependence Modeling with Vines

Implementation of the vine graphical model for building highdimensional probability distributions as a factorization of bivariate copulas and marginal density functions. This package provides S4 classes for vines (Cvines and Dvines) and methods for inference, goodnessoffit tests, density/distribution function evaluation, and simulation.

682

Probability Distributions

vistributions

Visualize Probability Distributions

Visualize and compute percentiles/probabilities of normal, t, f, chi square and binomial distributions.

683

Probability Distributions

visualize

Graph Probability Distributions with User Supplied Parameters and Statistics

Graphs the pdf or pmf and highlights what area or probability is present in user defined locations. Visualize is able to provide lower tail, bounded, upper tail, and two tail calculations. Supports strict and equal to inequalities. Also provided on the graph is the mean and variance of the distribution.

684

Probability Distributions

Wrapped

Computes Pdf, Cdf, Quantile, Random Numbers and Provides Estimation for any Univariate Wrapped Distributions

Computes the pdf, cdf, quantile, random numbers for any wrapped G distributions. Computes maximum likelihood estimates of the parameters, standard errors, 95 percent confidence intervals, value of Cramervon Misses statistic, value of Anderson Darling statistic, value of Kolmogorov Smirnov test statistic and its \(p\)value, value of Akaike Information Criterion, value of Consistent Akaike Information Criterion, value of Bayesian Information Criterion, value of HannanQuinn information criterion, minimum value of the negative loglikelihood function and convergence status when the wrapped distribution is fitted to some data.

685

Probability Distributions

zipfextR

Zipf Extended Distributions

Implementation of three extensions of the Zipf distribution: the MarshallOlkin Extended Zipf (MOEZipf) PerezCasany, M., & Casellas, A. (2013) <arXiv:1304.4540>, the ZipfPoisson Extreme (ZipfPE) and the ZipfPoisson Stopped Sum (ZipfPSS) distributions. In loglog scale, the two first extensions allow for topconcavity and topconvexity while the third one only allows for topconcavity. All the extensions maintain the linearity associated with the Zipf model in the tail.

686

Probability Distributions

zipfR

Statistical Models for Word Frequency Distributions

Statistical models and utilities for the analysis of word frequency distributions. The utilities include functions for loading, manipulating and visualizing word frequency data and vocabulary growth curves. The package also implements several statistical models for the distribution of word frequencies in a population. (The name of this package derives from the most famous word frequency distribution, Zipf’s law.)

687

Econometrics

AER (core)

Applied Econometrics with R

Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, SpringerVerlag, New York. ISBN 9780387773162. (See the vignette “AER” for a package overview.)

688

Econometrics

alpaca

Fit GLM’s with HighDimensional kWay Fixed Effects

Provides a routine to concentrate out factors with many levels during the optimization of the loglikelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm proposed by Stammann (2018) <arXiv:1707.01815> and is restricted to glm’s that are based on maximum likelihood estimation and nonlinear. It also offers an efficient algorithm to recover estimates of the fixed effects in a postestimation routine and includes robust and multiway clustered standard errors. Further the package provides an analytical biascorrection for binary choice models (logit and probit) derived by FernandezVal and Weidner (2016) <doi:10.1016/j.jeconom.2015.12.014>.

689

Econometrics

aod

Analysis of Overdispersed Data

Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).

690

Econometrics

apollo

Tools for Choice Model Estimation and Application

The Choice Modelling Centre (CMC) at the University of Leeds has developed flexible code for the estimation and application of choice models in R. Users are able to write their own model functions or use a mix of already available ones. Random heterogeneity, both continuous and discrete and at the level of individuals and choices, can be incorporated for all models. There is support for both standalone models and hybrid model structures. Both classical and Bayesian estimation is available, and multiple discrete continuous models are covered in addition to discrete choice. Multithreading processing is supported for estimation and a large number of pre and postestimation routines, including for computing posterior (individuallevel) distributions are available. For examples, a manual, and a support forum, visit www.ApolloChoiceModelling.com. For more information on choice models see Train, K. (2009) <isbn:9780521747387> and Hess, S. & Daly, A.J. (2014) <isbn:9781781003145> for an overview of the field.

691

Econometrics

apt

Asymmetric Price Transmission

Asymmetric price transmission between two time series is assessed. Several functions are available for linear and nonlinear threshold cointegration, and furthermore, symmetric and asymmetric error correction model. A graphical user interface is also included for major functions included in the package, so users can also use these functions in a more intuitive way.

692

Econometrics

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

693

Econometrics

betareg

Beta Regression

Beta regression for modeling betadistributed dependent variables, e.g., rates and proportions. In addition to maximum likelihood regression (for both mean and precision of a betadistributed response), biascorrected and biasreduced estimation as well as finite mixture models and recursive partitioning for beta regressions are provided.

694

Econometrics

bife

Binary Choice Models with Fixed Effects

Estimates fixed effects binary choice models (logit and probit) with potentially many individual fixed effects and computes average partial effects. Incidental parameter bias can be reduced with an asymptotic biascorrection proposed by FernandezVal (2009) <doi:10.1016/j.jeconom.2009.02.007>.

695

Econometrics

bimets

Time Series and Econometric Modeling

Time series analysis, (dis)aggregation and manipulation, e.g. time series extension, merge, projection, lag, lead, delta, moving and cumulative average and product, selection by index, date and yearperiod, conversion to daily, monthly, quarterly, (semi)annually. Simultaneous equation models definition, estimation, simulation and forecasting with coefficient restrictions, error autocorrelation, exogenization, addfactors, impact and interim multipliers analysis, conditional equation evaluation, endogenous targeting and model renormalization.

696

Econometrics

BMA

Bayesian Model Averaging

Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).

697

Econometrics

BMS

Bayesian Model Averaging Library

Bayesian model averaging for linear models with a wide choice of (customizable) priors. Builtin priors include coefficient priors (fixed, flexible and hyperg priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Postprocessing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison.

698

Econometrics

boot

Bootstrap Functions (Originally by Angelo Canty for S)

Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.

699

Econometrics

bootstrap

Functions for the Book “An Introduction to the Bootstrap”

Software (bootstrap, crossvalidation, jackknife) and data for the book “An Introduction to the Bootstrap” by B. Efron and R. Tibshirani, 1993, Chapman and Hall. This package is primarily provided for projects already based on it, and for support of the book. New projects should preferentially use the recommended package “boot”.

700

Econometrics

brglm

Bias Reduction in BinomialResponse Generalized Linear Models

Fit generalized linear models with binomial responses using either an adjustedscore approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudodata. The interface is essentially the same as ‘glm’. More flexibility is provided by the fact that custom pseudodata representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reducedbias estimates.

701

Econometrics

CADFtest

A Package to Perform Covariate Augmented DickeyFuller Unit Root Tests

Hansen’s (1995) CovariateAugmented DickeyFuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The pvalues of the test are computed using the procedure illustrated in Lupi (2009).

702

Econometrics

car (core)

Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.

703

Econometrics

CDNmoney

Components of Canadian Monetary and Credit Aggregates

Components of Canadian Credit Aggregates and Monetary Aggregates with continuity adjustments.

704

Econometrics

censReg

Censored Regression (Tobit) Models

Maximum Likelihood estimation of censored regression (Tobit) models with crosssectional and panel data.

705

Econometrics

clubSandwich

ClusterRobust (Sandwich) Variance Estimators with SmallSample Corrections

Provides several clusterrobust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the biasreduced linearization estimator introduced by Bell and McCaffrey (2002) <http://www.statcan.gc.ca/pub/12001x/2002002/article/9058eng.pdf> and developed further by Pustejovsky and Tipton (2017) <doi:10.1080/07350015.2016.1247004>. The package includes functions for estimating the variance covariance matrix and for testing single and multiple contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddlepoint corrections. Tests of multiple contrast hypotheses use an approximation to Hotelling’s Tsquared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), ivreg() (from package ‘AER’), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and rma.mv() (from ‘metafor’).

706

Econometrics

clusterSEs

Calculate ClusterRobust pValues and Confidence Intervals

Calculate pvalues and confidence intervals using clusteradjusted tstatistics (based on Ibragimov and Muller (2010) <doi:10.1198/jbes.2009.08046>, pairs cluster bootstrapped tstatistics, and wild cluster bootstrapped tstatistics (the latter two techniques based on Cameron, Gelbach, and Miller (2008) <doi:10.1162/rest.90.3.414>. Procedures are included for use with GLM, ivreg, plm (pooling or fixed effects), and mlogit models.

707

Econometrics

crch

Censored Regression with Conditional Heteroscedasticity

Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by intervalcensoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals.

708

Econometrics

decompr

GlobalValueChain Decomposition

Two globalvaluechain decompositions are implemented. Firstly, the WangWeiZhu (Wang, Wei, and Zhu, 2013) algorithm splits bilateral gross exports into 16 valueadded components. Secondly, the Leontief decomposition (default) derives the value added origin of exports by country and industry, which is also based on Wang, Wei, and Zhu (Wang, Z., S.J. Wei, and K. Zhu. 2013. “Quantifying International Production Sharing at the Bilateral and Sector Levels.”).

709

Econometrics

dlsem

DistributedLag Linear Structural Equation Models

Inference functionalities for distributedlag linear structural equation models (DLSEMs). DLSEMs are Markovian structural causal models where each factor of the joint probability distribution is a distributedlag linear regression model (Magrini, 2018) <doi:10.2478/bile20180012>. DLSEMs account for temporal delays in the dependence relationships among the variables and allow to perform dynamic causal inference by assessing causal effects at different time lags. Endpointconstrained quadratic, quadratic decreasing and gamma lag shapes are available.

710

Econometrics

dynlm

Dynamic Linear Regression

Dynamic linear models and time series regression.

711

Econometrics

Ecdat

Data Sets for Econometrics

Data sets for econometrics.

712

Econometrics

effects

Effect Displays for Linear, Generalized Linear, and Other Models

Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.

713

Econometrics

erer

Empirical Research in Economics with R

Functions, datasets, and sample codes related to the book of ‘Empirical Research in Economics: Growing up with R’ by Dr. Changyou Sun are included. Marginal effects for binary or ordered choice models can be calculated. Static and dynamic Almost Ideal Demand System (AIDS) models can be estimated. A typical event analysis in finance can be conducted with several functions included.

714

Econometrics

estimatr

Fast Estimators for DesignBased Inference

Fast procedures for small set of commonlyused, designappropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, instrumental variables regression, differenceinmeans, HorvitzThompson estimation, and regression improving precision of experimental estimates by interacting treatment with centered pretreatment covariates introduced by Lin (2013) <doi:10.1214/12AOAS583>.

715

Econometrics

expsmooth

Data Sets from “Forecasting with Exponential Smoothing”

Data sets from the book “Forecasting with exponential smoothing: the state space approach” by Hyndman, Koehler, Ord and Snyder (Springer, 2008).

716

Econometrics

ExtremeBounds

Extreme Bounds Analysis (EBA)

An implementation of Extreme Bounds Analysis (EBA), a global sensitivity analysis that examines the robustness of determinants in regression models. The package supports both Leamer’s and SalaiMartin’s versions of EBA, and allows users to customize all aspects of the analysis.

717

Econometrics

feisr

Estimating Fixed Effects Individual Slope Models

Provides the function feis() to estimate fixed effects individual slope (FEIS) models. The FEIS model constitutes a more general version of the oftenused fixed effects (FE) panel model, as implemented in the package ‘plm’ by Croissant and Millo (2008) <doi:10.18637/jss.v027.i02>. In FEIS models, data are not only person “demeaned” like in conventional FE models, but “detrended” by the predicted individual slope of each person or group. Estimation is performed by applying least squares lm() to the transformed data. For more details on FEIS models see Bruederl and Ludwig (2015, ISBN:1446252442); Frees (2001) <doi:10.2307/3316008>; Polachek and Kim (1994) <doi:10.1016/03044076(94)900752>; Wooldridge (2010, ISBN:0262294354). To test consistency of conventional FE and random effects estimators against heterogeneous slopes, the package also provides the functions feistest() for an artificial regression test and bsfeistest() for a bootstrapped version of the Hausman test.

718

Econometrics

fma

Data Sets from “Forecasting: Methods and Applications” by Makridakis, Wheelwright & Hyndman (1998)

All data sets from “Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman (Wiley, 3rd ed., 1998).

719

Econometrics

forecast (core)

Forecasting Functions for Time Series and Linear Models

Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

720

Econometrics

frm

Regression Analysis of Fractional Responses

Estimation and specification analysis of one and twopart fractional regression models and calculation of partial effects.

721

Econometrics

frontier

Stochastic Frontier Analysis

Maximum Likelihood Estimation of Stochastic Frontier Production and Cost Functions. Two specifications are available: the error components specification with timevarying efficiencies (Battese and Coelli, 1992) and a model specification in which the firm effects are directly influenced by a number of variables (Battese and Coelli, 1995).

722

Econometrics

fxregime

Exchange Rate Regime Analysis

Exchange rate regression and structural change tools for estimating, testing, dating, and monitoring (de facto) exchange rate regimes.

723

Econometrics

gam

Generalized Additive Models

Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).

724

Econometrics

gamlss

Generalised Additive Models for Location Scale and Shape

Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.14679876.2005.00510.x>. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.

725

Econometrics

geepack

Generalized Estimating Equation Package

Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses.

726

Econometrics

gets

GeneraltoSpecific (GETS) Modelling and Indicator Saturation Methods

Automated GeneraltoSpecific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean.

727

Econometrics

glmx

Generalized Linear Models Extended

Extended techniques for generalized linear models (GLMs), especially for binary responses, including parametric links and heteroskedastic latent variables.

728

Econometrics

gmm

Generalized Method of Moments and Generalized Empirical Likelihood

It is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; <doi:10.2307/1392442>) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; <doi:10.1111/j.00130133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.14680262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.14680262.2005.00601.x>).

729

Econometrics

gmnl

Multinomial Logit Models with Random Parameters

An implementation of maximum simulated likelihood method for the estimation of multinomial logit models with random coefficients. Specifically, it allows estimating models with continuous heterogeneity such as the mixed multinomial logit and the generalized multinomial logit. It also allows estimating models with discrete heterogeneity such as the latent class and the mixedmixed multinomial logit model.

730

Econometrics

gravity

Estimation Methods for Gravity Models

A wrapper of different standard estimation methods for gravity models. This package provides estimation methods for loglog models and multiplicative models.

731

Econometrics

gvc

Global Value Chains Tools

Several tools for Global Value Chain (‘GVC’) analysis are implemented.

732

Econometrics

Hmisc

Harrell Miscellaneous

Contains many functions useful for data analysis, highlevel graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.

733

Econometrics

ineq

Measuring Inequality, Concentration, and Poverty

Inequality, concentration, and poverty measures. Lorenz curves (empirical and theoretical).

734

Econometrics

ivfixed

Instrumental fixed effect panel data model

Fit an Instrumental least square dummy variable model

735

Econometrics

ivpack

Instrumental Variable Estimation

This package contains functions for carrying out instrumental variable estimation of causal effects and power analyses for instrumental variable studies.

736

Econometrics

ivpanel

Instrumental Panel Data Models

Fit the instrumental panel data models: the fixed effects, random effects and between models.

737

Econometrics

ivprobit

Instrumental Variables Probit Model

Compute the instrumental variables probit model using the Amemiya’s Generalized Least Squares estimators (Amemiya, Takeshi, (1978) <doi:10.2307/1911443>).

738

Econometrics

LARF

Local Average Response Functions for Instrumental Variable Estimation of Treatment Effects

Provides instrumental variable estimation of treatment effects when both the endogenous treatment and its instrument are binary. Applicable to both binary and continuous outcomes.

739

Econometrics

lavaan

Latent Variable Analysis

Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.

740

Econometrics

lfe

Linear Group Fixed Effects

Transforms away factors with many levels prior to doing an OLS. Useful for estimating linear models with multiple group fixed effects, and for estimating linear models which uses factors with many levels as pure control variables. Includes support for instrumental variables, conditional F statistics for weak instruments, robust and multiway clustered standard errors, as well as limited mobility bias correction.

741

Econometrics

LinRegInteractive

Interactive Interpretation of Linear Regression Models

Interactive visualization of effects, response functions and marginal effects for different kinds of regression models. In this version linear regression models, generalized linear models, generalized additive models and linear mixedeffects models are supported. Major features are the interactive approach and the handling of the effects of categorical covariates: if two or more factors are used as covariates every combination of the levels of each factor is treated separately. The automatic calculation of marginal effects and a number of possibilities to customize the graphical output are useful features as well.

742

Econometrics

lme4

Linear MixedEffects Models using ‘Eigen’ and S4

Fit linear and generalized linear mixedeffects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

743

Econometrics

lmtest (core)

Testing Linear Regression Models

A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.

744

Econometrics

margins

Marginal Effects for Model Objects

An R port of Stata’s ‘margins’ command, which can be used to calculate marginal (or partial) effects from model objects.

745

Econometrics

MASS

Support Functions and Datasets for Venables and Ripley’s MASS

Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).

746

Econometrics

matchingMarkets

Analysis of Stable Matchings

Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes onesided matching of agents into groups as well as twosided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.

747

Econometrics

Matrix

Sparse and Dense Matrix Classes and Methods

A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using ‘LAPACK’ and ‘SuiteSparse’ libraries.

748

Econometrics

Mcomp

Data from the MCompetitions

The 1001 time series from the Mcompetition (Makridakis et al. 1982) <doi:10.1002/for.3980010202> and the 3003 time series from the IJFM3 competition (Makridakis and Hibon, 2000) <doi:10.1016/S01692070(00)000571>.

749

Econometrics

meboot

Maximum Entropy Bootstrap for Time Series

Maximum entropy density based dependent data bootstrap. An algorithm is provided to create a population of time series (ensemble) without assuming stationarity. The reference paper (Vinod, H.D., 2004) explains how the algorithm satisfies the ergodic theorem and the central limit theorem.

750

Econometrics

mgcv

Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.

751

Econometrics

micEcon

Microeconomic Analysis and Modelling

Various tools for microeconomic analysis and microeconomic modelling, e.g. estimating quadratic, CobbDouglas and Translog functions, calculating partial derivatives and elasticities of these functions, and calculating Hessian matrices, checking curvature and preparing restrictions for imposing monotonicity of Translog functions.

752

Econometrics

micEconAids

Demand Analysis with the Almost Ideal Demand System (AIDS)

Functions and tools for analysing consumer demand with the Almost Ideal Demand System (AIDS) suggested by Deaton and Muellbauer (1980).

753

Econometrics

micEconCES

Analysis with the Constant Elasticity of Substitution (CES) function

Tools for economic analysis and economic modelling with a Constant Elasticity of Substitution (CES) function

754

Econometrics

micEconSNQP

Symmetric Normalized Quadratic Profit Function

Production analysis with the Symmetric Normalized Quadratic (SNQ) profit function

755

Econometrics

midasr

Mixed Data Sampling Regression

Methods and tools for mixed frequency time series data analysis. Allows estimation, model selection and forecasting for MIDAS regressions.

756

Econometrics

mlogit

Multinomial Logit Models

Maximum Likelihood estimation of random utility discrete choice models, as described in Kenneth Train (2009) Discrete Choice Methods with Simulations <doi:10.1017/CBO9780511805271>.

757

Econometrics

MNP

R Package for Fitting the Multinomial Probit Model

Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 132. <doi:10.18637/jss.v014.i03>.

758

Econometrics

multiwayvcov

MultiWay Standard Error Clustering

Exports two functions implementing multiway clustering using the method suggested by Cameron, Gelbach, & Miller (2011) and cluster (or block) bootstrapping for estimating variancecovariance matrices. Normal one and twoway clustering matches the results of other common statistical packages. Missing values are handled transparently and rudimentary parallelization support is provided.

759

Econometrics

mvProbit

Multivariate Probit Models

Tools for estimating multivariate probit models, calculating conditional and unconditional expectations, and calculating marginal effects on conditional and unconditional expectations.

760

Econometrics

nlme

Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixedeffects models.

761

Econometrics

nnet

FeedForward Neural Networks and Multinomial LogLinear Models

Software for feedforward neural networks with a single hidden layer, and for multinomial loglinear models.

762

Econometrics

nonnest2

Tests of NonNested Models

Testing nonnested models via theory supplied by Vuong (1989) <doi:10.2307/1912557>. Includes tests of model distinguishability and of model fit that can be applied to both nested and nonnested models. Also includes functionality to obtain confidence intervals associated with AIC and BIC. This material is partially based on work supported by the National Science Foundation under Grant Number SES1061334.

763

Econometrics

np

Nonparametric Kernel Smoothing Methods for Mixed Data Types

Nonparametric (and semiparametric) kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor data types. We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC, <http://www.nserccrsng.gc.ca>), the Social Sciences and Humanities Research Council of Canada (SSHRC, <http://www.sshrccrsh.gc.ca>), and the Shared Hierarchical Academic Research Computing Network (SHARCNET, <http://www.sharcnet.ca>).

764

Econometrics

nse

Numerical Standard Errors Computation in R

Collection of functions designed to calculate numerical standard error (NSE) of univariate time series as described in Ardia et al. (2018) <doi:10.2139/ssrn.2741587> and Ardia and Bluteau (2017) <doi:10.21105/joss.00172>.

765

Econometrics

ordinal

Regression Models for Ordinal Data

Implementation of cumulative link (mixed) models also known as ordered regression models, proportional odds models, proportional hazards models for grouped survival times and ordered logit/probit/… models. Estimation is via maximum likelihood and mixed models are fitted with the Laplace approximation and adaptive GaussHermite quadrature. Multiple random effect terms are allowed and they may be nested, crossed or partially nested/crossed. Restrictions of symmetry and equidistance can be imposed on the thresholds (cutpoints/intercepts). Standard model methods are available (summary, anova, dropmethods, step, confint, predict etc.) in addition to profile methods and slice methods for visualizing the likelihood function and checking convergence.

766

Econometrics

OrthoPanels

Dynamic Panel Models with Orthogonal Reparameterization of Fixed Effects

Implements the orthogonal reparameterization approach recommended by Lancaster (2002) to estimate dynamic panel models with fixed effects (and optionally: panel specific intercepts). The approach uses a likelihoodbased estimator and produces estimates that are asymptotically unbiased as N goes to infinity, with a T as low as 2.

767

Econometrics

pampe

Implementation of the Panel Data Approach Method for Program Evaluation

Implements the Panel Data Approach Method for program evaluation as developed in Hsiao, Ching and Ki Wan (2012). pampe estimates the effect of an intervention by comparing the evolution of the outcome for a unit affected by an intervention or treatment to the evolution of the unit had it not been affected by the intervention.

768

Econometrics

panelAR

Estimation of Linear AR(1) Panel Data Models with CrossSectional Heteroskedasticity and/or Correlation

The package estimates linear models on panel data structures in the presence of AR(1)type autocorrelation as well as panel heteroskedasticity and/or contemporaneous correlation. First, AR(1)type autocorrelation is addressed via a twostep PraisWinsten feasible generalized least squares (FGLS) procedure, where the autocorrelation coefficients may be panelspecific. A number of common estimators for the autocorrelation coefficient are supported. In case of panel heteroskedasticty, one can choose to use a sandwichtype robust standard error estimator with OLS or a panel weighted least squares estimator after the twostep PraisWinsten estimator. Alternatively, if panels are both heteroskedastic and contemporaneously correlated, the package supports panelcorrected standard errors (PCSEs) as well as the ParksKmenta FGLS estimator.

769

Econometrics

Paneldata

Linear models for panel data

Linear models for panel data: the fixed effect model and the random effect model

770

Econometrics

panelr

Regression Models and Utilities for Repeated Measures and Panel Data

Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take advantage of the unique aspects of panel data. Among other capabilities, automates the “withinbetween” (also known as “betweenwithin” and “hybrid”) panel regression specification that combines the desirable aspects of both fixed effects and random effects econometric models and fits them as multilevel models (Allison, 2009 <doi:10.4135/9781412993869.d33>; Bell & Jones, 2015 <doi:10.1017/psrm.2014.7>). These models can also be estimated via generalized estimating equations (GEE; McNeish, 2019 <doi:10.1080/00273171.2019.1602504>) and Bayesian estimation is (optionally) supported via ‘Stan’. Supports estimation of asymmetric effects models via first differences (Allison, 2019 <doi:10.1177/2378023119826441>) as well as a generalized linear model extension thereof using GEE.

771

Econometrics

panelvar

Panel Vector Autoregression

We extend two general methods of moment estimators to panel vector autoregression models (PVAR) with p lags of endogenous variables, predetermined and strictly exogenous variables. This general PVAR model contains the first difference GMM estimator by HoltzEakin et al. (1988) <doi:10.2307/1913103>, Arellano and Bond (1991) <doi:10.2307/2297968> and the system GMM estimator by Blundell and Bond (1998) <doi:10.1016/S03044076(98)000098>. We also provide specification tests (Hansen overidentification test, lag selection criterion and stability test of the PVAR polynomial) and classical structural analysis for PVAR models such as orthogonal and generalized impulse response functions, bootstrapped confidence intervals for impulse response analysis and forecast error variance decompositions.

772

Econometrics

PANICr

PANIC Tests of Nonstationarity

A methodology that makes use of the factor structure of large dimensional panels to understand the nature of nonstationarity inherent in data. This is referred to as PANIC, Panel Analysis of Nonstationarity in Idiosyncratic and Common Components. PANIC (2004)<doi:10.1111/j.14680262.2004.00528.x> includes valid pooling methods that allow panel tests to be constructed. PANIC (2004) can detect whether the nonstationarity in a series is pervasive, or variable specific, or both. PANIC (2010) <doi:10.1017/s0266466609990478> includes two new tests on the idiosyncratic component that estimates the pooled autoregressive coefficient and sample moment, respectively. The PANIC model approximates the number of factors based on Bai and Ng (2002) <doi:10.1111/14680262.00273>.

773

Econometrics

pco

Panel Cointegration Tests

Computation of the Pedroni (1999) panel cointegration test statistics. Reported are the empirical and the standardized values.

774

Econometrics

pcse

PanelCorrected Standard Error Estimation in R

A function to estimate panelcorrected standard errors. Data may contain balanced or unbalanced panels.

775

Econometrics

pder

Panel Data Econometrics with R

Data sets for the Panel Data Econometrics with R book.

776

Econometrics

pdR

Threshold Model and Unit Root Tests in CrossSection and Time Series Data

Threshold model, panel version of Hylleberg et al. (1990) <doi:10.1016/03044076(90)90080D> seasonal unit root tests, and panel unit root test of Chang (2002) <doi:10.1016/S03044076(02)000957>.

777

Econometrics

pglm

Panel Generalized Linear Models

Estimation of panel models for glmlike models: this includes binomial models (logit and probit) count models (poisson and negbin) and ordered models (logit and probit).

778

Econometrics

phtt

Panel Data Analysis with Heterogeneous Time Trends

The package provides estimation procedures for panel data with large dimensions n, T, and general forms of unobservable heterogeneous effects. Particularly, the estimation procedures are those of Bai (2009) and Kneip, Sickles, and Song (2012), which complement one another very well: both models assume the unobservable heterogeneous effects to have a factor structure. The method of Bai (2009) assumes that the factors are stationary, whereas the method of Kneip et al. (2012) allows the factors to be nonstationary. Additionally, the ‘phtt’ package provides a wide range of dimensionality criteria in order to estimate the number of the unobserved factors simultaneously with the remaining model parameters.

779

Econometrics

plm (core)

Linear Models for Panel Data

A set of estimators and tests for panel data econometrics, as described in Baltagi (2013) Econometric Analysis of Panel Data, ISBN13:9781118672327, Hsiao (2014) Analysis of Panel Data <doi:10.1017/CBO9781139839327> and Croissant and Millo (2018), Panel Data Econometrics with R, ISBN:9781118949184.

780

Econometrics

pscl

Political Science Computational Laboratory

Bayesian analysis of itemresponse theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zeroinflated and hurdle models for count data; goodnessoffit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seatsvotes curves.

781

Econometrics

psidR

Build Panel Data Sets from PSID Raw Data

Makes it easy to build panel data in wide format from Panel Survey of Income Dynamics (‘PSID’) delivered raw data. Downloads data directly from the PSID server using the ‘SAScii’ package. ‘psidR’ takes care of merging data from each wave onto a crossperiod index file, so that individuals can be followed over time. The user must specify which years they are interested in, and the ‘PSID’ variable names (e.g. ER21003) for each year (they differ in each year). The package offers helper functions to retrieve variable names from different waves. There are different panel data designs and sample subsetting criteria implemented (“SRC”, “SEO”, “immigrant” and “latino” samples).

782

Econometrics

PSTR

Panel Smooth Transition Regression Modelling

Provides the Panel Smooth Transition Regression (PSTR) modelling. The modelling procedure consists of three stages: Specification, Estimation and Evaluation. The package offers sharp tools helping the package user(s) to conduct model specification tests, to do PSTR model estimation, and to do model evaluation. The tests implemented in the package allow for clusterdependency and are heteroskedasticityconsistent. The wild bootstrap and wild cluster bootstrap tests are also implemented. Parallel computation (as an option) is implemented in some functions, especially the bootstrap tests. The package suits tasks running many cores on supercomputation servers.

783

Econometrics

pwt

Penn World Table (Versions 5.6, 6.x, 7.x)

The Penn World Table provides purchasing power parity and national income accounts converted to international prices for 189 countries for some or all of the years 19502010.

784

Econometrics

pwt8

Penn World Table (Version 8.x)

The Penn World Table 8.x provides information on relative levels of income, output, inputs, and productivity for 167 countries between 1950 and 2011.

785

Econometrics

pwt9

Penn World Table (Version 9.x)

The Penn World Table 9.x (<http://www.ggdc.net/pwt/>) provides information on relative levels of income, output, inputs, and productivity for 182 countries between 1950 and 2017.

786

Econometrics

quantreg

Quantile Regression

Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and nonparametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.

787

Econometrics

Rchoice

Discrete Choice (Binary, Poisson and Ordered) Models with Random Parameters

An implementation of simulated maximum likelihood method for the estimation of Binary (Probit and Logit), Ordered (Probit and Logit) and Poisson models with random parameters for crosssectional and longitudinal data.

788

Econometrics

rdd

Regression Discontinuity Estimation

Provides the tools to undertake estimation in Regression Discontinuity Designs. Both sharp and fuzzy designs are supported. Estimation is accomplished using local linear regression. A provided function will utilize ImbensKalyanaraman optimal bandwidth calculation. A function is also included to test the assumption of nosorting effects.

789

Econometrics

rddapp

Regression Discontinuity Design Application

Estimation of both single and multipleassignment Regression Discontinuity Designs (RDDs). Provides both parametric (global) and nonparametric (local) estimation choices for both sharp and fuzzy designs, along with power analysis and assumption checks. Introductions to the underlying logic and analysis of RDDs are in Thistlethwaite, D. L., Campbell, D. T. (1960) <doi:10.1037/h0044319> and Lee, D. S., Lemieux, T. (2010) <doi:10.1257/jel.48.2.281>.

790

Econometrics

rddtools

Toolbox for Regression Discontinuity Design (‘RDD’)

Set of functions for Regression Discontinuity Design (‘RDD’), for data visualisation, estimation and testing.

791

Econometrics

rdlocrand

Local Randomization Methods for RD Designs

The regression discontinuity (RD) design is a popular quasiexperimental design for causal inference and policy evaluation. Under the local randomization approach, RD designs can be interpreted as randomized experiments inside a window around the cutoff. This package provides tools to perform randomization inference for RD designs under local randomization: rdrandinf() to perform hypothesis testing using randomization inference, rdwinselect() to select a window around the cutoff in which randomization is likely to hold, rdsensitivity() to assess the sensitivity of the results to different window lengths and null hypotheses and rdrbounds() to construct Rosenbaum bounds for sensitivity to unobserved confounders.

792

Econometrics

rdmulti

Analysis of RD Designs with Multiple Cutoffs or Scores

The regression discontinuity (RD) design is a popular quasiexperimental design for causal inference and policy evaluation. The ‘rdmulti’ package provides tools to analyze RD designs with multiple cutoffs or scores: rdmc() estimates pooled and cutoff specific effects for multicutoff designs, rdmcplot() draws RD plots for multicutoff designs and rdms() estimates effects in cumulative cutoffs or multiscore designs. See Cattaneo, Titiunik and VazquezBare (2018) <https://sites.google.com/site/rdpackages/rdmulti/CattaneoTitiunikVazquezBare_2018_rdmulti.pdf> for further methodological details.

793

Econometrics

rdpower

Power Calculations for RD Designs

The regression discontinuity (RD) design is a popular quasiexperimental design for causal inference and policy evaluation. The ‘rdpower’ package provides tools to perform power and sample size calculations in RD designs: rdpower() calculates the power of an RD design and rdsampsi() calculates the required sample size to achieve a desired power. See Cattaneo, Titiunik and VazquezBare (2018) <https://sites.google.com/site/rdpackages/rdpower/CattaneoTitiunikVazquezBare_2018_Stata.pdf> for further methodological details.

794

Econometrics

rdrobust

Robust DataDriven Statistical Inference in RegressionDiscontinuity Designs

Regressiondiscontinuity (RD) designs are quasiexperimental research designs popular in social, behavioral and natural sciences. The RD design is usually employed to study the (local) causal effect of a treatment, intervention or policy. This package provides tools for datadriven graphical and analytical statistical inference in RD designs: rdrobust() to construct localpolynomial point estimators and robust confidence intervals for average treatment effects at the cutoff in Sharp, Fuzzy and Kink RD settings, rdbwselect() to perform bandwidth selection for the different procedures implemented, and rdplot() to conduct exploratory data analysis (RD plots).

795

Econometrics

reldist

Relative Distribution Methods

Tools for the comparison of distributions. This includes nonparametric estimation of the relative distribution PDF and CDF and numerical summaries as described in “Relative Distribution Methods in the Social Sciences” by Mark S. Handcock and Martina Morris, SpringerVerlag, 1999, SpringerVerlag, ISBN 0387987789.

796

Econometrics

REndo

Fitting Linear Models with Endogenous Regressors using Latent Instrumental Variables

Fits linear models with endogenous regressor using latent instrumental variable approaches. The methods included in the package are Lewbel’s (1997) <doi:10.2307/2171884> higher moments approach as well as Lewbel’s (2012) <doi:10.1080/07350015.2012.643126> heteroscedasticity approach, Park and Gupta’s (2012) <doi:10.1287/mksc.1120.0718> joint estimation method that uses Gaussian copula and Kim and Frees’s (2007) <doi:10.1007/s1133600790081> multilevel generalized method of moment approach that deals with endogeneity in a multilevel setting. These are statistical techniques to address the endogeneity problem where no external instrumental variables are needed. Note that with version 2.0.0 sweeping changes were introduced which greatly improve functionality and usability but break backwards compatibility.

797

Econometrics

rms

Regression Modeling Strategies

Regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. ‘rms’ is a collection of functions that assist with and streamline modeling. It also contains functions for binary and ordinal logistic regression models, ordinal models for continuous Y with a variety of distribution families, and the BuckleyJames multiple regression model for rightcensored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models. ‘rms’ works with almost any regression model, but it was especially written to work with binary or ordinal regression models, Cox regression, accelerated failure time models, ordinary linear models, the BuckleyJames model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.

798

Econometrics

RSGHB

Functions for Hierarchical Bayesian Estimation: A Flexible Approach

Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (nonvarying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative lognormal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with nonnormal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.

799

Econometrics

rUnemploymentData

Data and Functions for USA State and County Unemployment Data

Contains data and visualization functions for USA unemployment data. Data comes from the US Bureau of Labor Statistics (BLS). State data is in ?df_state_unemployment and covers 20002013. County data is in ?df_county_unemployment and covers 19902013. Choropleth maps of the data can be generated with ?state_unemployment_choropleth() and ?county_unemployment_choropleth() respectively.

800

Econometrics

sampleSelection

Sample Selection Models

Twostep and maximum likelihood estimation of Heckmantype sample selection models: standard sample selection models (Tobit2), endogenous switching regression models (Tobit5), sample selection models with binary dependent outcome variable, interval regression with sample selection (only ML estimation), and endogenous treatment effects models. These methods are described in the three vignettes that are included in this package and in econometric textbooks such as Greene (2011, Econometric Analysis, 7th edition, Pearson).

801

Econometrics

sandwich (core)

Robust Covariance Matrix Estimators

Modelrobust standard error estimators for crosssectional, time series, clustered, panel, and longitudinal data.

802

Econometrics

segmented

Regression Models with BreakPoints / ChangePoints Estimation

Given a regression model, segmented ‘updates’ it by adding one or more segmented (i.e., piecewise linear) relationships. Several variables with multiple breakpoints are allowed. The estimation method is discussed in Muggeo (2003, <doi:10.1002/sim.1545>) and illustrated in Muggeo (2008, <https://www.rproject.org/doc/Rnews/Rnews_20081.pdf>). An approach for hypothesis testing is presented in Muggeo (2016, <doi:10.1080/00949655.2016.1149855>), and interval estimation for the breakpoint is discussed in Muggeo (2017, <doi:10.1111/anzs.12200>).

803

Econometrics

sem

Structural Equation Models

Functions for fitting general linear structural equation models (with observed and latent variables) using the RAM approach, and for fitting structural equations in observedvariable models by twostage least squares.

804

Econometrics

SemiParSampleSel

SemiParametric Sample Selection Modelling with Continuous or Discrete Response

Routine for fitting continuous or discrete response copula sample selection models with semiparametric predictors, including linear and nonlinear effects.

805

Econometrics

semsfa

Semiparametric Estimation of Stochastic Frontier Models

Semiparametric Estimation of Stochastic Frontier Models following a two step procedure: in the first step semiparametric or nonparametric regression techniques are used to relax parametric restrictions of the functional form representing technology and in the second step variance parameters are obtained by pseudolikelihood estimators or by method of moments.

806

Econometrics

sfa

Stochastic Frontier Analysis

Stochastic Frontier Analysis introduced by Aigner, Lovell and Schmidt (1976) and Battese and Coelli (1992, 1995).

807

Econometrics

simpleboot

Simple Bootstrap Routines

Simple bootstrap routines.

808

Econometrics

SparseM

Sparse Linear Algebra

Some basic linear algebra functionality for sparse matrices is provided: including Cholesky decomposition and backsolving as well as standard R subsetting and Kronecker products.

809

Econometrics

spatialprobit

Spatial Probit Models

Bayesian Estimation of Spatial Probit and Tobit Models.

810

Econometrics

spatialreg

Spatial Regression Analysis

A collection of all the estimation functions for spatial crosssectional models (on lattice/areal data using spatial weights matrices) contained up to now in ‘spdep’, ‘sphet’ and ‘spse’. These model fitting functions include maximum likelihood methods for crosssectional models proposed by ‘Cliff’ and ‘Ord’ (1973, ISBN:0850860369) and (1981, ISBN:0850860814), fitting methods initially described by ‘Ord’ (1975) <doi:10.1080/01621459.1975.10480272>. The models are further described by ‘Anselin’ (1988) <doi:10.1007/9789401577991>. Spatial two stage least squares and spatial general method of moment models initially proposed by ‘Kelejian’ and ‘Prucha’ (1998) <doi:10.1023/A:1007707430416> and (1999) <doi:10.1111/14682354.00027> are provided. Impact methods and MCMC fitting methods proposed by ‘LeSage’ and ‘Pace’ (2009) <doi:10.1201/9781420064254> are implemented for the family of crosssectional spatial regression models. Methods for fitting the log determinant term in maximum likelihood and MCMC fitting are compared by ‘Bivand et al.’ (2013) <doi:10.1111/gean.12008>, and model fitting methods by ‘Bivand’ and ‘Piras’ (2015) <doi:10.18637/jss.v063.i18>; both of these articles include extensive lists of references. ‘spatialreg’ >= 1.1* correspond to ‘spdep’ >= 1.11, in which the model fitting functions are deprecated and pass through to ‘spatialreg’, but will mask those in ‘spatialreg’. From versions 1.2*, the functions will be made defunct in ‘spdep’.

811

Econometrics

spfrontier

Spatial Stochastic Frontier Models

A set of tools for estimation of various spatial specifications of stochastic frontier models.

812

Econometrics

sphet

Estimation of Spatial Autoregressive Models with and without Heteroscedasticity

Generalized Method of Moment estimation of CliffOrdtype spatial autoregressive models with and without Heteroscedasticity.

813

Econometrics

splm

Econometric Models for Spatial Panel Data

ML and GM estimation and diagnostic testing of econometric models for spatial panel data.

814

Econometrics

ssfa

Spatial Stochastic Frontier Analysis

Spatial Stochastic Frontier Analysis (SSFA) is an original method for controlling the spatial heterogeneity in Stochastic Frontier Analysis (SFA) models, for crosssectional data, by splitting the inefficiency term into three terms: the first one related to spatial peculiarities of the territory in which each single unit operates, the second one related to the specific production features and the third one representing the error term.

815

Econometrics

strucchange

Testing, Monitoring, and Dating Structural Changes

Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.

816

Econometrics

survival

Survival Analysis

Contains the core survival analysis routines, including definition of Surv objects, KaplanMeier and AalenJohansen (multistate) curves, Cox models, and parametric accelerated failure time models.

817

Econometrics

systemfit

Estimating Systems of Simultaneous Equations

Econometric estimation of simultaneous systems of linear and nonlinear equations using Ordinary Least Squares (OLS), Weighted Least Squares (WLS), Seemingly Unrelated Regressions (SUR), TwoStage Least Squares (2SLS), Weighted TwoStage Least Squares (W2SLS), and ThreeStage Least Squares (3SLS).

818

Econometrics

truncreg

Truncated Gaussian Regression Models

Estimation of models for truncated Gaussian variables by maximum likelihood.

819

Econometrics

tsDyn

Nonlinear Time Series Models with Regime Switching

Implements nonlinear autoregressive (AR) time series models. For univariate series, a nonparametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).

820

Econometrics

tseries (core)

Time Series Analysis and Computational Finance

Time series analysis and computational finance.

821

Econometrics

tsfa

Time Series Factor Analysis

Extraction of Factors from Multivariate Time Series. See ?00tsfaIntro for more details.

822

Econometrics

urca (core)

Unit Root and Cointegration Tests for Time Series Data

Unit root and cointegration tests encountered in applied econometric analysis are implemented.

823

Econometrics

vars

VAR Modelling

Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.

824

Econometrics

VGAM

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/9781493928187> gives details of the statistical framework and the package. Currently only fixedeffects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

825

Econometrics

wahc

Autocorrelation and Heteroskedasticity Correction in Fixed Effect Panel Data Model

Fit the fixed effect panel data model with heteroskedasticity and autocorrelation correction.

826

Econometrics

wbstats

Programmatic Access to Data and Statistics from the World Bank API

Tools for searching and downloading data and statistics from the World Bank Data API (<http://data.worldbank.org/developers/apioverview>) and the World Bank Data Catalog API (<http://data.worldbank.org/developers/datacatalogapi>).

827

Econometrics

wooldridge

111 Data Sets from “Introductory Econometrics: A Modern Approach, 6e” by Jeffrey M. Wooldridge

Students learning both econometrics and R may find the introduction to both challenging. However, if the text is “Introductory Econometrics: A Modern Approach” by Jeffrey M. Wooldridge, they are in luck! The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have all been compressed to a fraction of their original size and are well documented. Documentation files contain the page numbers of the text where each set is used, the original source, time of publication, and notes suggesting ideas for further exploratory data analysis and research. If one need’s to brushup on model syntax, a vignette contains R solutions to examples from each chapter of the text. Data sets are from the 6th edition (Wooldridge 2016, ISBN13: 9781305270107), and are backwards compatible with all versions of the text.

828

Econometrics

xts

eXtensible Time Series

Provide for uniform handling of R’s different timebased data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying crossclass interoperability.

829

Econometrics

Zelig

Everyone’s Statistical Software

A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical workflowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and reweighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.

830

Econometrics

zoo (core)

S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)

An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.

831

Econometrics

zTree

Functions to Import Data from ‘zTree’ into R

Read ‘.xls’ and ‘.sbj’ files which are written by the Microsoft Windows program ‘zTree’. The latter is a software for developing and carrying out economic experiments (see <http://www.ztree.uzh.ch/> for more information).

832

Analysis of Ecological and Environmental Data

ade4 (core)

Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of onetable (e.g., principal component analysis, correspondence analysis), twotable (e.g., coinertia analysis, redundancy analysis), threetable (e.g., RLQ analysis) and Ktable (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.

833

Analysis of Ecological and Environmental Data

amap

Another Multidimensional Analysis Package

Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).

834

Analysis of Ecological and Environmental Data

analogue

Analogue and Weighted Averaging Methods for Palaeoecology

Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.

835

Analysis of Ecological and Environmental Data

aod

Analysis of Overdispersed Data

Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).

836

Analysis of Ecological and Environmental Data

ape

Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clocklike trees using mean path lengths and penalized likelihood, dating trees with noncontemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, TCoffee, Muscle) whose results are returned into R.

837

Analysis of Ecological and Environmental Data

aqp

Algorithms for Quantitative Pedology

The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a looselyrelated set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilwebapps/>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDANRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.

838

Analysis of Ecological and Environmental Data

BiodiversityR

Package for Community Ecology and Suitability Analysis

Graphical User Interface (via the RCommander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presenceabsence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.

839

Analysis of Ecological and Environmental Data

boussinesq

Analytic Solutions for (groundwater) Boussinesq Equation

This package is a collection of R functions implemented from published and available analytic solutions for the OneDimensional Boussinesq Equation (groundwater). In particular, the function “beq.lin” is the analytic solution of the linearized form of Boussinesq Equation between two different headbased boundary (Dirichlet) conditions; “beq.song” is the nonlinear powerseries analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.

840

Analysis of Ecological and Environmental Data

bReeze

Functions for Wind Resource Assessment

A collection of functions to analyse, visualize and interpret wind data and to calculate the potential energy production of wind turbines.

841

Analysis of Ecological and Environmental Data

CircStats

Circular Statistics, from “Topics in Circular Statistics” (2001)

Circular Statistics, from “Topics in Circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.

842

Analysis of Ecological and Environmental Data

circular

Circular Statistics

Circular Statistics, from “Topics in circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.

843

Analysis of Ecological and Environmental Data

cluster (core)

“Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.

844

Analysis of Ecological and Environmental Data

cocorresp

CoCorrespondence Analysis Methods

Fits predictive and symmetric cocorrespondence analysis (CoCA) models to relate one data matrix to another data matrix. More specifically, CoCA maximises the weighted covariance between the weighted averaged species scores of one community and the weighted averaged species scores of another community. CoCA attempts to find patterns that are common to both communities.

845

Analysis of Ecological and Environmental Data

Distance

Distance Sampling Detection Function and Abundance Estimation

A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a HorvitzThompsonlike estimator) if survey area information is provided.

846

Analysis of Ecological and Environmental Data

diveMove

Dive Analysis and Calibration

Utilities to represent, visualize, filter, analyse, and summarize timedepth recorder (TDR) data. Miscellaneous functions for handling location data are also provided.

847

Analysis of Ecological and Environmental Data

dse

Dynamic Systems Estimation (Time Series Package)

Tools for multivariate, linear, timeinvariant, time series models. This includes ARMA and statespace representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and statespace model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.

848

Analysis of Ecological and Environmental Data

DSpat

Spatial Modelling for Distance Sampling Data

Fits inhomogeneous Poisson process spatial models to line transect sampling data and provides estimate of abundance within a region.

849

Analysis of Ecological and Environmental Data

dyn

Time Series Regression

Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.

850

Analysis of Ecological and Environmental Data

dynatopmodel

Implementation of the Dynamic TOPMODEL Hydrological Model

A native R implementation and enhancement of the Dynamic TOPMODEL semidistributed hydrological model. Includes some preprocessing, utility and routines for displaying outputs.

851

Analysis of Ecological and Environmental Data

dynlm

Dynamic Linear Regression

Dynamic linear models and time series regression.

852

Analysis of Ecological and Environmental Data

e1071

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

853

Analysis of Ecological and Environmental Data

earth

Multivariate Adaptive Regression Splines

Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines” <doi:10.1214/aos/1176347963>. (The term “MARS” is trademarked and thus not used in the name of the package.)

854

Analysis of Ecological and Environmental Data

eco

Ecological Inference in 2x2 Tables

Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 <doi:10.1093/pan/mpm017>) and (2011 <doi:10.18637/jss.v042.i05>) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the ExpectationMaximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individuallevel data can be directly incorporated into the estimation whenever such data are available. Along with insample and outofsample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.

855

Analysis of Ecological and Environmental Data

ecodist

DissimilarityBased Functions for Ecological Analysis

Dissimilaritybased analysis functions including ordination and Mantel test functions, intended for use with spatial and community data.

856

Analysis of Ecological and Environmental Data

EcoHydRology

A Community Modeling Foundation for EcoHydrology

Provides a flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex ecohydrological interactions.

857

Analysis of Ecological and Environmental Data

EnvStats

Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous builtin data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 9781461484554, <http://www.springer.com/book/9781461484554>).

858

Analysis of Ecological and Environmental Data

equivalence

Provides Tests and Graphics for Assessing Tests of Equivalence

Provides statistical tests and graphics for assessing tests of equivalence. Such tests have similarity as the alternative hypothesis instead of the null. Sample data sets are included.

859

Analysis of Ecological and Environmental Data

evd

Functions for Extreme Value Distributions

Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.

860

Analysis of Ecological and Environmental Data

evdbayes

Bayesian Analysis in Extreme Value Theory

Provides functions for the bayesian analysis of extreme value models, using MCMC methods.

861

Analysis of Ecological and Environmental Data

evir

Extreme Values in R

Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.

862

Analysis of Ecological and Environmental Data

extRemes

Extreme Value Analysis

Functions for performing extreme value analysis.

863

Analysis of Ecological and Environmental Data

fast

Implementation of the Fourier Amplitude Sensitivity Test (FAST)

The Fourier Amplitude Sensitivity Test (FAST) is a method to determine global sensitivities of a model on parameter changes with relatively few model runs. This package implements this sensitivity analysis method.

864

Analysis of Ecological and Environmental Data

FD

Measuring functional diversity (FD) from multiple traits, and other tools for functional ecology

FD is a package to compute different multidimensional FD indices. It implements a distancebased framework to measure FD that allows any number and type of functional traits, and can also consider species relative abundances. It also contains other useful tools for functional ecology.

865

Analysis of Ecological and Environmental Data

flexmix

Flexible Mixture Modeling

A general framework for finite mixtures of regression models using the EM algorithm is implemented. The Estep and all data handling are provided, while the Mstep can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and modelbased clustering.

866

Analysis of Ecological and Environmental Data

forecast

Forecasting Functions for Time Series and Linear Models

Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

867

Analysis of Ecological and Environmental Data

fso

Fuzzy Set Ordination

Fuzzy set ordination is a multivariate analysis used in ecology to relate the composition of samples to possible explanatory variables. While differing in theory and method, in practice, the use is similar to ‘constrained ordination.’ The package contains plotting and summary functions as well as the analyses.

868

Analysis of Ecological and Environmental Data

gam

Generalized Additive Models

Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).

869

Analysis of Ecological and Environmental Data

gamair

Data for “GAMs: An Introduction with R”

Data sets and scripts used in the book “Generalized Additive Models: An Introduction with R”, Wood (2006) CRC.

870

Analysis of Ecological and Environmental Data

hydroGOF

GoodnessofFit Functions for Comparison of Simulated and Observed Hydrological Time Series

S3 functions implementing both statistical and graphical goodnessoffit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological models. Missing values in observed and/or simulated values can be removed before computations. Comments / questions / collaboration of any kind are very welcomed.

871

Analysis of Ecological and Environmental Data

HydroMe

R codes for estimating water retention and infiltration model parameters using experimental data

This package is version 2 of HydroMe v.1 package. It estimates the parameters in infiltration and water retention models by curvefitting method. The models considered are those that are commonly used in soil science. It has new models for water retention characteristic curve and debugging of errors in HydroMe v.1

872

Analysis of Ecological and Environmental Data

hydroPSO

Particle Swarm Optimisation, with Focus on Environmental Models

Stateoftheart version of the Particle Swarm Optimisation (PSO) algorithm (SPSO2011 and SPSO2007 capable). hydroPSO can be used as a replacement of the ‘optim’ R function for (global) optimization of nonsmooth and nonlinear functions. However, the main focus of hydroPSO is the calibration of environmental and other realworld models that need to be executed from the system console. hydroPSO is modelindependent, allowing the user to easily interface any computer simulation model with the calibration engine (PSO). hydroPSO communicates with the model through the model’s own input and output files, without requiring access to the model’s source code. Several PSO variants and controlling options are included to finetune the performance of the calibration engine to different calibration problems. An advanced sensitivity analysis function together with userfriendly plotting summaries facilitate the interpretation and assessment of the calibration results. hydroPSO is parallelcapable, to alleviate the computational burden of complex models with “long” execution time. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian). See ZambranoBigiarini and Rojas (2013) <doi:10.1016/j.envsoft.2013.01.004> for more details.

873

Analysis of Ecological and Environmental Data

hydroTSM

Time Series Management, Analysis and Interpolation for Hydrological Modelling

S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.

874

Analysis of Ecological and Environmental Data

Interpol.T

Hourly interpolation of multiple temperature daily series

Hourly interpolation of daily minimum and maximum temperature series. Carries out interpolation on multiple series ad once. Requires some hourly series for calibration (alternatively can use default calibration table).

875

Analysis of Ecological and Environmental Data

ipred

Improved Predictors

Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.

876

Analysis of Ecological and Environmental Data

ismev

An Introduction to Statistical Modeling of Extreme Values

Functions to support the computations carried out in ‘An Introduction to Statistical Modeling of Extreme Values’ by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes.

877

Analysis of Ecological and Environmental Data

labdsv (core)

Ordination and Multivariate Analysis for Ecology

A variety of ordination and community analyses useful in analysis of data sets in community ecology. Includes many of the common ordination methods, with graphical routines to facilitate their interpretation, as well as several novel analyses.

878

Analysis of Ecological and Environmental Data

latticeDensity

Density Estimation and Nonparametric Regression on Irregular Regions

Functions that compute the latticebased density estimator of Barry and McIntyre, which accounts for point processes in twodimensional regions with irregular boundaries and holes. The package also implements twodimensional nonparametric regression for similar regions.

879

Analysis of Ecological and Environmental Data

lme4

Linear MixedEffects Models using ‘Eigen’ and S4

Fit linear and generalized linear mixedeffects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

880

Analysis of Ecological and Environmental Data

maptree

Mapping, pruning, and graphing tree models

Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees.

881

Analysis of Ecological and Environmental Data

marked

MarkRecapture Analysis for Survival and Abundance Estimation

Functions for fitting various models to capturerecapture data including mixedeffects CormackJollySeber(CJS) and multistate models and the multivariate state model structure for survival estimation and POPAN structured JollySeber models for abundance estimation. There are also Hidden Markov model (HMM) implementations of CJS and multistate models with and without state uncertainty and a simulation capability for HMM models.

882

Analysis of Ecological and Environmental Data

MASS (core)

Support Functions and Datasets for Venables and Ripley’s MASS

Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).

883

Analysis of Ecological and Environmental Data

mclust

Gaussian Mixture Modelling for ModelBased Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for modelbased clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resamplingbased inference.

884

Analysis of Ecological and Environmental Data

mda

Mixture and Flexible Discriminant Analysis

Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, …

885

Analysis of Ecological and Environmental Data

mefa

Multivariate Data Handling in Ecology and Biogeography

A framework package aimed to provide standardized computational environment for specialist work via object classes to represent the data coded by samples, taxa and segments (i.e. subpopulations, repeated measures). It supports easy processing of the data along with cross tabulation and relational data tables for samples and taxa. An object of class ‘mefa’ is a project specific compendium of the data and can be easily used in further analyses. Methods are provided for extraction, aggregation, conversion, plotting, summary and reporting of ‘mefa’ objects. Reports can be generated in plain text or LaTeX format. Vignette contains worked examples.

886

Analysis of Ecological and Environmental Data

metacom

Analysis of the ‘Elements of Metacommunity Structure’

Functions to analyze coherence, boundary clumping, and turnover following the patternbased metacommunity analysis of Leibold and Mikkelson 2002 <doi:10.1034/j.16000706.2002.970210.x>. The package also includes functions to visualize ecological networks, and to calculate modularity as a replacement to boundary clumping.

887

Analysis of Ecological and Environmental Data

mgcv (core)

Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.

888

Analysis of Ecological and Environmental Data

mrds

MarkRecapture Distance Sampling

Animal abundance estimation via conventional, multiple covariate and markrecapture distance sampling (CDS/MCDS/MRDS). Detection function fitting is performed via maximum likelihood. Also included are diagnostics and plotting for fitted detection functions. Abundance estimation is via a HorvitzThompsonlike estimator.

889

Analysis of Ecological and Environmental Data

nlme

Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixedeffects models.

890

Analysis of Ecological and Environmental Data

nsRFA

NonSupervised Regional Frequency Analysis

A collection of statistical tools for objective (nonsupervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the indexvalue method and, more precisely, helps the hydrologist to: (1) regionalize the indexvalue; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available.

891

Analysis of Ecological and Environmental Data

oce

Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including ‘ADCP’ measurements, measurements made with ‘argo’ floats, ‘CTD’ measurements, sectional data, sealevel time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the ‘UNESCO’ or ‘TEOS10’ equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively in Dan Kelley’s book Oceanographic Analysis with R, published in 2018 by ‘SpringerVerlag’ with ISBN 9781493988426.

892

Analysis of Ecological and Environmental Data

openair

Tools for the Analysis of Air Pollution Data

Tools to analyse, interpret and understand air pollution data. Data are typically hourly time series and both monitoring data and dispersion model output can be analysed. Many functions can also be applied to other data, including meteorological and traffic data.

893

Analysis of Ecological and Environmental Data

ouch

OrnsteinUhlenbeck Models for Phylogenetic Comparative Hypotheses

Fit and compare OrnsteinUhlenbeck models for evolution along a phylogenetic tree.

894

Analysis of Ecological and Environmental Data

party

A Laboratory for Recursive Partytioning

A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed treestructured regression models into a well defined theory of conditional inference procedures. This nonparametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing treestructured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/14712105825>.

895

Analysis of Ecological and Environmental Data

pastecs

Package for Analysis of SpaceTime Ecological Series

Regularisation, decomposition and analysis of spacetime series. The pastecs R package is a PNECArt4 and IFREMER (Benoit Beliaeff <Benoit.Beliaeff@ifremer.fr>;) initiative to bring PASSTEC 2000 functionalities to R.

896

Analysis of Ecological and Environmental Data

pgirmess

Spatial Analysis and Data Mining for Field Ecologists

Set of tools for reading, writing and transforming spatial and seasonal data in ecology, model selection and specific statistical tests. It includes functions to discretize polylines into regular point intervals, link observations to those points, compute geographical coordinates at regular intervals between waypoints, read subsets of big rasters, compute zonal statistics or table of categories within polygons or circular buffers from raster. The package also provides miscellaneous functions for model selection, spatial statistics, geometries, writing data.frame with Chinese characters, and some other functions for field ecologists.

897

Analysis of Ecological and Environmental Data

popbio

Construction and Analysis of Matrix Population Models

Construct and analyze projection matrix models from a demography study of marked individuals classified by age or stage. The package covers methods described in Matrix Population Models by Caswell (2001) and Quantitative Conservation Biology by Morris and Doak (2002).

898

Analysis of Ecological and Environmental Data

prabclus

Functions for Clustering and Testing of PresenceAbsence, Abundance and Multilocus Genetic Data

Distancebased parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presenceabsence, abundance and multilocus genetical data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distancebased regressions are equal. Try package?prabclus for on overview.

899

Analysis of Ecological and Environmental Data

primer

Functions and data for A Primer of Ecology with R

Functions are primarily functions for systems of ordinary differential equations, difference equations, and eigenanalysis and projection of demographic matrices; data are for examples.

900

Analysis of Ecological and Environmental Data

pscl

Political Science Computational Laboratory

Bayesian analysis of itemresponse theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zeroinflated and hurdle models for count data; goodnessoffit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seatsvotes curves.

901

Analysis of Ecological and Environmental Data

pvclust

Hierarchical Clustering with PValues via Multiscale Bootstrap Resampling

An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) pvalue as well as BP (bootstrap probability) value for each cluster in a dendrogram.

902

Analysis of Ecological and Environmental Data

qualV

Qualitative Validation Methods

Qualitative methods for the validation of dynamic models. It contains (i) an orthogonal set of deviance measures for absolute, relative and ordinal scale and (ii) approaches accounting for time shifts. The first approach transforms time to take time delays and speed differences into account. The second divides the time series into interval units according to their main features and finds the longest common subsequence (LCS) using a dynamic programming algorithm.

903

Analysis of Ecological and Environmental Data

quantreg

Quantile Regression

Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and nonparametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.

904

Analysis of Ecological and Environmental Data

quantregGrowth

Growth Charts via Regression Quantiles

Fits noncrossing regression quantiles as a function of linear covariates and multiple smooth terms via Bsplines with L1norm difference penalties. Monotonicity constraints on the fitted curves are allowed. See Muggeo, Sciandra, Tomasello and Calvo (2013) <doi:10.1007/s1065101202321> and <doi:10.13140/RG.2.2.12924.85122> for some code example.

905

Analysis of Ecological and Environmental Data

randomForest

Breiman and Cutler’s Random Forests for Classification and Regression

Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <doi:10.1023/A:1010933404324>.

906

Analysis of Ecological and Environmental Data

Rcapture

Loglinear Models for CaptureRecapture Experiments

Estimation of abundance and other of demographic parameters for closed populations, open populations and the robust design in capturerecapture experiments using loglinear models.

907

Analysis of Ecological and Environmental Data

RMark

R Code for Mark Analysis

An interface to the software package MARK that constructs input files for MARK and extracts the output. MARK was developed by Gary White and is freely available at <http://www.phidot.org/software/mark/downloads/> but is not open source.

908

Analysis of Ecological and Environmental Data

RMAWGEN

MultiSite AutoRegressive Weather GENerator

S3 and S4 functions are implemented for spatial multisite stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental “Gaussianized” time series through the ‘vars’ package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.

909

Analysis of Ecological and Environmental Data

rpart

Recursive Partitioning and Regression Trees

Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.

910

Analysis of Ecological and Environmental Data

rtop

Interpolation of Data with Variable Spatial Support

Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.

911

Analysis of Ecological and Environmental Data

seacarb

Seawater Carbonate Chemistry

Calculates parameters of the seawater carbonate system and assists the design of ocean acidification perturbation experiments.

912

Analysis of Ecological and Environmental Data

seas

Seasonal Analysis and Graphics, Especially for Climatology

Capable of deriving seasonal statistics, such as “normals”, and analysis of seasonal data, such as departures. This package also has graphics capabilities for representing seasonal data, including boxplots for seasonal parameters, and bars for summed normals. There are many specific functions related to climatology, including precipitation normals, temperature normals, cumulative precipitation departures and precipitation interarrivals. However, this package is designed to represent any timevarying parameter with a discernible seasonal signal, such as found in hydrology and ecology.

913

Analysis of Ecological and Environmental Data

secr

Spatially Explicit CaptureRecapture

Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distancedependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.

914

Analysis of Ecological and Environmental Data

segmented

Regression Models with BreakPoints / ChangePoints Estimation

Given a regression model, segmented ‘updates’ it by adding one or more segmented (i.e., piecewise linear) relationships. Several variables with multiple breakpoints are allowed. The estimation method is discussed in Muggeo (2003, <doi:10.1002/sim.1545>) and illustrated in Muggeo (2008, <https://www.rproject.org/doc/Rnews/Rnews_20081.pdf>). An approach for hypothesis testing is presented in Muggeo (2016, <doi:10.1080/00949655.2016.1149855>), and interval estimation for the breakpoint is discussed in Muggeo (2017, <doi:10.1111/anzs.12200>).

915

Analysis of Ecological and Environmental Data

sensitivity

Global Sensitivity Analysis of Model Outputs

A collection of functions for factor screening, global sensitivity analysis and reliability sensitivity analysis. Most of the functions have to be applied on model with scalar output, but several functions support multidimensional outputs.

916

Analysis of Ecological and Environmental Data

simba

A Collection of functions for similarity analysis of vegetation data

Besides functions for the calculation of similarity and multiple plot similarity measures with binary data (for instance presence/absence species data) the package contains some simple wrapper functions for reshaping species lists into matrices and vice versa and some other functions for further processing of similarity data (Mantellike permutation procedures) as well as some other useful stuff for vegetation analysis.

917

Analysis of Ecological and Environmental Data

simecol

Simulation of Ecological (and Other) Dynamic Systems

An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individualbased (or agentbased) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and reusability of code.

918

Analysis of Ecological and Environmental Data

siplab

Spatial IndividualPlant Modelling

A platform for experimenting with spatially explicit individualbased vegetation models.

919

Analysis of Ecological and Environmental Data

soiltexture

Functions for Soil Texture Plot, Classification and Transformation

“The Soil Texture Wizard” is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, rightangled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui().

920

Analysis of Ecological and Environmental Data

SPACECAP

A Program to Estimate Animal Abundance and Density using Bayesian SpatiallyExplicit CaptureRecapture Models

SPACECAP is a userfriendly software package for estimating animal densities using closed model capturerecapture sampling based on photographic captures using Bayesian spatiallyexplicit capturerecapture models. This approach offers advantage such as: substantially dealing with problems posed by individual heterogeneity in capture probabilities in conventional capturerecapture analyses. It also offers nonasymptotic inferences which are more appropriate for small samples of capture data typical of photocapture studies.

921

Analysis of Ecological and Environmental Data

SpatialExtremes

Modelling Spatial Extremes

Tools for the statistical modelling of spatial extremes using maxstable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric maxstable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple maxstable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the socalled conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) <doi:10.1214/11STS376>, Padoan et al. (2010) <doi:10.1198/jasa.2009.tm08577>, Dombry et al. (2013) <doi:10.1093/biomet/ass067>.

922

Analysis of Ecological and Environmental Data

StreamMetabolism

Calculate Single Station Metabolism from Diurnal Oxygen Curves

I provide functions to calculate Gross Primary Productivity, Net Ecosystem Production, and Ecosystem Respiration from single station diurnal Oxygen curves.

923

Analysis of Ecological and Environmental Data

strucchange

Testing, Monitoring, and Dating Structural Changes

Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.

924

Analysis of Ecological and Environmental Data

surveillance

Temporal and SpatioTemporal Modeling and Monitoring of Epidemic Phenomena

Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuoustime point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLRCUSUM method of Hohle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several realworld data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatiotemporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemicepidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14AOAS743>. twinSIR() models the susceptibleinfectiousrecovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hohle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates selfexciting point process models for a spatiotemporal point pattern of infective events, e.g., timestamped georeferenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.15410420.2011.01684.x>. A recent overview of the implemented spacetime modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.

925

Analysis of Ecological and Environmental Data

tiger

TIme series of Grouped ERrors

Temporally resolved groups of typical differences (errors) between two time series are determined and visualized

926

Analysis of Ecological and Environmental Data

topmodel

Implementation of the Hydrological Model TOPMODEL in R

Set of hydrological functions including an R implementation of the hydrological model TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. From version 0.7.0, the package is put into maintenance mode.

927

Analysis of Ecological and Environmental Data

tseries

Time Series Analysis and Computational Finance

Time series analysis and computational finance.

928

Analysis of Ecological and Environmental Data

unmarked

Models for Data from Unmarked Animals

Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates.

929

Analysis of Ecological and Environmental Data

untb

Ecological Drift under the UNTB

Hubbell’s Unified Neutral Theory of Biodiversity.

930

Analysis of Ecological and Environmental Data

vegan (core)

Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

931

Analysis of Ecological and Environmental Data

vegetarian

Jost Diversity Measures for Community Data

This package computes diversity for community data sets using the methods outlined by Jost (2006, 2007). While there are differing opinions on the ideal way to calculate diversity (e.g. Magurran 2004), this method offers the advantage of providing diversity numbers equivalents, independent alpha and beta diversities, and the ability to incorporate ‘order’ (q) as a continuous measure of the importance of rare species in the metrics. The functions provided in this package largely correspond with the equations offered by Jost in the cited papers. The package computes alpha diversities, beta diversities, gamma diversities, and similarity indices. Confidence intervals for diversity measures are calculated using a bootstrap method described by Chao et al. (2008). For datasets with many samples (sites, plots), sim.table creates tables of all pairwise comparisons possible, and for grouped samples sim.groups calculates pairwise combinations of within and betweengroup comparisons.

932

Analysis of Ecological and Environmental Data

VGAM

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/9781493928187> gives details of the statistical framework and the package. Currently only fixedeffects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

933

Analysis of Ecological and Environmental Data

wasim

Visualisation and analysis of output files of the hydrological model WASIM

Helpful tools for data processing and visualisation of results of the hydrological model WASIMETH.

934

Analysis of Ecological and Environmental Data

zoo

S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)

An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.

935

Design of Experiments (DoE) & Analysis of Experimental Data

acebayes

Optimal Bayesian Experimental Design using the ACE Algorithm

Optimal Bayesian experimental design using the approximate coordinate exchange (ACE) algorithm.

936

Design of Experiments (DoE) & Analysis of Experimental Data

agricolae (core)

Statistical Procedures for Agricultural Research

Original idea was presented in the thesis “A statistical analysis tool for agricultural research” to obtain the degree of Master on science, National Engineering University (UNI), LimaPeru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, GraecoLatin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several nonparametric tests comparison, biodiversity indexes and consensus cluster.

937

Design of Experiments (DoE) & Analysis of Experimental Data

agridat

Agricultural Datasets

Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from smallplot trials, multienvironment trials, uniformity trials, yield monitors, and more.

938

Design of Experiments (DoE) & Analysis of Experimental Data

AlgDesign (core)

Algorithmic Experimental Design

Algorithmic experimental designs. Calculates exact and approximate theory experimental designs for D,A, and I criteria. Very large designs may be created. Experimental designs may be blocked or blocked designs created from a candidate list, using several criteria. The blocking can be done when whole and within plot factors interact.

939

Design of Experiments (DoE) & Analysis of Experimental Data

ALTopt

Optimal Experimental Designs for Accelerated Life Testing

Creates the optimal (D, U and I) designs for the accelerated life testing with right censoring or interval censoring. It uses generalized linear model (GLM) approach to derive the asymptotic variancecovariance matrix of regression coefficients. The failure time distribution is assumed to follow Weibull distribution with a known shape parameter and loglinear link functions are used to model the relationship between failure time parameters and stress variables. The acceleration model may have multiple stress factors, although most ALTs involve only two or less stress factors. ALTopt package also provides several plotting functions including contour plot, Fraction of Use Space (FUS) plot and Variance Dispersion graphs of Use Space (VDUS) plot.

940

Design of Experiments (DoE) & Analysis of Experimental Data

asd

Simulations for Adaptive Seamless Designs

Package runs simulations for adaptive seamless designs with and without early outcomes for treatment selection and subpopulation type designs.

941

Design of Experiments (DoE) & Analysis of Experimental Data

BatchExperiments

Statistical Experiments on Batch Computing Clusters

Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.

942

Design of Experiments (DoE) & Analysis of Experimental Data

BayesMAMS

Designing Bayesian MultiArm MultiStage Studies

Calculating Bayesian sample sizes for multiarm trials where several experimental treatments are compared to a common control, perhaps even at multiple stages.

943

Design of Experiments (DoE) & Analysis of Experimental Data

bcrm

Bayesian Continual Reassessment Method for Phase I DoseEscalation Trials

Implements a wide variety of one and twoparameter Bayesian CRM designs. The program can run interactively, allowing the user to enter outcomes after each cohort has been recruited, or via simulation to assess operating characteristics.

944

Design of Experiments (DoE) & Analysis of Experimental Data

BHH2

Useful Functions for Box, Hunter and Hunter II

Functions and data sets reproducing some examples in Box, Hunter and Hunter II. Useful for statistical design of experiments, especially factorial experiments.

945

Design of Experiments (DoE) & Analysis of Experimental Data

binseqtest

Exact Binary Sequential Designs and Analysis

For a series of binary responses, create stopping boundary with exact results after stopping, allowing updating for missing assessments.

946

Design of Experiments (DoE) & Analysis of Experimental Data

bioOED

Sensitivity Analysis and Optimum Experiment Design for Microbial Inactivation

Extends the bioinactivation package with functions for Sensitivity Analysis and Optimum Experiment Design.

947

Design of Experiments (DoE) & Analysis of Experimental Data

blocksdesign

Nested and Crossed Block Designs for Factorial, Fractional Factorial and Unstructured Treatment Sets

Constructs Doptimal or near Doptimal nested and crossed block designs for unstructured or general factorial treatment designs. The treatment design, if required, is found from a model matrix design formula and can be added sequentially, if required. The block design is found from a defined set of block factors and is conditional on the defined treatment design. The block factors are added in sequence and each added block factor is optimized conditional on all previously added block factors. The block design can have repeated nesting down to any required depth of nesting with either simple nested blocks or a crossed blocks design at each level of nesting. Outputs include a table showing the allocation of treatments to blocks and tables showing the achieved Defficiency factors for each block and treatment design.

948

Design of Experiments (DoE) & Analysis of Experimental Data

blockTools

Block, Assign, and Diagnose Potential Interference in Randomized Experiments

Blocks units into experimental blocks, with one unit per treatment condition, by creating a measure of multivariate distance between all possible pairs of units. Maximum, minimum, or an allowable range of differences between units on one variable can be set. Randomly assign units to treatment conditions. Diagnose potential interference between units assigned to different treatment conditions. Write outputs to .tex and .csv files.

949

Design of Experiments (DoE) & Analysis of Experimental Data

BOIN

Bayesian Optimal INterval (BOIN) Design for SingleAgent and Drug Combination Phase I Clinical Trials

The Bayesian optimal interval (BOIN) design is a novel phase I clinical trial design for finding the maximum tolerated dose (MTD). It can be used to design both singleagent and drugcombination trials. The BOIN design is motivated by the top priority and concern of clinicians when testing a new drug, which is to effectively treat patients and minimize the chance of exposing them to subtherapeutic or overly toxic doses. The prominent advantage of the BOIN design is that it achieves simplicity and superior performance at the same time. The BOIN design is algorithmbased and can be implemented in a simple way similar to the traditional 3+3 design. The BOIN design yields an average performance that is comparable to that of the continual reassessment method (CRM, one of the best modelbased designs) in terms of selecting the MTD, but has a substantially lower risk of assigning patients to subtherapeutic or overly toxic doses.

950

Design of Experiments (DoE) & Analysis of Experimental Data

BsMD

Bayes Screening and Model Discrimination

Bayes screening and model discrimination followup designs.

951

Design of Experiments (DoE) & Analysis of Experimental Data

choiceDes

Design Functions for Choice Studies

Design functions for DCMs and other types of choice studies (including MaxDiff and other tradeoffs).

952

Design of Experiments (DoE) & Analysis of Experimental Data

CombinS

Construction Methods of some Series of PBIB Designs

Series of partially balanced incomplete block designs (PBIB) based on the combinatory method (S) introduced in (Imane Rezgui et al, 2014) <doi:10.3844/jmssp.2014.45.48>; and it gives their associated Utype design.

953

Design of Experiments (DoE) & Analysis of Experimental Data

conf.design (core)

Construction of factorial designs

This small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs.

954

Design of Experiments (DoE) & Analysis of Experimental Data

crmPack

ObjectOriented Implementation of CRM Designs

Implements a wide range of modelbased dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on doselimiting toxicity endpoints to dualendpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with nonBayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules.

955

Design of Experiments (DoE) & Analysis of Experimental Data

crossdes (core)

Construction of Crossover Designs

Contains functions for the construction of carryover balanced crossover designs. In addition contains functions to check given designs for balance.

956

Design of Experiments (DoE) & Analysis of Experimental Data

Crossover

Analysis and Search of Crossover Designs

Package Crossover provides different crossover designs from combinatorial or search algorithms as well as from literature and a GUI to access them.

957

Design of Experiments (DoE) & Analysis of Experimental Data

dae

Functions Useful in the Design and ANOVA of Experiments

The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the design functions for randomizing and assessing designs available as a vignette called ‘DesignNotes’. The ANOVA functions facilitate the extraction of information when the ‘Error’ function has been used in the call to ‘aov’. The package ‘dae’ can also be installed from <http://chris.brien.name/rpackages/>.

958

Design of Experiments (DoE) & Analysis of Experimental Data

daewr

Design and Analysis of Experiments with R

Contains Data frames and functions used in the book “Design and Analysis of Experiments with R”.

959

Design of Experiments (DoE) & Analysis of Experimental Data

designGG

Computational tool for designing genetical genomics experiments

The package provides R scripts for designing genetical genomics experiments.

960

Design of Experiments (DoE) & Analysis of Experimental Data

designGLMM

Finding Optimal Block Designs for a Generalised Linear Mixed Model

Use simulated annealing to find optimal designs for Poisson regression models with blocks.

961

Design of Experiments (DoE) & Analysis of Experimental Data

designmatch

Matched Samples that are Balanced and Representative by Design

Includes functions for the construction of matched samples that are balanced and representative by design. Among others, these functions can be used for matching in observational studies with treated and control units, with cases and controls, in related settings with instrumental variables, and in discontinuity designs. Also, they can be used for the design of randomized experiments, for example, for matching before randomization. By default, ‘designmatch’ uses the ‘GLPK’ optimization solver, but its performance is greatly enhanced by the ‘Gurobi’ optimization solver and its associated R interface. For their installation, please follow the instructions at <http://user.gurobi.com/download/gurobioptimizer> and <http://www.gurobi.com/documentation/7.0/refman/r_api_overview.html>. We have also included directions in the gurobi_installation file in the inst folder.

962

Design of Experiments (DoE) & Analysis of Experimental Data

desirability

Function Optimization and Ranking via Desirability Functions

S3 classes for multivariate optimization using the desirability function by Derringer and Suich (1980).

963

Design of Experiments (DoE) & Analysis of Experimental Data

desplot

Plotting Field Plans for Agricultural Experiments

A function for plotting maps of agricultural field experiments that are laid out in grids.

964

Design of Experiments (DoE) & Analysis of Experimental Data

dfcomb

Phase I/II Adaptive DoseFinding Design for Combination Studies

Phase I/II adaptive dosefinding design for combination studies where toxicity rates are supposed to increase with both agents.

965

Design of Experiments (DoE) & Analysis of Experimental Data

dfcrm

DoseFinding by the Continual Reassessment Method

Provides functions to run the CRM and TITECRM in phase I trials and calibration tools for trial planning purposes.

966

Design of Experiments (DoE) & Analysis of Experimental Data

dfmta

Phase I/II Adaptive DoseFinding Design for MTA

Phase I/II adaptive dosefinding design for singleagent Molecularly Targeted Agent (MTA), according to the paper “Phase I/II DoseFinding Design for Molecularly Targeted Agent: Plateau Determination using Adaptive Randomization”, Riviere MarieKarelle et al. (2016) <doi:10.1177/0962280216631763>.

967

Design of Experiments (DoE) & Analysis of Experimental Data

dfpk

Bayesian DoseFinding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials

Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods, proposed by Ursino et al, (2017) <doi:10.1002/bimj.201600084>, enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).

968

Design of Experiments (DoE) & Analysis of Experimental Data

DiceDesign

Designs of Computer Experiments

SpaceFilling Designs and Uniformity Criteria.

969

Design of Experiments (DoE) & Analysis of Experimental Data

DiceEval

Construction and Evaluation of Metamodels

Estimation, validation and prediction of models of different types : linear models, additive models, MARS,PolyMARS and Kriging.

970

Design of Experiments (DoE) & Analysis of Experimental Data

DiceKriging

Kriging Methods for Computer Experiments

Estimation, validation and prediction of kriging models. Important functions : km, print.km, plot.km, predict.km.

971

Design of Experiments (DoE) & Analysis of Experimental Data

DiceView

Plot Methods for Computer Experiments Design and Surrogate

View 2D/3D sections or contours of computer experiments designs, surrogates or test functions.

972

Design of Experiments (DoE) & Analysis of Experimental Data

docopulae

Optimal Designs for Copula Models

A direct approach to optimal designs for copula models based on the Fisher information. Provides flexible functions for building joint PDFs, evaluating the Fisher information and finding optimal designs. It includes an extensible solution to summation and integration called ‘nint’, functions for transforming, plotting and comparing designs, as well as a set of tools for common lowlevel tasks.

973

Design of Experiments (DoE) & Analysis of Experimental Data

DoE.base (core)

Full Factorials, Orthogonal Arrays and Base Utilities for DoE Packages

Creates full factorial experimental designs and designs based on orthogonal arrays for (industrial) experiments. Provides diverse quality criteria. Provides utility functions for the class design, which is also used by other packages for designed experiments.

974

Design of Experiments (DoE) & Analysis of Experimental Data

DoE.MIParray

Creation of Arrays by Mixed Integer Programming

‘CRAN’ packages ‘DoE.base’ and ‘Rmosek’ and non‘CRAN’ package ‘gurobi’ are enhanced with functionality for the creation of optimized arrays for experimentation, where optimization is in terms of generalized minimum aberration. It is also possible to optimally extend existing arrays to larger run size. Optimization requires the availability of at least one of the commercial products ‘Gurobi’ or ‘Mosek’ (free academic licenses available for both). For installing ‘Gurobi’ and its R package ‘gurobi’, follow instructions at <http://www.gurobi.com/downloads/gurobioptimizer> and <http://www.gurobi.com/documentation/7.5/refman/r_api_overview.html> (or higher version). For installing ‘Mosek’ and its R package ‘Rmosek’, follow instructions at <https://www.mosek.com/downloads/> and <http://docs.mosek.com/8.1/rmosek/installinterface.html>, or use the functionality in the stump CRAN R package ‘Rmosek’.

975

Design of Experiments (DoE) & Analysis of Experimental Data

DoE.wrapper (core)

Wrapper Package for Design of Experiments Functionality

Various kinds of designs for (industrial) experiments can be created. The package uses, and sometimes enhances, design generation routines from other packages. So far, response surface designs from package rsm, latin hypercube samples from packages lhs and DiceDesign, and Doptimal designs from package AlgDesign have been implemented.

976

Design of Experiments (DoE) & Analysis of Experimental Data

DoseFinding

Planning and Analyzing Dose Finding Experiments

The DoseFinding package provides functions for the design and analysis of dosefinding experiments (with focus on pharmaceutical Phase II clinical trials). It provides functions for: multiple contrast tests, fitting nonlinear doseresponse models (using Bayesian and nonBayesian estimation), calculating optimal designs and an implementation of the MCPMod methodology.

977

Design of Experiments (DoE) & Analysis of Experimental Data

dynaTree

Dynamic Trees for Learning and Design

Inference by sequential Monte Carlo for dynamic tree regression and classification models with hooks provided for sequential design and optimization, fully online learning with drift, variable selection, and sensitivity analysis of inputs. Illustrative examples from the original dynamic trees paper are facilitated by demos in the package; see demo(package=“dynaTree”).

978

Design of Experiments (DoE) & Analysis of Experimental Data

easypower

Sample Size Estimation for Experimental Designs

Power analysis is used in the estimation of sample sizes for experimental designs. Most programs and R packages will only output the highest recommended sample size to the user. Often the user input can be complicated and computing multiple power analyses for different treatment comparisons can be time consuming. This package simplifies the user input and allows the user to view all of the sample size recommendations or just the ones they want to see. The calculations used to calculate the recommended sample sizes are from the ‘pwr’ package.

979

Design of Experiments (DoE) & Analysis of Experimental Data

edesign

Maximum Entropy Sampling

An implementation of maximum entropy sampling for spatial data is provided. An exact branchandbound algorithm as well as greedy and dual greedy heuristics are included.

980

Design of Experiments (DoE) & Analysis of Experimental Data

EngrExpt

Data sets from “Introductory Statistics for Engineering Experimentation”

Datasets from Nelson, Coffin and Copeland “Introductory Statistics for Engineering Experimentation” (Elsevier, 2003) with sample code.

981

Design of Experiments (DoE) & Analysis of Experimental Data

experiment

R Package for Designing and Analyzing Randomized Experiments

Provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomizedblock and matchedpair designs based on possibly multivariate pretreatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, twostage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.

982

Design of Experiments (DoE) & Analysis of Experimental Data

ez

Easy Analysis and Visualization of Factorial Experiments

Facilitates easy analysis of factorial experiments, including purely withinSs designs (a.k.a. “repeated measures”), purely betweenSs designs, and mixed withinandbetweenSs designs. The functions in this package aim to provide simple, intuitive and consistent specification of data analysis and visualization. Visualization functions also include design visualization for preanalysis data auditing, and correlation matrix visualization. Finally, this package includes functions for nonparametric analysis, including permutation tests and bootstrap resampling. The bootstrap function obtains predictions either by cell means or by more advanced/powerful mixed effects models, yielding predictions and confidence intervals that may be easily visualized at any level of the experiment’s design.

983

Design of Experiments (DoE) & Analysis of Experimental Data

FMC

Factorial Experiments with Minimum Level Changes

Generate cost effective minimally changed run sequences for symmetrical as well as asymmetrical factorial designs.

984

Design of Experiments (DoE) & Analysis of Experimental Data

FrF2 (core)

Fractional Factorial Designs with 2Level Factors

Regular and nonregular Fractional Factorial 2level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the builtin function alias).

985

Design of Experiments (DoE) & Analysis of Experimental Data

FrF2.catlg128

Catalogues of resolution IV 128 run 2level fractional factorials up to 33 factors that do have 5letter words

This package provides catalogues of resolution IV regular fractional factorial designs in 128 runs for up to 33 2level factors. The catalogues are complete, excluding resolution IV designs without 5letter words, because these do not add value for a search for clear designs. The previous package version 1.0 with complete catalogues up to 24 runs (24 runs and a namespace added later) can be downloaded from the authors website.

986

Design of Experiments (DoE) & Analysis of Experimental Data

GAD

GAD: Analysis of variance from general principles

This package analyses complex ANOVA models with any combination of orthogonal/nested and fixed/random factors, as described by Underwood (1997). There are two restrictions: (i) data must be balanced; (ii) fixed nested factors are not allowed. Homogeneity of variances is checked using Cochran’s C test and ‘a posteriori’ comparisons of means are done using StudentNewmanKeuls (SNK) procedure.

987

Design of Experiments (DoE) & Analysis of Experimental Data

geospt

Geostatistical Analysis and Design of Optimal Spatial Sampling Networks

Estimation of the variogram through trimmed mean, radial basis functions (optimization, prediction and crossvalidation), summary statistics from crossvalidation, pocket plot, and design of optimal sampling networks through sequential and simultaneous points methods.

988

Design of Experiments (DoE) & Analysis of Experimental Data

granova

Graphical Analysis of Variance

This small collection of functions provides what we call elemental graphics for display of anova results. The term elemental derives from the fact that each function is aimed at construction of graphical displays that afford direct visualizations of data with respect to the fundamental questions that drive the particular anova methods. The two main functions are granova.1w (a graphic for one way anova) and granova.2w (a corresponding graphic for two way anova). These functions were written to display data for any number of groups, regardless of their sizes (however, very large data sets or numbers of groups can be problematic). For these two functions a specialized approach is used to construct databased contrast vectors for which anova data are displayed. The result is that the graphics use straight lines, and when appropriate flat surfaces, to facilitate clear interpretations while being faithful to the standard effect tests in anova. The graphic results are complementary to standard summary tables for these two basic kinds of analysis of variance; numerical summary results of analyses are also provided as side effects. Two additional functions are granova.ds (for comparing two dependent samples), and granova.contr (which provides graphic displays for a priori contrasts). All functions provide relevant numerical results to supplement the graphic displays of anova data. The graphics based on these functions should be especially helpful for learning how the methods have been applied to answer the question(s) posed. This means they can be particularly helpful for students and nonstatistician analysts. But these methods should be quite generally helpful for workaday applications of all kinds, as they can help to identify outliers, clusters or patterns, as well as highlight the role of nonlinear transformations of data. In the case of granova.1w and granova.ds especially, several arguments are provided to facilitate flexibility in the construction of graphics that accommodate diverse features of data, according to their corresponding display requirements. See the help files for individual functions.

989

Design of Experiments (DoE) & Analysis of Experimental Data

GroupSeq

A GUIBased Program to Compute Probabilities Regarding Group Sequential Designs

A graphical user interface to compute group sequential designs based on normally distributed test statistics, particularly critical boundaries, power, drift, and confidence intervals of such designs. All computations are based on the alpha spending approach by LanDeMets with various alpha spending functions being available to choose among.

990

Design of Experiments (DoE) & Analysis of Experimental Data

gsbDesign

Group Sequential Bayes Design

Group Sequential Operating Characteristics for Clinical, Bayesian twoarm Trials with known Sigma and Normal Endpoints.

991

Design of Experiments (DoE) & Analysis of Experimental Data

gsDesign

Group Sequential Design

Derives group sequential designs and describes their properties.

992

Design of Experiments (DoE) & Analysis of Experimental Data

gset

Group Sequential Design in Equivalence Studies

calculate equivalence and futility boundaries based on the exact bivariate \(t\) test statistics for group sequential designs in studies with equivalence hypotheses.

993

Design of Experiments (DoE) & Analysis of Experimental Data

hiPOD

hierarchical Pooled Optimal Design

Based on hierarchical modeling, this package provides a few practical functions to find and present the optimal designs for a pooled NGS design.

994

Design of Experiments (DoE) & Analysis of Experimental Data

ibd

Incomplete Block Designs

A collection of several utility functions related to binary incomplete block designs. The package contains function to generate A and Defficient binary incomplete block designs with given numbers of treatments, number of blocks and block size. The package also contains function to generate an incomplete block design with specified concurrence matrix. There are functions to generate balanced treatment incomplete block designs and incomplete block designs for test versus control treatments comparisons with specified concurrence matrix. Package also allows performing analysis of variance of data and computing estimated marginal means of factors from experiments using a connected incomplete block design. Tests of hypothesis of treatment contrasts in incomplete block design set up is supported.

995

Design of Experiments (DoE) & Analysis of Experimental Data

ICAOD

Optimal Designs for Nonlinear Models

Finds optimal designs for nonlinear models using a metaheuristic algorithm called imperialist competitive algorithm ICA. See, for details, Masoudi et al. (2017) <doi:10.1016/j.csda.2016.06.014> and Masoudi et al. (2019) <doi:10.1080/10618600.2019.1601097>.

996

Design of Experiments (DoE) & Analysis of Experimental Data

idefix

Efficient Designs for Discrete Choice Experiments

Generates efficient designs for discrete choice experiments based on the multinomial logit model, and individually adapted designs for the mixed multinomial logit model. The generated designs can be presented on screen and choice data can be gathered using a shiny application. Crabbe M, Akinc D and Vandebroek M (2014) <doi:10.1016/j.trb.2013.11.008>.

997

Design of Experiments (DoE) & Analysis of Experimental Data

JMdesign

Joint Modeling of Longitudinal and Survival Data  Power Calculation

Performs power calculations for joint modeling of longitudinal and survival data with kth order trajectories when the variancecovariance matrix, Sigma_theta, is unknown.

998

Design of Experiments (DoE) & Analysis of Experimental Data

LDOD

Finding Locally Doptimal optimal designs for some nonlinear and generalized linear models

this package provides functions for Finding Locally Doptimal designs for Logistic, Negative Binomial, Poisson, MichaelisMenten, Exponential, LogLinear, Emax, Richards, Weibull and Inverse Quadratic regression models and also functions for autoconstructing Fisher information matrix and Frechet derivative based on some input variables and without userinterfere.

999

Design of Experiments (DoE) & Analysis of Experimental Data

lhs

Latin Hypercube Samples

Provides a number of methods for creating and augmenting Latin Hypercube Samples.

1000

Design of Experiments (DoE) & Analysis of Experimental Data

MAMS

Designing MultiArm MultiStage Studies

Designing multiarm multistage studies with (asymptotically) normal endpoints and known variance.

1001

Design of Experiments (DoE) & Analysis of Experimental Data

MaxPro

Maximum Projection Designs

Generate maximum projection (MaxPro) designs for quantitative and/or qualitative factors. Details of the MaxPro criterion can be found in: (1) Joseph, Gul, and Ba. (2015) “Maximum Projection Designs for Computer Experiments”, Biometrika, 102, 371380, and (2) Joseph, Gul, and Ba. (2018) “Designing Computer Experiments with Multiple Types of Factors: The MaxPro Approach”, Journal of Quality Technology, to appear.

1002

Design of Experiments (DoE) & Analysis of Experimental Data

MBHdesign

Spatial Designs for Ecological and Environmental Surveys

Provides spatially balanced designs from a set of (contiguous) potential sampling locations in a study region. Accommodates , without detrimental effects on spatial balance, sites that the researcher wishes to include in the survey for reasons other than the current randomisation (legacy sites).

1003

Design of Experiments (DoE) & Analysis of Experimental Data

minimalRSD

Minimally Changed CCD and BBD

Generate central composite designs (CCD)with full as well as fractional factorial points (half replicate) and Box Behnken designs (BBD) with minimally changed run sequence.

1004

Design of Experiments (DoE) & Analysis of Experimental Data

minimaxdesign

Minimax and Minimax Projection Designs

Provides two main functions, minimax() and miniMaxPro(), for computing minimax and minimax projection designs using the minimax clustering algorithm in Mak and Joseph (2018) <doi:10.1080/10618600.2017.1302881>. Current design region options include the unit hypercube (“hypercube”), the unit simplex (“simplex”), the unit ball (“ball”), as well as userdefined constraints on the unit hypercube (“custom”). Minimax designs can also be computed on userprovided images using the function minimax.map(). Design quality can be assessed using the function mMdist(), which computes the minimax (fill) distance of a design.

1005

Design of Experiments (DoE) & Analysis of Experimental Data

mixexp

Design and Analysis of Mixture Experiments

Functions for creating designs for mixture experiments, making ternary contour plots, and making mixture effect plots.

1006

Design of Experiments (DoE) & Analysis of Experimental Data

mkssd

Efficient multilevel kcirculant supersaturated designs

mkssd is a package that generates efficient balanced nonaliased multilevel kcirculant supersaturated designs by interchanging the elements of the generator vector. The package tries to generate a supersaturated design that has chisquare efficiency more than user specified efficiency level (mef). The package also displays the progress of generation of an efficient multilevel kcirculant design through a progress bar. The progress of 100% means that one full round of interchange is completed. More than one full round (typically 45 rounds) of interchange may be required for larger designs.

1007

Design of Experiments (DoE) & Analysis of Experimental Data

mxkssd

Efficient mixedlevel kcirculant supersaturated designs

mxkssd is a package that generates efficient balanced mixedlevel kcirculant supersaturated designs by interchanging the elements of the generator vector. The package tries to generate a supersaturated design that has EfNOD efficiency more than user specified efficiency level (mef). The package also displays the progress of generation of an efficient mixedlevel kcirculant design through a progress bar. The progress of 100 per cent means that one full round of interchange is completed. More than one full round (typically 45 rounds) of interchange may be required for larger designs.

1008

Design of Experiments (DoE) & Analysis of Experimental Data

OBsMD

Objective Bayesian Model Discrimination in FollowUp Designs

Implements the objective Bayesian methodology proposed in Consonni and Deldossi in order to choose the optimal experiment that better discriminate between competing models. G.Consonni, L. Deldossi (2014) Objective Bayesian Model Discrimination in Followup Experimental Designs, Test. <doi:10.1007/s1174901504613>.

1009

Design of Experiments (DoE) & Analysis of Experimental Data

odr

Optimal Design and Statistical Power of Multilevel Randomized Trials

Calculate the optimal sample allocation that minimizes the variance of treatment effect in multilevel randomized trials under fixed budget and cost structure, perform power analyses with and without accommodating costs and budget. The references for proposed methods are: (1) Shen, Z. (in progress). Using optimal sample allocation to improve statistical precision and design efficiency for multilevel randomized trials. (unpublished doctoral dissertation). University of Cincinnati, Cincinnati, OH. (2) Shen, Z., & Kelcey, B. (revise & resubmit). Optimal sample allocation accounts for the full variation of sampling costs in clusterrandomized trials. Journal of Educational and Behavioral Statistics. (3) Shen, Z., & Kelcey, B. (2018, April). Optimal design of cluster randomized trials under condition and unitspecific cost structures. Roundtable discussion presented at American Educational Research Association (AERA) annual conference. (4) Champely., S. (2018). pwr: Basic functions for power analysis (Version 1.22) [Software]. Available from <https://CRAN.Rproject.org/package=pwr>.

1010

Design of Experiments (DoE) & Analysis of Experimental Data

OPDOE

Optimal Design of Experiments

Several function related to Experimental Design are implemented here, see “Optimal Experimental Design with R” by Rasch D. et. al (ISBN 9781439816974).

1011

Design of Experiments (DoE) & Analysis of Experimental Data

optbdmaeAT

Optimal Block Designs for TwoColour cDNA Microarray Experiments

Computes A, MV, D and Eoptimal or nearoptimal block designs for twocolour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all possible elementary treatment contrasts. The algorithms used in this package are based on the treatment exchange and array exchange algorithms of Debusho, Gemechu and Haines (2016, unpublished). The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.

1012

Design of Experiments (DoE) & Analysis of Experimental Data

optDesignSlopeInt

Optimal Designs for Estimating the Slope Divided by the Intercept

Compute optimal experimental designs that measure the slope divided by the intercept.

1013

Design of Experiments (DoE) & Analysis of Experimental Data

OptGS

NearOptimal and Balanced GroupSequential Designs for Clinical Trials with Continuous Outcomes

Functions to find nearoptimal multistage designs for continuous outcomes.

1014

Design of Experiments (DoE) & Analysis of Experimental Data

OptimalDesign

Algorithms for D, A, and IVOptimal Designs

Algorithms for D, A and IVoptimal designs of experiments. Some of the functions in this package require the ‘gurobi’ software and its accompanying R package. For their installation, please follow the instructions at <www.gurobi.com> and the file gurobi_inst.txt, respectively.

1015

Design of Experiments (DoE) & Analysis of Experimental Data

OptimaRegion

Confidence Regions for Optima

Computes confidence regions on the location of response surface optima.

1016

Design of Experiments (DoE) & Analysis of Experimental Data

OptInterim

Optimal Two and Three Stage Designs for SingleArm and TwoArm Randomized Controlled Trials with a LongTerm Binary Endpoint

Optimal two and three stage designs monitoring timetoevent endpoints at a specified timepoint

1017

Design of Experiments (DoE) & Analysis of Experimental Data

optrcdmaeAT

Optimal RowColumn Designs for TwoColour cDNA Microarray Experiments

Computes A, MV, D and Eoptimal or nearoptimal rowcolumn designs for twocolour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all pairwise treatment contrasts. The algorithms used in this package are based on the array exchange and treatment exchange algorithms adopted from Debusho, Gemechu and Haines (2016, unpublished) algorithms after adjusting for the rowcolumn designs setup. The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.

1018

Design of Experiments (DoE) & Analysis of Experimental Data

osDesign

Design and analysis of observational studies

The osDesign serves for planning an observational study. Currently, functionality is focused on the twophase and casecontrol designs. Functions in this packages provides Monte Carlo based evaluation of operating characteristics such as powers for estimators of the components of a logistic regression model.

1019

Design of Experiments (DoE) & Analysis of Experimental Data

PBIBD

Partially Balanced Incomplete Block Designs

The PBIB designs are important type of incomplete block designs having wide area of their applications for example in agricultural experiments, in plant breeding, in sample surveys etc. This package constructs various series of PBIB designs and assists in checking all the necessary conditions of PBIB designs and the association scheme on which these designs are based on. It also assists in calculating the efficiencies of PBIB designs with any number of associate classes. The package also constructs Youdenm square designs which are RowColumn designs for the twoway elimination of heterogeneity. The incomplete columns of these Youdenm square designs constitute PBIB designs. With the present functionality, the package will be of immense importance for the researchers as it will help them to construct PBIB designs, to check if their PBIB designs and association scheme satisfy various necessary conditions for the existence, to calculate the efficiencies of PBIB designs based on any association scheme and to construct Youdenm square designs for the twoway elimination of heterogeneity. R. C. Bose and K. R. Nair (1939) <http://www.jstor.org/stable/40383923>.

1020

Design of Experiments (DoE) & Analysis of Experimental Data

PGM2

Nested Resolvable Designs and their Associated Uniform Designs

Construction method of nested resolvable designs from a projective geometry defined on Galois field of order 2. The obtained Resolvable designs are used to build uniform design. The presented results are based on <https://eudml.org/doc/219563> and A. Boudraa et al. (See references).

1021

Design of Experiments (DoE) & Analysis of Experimental Data

ph2bayes

Bayesian SingleArm Phase II Designs

An implementation of Bayesian singlearm phase II design methods for binary outcome based on posterior probability (Thall and Simon (1994) <doi:10.2307/2533377>) and predictive probability (Lee and Liu (2008) <doi:10.1177/1740774508089279>).

1022

Design of Experiments (DoE) & Analysis of Experimental Data

ph2bye

Phase II Clinical Trial Design Using Bayesian Methods

Calculate the Bayesian posterior/predictive probability and determine the sample size and stopping boundaries for singlearm Phase II design.

1023

Design of Experiments (DoE) & Analysis of Experimental Data

pid

Process Improvement using Data

A collection of scripts and data files for the statistics text: “Process Improvement using Data” <https://learnche.org/pid> and the online course “Experimentation for Improvement” found on Coursera. The package contains code for designed experiments, data sets and other convenience functions used in the book.

1024

Design of Experiments (DoE) & Analysis of Experimental Data

pipe.design

DualAgent Dose Escalation for Phase I Trials using the PIPE Design

Implements the Product of Independent beta Probabilities dose Escalation (PIPE) design for dualagent Phase I trials as described in Mander AP, Sweeting MJ (2015) <doi:10.1002/sim.6434>.

1025

Design of Experiments (DoE) & Analysis of Experimental Data

plgp

Particle Learning of Gaussian Processes

Sequential Monte Carlo inference for fully Bayesian Gaussian process (GP) regression and classification models by particle learning (PL). The sequential nature of inference and the active learning (AL) hooks provided facilitate thrifty sequential design (by entropy) and optimization (by improvement) for classification and regression models, respectively. This package essentially provides a generic PL interface, and functions (arguments to the interface) which implement the GP models and AL heuristics. Functions for a special, linked, regression/classification GP model and an integrated expected conditional improvement (IECI) statistic is provides for optimization in the presence of unknown constraints. Separable and isotropic Gaussian, and singleindex correlation functions are supported. See the examples section of ?plgp and demo(package=“plgp”) for an index of demos

1026

Design of Experiments (DoE) & Analysis of Experimental Data

PopED

Population (and Individual) Optimal Experimental Design

Optimal experimental designs for both population and individual studies based on nonlinear mixedeffect models. Often this is based on a computation of the Fisher Information Matrix. This package was developed for pharmacometric problems, and examples and predefined models are available for these types of systems. The methods are described in Nyberg et al. (2012) <doi:10.1016/j.cmpb.2012.05.005>, and Foracchia et al. (2004) <doi:10.1016/S01692607(03)000737>.

1027

Design of Experiments (DoE) & Analysis of Experimental Data

powerAnalysis

Power Analysis in Experimental Design

Basic functions for power analysis and effect size calculation.

1028

Design of Experiments (DoE) & Analysis of Experimental Data

powerbydesign

Power Estimates for ANOVA Designs

Functions for bootstrapping the power of ANOVA designs based on estimated means and standard deviations of the conditions. Please refer to the documentation of the boot.power.anova() function for further details.

1029

Design of Experiments (DoE) & Analysis of Experimental Data

powerGWASinteraction

Power Calculations for GxE and GxG Interactions for GWAS

Analytical power calculations for GxE and GxG interactions for casecontrol studies of candidate genes and genomewide association studies (GWAS). This includes power calculation for four twostep screening and testing procedures. It can also calculate power for GxE and GxG without any screening.

1030

Design of Experiments (DoE) & Analysis of Experimental Data

PwrGSD

Power in a Group Sequential Design

Tools for the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives. See Izmirlian, G. (2014) <doi:10.4310/SII.2014.v7.n1.a4>.

1031

Design of Experiments (DoE) & Analysis of Experimental Data

qtlDesign

Design of QTL experiments

Tools for the design of QTL experiments

1032

Design of Experiments (DoE) & Analysis of Experimental Data

qualityTools

Statistical Methods for Quality Science

Contains methods associated with the Define, Measure, Analyze, Improve and Control (i.e. DMAIC) cycle of the Six Sigma Quality Management methodology.It covers distribution fitting, normal and nonnormal process capability indices, techniques for Measurement Systems Analysis especially gage capability indices and Gage Repeatability (i.e Gage RR) and Reproducibility studies, factorial and fractional factorial designs as well as response surface methods including the use of desirability functions. Improvement via Six Sigma is project based strategy that covers 5 phases: Define  Pareto Chart; Measure  Probability and QuantileQuantile Plots, Process Capability Indices for various distributions and Gage RR Analyze i.e. Pareto Chart, MultiVari Chart, Dot Plot; Improve  Full and fractional factorial, response surface and mixture designs as well as the desirability approach for simultaneous optimization of more than one response variable. Normal, Pareto and Lenth Plot of effects as well as Interaction Plots; Control  Quality Control Charts can be found in the ‘qcc’ package. The focus is on teaching the statistical methodology used in the Quality Sciences.

1033

Design of Experiments (DoE) & Analysis of Experimental Data

RcmdrPlugin.DoE

R Commander Plugin for (industrial) Design of Experiments

The package provides a platformindependent GUI for design of experiments. It is implemented as a plugin to the RCommander, which is a more general graphical user interface for statistics in R based on tcl/tk. DoE functionality can be accessed through the menu Design that is added to the RCommander menus.

1034

Design of Experiments (DoE) & Analysis of Experimental Data

rodd

Optimal Discriminating Designs

A collection of functions for numerical construction of optimal discriminating designs. At the current moment Toptimal designs (which maximize the lower bound for the power of Ftest for regression model discrimination), KLoptimal designs (for lognormal errors) and their robust analogues can be calculated with the package.

1035

Design of Experiments (DoE) & Analysis of Experimental Data

RPPairwiseDesign

Resolvable partially pairwise balanced design and Spacefilling design via association scheme

Using some association schemes to obtain a new series of resolvable partially pairwise balanced designs (RPPBD) and spacefilling designs.

1036

Design of Experiments (DoE) & Analysis of Experimental Data

rsm (core)

ResponseSurface Analysis

Provides functions to generate responsesurface designs, fit first and secondorder responsesurface models, make surface plots, obtain the path of steepest ascent, and do canonical analysis. A good reference on these methods is Chapter 10 of Wu, CF J and Hamada, M (2009) “Experiments: Planning, Analysis, and Parameter Design Optimization” ISBN 9780471699460.

1037

Design of Experiments (DoE) & Analysis of Experimental Data

rsurface

Design of Rotatable Central Composite Experiments and Response Surface Analysis

Produces tables with the level of replication (number of replicates) and the experimental uncoded values of the quantitative factors to be used for rotatable Central Composite Design (CCD) experimentation and a 2D contour plot of the corresponding variance of the predicted response according to Mead et al. (2012) <doi:10.1017/CBO9781139020879> design_ccd(), and analyzes CCD data with response surface methodology ccd_analysis(). A rotatable CCD provides values of the variance of the predicted response that are concentrically distributed around the average treatment combination used in the experimentation, which with uniform precision (implied by the use of several replicates at the average treatment combination) improves greatly the search and finding of an optimum response. These properties of a rotatable CCD represent undeniable advantages over the classical factorial design, as discussed by Panneton et al. (1999) <doi:10.13031/2013.13267> and Mead et al. (2012) <doi:10.1017/CBO9781139020879.018> among others.

1038

Design of Experiments (DoE) & Analysis of Experimental Data

SensoMineR

Sensory Data Analysis

Statistical Methods to Analyse Sensory Data. SensoMineR: A package for sensory data analysis. S. Le and F. Husson (2008) <doi:10.1111/j.1745459X.2007.00137.x>.

1039

Design of Experiments (DoE) & Analysis of Experimental Data

seqDesign

Simulation and Group Sequential Monitoring of Randomized TwoStage Treatment Efficacy Trials with TimetoEvent Endpoints

A modification of the preventive vaccine efficacy trial design of Gilbert, Grove et al. (2011, Statistical Communications in Infectious Diseases) is implemented, with application generally to individualrandomized clinical trials with multiple active treatment groups and a shared control group, and a study endpoint that is a timetoevent endpoint subject to rightcensoring. The design accounts for the issues that the efficacy of the treatment/vaccine groups may take time to accrue while the multiple treatment administrations/vaccinations are given; there is interest in assessing the durability of treatment efficacy over time; and group sequential monitoring of each treatment group for potential harm, nonefficacy/efficacy futility, and high efficacy is warranted. The design divides the trial into two stages of time periods, where each treatment is first evaluated for efficacy in the first stage of followup, and, if and only if it shows significant treatment efficacy in stage one, it is evaluated for longerterm durability of efficacy in stage two. The package produces plots and tables describing operating characteristics of a specified design including an unconditional power for intentiontotreat and perprotocol/astreated analyses; trial duration; probabilities of the different possible trial monitoring outcomes (e.g., stopping early for nonefficacy); unconditional power for comparing treatment efficacies; and distributions of numbers of endpoint events occurring after the treatments/vaccinations are given, useful as input parameters for the design of studies of the association of biomarkers with a clinical outcome (surrogate endpoint problem). The code can be used for a single active treatment versus control design and for a singlestage design.

1040

Design of Experiments (DoE) & Analysis of Experimental Data

sFFLHD

Sequential Full FactorialBased Latin Hypercube Design

Gives design points from a sequential full factorialbased Latin hypercube design, as described in Duan, Ankenman, Sanchez, and Sanchez (2015, Technometrics, <doi:10.1080/00401706.2015.1108233>).

1041

Design of Experiments (DoE) & Analysis of Experimental Data

simrel

Simulation of Multivariate Linear Model Data

Simulate multivariate linear model data is useful in research and education weather for comparison or create data with specific properties. This package lets user to simulate linear model data of wide range of properties with few tuning parameters. The package also consist of function to create plots for the simulation objects and A shiny app as RStudio gadget. It can be a handy tool for model comparison, testing and many other purposes.

1042

Design of Experiments (DoE) & Analysis of Experimental Data

skpr (core)

Design of Experiments Suite: Generate and Evaluate Optimal Designs

Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of split/splitsplit/…/Nsplit plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve easeofuse and make analyses more reproducible.

1043

Design of Experiments (DoE) & Analysis of Experimental Data

SLHD

MaximinDistance (Sliced) Latin Hypercube Designs

Generate the optimal Latin Hypercube Designs (LHDs) for computer experiments with quantitative factors and the optimal Sliced Latin Hypercube Designs (SLHDs) for computer experiments with both quantitative and qualitative factors. Details of the algorithm can be found in Ba, S., Brenneman, W. A. and Myers, W. R. (2015), “Optimal Sliced Latin Hypercube Designs,” Technometrics. Important function in this package is “maximinSLHD”.

1044

Design of Experiments (DoE) & Analysis of Experimental Data

soptdmaeA

Sequential Optimal Designs for TwoColour cDNA Microarray Experiments

Computes sequential A, MV, D and Eoptimal or nearoptimal block and rowcolumn designs for twocolour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all possible elementary treatment contrasts. The package also provides an optional method of using the graphical user interface (GUI) R package ‘tcltk’ to ensure that it is user friendly.

1045

Design of Experiments (DoE) & Analysis of Experimental Data

sp23design

Design and Simulation of seamless Phase IIIII Clinical Trials

Provides methods for generating, exploring and executing seamless Phase IIIII designs of Lai, Lavori and Shih using generalized likelihood ratio statistics. Includes pdf and source files that describe the entire R implementation with the relevant mathematical details.

1046

Design of Experiments (DoE) & Analysis of Experimental Data

ssize.fdr

Sample Size Calculations for Microarray Experiments

This package contains a set of functions that calculates appropriate sample sizes for onesample ttests, twosample ttests, and Ftests for microarray experiments based on desired power while controlling for false discovery rates. For all tests, the standard deviations (variances) among genes can be assumed fixed or random. This is also true for effect sizes among genes in onesample and two sample experiments. Functions also output a chart of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes.

1047

Design of Experiments (DoE) & Analysis of Experimental Data

ssizeRNA

Sample Size Calculation for RNASeq Experimental Design

We propose a procedure for sample size calculation while controlling false discovery rate for RNAseq experimental design. Our procedure depends on the Voom method proposed for RNAseq data analysis by Law et al. (2014) <doi:10.1186/gb2014152r29> and the sample size calculation method proposed for microarray experiments by Liu and Hwang (2007) <doi:10.1093/bioinformatics/btl664>. We develop a set of functions that calculates appropriate sample sizes for twosample ttest for RNAseq experiments with fixed or varied set of parameters. The outputs also contain a plot of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes. To install this package, please use ‘source(“http://bioconductor.org/biocLite.R”); biocLite(“ssizeRNA”)’.

1048

Design of Experiments (DoE) & Analysis of Experimental Data

support.CEs

Basic Functions for Supporting an Implementation of Choice Experiments

Provides seven basic functions that support an implementation of choice experiments.

1049

Design of Experiments (DoE) & Analysis of Experimental Data

TEQR

Target Equivalence Range Design

The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is deescalated only if an unacceptable level of toxicity is experienced.

1050

Design of Experiments (DoE) & Analysis of Experimental Data

tgp

Bayesian Treed Gaussian Process Models

Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP singleindex models. Provides 1d and 2d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgpclass output. Sensitivity analysis and multiresolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivativefree optimization of noisy blackbox functions.

1051

Design of Experiments (DoE) & Analysis of Experimental Data

ThreeArmedTrials

Design and Analysis of Clinical NonInferiority or Superiority Trials with Active and Placebo Control

Design and analyze threearm noninferiority or superiority trials which follow a goldstandard design, i.e. trials with an experimental treatment, an active, and a placebo control. Method for the following distributions are implemented: Poisson (Mielke and Munk (2009) <arXiv:0912.4169>), negative binomial (Muetze et al. (2016) <doi:10.1002/sim.6738>), normal (Pigeot et al. (2003) <doi:10.1002/sim.1450>; Hasler et al. (2009) <doi:10.1002/sim.3052>), binary (Friede and Kieser (2007) <doi:10.1002/sim.2543>), nonparametric (Muetze et al. (2017) <doi:10.1002/sim.7176>), exponential (Mielke and Munk (2009) <arXiv:0912.4169>).

1052

Design of Experiments (DoE) & Analysis of Experimental Data

toxtestD

Experimental design for binary toxicity tests

Calculates sample size and dose allocation for binary toxicity tests, using the Fish Embryo Toxicity Test as example. An optimal test design is obtained by running (i) spoD (calculate the number of individuals to test under control conditions), (ii) setD (estimate the minimal sample size per treatment given the users precision requirements) and (iii) doseD (construct an individual dose scheme).

1053

Design of Experiments (DoE) & Analysis of Experimental Data

unrepx

Analysis and Graphics for Unreplicated Experiments

Provides halfnormal plots, reference plots, and Pareto plots of effects from an unreplicated experiment, along with various pseudostandarderror measures, simulated reference distributions, and other tools. Many of these methods are described in Daniel C. (1959) <doi:10.1080/00401706.1959.10489866> and/or Lenth R.V. (1989) <doi:10.1080/00401706.1989.10488595>, but some new approaches are added and integrated in one package.

1054

Design of Experiments (DoE) & Analysis of Experimental Data

vdg

Variance Dispersion Graphs and Fraction of Design Space Plots

Facilities for constructing variance dispersion graphs, fraction ofdesignspace plots and similar graphics for exploring the properties of experimental designs. The design region is explored via random sampling, which allows for more flexibility than traditional variance dispersion graphs. A formula interface is leveraged to provide access to complex model formulae. Graphics can be constructed simultaneously for multiple experimental designs and/or multiple model formulae. Instead of using pointwise optimization to find the minimum and maximum scaled prediction variance curves, which can be inaccurate and time consuming, this package uses quantile regression as an alternative.

1055

Design of Experiments (DoE) & Analysis of Experimental Data

Vdgraph

Variance dispersion graphs and Fraction of design space plots for response surface designs

Uses a modification of the published FORTRAN code in “A Computer Program for Generating Variance Dispersion Graphs” by G. Vining, Journal of Quality Technology, Vol. 25 No. 1 January 1993, to produce variance dispersion graphs. Also produces fraction of design space plots, and contains data frames for several minimal run response surface designs.

1056

Design of Experiments (DoE) & Analysis of Experimental Data

VdgRsm

Plots of Scaled Prediction Variances for Response Surface Designs

Functions for creating variance dispersion graphs, fraction of design space plots, and contour plots of scaled prediction variances for secondorder response surface designs in spherical and cuboidal regions. Also, some standard response surface designs can be generated.

1057

Design of Experiments (DoE) & Analysis of Experimental Data

VNM

Finding MultipleObjective Optimal Designs for the 4Parameter Logistic Model

Provide tools for finding multipleobjective optimal designs for estimating the shape of doseresponse, the ED50 (the dose producing an effect midway between the expected responses at the extreme doses) and the MED (the minimum effective dose level) for the 2,3,4parameter logistic models and for evaluating its efficiencies for the three objectives. The acronym VNM stands for Valgorithm using Newton Raphson method to search multipleobjective optimal design.

1058

Extreme Value Analysis

copula

Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extremevalue and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extremevalue dependence, goodnessoffit) and model selection based on crossvalidation. Empirical copula, smoothed versions, and nonparametric estimators of the Pickands dependence function.

1059

Extreme Value Analysis

evd (core)

Functions for Extreme Value Distributions

Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.

1060

Extreme Value Analysis

evdbayes

Bayesian Analysis in Extreme Value Theory

Provides functions for the bayesian analysis of extreme value models, using MCMC methods.

1061

Extreme Value Analysis

evir (core)

Extreme Values in R

Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.

1062

Extreme Value Analysis

evmix

Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

The usual distribution functions, maximum likelihood inference and model diagnostics for univariate stationary extreme value mixture models are provided. Kernel density estimation including various boundary corrected kernel density estimation methods and a wide choice of kernels, with crossvalidation likelihood based bandwidth estimator. Reasonable consistency with the base functions in the ‘evd’ package is provided, so that users can safely interchange most code.

1063

Extreme Value Analysis

extremefit

Estimation of Extreme Conditional Quantiles and Probabilities

Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.

1064

Extreme Value Analysis

extRemes

Extreme Value Analysis

Functions for performing extreme value analysis.

1065

Extreme Value Analysis

extremeStat

Extreme Value Statistics and Quantile Estimation

Code to fit, plot and compare several (extreme value) distribution functions. Can also compute (truncated) distribution quantile estimates and draw a plot with return periods on a linear scale.

1066

Extreme Value Analysis

fExtremes

Rmetrics  Modelling Extreme Events in Finance

Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data preprocessing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.

1067

Extreme Value Analysis

in2extRemes

Into the extRemes Package

Graphical User Interface (GUI) to some of the functions in the package extRemes version >= 2.0 are included.

1068

Extreme Value Analysis

ismev

An Introduction to Statistical Modeling of Extreme Values

Functions to support the computations carried out in ‘An Introduction to Statistical Modeling of Extreme Values’ by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes.

1069

Extreme Value Analysis

lmom

LMoments

Functions related to Lmoments: computation of Lmoments and trimmed Lmoments of distributions and data samples; parameter estimation; Lmoment ratio diagram; plot vs. quantiles of an extremevalue distribution.

1070

Extreme Value Analysis

lmomco

LMoments, Censored LMoments, Trimmed LMoments, LComoments, and Many Distributions

Extensive functions for Lmoments (LMs) and probabilityweighted moments (PWMs), parameter estimation for distributions, LM computation for distributions, and Lmoment ratio diagrams. Maximum likelihood and maximum product of spacings estimation are also available. LMs for righttail and lefttail censoring by known or unknown threshold and by indicator variable are available. Asymmetric (asy) trimmed LMs (TLmoments, TLMs) are supported. LMs of residual (resid) and reversed (rev) resid life are implemented along with 13 quantile function operators for reliability and survival analyses. Exact analytical bootstrap estimates of order statistics, LMs, and variances covariances of LMs are provided. The HarriCoble Tau34squared Normality Test is available. Distribution support with “L” (LMs), “TL” (TLMs) and added (+) support for righttail censoring (RC) encompasses: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], EtaMu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L,TL], Gen Logistic [L), Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], KappaMu [L], Kumaraswamy [L], Laplace [L], Linear Mean Resid. Quantile Function [L], Normal [L], 3p logNormal [L], Pearson Type III [L], Rayleigh [L], RevGumbel [L+RC], Rice/Rician [L], Slash [TL], 3p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L]. Multivariate sample Lcomoments (LCMs) are implemented to measure asymmetric associations.

1071

Extreme Value Analysis

lmomRFA

Regional Frequency Analysis using LMoments

Functions for regional frequency analysis using the methods of J. R. M. Hosking and J. R. Wallis (1997), “Regional frequency analysis: an approach based on Lmoments”.

1072

Extreme Value Analysis

mev

Multivariate Extreme Value Distributions

Exact simulation from maxstable processes, RPareto processes for various parametric models. Threshold selection methods. Multivariate extreme diagnostics. Estimation and likelihoods for univariate extremes.

1073

Extreme Value Analysis

POT

Generalized Pareto Distribution and Peaks Over Threshold

Some functions useful to perform a Peak Over Threshold analysis in univariate and bivariate cases, see Beirlant et al. (2004) <doi:10.1002/0470012382>. A user’s guide is available.

1074

Extreme Value Analysis

ptsuite

Tail Index Estimation for Power Law Distributions

Various estimation methods for the shape parameter of Pareto distributed data. This package contains functions for various estimation methods such as maximum likelihood (Newman, 2005)<doi:10.1016/j.cities.2012.03.001>, Hill’s estimator (Hill, 1975)<doi:10.1214/aos/1176343247>, least squares (Zaher et al., 2014)<doi:10.9734/BJMCS/2014/10890>, method of moments (Rytgaard, 1990)<doi:10.2143/AST.20.2.2005443>, percentiles (Bhatti et al., 2018)<doi:10.1371/journal.pone.0196456>, and weighted least squares (Nair et al., 2019) to estimate the shape parameter of Pareto distributed data. It also provides both a heuristic method (Hubert et al., 2013)<doi:10.1016/j.csda.2012.07.011> and a goodness of fit test (Gulati and Shapiro, 2008)<doi:10.1007/9780817646196> for testing for Pareto data as well as a method for generating Pareto distributed data.

1075

Extreme Value Analysis

QRM

Provides RLanguage Code to Examine Quantitative Risk Management Concepts

Accompanying package to the book Quantitative Risk Management: Concepts, Techniques and Tools by Alexander J. McNeil, Rudiger Frey, and Paul Embrechts.

1076

Extreme Value Analysis

ReIns

Functions from “Reinsurance: Actuarial and Statistical Aspects”

Functions from the book “Reinsurance: Actuarial and Statistical Aspects” (2017) by Hansjoerg Albrecher, Jan Beirlant and Jef Teugels <http://www.wiley.com/WileyCDA/WileyTitle/productCd0470772689.html>.

1077

Extreme Value Analysis

Renext

Renewal Method for Extreme Values Extrapolation

Peaks Over Threshold (POT) or ‘methode du renouvellement’. The distribution for the exceedances can be chosen, and heterogeneous data (including historical data or block data) can be used in a MaximumLikelihood framework.

1078

Extreme Value Analysis

revdbayes

RatioofUniforms Sampling for Bayesian Extreme Value Analysis

Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.rproject.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.rproject.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the Kgaps model of Suveges and Davison (2010) <doi:10.1214/09AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.

1079

Extreme Value Analysis

RTDE

Robust Tail Dependence Estimation

Robust tail dependence estimation for bivariate models. This package is based on two papers by the authors:‘Robust and biascorrected estimation of the coefficient of tail dependence’ and ‘Robust and biascorrected estimation of probabilities of extreme failure sets’. This work was supported by a research grant (VKR023480) from VILLUM FONDEN and an international project for scientific cooperation (PICS6416).

1080

Extreme Value Analysis

SpatialExtremes

Modelling Spatial Extremes

Tools for the statistical modelling of spatial extremes using maxstable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric maxstable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple maxstable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the socalled conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) <doi:10.1214/11STS376>, Padoan et al. (2010) <doi:10.1198/jasa.2009.tm08577>, Dombry et al. (2013) <doi:10.1093/biomet/ass067>.

1081

Extreme Value Analysis

texmex

Statistical Modelling of Extreme Values

Statistical extreme value modelling of threshold excesses, maxima and multivariate extremes. Univariate models for threshold excesses and maxima are the Generalised Pareto, and Generalised Extreme Value model respectively. These models may be fitted by using maximum (optionally penalised)likelihood, or Bayesian estimation, and both classes of models may be fitted with covariates in any/all model parameters. Model diagnostics support the fitting process. Graphical output for visualising fitted models and return level estimates is provided. For serially dependent sequences, the intervals declustering algorithm of Ferro and Segers (2003) <doi:10.1111/14679868.00401> is provided, with diagnostic support to aid selection of threshold and declustering horizon. Multivariate modelling is performed via the conditional approach of Heffernan and Tawn (2004) <doi:10.1111/j.14679868.2004.02050.x>, with graphical tools for threshold selection and to diagnose estimation convergence.

1082

Extreme Value Analysis

threshr

Threshold Selection and Uncertainty for Extreme Value Analysis

Provides functions for the selection of thresholds for use in extreme value models, based mainly on the methodology in Northrop, Attalides and Jonathan (2017) <doi:10.1111/rssc.12159>. It also performs predictive inferences about future extreme values, based either on a single threshold or on a weighted average of inferences from multiple thresholds, using the ‘revdbayes’ package <https://cran.rproject.org/package=revdbayes>. At the moment only the case where the data can be treated as independent identically distributed observations is considered.

1083

Extreme Value Analysis

VGAM

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/9781493928187> gives details of the statistical framework and the package. Currently only fixedeffects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

1084

Empirical Finance

actuar

Actuarial Functions and Heavy Tailed Distributions

Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss amounts and loss frequency: 19 continuous heavy tailed distributions; the Poissoninverse Gaussian discrete distribution; zerotruncated and zeromodified extensions of the standard discrete distributions. Support for phasetype distributions commonly used to compute ruin probabilities.

1085

Empirical Finance

AmericanCallOpt

This package includes pricing function for selected American call options with underlying assets that generate payouts

This package includes a set of pricing functions for American call options. The following cases are covered: Pricing of an American call using the standard binomial approximation; Hedge parameters for an American call with a standard binomial tree; Binomial pricing of an American call with continuous payout from the underlying asset; Binomial pricing of an American call with an underlying stock that pays proportional dividends in discrete time; Pricing of an American call on futures using a binomial approximation; Pricing of a currency futures American call using a binomial approximation; Pricing of a perpetual American call. The user should kindly notice that this material is for educational purposes only. The codes are not optimized for computational efficiency as they are meant to represent standard cases of analytical and numerical solution.

1086

Empirical Finance

backtest

Exploring PortfolioBased Conjectures About Financial Instruments

The backtest package provides facilities for exploring portfoliobased conjectures about financial instruments (stocks, bonds, swaps, options, et cetera).

1087

Empirical Finance

bayesGARCH

Bayesian Estimation of the GARCH(1,1) Model with Studentt Innovations

Provides the bayesGARCH() function which performs the Bayesian estimation of the GARCH(1,1) model with Student’s t innovations as described in Ardia (2008) <doi:10.1007/9783540786573>.

1088

Empirical Finance

BCC1997

Calculation of Option Prices Based on a Universal Solution

Calculates the prices of European options based on the universal solution provided by Bakshi, Cao and Chen (1997) <doi:10.1111/j.15406261.1997.tb02749.x>. This solution considers stochastic volatility, stochastic interest and random jumps. Please cite their work if this package is used.

1089

Empirical Finance

BenfordTests

Statistical Tests for Evaluating Conformity to Benford’s Law

Several specialized statistical tests and support functions for determining if numerical data could conform to Benford’s law.

1090

Empirical Finance

betategarch

Simulation, Estimation and Forecasting of BetaSkewtEGARCH Models

Simulation, estimation and forecasting of firstorder BetaSkewtEGARCH models with leverage (onecomponent, twocomponent, skewed versions).

1091

Empirical Finance

bizdays

Business Days Calculations and Utilities

Business days calculations based on a list of holidays and nonworking weekdays. Quite useful for fixed income and derivatives pricing.

1092

Empirical Finance

BLModel

BlackLitterman Posterior Distribution

Posterior distribution in the BlackLitterman model is computed from a prior distribution given in the form of a time series of asset returns and a continuous distribution of views provided by the user as an external function.

1093

Empirical Finance

BurStFin

Burns Statistics Financial

A suite of functions for finance, including the estimation of variance matrices via a statistical factor model or LedoitWolf shrinkage.

1094

Empirical Finance

BurStMisc

Burns Statistics Miscellaneous

Script search, corner, genetic optimization, permutation tests, write expect test.

1095

Empirical Finance

CADFtest

A Package to Perform Covariate Augmented DickeyFuller Unit Root Tests

Hansen’s (1995) CovariateAugmented DickeyFuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The pvalues of the test are computed using the procedure illustrated in Lupi (2009).

1096

Empirical Finance

car

Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.

1097

Empirical Finance

ChainLadder

Statistical Methods and Models for Claims Reserving in General Insurance

Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.

1098

Empirical Finance

copula

Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extremevalue and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extremevalue dependence, goodnessoffit) and model selection based on crossvalidation. Empirical copula, smoothed versions, and nonparametric estimators of the Pickands dependence function.

1099

Empirical Finance

CreditMetrics

Functions for calculating the CreditMetrics risk model

A set of functions for computing the CreditMetrics risk model

1100

Empirical Finance

credule

Credit Default Swap Functions

It provides functions to bootstrap Credit Curves from market quotes (Credit Default Swap  CDS  spreads) and price Credit Default Swaps  CDS.

1101

Empirical Finance

crp.CSFP

CreditRisk+ Portfolio Model

Modelling credit risks based on the concept of “CreditRisk+”, First Boston Financial Products, 1997 and “CreditRisk+ in the Banking Industry”, Gundlach & Lehrbass, Springer, 2003.

1102

Empirical Finance

crseEventStudy

A Robust and Powerful Test of Abnormal Stock Returns in LongHorizon Event Studies

Based on Dutta et al. (2018) <doi:10.1016/j.jempfin.2018.02.004>, this package provides their standardized test for abnormal returns in longhorizon event studies. The methods used improve the major weaknesses of size, power, and robustness of longrun statistical tests described in Kothari/Warner (2007) <doi:10.1016/B9780444532657.500159>. Abnormal returns are weighted by their statistical precision (i.e., standard deviation), resulting in abnormal standardized returns. This procedure efficiently captures the heteroskedasticity problem. Clustering techniques following Cameron et al. (2011) <10.1198/jbes.2010.07136> are adopted for computing crosssectional correlation robust standard errors. The statistical tests in this package therefore accounts for potential biases arising from returns’ crosssectional correlation, autocorrelation, and volatility clustering without power loss.

1103

Empirical Finance

cvar

Compute Expected Shortfall and Value at Risk for Continuous Distributions

Compute expected shortfall (ES) and Value at Risk (VaR) from a quantile function, distribution function, random number generator or probability density function. ES is also known as Conditional Value at Risk (CVaR). Virtually any continuous distribution can be specified. The functions are vectorized over the arguments. The computations are done directly from the definitions, see e.g. Acerbi and Tasche (2002) <doi:10.1111/14680300.00091>. Some support for GARCH models is provided, as well.

1104

Empirical Finance

data.table

Extension of ‘data.frame’

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast characterseparatedvalue read/write. Offers a natural and flexible syntax, for faster development.

1105

Empirical Finance

derivmkts

Functions and R Code to Accompany Derivatives Markets

A set of pricing and expository functions that should be useful in teaching a course on financial derivatives.

1106

Empirical Finance

dlm

Bayesian and Likelihood Analysis of Dynamic Linear Models

Provides routines for Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models.

1107

Empirical Finance

Dowd

Functions Ported from ‘MMR2’ Toolbox Offered in Kevin Dowd’s Book Measuring Market Risk

‘Kevin Dowd’s’ book Measuring Market Risk is a widely read book in the area of risk measurement by students and practitioners alike. As he claims, ‘MATLAB’ indeed might have been the most suitable language when he originally wrote the functions, but, with growing popularity of R it is not entirely valid. As ‘Dowd’s’ code was not intended to be error free and were mainly for reference, some functions in this package have inherited those errors. An attempt will be made in future releases to identify and correct them. ‘Dowd’s’ original code can be downloaded from www.kevindowd.org/measuringmarketrisk/. It should be noted that ‘Dowd’ offers both ‘MMR2’ and ‘MMR1’ toolboxes. Only ‘MMR2’ was ported to R. ‘MMR2’ is more recent version of ‘MMR1’ toolbox and they both have mostly similar function. The toolbox mainly contains different parametric and non parametric methods for measurement of market risk as well as backtesting risk measurement methods.

1108

Empirical Finance

DriftBurstHypothesis

Calculates the TestStatistic for the Drift Burst Hypothesis

Calculates the TStatistic for the drift burst hypothesis from the working paper Christensen, Oomen and Reno (2018) <doi:10.2139/ssrn.2842535>. The authors’ MATLAB code is available upon request, see: <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2842535>.

1109

Empirical Finance

dse

Dynamic Systems Estimation (Time Series Package)

Tools for multivariate, linear, timeinvariant, time series models. This includes ARMA and statespace representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and statespace model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.

1110

Empirical Finance

DtD

Distance to Default

Provides fast methods to work with Merton’s distance to default model introduced in Merton (1974) <doi:10.1111/j.15406261.1974.tb03058.x>. The methods includes simulation and estimation of the parameters.

1111

Empirical Finance

dyn

Time Series Regression

Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.

1112

Empirical Finance

dynlm

Dynamic Linear Regression

Dynamic linear models and time series regression.

1113

Empirical Finance

ESG

ESG  A package for asset projection

The package presents a “Scenarios” class containing general parameters, risk parameters and projection results. Risk parameters are gathered together into a ParamsScenarios subobject. The general process for using this package is to set all needed parameters in a Scenarios object, use the customPathsGeneration method to proceed to the projection, then use xxx_PriceDistribution() methods to get asset prices.

1114

Empirical Finance

estudy2

An Implementation of Parametric and Nonparametric Event Study

An implementation of a most commonly used event study methodology, including both parametric and nonparametric tests. It contains variety aspects of the rate of return estimation (the core calculation is done in C++), as well as three classical for event study market models: mean adjusted returns, market adjusted returns and singleindex market models. There are 6 parametric and 6 nonparametric tests provided, which examine crosssectional daily abnormal return (see the documentation of the functions for more information). Parametric tests include tests proposed by Brown and Warner (1980) <doi:10.1016/0304405X(80)900021>, Brown and Warner (1985) <doi:10.1016/0304405X(85)90042X>, Boehmer et al. (1991) <doi:10.1016/0304405X(91)90032F>, Patell (1976) <doi:10.2307/2490543>, and Lamb (1995) <doi:10.2307/253695>. Nonparametric tests covered in estudy2 are tests described in Corrado and Zivney (1992) <doi:10.2307/2331331>, McConnell and Muscarella (1985) <doi:10.1016/0304405X(85)900066>, Boehmer et al. (1991) <doi:10.1016/0304405X(91)90032F>, Cowan (1992) <doi:10.1007/BF00939016>, Corrado (1989) <doi:10.1016/0304405X(89)900640>, Campbell and Wasley (1993) <doi:10.1016/0304405X(93)900257>, Savickas (2003) <doi:10.1111/14756803.00052>, Kolari and Pynnonen (2010) <doi:10.1093/rfs/hhq072>. Furthermore, tests for the cumulative abnormal returns proposed by Brown and Warner (1985) <doi:10.1016/0304405X(85)90042X> and Lamb (1995) <doi:10.2307/253695> are included.

1115

Empirical Finance

factorstochvol

Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models

Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models with interweaving <doi:10.1080/10618600.2017.1322091>. Sparsity can be achieved through the usage of NormalGamma priors on the factor loading matrix <doi:10.1016/j.jeconom.2018.11.007>.

1116

Empirical Finance

fame

Interface for FAME Time Series Database

Read and write FAME databases.

1117

Empirical Finance

fAssets (core)

Rmetrics  Analysing and Modelling Financial Assets

Provides a collection of functions to manage, to investigate and to analyze data sets of financial assets from different points of view.

1118

Empirical Finance

FatTailsR

Kiener Distributions and Fat Tails in Finance

Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in market finance, neuroscience and other disciplines. Two algorithms to estimate with a high accuracy distribution parameters, quantiles, valueatrisk and expected shortfall. Include power hyperbolas and power hyperbolic functions.

1119

Empirical Finance

fBasics (core)

Rmetrics  Markets and Basic Statistics

Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.

1120

Empirical Finance

fBonds (core)

Rmetrics  Pricing and Evaluating Bonds

It implements the NelsonSiegel and the NelsonSiegelSvensson term structures.

1121

Empirical Finance

fCopulae (core)

Rmetrics  Bivariate Dependence Structures with Copulae

Provides a collection of functions to manage, to investigate and to analyze bivariate financial returns by Copulae. Included are the families of Archemedean, Elliptical, Extreme Value, and Empirical Copulae.

1122

Empirical Finance

fExoticOptions (core)

Rmetrics  Pricing and Evaluating Exotic Option

Provides a collection of functions to evaluate barrier options, Asian options, binary options, currency translated options, lookback options, multiple asset options and multiple exercise options.

1123

Empirical Finance

fExtremes (core)

Rmetrics  Modelling Extreme Events in Finance

Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data preprocessing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.

1124

Empirical Finance

fgac

Generalized Archimedean Copula

Bivariate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).

1125

Empirical Finance

fGarch (core)

Rmetrics  Autoregressive Conditional Heteroskedastic Modelling

Provides a collection of functions to analyze and model heteroskedastic behavior in financial time series models.

1126

Empirical Finance

fImport (core)

Rmetrics  Importing Economic and Financial Data

Provides a collection of utility functions to download and manage data sets from the Internet or from other sources.

1127

Empirical Finance

FinancialMath

Financial Mathematics for Actuaries

Contains financial math functions and introductory derivative functions included in the Society of Actuaries and Casualty Actuarial Society ‘Financial Mathematics’ exam, and some topics in the ‘Models for Financial Economics’ exam.

1128

Empirical Finance

FinAsym

Classifies implicit trading activity from market quotes and computes the probability of informed trading

This package accomplishes two tasks: a) it classifies implicit trading activity from quotes in OTC markets using the algorithm of Lee and Ready (1991); b) based on information for trade initiation, the package computes the probability of informed trading of Easley and O’Hara (1987).

1129

Empirical Finance

finreportr

Financial Data from U.S. Securities and Exchange Commission

Download and display company financial data from the U.S. Securities and Exchange Commission’s EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See <https://www.sec.gov/edgar/searchedgar/companysearch.html> for more information.

1130

Empirical Finance

fmdates

Financial Market Date Calculations

Implements common date calculations relevant for specifying the economic nature of financial market contracts that are typically defined by International Swap Dealer Association (ISDA, <http://www2.isda.org>) legal documentation. This includes methods to check whether dates are business days in certain locales, functions to adjust and shift dates and time length (or day counter) calculations.

1131

Empirical Finance

fMultivar (core)

Rmetrics  Analysing and Modeling Multivariate Financial Return Distributions

Provides a collection of functions to manage, to investigate and to analyze bivariate and multivariate data sets of financial returns.

1132

Empirical Finance

fNonlinear (core)

Rmetrics  Nonlinear and Chaotic Time Series Modelling

Provides a collection of functions for testing various aspects of univariate time series including independence and neglected nonlinearities. Further provides functions to investigate the chaotic behavior of time series processes and to simulate different types of chaotic time series maps.

1133

Empirical Finance

fOptions (core)

Rmetrics  Pricing and Evaluating Basic Options

Provides a collection of functions to valuate basic options. This includes the generalized BlackScholes option, options on futures and options on commodity futures.

1134

Empirical Finance

forecast

Forecasting Functions for Time Series and Linear Models

Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

1135

Empirical Finance

fPortfolio (core)

Rmetrics  Portfolio Selection and Optimization

Provides a collection of functions to optimize portfolios and to analyze them from different points of view.

1136

Empirical Finance

fracdiff

Fractionally differenced ARIMA aka ARFIMA(p,d,q) models

Maximum likelihood estimation of the parameters of a fractionally differenced ARIMA(p,d,q) model (Haslett and Raftery, Appl.Statistics, 1989).

1137

Empirical Finance

fractal

A Fractal Time Series Modeling and Analysis Package

Stochastic fractal and deterministic chaotic time series analysis.

1138

Empirical Finance

FRAPO

Financial Risk Modelling and Portfolio Optimisation with R

Accompanying package of the book ‘Financial Risk Modelling and Portfolio Optimisation with R’, second edition. The data sets used in the book are contained in this package.

1139

Empirical Finance

fRegression (core)

Rmetrics  Regression Based Decision and Prediction

A collection of functions for linear and nonlinear regression modelling. It implements a wrapper for several regression models available in the base and contributed packages of R.

1140

Empirical Finance

frmqa

The Generalized Hyperbolic Distribution, Related Distributions and Their Applications in Finance

A collection of R and C++ functions to work with the generalized hyperbolic distribution, related distributions and their applications in financial risk management and quantitative analysis.

1141

Empirical Finance

fTrading (core)

Rmetrics  Trading and Rebalancing Financial Instruments

A collection of functions for trading and rebalancing financial instruments. It implements various technical indicators to analyse time series such as moving averages or stochastic oscillators.

1142

Empirical Finance

GCPM

Generalized Credit Portfolio Model

Analyze the default risk of credit portfolios. Commonly known models, like CreditRisk+ or the CreditMetrics model are implemented in their very basic settings. The portfolio loss distribution can be achieved either by simulation or analytically in case of the classic CreditRisk+ model. Models are only implemented to respect losses caused by defaults, i.e. migration risk is not included. The package structure is kept flexible especially with respect to distributional assumptions in order to quantify the sensitivity of risk figures with respect to several assumptions. Therefore the package can be used to determine the credit risk of a given portfolio as well as to quantify model sensitivities.

1143

Empirical Finance

GetHFData

Download and Aggregate High Frequency Trading Data from Bovespa

Downloads and aggregates high frequency trading data for Brazilian instruments directly from Bovespa ftp site <ftp://ftp.bmf.com.br/MarketData/>.

1144

Empirical Finance

gets

GeneraltoSpecific (GETS) Modelling and Indicator Saturation Methods

Automated GeneraltoSpecific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean.

1145

Empirical Finance

GetTDData

Get Data for Brazilian Bonds (Tesouro Direto)

Downloads and aggregates data for Brazilian government issued bonds directly from the website of Tesouro Direto <http://www.tesouro.fazenda.gov.br/tesourodiretobalancoeestatisticas>.

1146

Empirical Finance

GEVStableGarch

ARMAGARCH/APARCH Models with GEV and Stable Distributions

Package for simulation and estimation of ARMAGARCH/APARCH models with GEV and stable distributions.

1147

Empirical Finance

ghyp

A Package on Generalized Hyperbolic Distribution and Its Special Cases

Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Studentt and Gaussian distribution). Especially, it contains fitting procedures, an AICbased model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution.

1148

Empirical Finance

gmm

Generalized Method of Moments and Generalized Empirical Likelihood

It is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; <doi:10.2307/1392442>) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; <doi:10.1111/j.00130133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.14680262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.14680262.2005.00601.x>).

1149

Empirical Finance

gogarch

Generalized Orthogonal GARCH (GOGARCH) models

Implementation of the GOGARCH model class

1150

Empirical Finance

GUIDE

GUI for DErivatives in R

A nice GUI for financial DErivatives in R.

1151

Empirical Finance

highfrequency

Tools for Highfrequency Data Analysis

Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity.

1152

Empirical Finance

IBrokers

R API to Interactive Brokers Trader Workstation

Provides native R access to Interactive Brokers Trader Workstation API.

1153

Empirical Finance

InfoTrad

Calculates the Probability of Informed Trading (PIN)

Estimates the probability of informed trading (PIN) initially introduced by Easley et. al. (1996) <doi:10.1111/j.15406261.1996.tb04074.x> . Contribution of the package is that it uses likelihood factorizations of Easley et. al. (2010) <doi:10.1017/S0022109010000074> (EHO factorization) and Lin and Ke (2011) <doi:10.1016/j.finmar.2011.03.001> (LK factorization). Moreover, the package uses different estimation algorithms. Specifically, the gridsearch algorithm proposed by Yan and Zhang (2012) <doi:10.1016/j.jbankfin.2011.08.003> , hierarchical agglomerative clustering approach proposed by Gan et. al. (2015) <doi:10.1080/14697688.2015.1023336> and later extended by Ersan and Alici (2016) <doi:10.1016/j.intfin.2016.04.001> .

1154

Empirical Finance

lgarch

Simulation and Estimation of LogGARCH Models

Simulation and estimation of univariate and multivariate logGARCH models. The main functions of the package are: lgarchSim(), mlgarchSim(), lgarch() and mlgarch(). The first two functions simulate from a univariate and a multivariate logGARCH model, respectively, whereas the latter two estimate a univariate and multivariate logGARCH model, respectively.

1155

Empirical Finance

lifecontingencies

Financial and Actuarial Mathematics for Life Contingencies

Classes and methods that allow the user to manage life table, actuarial tables (also multiple decrements tables). Moreover, functions to easily perform demographic, financial and actuarial mathematics on life contingencies insurances calculations are contained therein.

1156

Empirical Finance

lmtest

Testing Linear Regression Models

A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.

1157

Empirical Finance

longmemo

Statistics for LongMemory Processes (Book Jan Beran), and Related Functionality

Datasets and Functionality from ‘Jan Beran’ (1994). Statistics for LongMemory Processes; Chapman & Hall. Estimation of Hurst (and more) parameters for fractional Gaussian noise, ‘fARIMA’ and ‘FEXP’ models.

1158

Empirical Finance

LSMonteCarlo

American options pricing with Least Squares Monte Carlo method

The package compiles functions for calculating prices of American put options with Least Squares Monte Carlo method. The option types are plain vanilla American put, Asian American put, and Quanto American put. The pricing algorithms include variance reduction techniques such as Antithetic Variates and Control Variates. Additional functions are given to derive “price surfaces” at different volatilities and strikes, create 3D plots, quickly generate Geometric Brownian motion, and calculate prices of European options with Black & Scholes analytical solution.

1159

Empirical Finance

markovchain

Easy Handling Discrete Time Markov Chains

Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided.

1160

Empirical Finance

MarkowitzR

Statistical Significance of the Markowitz Portfolio

A collection of tools for analyzing significance of Markowitz portfolios.

1161

Empirical Finance

matchingMarkets

Analysis of Stable Matchings

Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes onesided matching of agents into groups as well as twosided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.

1162

Empirical Finance

MSGARCH

MarkovSwitching GARCH Models

Fit (by Maximum Likelihood or MCMC/Bayesian), simulate, and forecast various MarkovSwitching GARCH models as described in Ardia et al. (2017) <https://ssrn.com/abstract=2845809>.

1163

Empirical Finance

mvtnorm

Multivariate Normal and t Distributions

Computes multivariate normal and t probabilities, quantiles, random deviates and densities.

1164

Empirical Finance

NetworkRiskMeasures

Risk Measures for (Financial) Networks

Implements some risk measures for (financial) networks, such as DebtRank, Impact Susceptibility, Impact Diffusion and Impact Fluidity.

1165

Empirical Finance

nlme

Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixedeffects models.

1166

Empirical Finance

NMOF

Numerical Methods and Optimization in Finance

Functions, examples and data from the book “Numerical Methods and Optimization in Finance” by M. Gilli, D. Maringer and E. Schumann (2011), ISBN 9780123756626. The package provides implementations of several optimisation heuristics, such as Differential Evolution, Genetic Algorithms and Threshold Accepting. There are also functions for the valuation of financial instruments, such as bonds and options, and functions that help with stochastic simulations.

1167

Empirical Finance

obAnalytics

Limit Order Book Analytics

Data processing, visualisation and analysis of Limit Order Book event data.

1168

Empirical Finance

OptHedging

Estimation of value and hedging strategy of call and put options

Estimation of value and hedging strategy of call and put options, based on optimal hedging and Monte Carlo method, from Chapter 3 of ‘Statistical Methods for Financial Engineering’, by Bruno Remillard, CRC Press, (2013).

1169

Empirical Finance

OptionPricing

Option Pricing with Efficient Simulation Algorithms

Efficient Monte Carlo Algorithms for the price and the sensitivities of Asian and European Options under Geometric Brownian Motion.

1170

Empirical Finance

pa

Performance Attribution for Equity Portfolios

A package that provides tools for conducting performance attribution for equity portfolios. The package uses two methods: the Brinson method and a regressionbased analysis.

1171

Empirical Finance

parma

Portfolio Allocation and Risk Management Applications

Provision of a set of models and methods for use in the allocation and management of capital in financial portfolios.

1172

Empirical Finance

pbo

Probability of Backtest Overfitting

Following the method of Bailey et al., computes for a collection of candidate models the probability of backtest overfitting, the performance degradation and probability of loss, and the stochastic dominance.

1173

Empirical Finance

PeerPerformance

LuckCorrected Peer Performance Analysis in R

Provides functions to perform the peer performance analysis of funds’ returns as described in Ardia and Boudt (2018) <doi:10.1016/j.jbankfin.2017.10.014>.

1174

Empirical Finance

PerformanceAnalytics (core)

Econometric Tools for Performance and Risk Analysis

Collection of econometric functions for performance and risk analysis. In addition to standard risk and performance metrics, this package aims to aid practitioners and researchers in utilizing the latest research in analysis of nonnormal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.

1175

Empirical Finance

pinbasic

Fast and Stable Estimation of the Probability of Informed Trading (PIN)

Utilities for fast and stable estimation of the probability of informed trading (PIN) in the model introduced by Easley et al. (2002) <doi:10.1111/15406261.00493> are implemented. Since the basic model developed by Easley et al. (1996) <doi:10.1111/j.15406261.1996.tb04074.x> is nested in the former due to equating the intensity of uninformed buys and sells, functions can also be applied to this simpler model structure, if needed. Stateoftheart factorization of the model likelihood function as well as most recent algorithms for generating initial values for optimization routines are implemented. In total, two likelihood factorizations and three methodologies for starting values are included. Furthermore, functions for simulating datasets of daily aggregated buys and sells, calculating confidence intervals for the probability of informed trading and posterior probabilities of trading days’ conditions are available.

1176

Empirical Finance

portfolio

Analysing equity portfolios

Classes for analysing and implementing equity portfolios.

1177

Empirical Finance

PortfolioEffectHFT

High Frequency Portfolio Analytics by PortfolioEffect

R interface to PortfolioEffect cloud service for backtesting high frequency trading (HFT) strategies, intraday portfolio analysis and optimization. Includes autocalibrating model pipeline for market microstructure noise, risk factors, price jumps/outliers, tail risk (highorder moments) and price fractality (long memory). Constructed portfolios could use clientside market data or access HF intraday price history for all major US Equities. See <https://www.portfolioeffect.com/> for more information on the PortfolioEffect high frequency portfolio analytics platform.

1178

Empirical Finance

PortfolioOptim

Small/Large Sample Portfolio Optimization

Two functions for financial portfolio optimization by linear programming are provided. One function implements Benders decomposition algorithm and can be used for very large data sets. The other, applicable for moderate sample sizes, finds optimal portfolio which has the smallest distance to a given benchmark portfolio.

1179

Empirical Finance

portfolioSim

Framework for simulating equity portfolio strategies

Classes that serve as a framework for designing equity portfolio simulations.

1180

Empirical Finance

PortRisk

Portfolio Risk Analysis

Risk Attribution of a portfolio with Volatility Risk Analysis.

1181

Empirical Finance

quantmod

Quantitative Financial Modelling Framework

Specify, build, trade, and analyse quantitative financial trading strategies.

1182

Empirical Finance

QuantTools

Enhanced Quantitative Trading Modelling

Download and organize historical market data from multiple sources like Yahoo (<https://finance.yahoo.com>), Google (<https://www.google.com/finance>), Finam (<https://www.finam.ru/profile/moexakcii/sberbank/export/>), MOEX (<https://www.moex.com/en/derivatives/contracts.aspx>) and IQFeed (<https://www.iqfeed.net/symbolguide/index.cfm?symbolguide=lookup>). Code your trading algorithms in modern C++11 with powerful event driven tick processing API including trading costs and exchange communication latency and transform detailed data seamlessly into R. In just few lines of code you will be able to visualize every step of your trading model from tick data to multi dimensional heat maps.

1183

Empirical Finance

ragtop

Pricing Equity Derivatives with Extensions of BlackScholes

Algorithms to price American and European equity options, convertible bonds and a variety of other financial derivatives. It uses an extension of the usual BlackScholes model in which jump to default may occur at a probability specified by a powerlaw link between stock price and hazard rate as found in the paper by Takahashi, Kobayashi, and Nakagawa (2001) <doi:10.3905/jfi.2001.319302>. We use ideas and techniques from Andersen and Buffum (2002) <doi:10.2139/ssrn.355308> and Linetsky (2006) <doi:10.1111/j.14679965.2006.00271.x>.

1184

Empirical Finance

Rbitcoin

R & bitcoin integration

Utilities related to Bitcoin. Unified markets API interface (bitstamp, kraken, btce, bitmarket). Both public and private API calls. Integration of data structures for all markets. Support SSL. Read Rbitcoin documentation (command: ?btc) for more information.

1185

Empirical Finance

Rblpapi

R Interface to ‘Bloomberg’

An R Interface to ‘Bloomberg’ is provided via the ‘Blp API’.

1186

Empirical Finance

Rcmdr

R Commander

A platformindependent basicstatistics GUI (graphical user interface) for R, based on the tcltk package.

1187

Empirical Finance

RcppQuantuccia

R Bindings to the ‘Quantuccia’ HeaderOnly Essentials of ‘QuantLib’

‘QuantLib’ bindings are provided for R using ‘Rcpp’ and the headeronly ‘Quantuccia’ variant (put together by Peter Caspers) offering an essential subset of ‘QuantLib’. See the included file ‘AUTHORS’ for a full list of contributors to both ‘QuantLib’ and ‘Quantuccia’.

1188

Empirical Finance

reinsureR

Reinsurance Treaties Application

Application of reinsurance treaties to claims portfolios. The package creates a class Claims whose objective is to store claims and premiums, on which different treaties can be applied. A statistical analysis can then be applied to measure the impact of reinsurance, producing a table or graphical output. This package can be used for estimating the impact of reinsurance on several portfolios or for pricing treaties through statistical analysis. Documentation for the implemented methods can be found in “Reinsurance: Actuarial and Statistical Aspects” by Hansjoerg Albrecher, Jan Beirlant, Jozef L. Teugels (2017, ISBN: 9780470772683) and “REINSURANCE: A Basic Guide to Facultative and Treaty Reinsurance” by Munich Re (2010) <https://www.munichre.com/site/mram/get/documents_E96160999/mram/assetpool.mr_america/PDFs/3_Publications/reinsurance_basic_guide.pdf>.

1189

Empirical Finance

restimizeapi

Functions for Working with the ‘www.estimize.com’ Web Services

Provides the user with functions to develop their trading strategy, uncover actionable trading ideas, and monitor consensus shifts with crowdsourced earnings and economic estimate data directly from <www.estimize.com>. Further information regarding the web services this package invokes can be found at <www.estimize.com/api>.

1190

Empirical Finance

Risk

Computes 26 Financial Risk Measures for Any Continuous Distribution

Computes 26 financial risk measures for any continuous distribution. The 26 financial risk measures include value at risk, expected shortfall due to Artzner et al. (1999) <doi:10.1007/s1095701199682>, tail conditional median due to Kou et al. (2013) <doi:10.1287/moor.1120.0577>, expectiles due to Newey and Powell (1987) <doi:10.2307/1911031>, beyond value at risk due to Longin (2001) <doi:10.3905/jod.2001.319161>, expected proportional shortfall due to Belzunce et al. (2012) <doi:10.1016/j.insmatheco.2012.05.003>, elementary risk measure due to AhmadiJavid (2012) <doi:10.1007/s1095701199682>, omega due to Shadwick and Keating (2002), sortino ratio due to Rollinger and Hoffman (2013), kappa due to Kaplan and Knowles (2004), Wang (1998)’s <doi:10.1080/10920277.1998.10595708> risk measures, Stone (1973)’s <doi:10.2307/2978638> risk measures, Luce (1980)’s <doi:10.1007/BF00135033> risk measures, Sarin (1987)’s <doi:10.1007/BF00126387> risk measures, Bronshtein and Kurelenkova (2009)’s risk measures.

1191

Empirical Finance

riskParityPortfolio

Design of Risk Parity Portfolios

Fast design of risk parity portfolios for financial investment. The goal of the risk parity portfolio formulation is to equalize or distribute the risk contributions of the different assets, which is missing if we simply consider the overall volatility of the portfolio as in the meanvariance Markowitz portfolio. In addition to the vanilla formulation, where the risk contributions are perfectly equalized subject to no shortselling and budget constraints, many other formulations are considered that allow for box constraints and shortselling, as well as the inclusion of additional objectives like the expected return and overall variance. See vignette for a detailed documentation and comparison, with several illustrative examples. The package is based on the papers: Y. Feng, and D. P. Palomar (2015). SCRIP: Successive Convex Optimization Methods for Risk Parity Portfolio Design. IEEE Trans. on Signal Processing, vol. 63, no. 19, pp. 52855300. <doi:10.1109/TSP.2015.2452219>. F. Spinu (2013), An Algorithm for Computing Risk Parity Weights. <doi:10.2139/ssrn.2297383>. T. GriveauBillion, J. Richard, and T. Roncalli (2013). A fast algorithm for computing Highdimensional risk parity portfolios. <arXiv:1311.4057>.

1192

Empirical Finance

RiskPortfolios

Computation of RiskBased Portfolios

Collection of functions designed to compute riskbased portfolios as described in Ardia et al. (2017) <doi:10.1007/s1047901724747> and Ardia et al. (2017) <doi:10.21105/joss.00171>.

1193

Empirical Finance

riskSimul

Risk Quantification for Stock Portfolios under the TCopula Model

Implements efficient simulation procedures to estimate tail loss probabilities and conditional excess for a stock portfolio. The logreturns are assumed to follow a tcopula model with generalized hyperbolic or t marginals.

1194

Empirical Finance

RM2006

RiskMetrics 2006 Methodology

Estimation the conditional covariance matrix using the RiskMetrics 2006 methodology of Zumbach (2007) <doi:10.2139/ssrn.1420185>.

1195

Empirical Finance

rmgarch

Multivariate GARCH Models

Feasible multivariate GARCH models including DCC, GOGARCH and CopulaGARCH.

1196

Empirical Finance

RND

Risk Neutral Density Extraction Package

Extract the implied risk neutral density from options using various methods.

1197

Empirical Finance

rpatrec

Recognising Visual Charting Patterns in Time Series Data

Generating visual charting patterns and noise, smoothing to find a signal in noisy time series and enabling users to apply their findings to real life data.

1198

Empirical Finance

RQuantLib

R Interface to the ‘QuantLib’ Library

The ‘RQuantLib’ package makes parts of ‘QuantLib’ accessible from R The ‘QuantLib’ project aims to provide a comprehensive software framework for quantitative finance. The goal is to provide a standard open source library for quantitative analysis, modeling, trading, and risk management of financial assets.

1199

Empirical Finance

rugarch (core)

Univariate GARCH Models

ARFIMA, inmean, external regressors and various GARCH flavors, with methods for fit, forecast, simulation, inference and plotting.

1200

Empirical Finance

rwt

Rice Wavelet Toolbox wrapper

Provides a set of functions for performing digital signal processing.

1201

Empirical Finance

sandwich

Robust Covariance Matrix Estimators

Modelrobust standard error estimators for crosssectional, time series, clustered, panel, and longitudinal data.

1202

Empirical Finance

sde

Simulation and Inference for Stochastic Differential Equations

Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 9780387758381, Springer, NY.

1203

Empirical Finance

SharpeR

Statistical Significance of the Sharpe Ratio

A collection of tools for analyzing significance of assets, funds, and trading strategies, based on the Sharpe ratio and overfit of the same. Provides density, distribution, quantile and random generation of the Sharpe ratio distribution based on normal returns, as well as the optimal Sharpe ratio over multiple assets. Computes confidence intervals on the Sharpe and provides a test of equality of Sharpe ratios based on the Delta method.

1204

Empirical Finance

sharpeRratio

MomentFree Estimation of Sharpe Ratios

An efficient momentfree estimator of the Sharpe ratio, or signaltonoise ratio, for heavytailed data (see <https://arxiv.org/abs/1505.01333>).

1205

Empirical Finance

Sim.DiffProc

Simulation of Diffusion Processes

It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1562523422.

1206

Empirical Finance

SmithWilsonYieldCurve

SmithWilson Yield Curve Construction

Constructs a yield curve by the SmithWilson method from a table of LIBOR and SWAP rates

1207

Empirical Finance

stochvol

Efficient Bayesian Inference for Stochastic Volatility (SV) Models

Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods. Methodological details are given in Kastner and FruhwirthSchnatter (2014) <doi:10.1016/j.csda.2013.01.002>; the most common use cases are described in Kastner (2016) <doi:10.18637/jss.v069.i05>. Also incorporates SV with leverage.

1208

Empirical Finance

strucchange

Testing, Monitoring, and Dating Structural Changes

Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.

1209

Empirical Finance

TAQMNGR

Manage TickbyTick Transaction Data

Manager of tickbytick transaction data that performs ‘cleaning’, ‘aggregation’ and ‘import’ in an efficient and fast way. The package engine, written in C++, exploits the ‘zlib’ and ‘gzstream’ libraries to handle gzipped data without need to uncompress them. ‘Cleaning’ and ‘aggregation’ are performed according to Brownlees and Gallo (2006) <doi:10.1016/j.csda.2006.09.030>. Currently, TAQMNGR processes raw data from WRDS (Wharton Research Data Service, <https://wrdsweb.wharton.upenn.edu/wrds/>).

1210

Empirical Finance

tawny

Clean Covariance Matrices Using Random Matrix Theory and Shrinkage Estimators for Portfolio Optimization

Portfolio optimization typically requires an estimate of a covariance matrix of asset returns. There are many approaches for constructing such a covariance matrix, some using the sample covariance matrix as a starting point. This package provides implementations for two such methods: random matrix theory and shrinkage estimation. Each method attempts to clean or remove noise related to the sampling process from the sample covariance matrix.

1211

Empirical Finance

TFX

R API to TrueFX(tm)

Connects R to TrueFX(tm) for free streaming realtime and historical tickbytick market data for dealable interbank foreign exchange rates with millisecond detail.

1212

Empirical Finance

tidyquant

Tidy Quantitative Financial Analysis

Bringing financial analysis to the ‘tidyverse’. The ‘tidyquant’ package provides a convenient wrapper to various ‘xts’, ‘zoo’, ‘quantmod’, ‘TTR’ and ‘PerformanceAnalytics’ package functions and returns the objects in the tidy ‘tibble’ format. The main advantage is being able to use quantitative functions with the ‘tidyverse’ functions including ‘purrr’, ‘dplyr’, ‘tidyr’, ‘ggplot2’, ‘lubridate’, etc. See the ‘tidyquant’ website for more information, documentation and examples.

1213

Empirical Finance

timeDate (core)

Rmetrics  Chronological and Calendar Objects

The ‘timeDate’ class fulfils the conventions of the ISO 8601 standard as well as of the ANSI C and POSIX standards. Beyond these standards it provides the “Financial Center” concept which allows to handle data records collected in different time zones and mix them up to have always the proper time stamps with respect to your personal financial center, or alternatively to the GMT reference time. It can thus also handle time stamps from historical data records from the same time zone, even if the financial centers changed day light saving times at different calendar dates.

1214

Empirical Finance

timeSeries (core)

Rmetrics  Financial Time Series Objects

Provides a class and various tools for financial time series. This includes basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.

1215

Empirical Finance

timsac

Time Series Analysis and Control Package

Functions for statistical analysis, prediction and control of time series based mainly on Akaike and Nakagawa (1988) <ISBN 9789027727862>.

1216

Empirical Finance

tis

Time Indexes and Time Indexed Series

Functions and S3 classes for time indexes and time indexed series, which are compatible with FAME frequencies.

1217

Empirical Finance

TSdbi

Time Series Database Interface

Provides a common interface to time series databases. The objective is to define a standard interface so users can retrieve time series data from various sources with a simple, common, set of commands, and so programs can be written to be portable with respect to the data source. The SQL implementations also provide a database table design, so users needing to set up a time series database have a reasonably complete way to do this easily. The interface provides for a variety of options with respect to the representation of time series in R. The interface, and the SQL implementations, also handle vintages of time series data (sometime called editions or realtime data). There is also a (not yet well tested) mechanism to handle multilingual data documentation. Comprehensive examples of all the ’TS*‘packages is provided in the vignette Guide.pdf with the ’TSdata’ package.

1218

Empirical Finance

tsDyn

Nonlinear Time Series Models with Regime Switching

Implements nonlinear autoregressive (AR) time series models. For univariate series, a nonparametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).

1219

Empirical Finance

tseries (core)

Time Series Analysis and Computational Finance

Time series analysis and computational finance.

1220

Empirical Finance

tseriesChaos

Analysis of Nonlinear Time Series

Routines for the analysis of nonlinear time series. This work is largely inspired by the TISEAN project, by Rainer Hegger, Holger Kantz and Thomas Schreiber: <http://www.mpipksdresden.mpg.de/~tisean/>.

1221

Empirical Finance

tsfa

Time Series Factor Analysis

Extraction of Factors from Multivariate Time Series. See ?00tsfaIntro for more details.

1222

Empirical Finance

TTR

Technical Trading Rules

Functions and data to construct technical trading rules with R.

1223

Empirical Finance

tvm

Time Value of Money Functions

Functions for managing cashflows and interest rate curves.

1224

Empirical Finance

urca (core)

Unit Root and Cointegration Tests for Time Series Data

Unit root and cointegration tests encountered in applied econometric analysis are implemented.

1225

Empirical Finance

vars

VAR Modelling

Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.

1226

Empirical Finance

VarSwapPrice

Pricing a variance swap on an equity index

Computes a portfolio of European options that replicates the cost of capturing the realised variance of an equity index.

1227

Empirical Finance

vrtest

Variance Ratio tests and other tests for Martingale Difference Hypothesis

A collection of statistical tests for martingale difference hypothesis

1228

Empirical Finance

wavelets

Functions for Computing Wavelet Filters, Wavelet Transforms and Multiresolution Analyses

Contains functions for computing and plotting discrete wavelet transforms (DWT) and maximal overlap discrete wavelet transforms (MODWT), as well as their inverses. Additionally, it contains functionality for computing and plotting wavelet transform filters that are used in the above decompositions as well as multiresolution analyses.

1229

Empirical Finance

waveslim

Basic Wavelet Routines for One, Two And ThreeDimensional Signal Processing

Basic wavelet routines for time series (1D), image (2D) and array (3D) analysis. The code provided here is based on wavelet methodology developed in Percival and Walden (2000); Gencay, Selcuk and Whitcher (2001); the dualtree complex wavelet transform (DTCWT) from Kingsbury (1999, 2001) as implemented by Selesnick; and Hilbert wavelet pairs (Selesnick 2001, 2002). All figures in chapters 47 of GSW (2001) are reproducible using this package and R code available at the book website(s) below.

1230

Empirical Finance

wavethresh

Wavelets Statistics and Transforms

Performs 1, 2 and 3D real and complexvalued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complexvalued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.

1231

Empirical Finance

XBRL

Extraction of Business Financial Information from ‘XBRL’ Documents

Functions to extract business financial information from an Extensible Business Reporting Language (‘XBRL’) instance file and the associated collection of files that defines its ‘Discoverable’ Taxonomy Set (‘DTS’).

1232

Empirical Finance

xts (core)

eXtensible Time Series

Provide for uniform handling of R’s different timebased data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying crossclass interoperability.

1233

Empirical Finance

ycinterextra

Yield curve or zerocoupon prices interpolation and extrapolation

Yield curve or zerocoupon prices interpolation and extrapolation using the NelsonSiegel, Svensson, SmithWilson models, and Hermite cubic splines.

1234

Empirical Finance

YieldCurve

Modelling and estimation of the yield curve

Modelling the yield curve with some parametric models. The models implemented are: NelsonSiegel, DieboldLi and Svensson. The package also includes the data of the term structure of interest rate of Federal Reserve Bank and European Central Bank.

1235

Empirical Finance

Zelig

Everyone’s Statistical Software

A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical workflowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and reweighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.

1236

Empirical Finance

zoo (core)

S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)

An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.

1237

Functional Data Analysis

classiFunc

Classification of Functional Data

Efficient implementation of knearest neighbor estimation and kernel estimation for functional data classification.

1238

Functional Data Analysis

covsep

Tests for Determining if the Covariance Structure of 2Dimensional Data is Separable

Functions for testing if the covariance structure of 2dimensional data (e.g. samples of surfaces X_i = X_i(s,t)) is separable, i.e. if covariance(X) = C_1 x C_2. A complete descriptions of the implemented tests can be found in the paper Aston, John A. D.; Pigoli, Davide; Tavakoli, Shahin. Tests for separability in nonparametric covariance operators of random surfaces. Ann. Statist. 45 (2017), no. 4, 14311461. <doi:10.1214/16AOS1495> <https://projecteuclid.org/euclid.aos/1498636862> <arXiv:1505.02023>.

1239

Functional Data Analysis

dbstats

DistanceBased Statistics

Prediction methods where explanatory information is coded as a matrix of distances between individuals. Distances can either be directly input as a distances matrix, a squared distances matrix, an innerproducts matrix or computed from observed predictors.

1240

Functional Data Analysis

ddalpha

DepthBased Classification and Calculation of Data Depth

Contains procedures for depthbased supervised learning, which are entirely nonparametric, in particular the DDalphaprocedure (Lange, Mosler and Mozharovskyi, 2014 <doi:10.1007/s0036201204884>). The training data sample is transformed by a statistical depth function to a compact lowdimensional space, where the final classification is done. It also offers an extension to functional data and routines for calculating certain notions of statistical depth functions. 50 multivariate and 5 functional classification problems are included.

1241

Functional Data Analysis

denseFLMM

Functional Linear Mixed Models for Densely Sampled Data

Estimation of functional linear mixed models for densely sampled data based on functional principal component analysis.

1242

Functional Data Analysis

fda (core)

Functional Data Analysis

These functions were developed to support functional data analysis as described in Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer. They were ported from earlier versions in Matlab and SPLUS. An introduction appears in Ramsay, J. O., Hooker, Giles, and Graves, Spencer (2009) Functional Data Analysis with R and Matlab (Springer). The package includes data sets and script files working many examples including all but one of the 76 figures in this latter book. Matlab versions of the code and sample analyses are no longer distributed through CRAN, as they were when the book was published. For those, ftp from <http://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/> There you find a set of .zip files containing the functions and sample analyses, as well as two .txt files giving instructions for installation and some additional information. The changes from Version 2.4.1 are fixes of bugs in density.fd and removal of functions create.polynomial.basis, polynompen, and polynomial. These were deleted because the monomial basis does the same thing and because there were errors in the code.

1243

Functional Data Analysis

fda.usc (core)

Functional Data Analysis and Utilities for Statistical Computing

Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.

1244

Functional Data Analysis

fdadensity

Functional Data Analysis for Density Functions by Transformation to a Hilbert Space

An implementation of the methodology described in Petersen and Mueller (2016) <doi:10.1214/15AOS1363> for the functional data analysis of samples of density functions. Densities are first transformed to their corresponding log quantile densities, followed by ordinary Functional Principal Components Analysis (FPCA). Transformation modes of variation yield improved interpretation of the variability in the data as compared to FPCA on the densities themselves. The standard fraction of variance explained (FVE) criterion commonly used for functional data is adapted to the transformation setting, also allowing for an alternative quantification of variability for density data through the Wasserstein metric of optimal transport.

1245

Functional Data Analysis

fdakma

Functional Data Analysis: KMean Alignment

It performs simultaneously clustering and alignment of a multidimensional or unidimensional functional dataset by means of kmean alignment.

1246

Functional Data Analysis

fdapace (core)

Functional Data Analysis and Empirical Dynamics

Provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm or numerical integration. PACE is useful for the analysis of data that have been generated by a sample of underlying (but usually not fully observed) random trajectories. It does not rely on presmoothing of trajectories, which is problematic if functional data are sparsely sampled. PACE provides options for functional regression and correlation, for Longitudinal Data Analysis, the analysis of stochastic processes from samples of realized trajectories, and for the analysis of underlying dynamics. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

1247

Functional Data Analysis

fdaPDE

Functional Data Analysis and Partial Differential Equations; Statistical Analysis of Functional and Spatial Data, Based on Regression with Partial Differential Regularizations

An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization.

1248

Functional Data Analysis

fdasrvf (core)

Elastic Functional Data Analysis

Performs alignment, PCA, and modeling of multidimensional and unidimensional functions using the squareroot velocity framework (Srivastava et al., 2011 <arXiv:1103.3817> and Tucker et al., 2014 <doi:10.1016/j.csda.2012.12.001>). This framework allows for elastic analysis of functional data through phase and amplitude separation.

1249

Functional Data Analysis

fdatest

Interval Testing Procedure for Functional Data

Implementation of the Interval Testing Procedure for functional data in different frameworks (i.e., one or twopopulation frameworks, functional linear models) by means of different basis expansions (i.e., Bspline, Fourier, and phaseamplitude Fourier). The current version of the package requires functional data evaluated on a uniform grid; it automatically projects each function on a chosen functional basis; it performs the entire family of multivariate tests; and, finally, it provides the matrix of the pvalues of the previous tests and the vector of the corrected pvalues. The functional basis, the coupled or uncoupled scenario, and the kind of test can be chosen by the user. The package provides also a plotting function creating a graphical output of the procedure: the pvalue heatmap, the plot of the corrected pvalues, and the plot of the functional data.

1250

Functional Data Analysis

FDboost (core)

Boosting Functional Regression Models

Regression models for functional data, i.e., scalaronfunction, functiononscalar and functiononfunction regression models, are fitted by a componentwise gradient boosting algorithm.

1251

Functional Data Analysis

fdcov

Analysis of Covariance Operators

Provides a variety of tools for the analysis of covariance operators including ksample tests for equality and classification and clustering methods found in the works of Cabassi et al (2017) <doi:10.1214/17EJS1347>, Kashlak et al (2017) <arXiv:1604.06310>, Pigoli et al (2014) <doi:10.1093/biomet/asu008>, and Panaretos et al (2010) <doi:10.1198/jasa.2010.tm09239>.

1252

Functional Data Analysis

fds (core)

Functional Data Sets

Functional data sets.

1253

Functional Data Analysis

flars

Functional LARS

Variable selection algorithm for functional linear regression with scalar response variable and mixed scalar/functional predictors.

1254

Functional Data Analysis

fpca

Restricted MLE for Functional Principal Components Analysis

A geometric approach to MLE for functional principal components

1255

Functional Data Analysis

freqdom

Frequency Domain Based Analysis: Dynamic PCA

Implementation of dynamic principal component analysis (DPCA), simulation of VAR and VMA processes and frequency domain tools. These frequency domain methods for dimensionality reduction of multivariate time series were introduced by David Brillinger in his book Time Series (1974). We follow implementation guidelines as described in Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.

1256

Functional Data Analysis

freqdom.fda

Functional Time Series: Dynamic Functional Principal Components

Implementations of functional dynamic principle components analysis. Related graphic tools and frequency domain methods. These methods directly use multivariate dynamic principal components implementation, following the guidelines from Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.

1257

Functional Data Analysis

ftsa (core)

Functional Time Series Analysis

Functions for visualizing, modeling, forecasting and hypothesis testing of functional time series.

1258

Functional Data Analysis

ftsspec

Spectral Density Estimation and Comparison for Functional Time Series

Functions for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length.

1259

Functional Data Analysis

funData

An S4 Class for Functional Data

S4 classes for univariate and multivariate functional data with utility functions.

1260

Functional Data Analysis

funFEM

Clustering in the Discriminative Functional Subspace

The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.

1261

Functional Data Analysis

funHDDC

Univariate and Multivariate ModelBased Clustering in GroupSpecific Functional Subspaces

The funHDDC algorithm allows to cluster functional univariate (Bouveyron and Jacques, 2011, <doi:10.1007/s1163401100956>) or multivariate data (Schmutz et al., 2018) by modeling each group within a specific functional subspace.

1262

Functional Data Analysis

funLBM

ModelBased CoClustering of Functional Data

The funLBM algorithm allows to simultaneously cluster the rows and the columns of a data matrix where each entry of the matrix is a function or a time series.

1263

Functional Data Analysis

geofd

Spatial Prediction for Function Value Data

Kriging based methods are used for predicting functional data (curves) with spatial dependence.

1264

Functional Data Analysis

GPFDA

Apply Gaussian Process in Functional data analysis

Use functional regression as the mean structure and Gaussian Process as the covariance structure.

1265

Functional Data Analysis

growfunctions

Bayesian NonParametric Dependent Models for TimeIndexed Functional Data

Estimates a collection of timeindexed functions under either of Gaussian process (GP) or intrinsic Gaussian Markov random field (iGMRF) prior formulations where a Dirichlet process mixture allows subgroupings of the functions to share the same covariance or precision parameters. The GP and iGMRF formulations both support any number of additive covariance or precision terms, respectively, expressing either or both of multiple trend and seasonality.

1266

Functional Data Analysis

pcdpca

Dynamic Principal Components for Periodically Correlated Functional Time Series

Method extends multivariate and functional dynamic principal components to periodically correlated multivariate time series. This package allows you to compute true dynamic principal components in the presence of periodicity. We follow implementation guidelines as described in Kidzinski, Kokoszka and Jouzdani (2017), in Principal component analysis of periodically correlated functional time series <arXiv:1612.00040>.

1267

Functional Data Analysis

rainbow

Bagplots, Boxplots and Rainbow Plots for Functional Data

Visualizing functional data and identifying functional outliers.

1268

Functional Data Analysis

refund (core)

Regression with Functional Data

Methods for regression for functional data, including functiononscalar, scalaronfunction, and functiononfunction regression. Some of the functions are applicable to image data.

1269

Functional Data Analysis

refund.shiny

Interactive Plotting for Functional Data Analyses

Interactive plotting for functional data analyses.

1270

Functional Data Analysis

RFgroove

Importance Measure and Selection for Groups of Variables with Random Forests

Variable selection tools for groups of variables and functional data based on a new grouped variable importance with random forests.

1271

Functional Data Analysis

roahd

Robust Analysis of High Dimensional Data

A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in highdimensional cases, and hence with attention to computational efficiency and simplicity of use.

1272

Functional Data Analysis

SCBmeanfd

Simultaneous Confidence Bands for the Mean of Functional Data

Statistical methods for estimating and inferring the mean of functional data. The methods include simultaneous confidence bands, local polynomial fitting, bandwidth selection by plugin and crossvalidation, goodnessoffit tests for parametric models, equality tests for twosample problems, and plotting functions.

1273

Functional Data Analysis

sparseFLMM

Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data

Estimation of functional linear mixed models for irregularly or sparsely sampled data based on functional principal component analysis.

1274

Functional Data Analysis

splinetree

Longitudinal Regression Trees and Forests

Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.

1275

Functional Data Analysis

switchnpreg

Switching nonparametric regression models for a single curve and functional data

Functions for estimating the parameters from the latent state process and the functions corresponding to the J states as proposed by De Souza and Heckman (2013).

1276

Functional Data Analysis

warpMix

Mixed Effects Modeling with Warping for Functional Data Using BSpline

Mixed effects modeling with warping for functional data using B spline. Warping coefficients are considered as random effects, and warping functions are general functions, parameters representing the projection onto B spline basis of a part of the warping functions. Warped data are modelled by a linear mixed effect functional model, the noise is Gaussian and independent from the warping functions.

1277

Statistical Genetics

adegenet

Exploratory Analysis of Genetic and Genomic Data

Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure (‘genind’ class), alleles counts by populations (‘genpop’), and genomewide SNP data (‘genlight’). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.

1278

Statistical Genetics

ape

Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clocklike trees using mean path lengths and penalized likelihood, dating trees with noncontemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, TCoffee, Muscle) whose results are returned into R.

1279

Statistical Genetics

Biodem

Biodemography Functions

The Biodem package provides a number of functions for Biodemographic analysis.

1280

Statistical Genetics

bqtl

Bayesian QTL Mapping Toolkit

QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools.

1281

Statistical Genetics

dlmap

Detection Localization Mapping for QTL

QTL mapping in a mixed model framework with separate detection and localization stages. The first stage detects the number of QTL on each chromosome based on the genetic variation due to grouped markers on the chromosome; the second stage uses this information to determine the most likely QTL positions. The mixed model can accommodate general fixed and random effects, including spatial effects in field trials and pedigree effects. Applicable to backcrosses, doubled haploids, recombinant inbred lines, F2 intercrosses, and association mapping populations.

1282

Statistical Genetics

gap (core)

Genetic Analysis Package

It is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both populationbased and familybased designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates.

1283

Statistical Genetics

genetics (core)

Population Genetics

Classes and methods for handling genetic data. Includes classes to represent genotypes and haplotypes at single markers up to multiple markers on multiple chromosomes. Function include allele frequencies, flagging homo/heterozygotes, flagging carriers of certain alleles, estimating and testing for HardyWeinberg disequilibrium, estimating and testing for linkage disequilibrium, …

1284

Statistical Genetics

hapassoc

Inference of Trait Associations with SNP Haplotypes and Other Attributes using the EM Algorithm

The following R functions are used for inference of trait associations with haplotypes and other covariates in generalized linear models. The functions are developed primarily for data collected in cohort or crosssectional studies. They can accommodate uncertain haplotype phase and handle missing genotypes at some SNPs.

1285

Statistical Genetics

haplo.stats (core)

Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous

Routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The main functions are: haplo.em(), haplo.glm(), haplo.score(), and haplo.power(); all of which have detailed examples in the vignette.

1286

Statistical Genetics

HardyWeinberg

Statistical Tests and Graphics for HardyWeinberg Equilibrium

Contains tools for exploring HardyWeinberg equilibrium (Hardy, 1908; Weinberg, 1908) <doi:10.1126/science.28.706.49> for bi and multiallelic genetic marker data. All classical tests (chisquare, exact, likelihoodratio and permutation tests) with biallelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the Xchromosome are included (Graffelman & Weir, 2016) <doi:10.1038/hdy.2016.20>, including Bayesian procedures. Some exact and permutation procedures also work with multiallelic variants. Special test procedures that jointly address HardyWeinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multiallelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of biallelic markers: ternary plots with acceptance regions, logratio plots and QQ plots.

1287

Statistical Genetics

hierfstat

Estimation and Tests of Hierarchical FStatistics

Allows the estimation of hierarchical Fstatistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution, 1998, 52(4):950956; <doi:10.2307/2411227>. Functions are also given to test via randomisations the significance of each F and variance components, using the likelihoodratio statistics G.

1288

Statistical Genetics

hwde

Models and Tests for Departure from HardyWeinberg Equilibrium and Independence Between Loci

Fits models for genotypic disequilibria, as described in Huttley and Wilson (2000), Weir (1996) and Weir and Wilson (1986). Contrast terms are available that account for first order interactions between loci. Also implements, for a single locus in a single population, a conditional exact test for HardyWeinberg equilibrium.

1289

Statistical Genetics

ibdreg

Regression Methods for IBD Linkage With Covariates

A method to test genetic linkage with covariates by regression methods with response IBD sharing for relative pairs. Account for correlations of IBD statistics and covariates for relative pairs within the same pedigree.

1290

Statistical Genetics

LDheatmap

Graphical Display of Pairwise Linkage Disequilibria Between SNPs

Produces a graphical display, as a heat map, of measures of pairwise linkage disequilibria between SNPs. Users may optionally include the physical locations or genetic map distances of each SNP on the plot. Users should note that the imported package ‘snpStats’ and the suggested packages ‘rtracklayer’, ‘GenomicRanges’, ‘GenomInfoDb’ and ‘IRanges’ are all BioConductor packages <https://bioconductor.org>.

1291

Statistical Genetics

ouch

OrnsteinUhlenbeck Models for Phylogenetic Comparative Hypotheses

Fit and compare OrnsteinUhlenbeck models for evolution along a phylogenetic tree.

1292

Statistical Genetics

pbatR

Pedigree/FamilyBased Genetic Association Tests Analysis and Power

This R package provides power calculations via internal simulation methods. The package also provides a frontend to the now abandoned PBAT program (developed by Christoph Lange), and reads in the corresponding output and displays results and figures when appropriate. The license of this R package itself is GPL. However, to have the program interact with the PBAT program for some functionality of the R package, users must additionally obtain the PBAT program from Christoph Lange, and accept his license. Both the data analysis and power calculations have command line and graphical interfaces using tcltk.

1293

Statistical Genetics

phangorn

Phylogenetic Reconstruction and Analysis

Package contains methods for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation. Allows to compare trees, models selection and offers visualizations for trees and split networks.

1294

Statistical Genetics

qtl

Tools for Analyzing QTL Experiments

Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits.

1295

Statistical Genetics

rmetasim

An IndividualBased Population Genetic Simulation Environment

An interface between R and the metasim simulation engine. The simulation environment is documented in: “Strand, A.(2002) <doi:10.1046/j.14718286.2002.00208.x> Metasim 1.0: an individualbased environment for simulating population genetics of complex population dynamics. Mol. Ecol. Notes. Please see the vignettes CreatingLandscapes and Simulating to get some ideas on how to use the packages. See the rmetasim vignette to get an overview and to see important changes to the code in the most recent version.

1296

Statistical Genetics

seqinr

Biological Sequences Retrieval and Analysis

Exploratory data analysis and data visualization for biological sequence (DNA and protein) data. Seqinr includes utilities for sequence data management under the ACNUC system described in Gouy, M. et al. (1984) Nucleic Acids Res. 12:121127 <doi:10.1093/nar/12.1Part1.121>.

1297

Statistical Genetics

snp.plotter

snp.plotter

Creates plots of pvalues using single SNP and/or haplotype data. Main features of the package include options to display a linkage disequilibrium (LD) plot and the ability to plot multiple datasets simultaneously. Plots can be created using global and/or individual haplotype pvalues along with single SNP pvalues. Images are created as either PDF/EPS files.

1298

Statistical Genetics

SNPmaxsel

Maximally selected statistics for SNP data

This package implements asymptotic methods related to maximally selected statistics, with applications to SNP data.

1299

Statistical Genetics

stepwise

Stepwise detection of recombination breakpoints

A stepwise approach to identifying recombination breakpoints in a sequence alignment.

1300

Statistical Genetics

tdthap

TDT Tests for Extended Haplotypes

Functions and examples are provided for Transmission/disequilibrium tests for extended marker haplotypes, as in Clayton, D. and Jones, H. (1999) “Transmission/disequilibrium tests for extended marker haplotypes”. Amer. J. Hum. Genet., 65:11611169, <doi:10.1086/302566>.

1301

Statistical Genetics

untb

Ecological Drift under the UNTB

Hubbell’s Unified Neutral Theory of Biodiversity.

1302

Statistical Genetics

wgaim

Whole Genome Average Interval Mapping for QTL Detection using Mixed Models

Integrates sophisticated mixed modelling methods with a whole genome approach to detecting significant QTL in linkage maps.

1303

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

ade4

Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of onetable (e.g., principal component analysis, correspondence analysis), twotable (e.g., coinertia analysis, redundancy analysis), threetable (e.g., RLQ analysis) and Ktable (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.

1304

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

animation

A Gallery of Animations in Statistics and Utilities to Create Animations

Provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, nonparametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. Flash, ‘GIF’, HTML pages, ‘PDF’ and videos. ‘PDF’ animations can be inserted into ‘Sweave’ / ‘knitr’ easily.

1305

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

ape

Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clocklike trees using mean path lengths and penalized likelihood, dating trees with noncontemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, TCoffee, Muscle) whose results are returned into R.

1306

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

aplpack

Another Plot Package: ‘Bagplots’, ‘Iconplots’, ‘Summaryplots’, Slider Functions and Others

Some functions for drawing some special plots: The function ‘bagplot’ plots a bagplot, ‘faces’ plots chernoff faces, ‘iconplot’ plots a representation of a frequency table or a data matrix, ‘plothulls’ plots hulls of a bivariate data set, ‘plotsummary’ plots a graphical summary of a data set, ‘puticon’ adds icons to a plot, ‘skyline.hist’ combines several histograms of a one dimensional data set in one plot, ‘slider’ functions supports some interactive graphics, ‘spin3R’ helps an inspection of a 3dim point cloud, ‘stem.leaf’ plots a stem and leaf plot, ‘stem.leaf.backback’ plots backtoback versions of stem and leaf plot.

1307

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

ash

David Scott’s ASH Routines

David Scott’s ASH routines ported from SPLUS to R.

1308

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

biclust

BiCluster Algorithms

The main function biclust() provides several algorithms to find biclusters in twodimensional data: Cheng and Church (2000, ISBN:1577351150), spectral (2003) <doi:10.1101/gr.648603>, plaid model (2005) <doi:10.1016/j.csda.2004.02.003>, xmotifs (2003) <doi:10.1142/9789812776303_0008> and bimax (2006) <doi:10.1093/bioinformatics/btl060>. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.

1309

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

Cairo

R Graphics Device using Cairo Graphics Library for Creating HighQuality Bitmap (PNG, JPEG, TIFF), Vector (PDF, SVG, PostScript) and Display (X11 and Win32) Output

R graphics device using cairographics library that can be used to create highquality vector (PDF, PostScript and SVG) and bitmap output (PNG,JPEG,TIFF), and highquality rendering in displays (X11 and Win32). Since it uses the same backend for all output, copying across formats is WYSIWYG. Files are created without the dependence on X11 or other external programs. This device supports alpha channel (semitransparent drawing) and resulting images can contain transparent and semitransparent regions. It is ideal for use in server environments (file output) and as a replacement for other devices that don’t have Cairo’s capabilities such as alpha support or antialiasing. Backends are modular such that any subset of backends is supported.

1310

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

cairoDevice

Embeddable Cairo Graphics Device Driver

This device uses Cairo and GTK to draw to the screen, file (png, svg, pdf, and ps) or memory (arbitrary GdkDrawable or Cairo context). The screen device may be embedded into RGtk2 interfaces and supports all interactive features of other graphics devices, including getGraphicsEvent().

1311

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

cba

Clustering for Business Analytics

Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.

1312

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

colorspace

A Toolbox for Manipulating and Assessing Colors and Palettes

Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB. Qualitative, sequential, and diverging color palettes based on HCL colors are provided along with corresponding ggplot2 color scales. Color palette choice is aided by an interactive app (with either a Tcl/Tk or a shiny GUI) and shiny apps with an HCL color picker and a color vision deficiency emulator. Plotting functions for displaying and assessing palettes include color swatches, visualizations of the HCL space, and trajectories in HCL and/or RGB spectrum. Color manipulation functions include: desaturation, lightening/darkening, mixing, and simulation of color vision deficiencies (deutanomaly, protanomaly, tritanomaly).

1313

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

diagram

Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams

Visualises simple graphs (networks) based on a transition matrix, utilities to plot flow diagrams, visualising webs, electrical networks, etc. Support for the book “A practical guide to ecological modelling  using R as a simulation platform” by Karline Soetaert and Peter M.J. Herman (2009), Springer. and the book “Solving Differential Equations in R” by Karline Soetaert, Jeff Cash and Francesca Mazzia (2012), Springer. Includes demo(flowchart), demo(plotmat), demo(plotweb).

1314

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

dichromat

Color Schemes for Dichromats

Collapse redgreen or greenblue distinctions to simulate the effects of different types of colorblindness.

1315

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

gclus

Clustering Graphics

Orders panels in scatterplot matrices and parallel coordinate displays by some merit index. Package contains various indices of merit, ordering functions, and enhanced versions of pairs and parcoord which color panels according to their merit level.

1316

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

ggplot2 (core)

Create Elegant Data Visualisations Using the Grammar of Graphics

A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

1317

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

gplots

Various R Programming Tools for Plotting Data

Various R programming tools for plotting data, including:  calculating and plotting locally smoothed summary function as (‘bandplot’, ‘wapply’),  enhanced versions of standard plots (‘barplot2’, ‘boxplot2’, ‘heatmap.2’, ‘smartlegend’),  manipulating colors (‘col2hex’, ‘colorpanel’, ‘redgreen’, ‘greenred’, ‘bluered’, ‘redblue’, ‘rich.colors’),  calculating and plotting twodimensional data summaries (‘ci2d’, ‘hist2d’),  enhanced regression diagnostic plots (‘lmplot2’, ‘residplot’),  formulaenabled interface to ‘stats::lowess’ function (‘lowess’),  displaying textual data in plots (‘textplot’, ‘sinkplot’),  plotting a matrix where each cell contains a dot whose size reflects the relative magnitude of the elements (‘balloonplot’),  plotting “Venn” diagrams (‘venn’),  displaying OpenOffice style plots (‘ooplot’),  plotting multiple data on same region, with separate axes (‘overplot’),  plotting means and confidence intervals (‘plotCI’, ‘plotmeans’),  spacing points in an xy plot so they don’t overlap (‘space’).

1318

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

gridBase

Integration of base and grid graphics

Integration of base and grid graphics

1319

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

hexbin

Hexagonal Binning Routines

Binning and plotting functions for hexagonal bins.

1320

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

IDPmisc

‘Utilities of Institute of Data Analyses and Process Design (www.zhaw.ch/idp)’

Different highlevel graphics functions for displaying large datasets, displaying circular data in a very flexible way, finding local maxima, brewing color ramps, drawing nice arrows, zooming 2Dplots, creating figures with differently colored margin and plot region. In addition, the package contains auxiliary functions for data manipulation like omitting observations with irregular values or selecting data by logical vectors, which include NAs. Other functions are especially useful in spectroscopy and analyses of environmental data: robust baseline fitting, finding peaks in spectra, converting humidity measures.

1321

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

igraph

Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

1322

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

iplots

iPlots  interactive graphics for R

Interactive plots for R.

1323

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

JavaGD

Java Graphics Device

Graphics device routing all graphics commands to a Java program. The actual functionality of the JavaGD depends on the Javaside implementation. Simple AWT and Swing implementations are included.

1324

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

klaR

Classification and Visualization

Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kerneldensity naive Bayes, an interface to ‘svmlight’ and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.

1325

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

lattice (core)

Trellis Graphics for R

A powerful and elegant highlevel data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.

1326

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

latticeExtra

Extra Graphical Utilities Based on Lattice

Building on the infrastructure provided by the lattice package, this package provides several new highlevel functions and methods, as well as additional utilities such as panel and axis annotation functions.

1327

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

misc3d

Miscellaneous 3D Plots

A collection of miscellaneous 3d plots, including isosurfaces.

1328

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

onion

Octonions and Quaternions

Quaternions and Octonions are four and eight dimensional extensions of the complex numbers. They are normed division algebras over the real numbers and find applications in spatial rotations (quaternions) and string theory and relativity (octonions). The quaternions are noncommutative and the octonions nonassociative. See RKS Hankin 2006, Rnews Volume 6/2: 4951, and the package vignette, for more details.

1329

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

plotrix (core)

Various Plotting Functions

Lots of plots, various labeling, axis and color scaling functions.

1330

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RColorBrewer (core)

ColorBrewer Palettes

Provides color schemes for maps (and other graphics) designed by Cynthia Brewer as described at http://colorbrewer2.org

1331

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

rggobi

Interface Between R and ‘GGobi’

A commandline interface to ‘GGobi’, an interactive and dynamic graphics package. ‘Rggobi’ complements the graphical user interface of ‘GGobi’ providing a way to fluidly transition between analysis and exploration, as well as automating common tasks.

1332

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

rgl (core)

3D Visualization Using OpenGL

Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.

1333

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RGraphics

Data and Functions from the Book R Graphics, Second Edition

Data and Functions from the book R Graphics, Second Edition. There is a function to produce each figure in the book, plus several functions, classes, and methods defined in Chapter 8.

1334

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RGtk2

R Bindings for Gtk 2.8.0 and Above

Facilities in the R language for programming graphical interfaces using Gtk, the Gimp Tool Kit.

1335

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RSvgDevice

An R SVG graphics device

A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics.

1336

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RSVGTipsDevice

An R SVG Graphics Device with Dynamic Tips and Hyperlinks

A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics. This version supports tooltips with 1 to 3 lines, hyperlinks, and line styles.

1337

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

scagnostics

Compute scagnostics  scatterplot diagnostics

Calculates graph theoretic scagnostics. Scagnostics describe various measures of interest for pairs of variables, based on their appearance on a scatterplot. They are useful tool for discovering interesting or unusual scatterplots from a scatterplot matrix, without having to look at every individual plot.

1338

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

scatterplot3d

3D Scatter Plot

Plots a three dimensional (3D) point cloud.

1339

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

seriation

Infrastructure for Ordering Objects Using Seriation

Infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

1340

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

tkrplot

TK Rplot

Simple mechanism for placing R graphics in a Tk widget.

1341

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

vcd (core)

Visualizing Categorical Data

Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).

1342

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

vioplot

Violin Plot

A violin plot is a combination of a box plot and a kernel density plot. This package allows extensive customisation of violin plots.

1343

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

xgobi

Interface to the XGobi and XGvis programs for graphical data analysis

Interface to the XGobi and XGvis programs for graphical data analysis.

1344

HighPerformance and Parallel Computing with R

aprof

Amdahl’s Profiler, Directed Optimization Made Easy

Assists the evaluation of whether and where to focus code optimization, using Amdahl’s law and visual aids based on line profiling. Amdahl’s profiler organizes profiling output files (including memory profiling) in a visually appealing way. It is meant to help to balance development vs. execution time by helping to identify the most promising sections of code to optimize and projecting potential gains. The package is an addition to R’s standard profiling tools and is not a wrapper for them.

1345

HighPerformance and Parallel Computing with R

batch

Batching Routines in Parallel and Passing CommandLine Arguments to R

Functions to allow you to easily pass commandline arguments into R, and functions to aid in submitting your R code in parallel on a cluster and joining the results afterward (e.g. multiple parameter values for simulations running in parallel, splitting up a permutation test in parallel, etc.). See ‘parseCommandArgs(…)’ for the main example of how to use this package.

1346

HighPerformance and Parallel Computing with R

BatchExperiments

Statistical Experiments on Batch Computing Clusters

Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.

1347

HighPerformance and Parallel Computing with R

BatchJobs

Batch Computing with R

Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.

1348

HighPerformance and Parallel Computing with R

batchtools

Tools for Computation on Batch Systems

As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers ‘IBM Spectrum LSF’ (<https://www.ibm.com/usen/marketplace/hpcworkloadmanagement>), ‘OpenLava’ (<http://www.openlava.org/>), ‘Univa Grid Engine’/‘Oracle Grid Engine’ (<http://www.univa.com/>), ‘Slurm’ (<http://slurm.schedmd.com/>), ‘TORQUE/PBS’ (<http://www.adaptivecomputing.com/products/opensource/torque/>), or ‘Docker Swarm’ (<https://docs.docker.com/swarm/>). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define largescale computer experiments in a wellorganized and reproducible way.

1349

HighPerformance and Parallel Computing with R

bcp

Bayesian Analysis of Change Point Problems

Provides an implementation of the Barry and Hartigan (1993) product partition model for the normal errors change point problem using Markov Chain Monte Carlo. It also extends the methodology to regression models on a connected graph (Wang and Emerson, 2015); this allows estimation of change point models with multivariate responses. Parallel MCMC, previously available in bcp v.3.0.0, is currently not implemented.

1350

HighPerformance and Parallel Computing with R

BDgraph

Bayesian Structure Learning in Graphical Models using BirthDeath MCMC

Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14BA889>, Mohammadi and Wit (2019) <doi:10.18637/jss.v089.i03>.

1351

HighPerformance and Parallel Computing with R

biglars

Scalable LeastAngle Regression and Lasso

Leastangle regression, lasso and stepwise regression for numeric datasets in which the number of observations is greater than the number of predictors. The functions can be used with the ff library to accomodate datasets that are too large to be held in memory.

1352

HighPerformance and Parallel Computing with R

biglm

bounded memory linear and generalized linear models

Regression for data too large to fit in memory

1353

HighPerformance and Parallel Computing with R

bigmemory

Manage Massive Matrices with Shared Memory and MemoryMapped Files

Create, store, access, and manipulate massive matrices. Matrices are allocated to shared memory and may use memorymapped files. Packages ‘biganalytics’, ‘bigtabulate’, ‘synchronicity’, and ‘bigalgebra’ provide advanced functionality.

1354

HighPerformance and Parallel Computing with R

bigstatsr

Statistical Tools for Filebacked Big Matrices

Easytouse, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memorymapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.

1355

HighPerformance and Parallel Computing with R

bnlearn

Bayesian Network Structure Learning, Parameter Learning and Inference

Bayesian network structure learning, parameter learning and inference. This package implements constraintbased (PC, GS, IAMB, InterIAMB, FastIAMB, MMPC, HitonPC, HPC), pairwise (ARACNE and ChowLiu), scorebased (HillClimbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the TreeAugmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, crossvalidation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <http://www.bnlearn.com>.

1356

HighPerformance and Parallel Computing with R

caret

Classification and Regression Training

Misc functions for training and plotting classification and regression models.

1357

HighPerformance and Parallel Computing with R

clustermq

Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM, PBS/Torque)

Evaluate arbitrary function calls using workers on HPC schedulers in single line of code. All processing is done on the network without accessing the file system. Remote schedulers are supported via SSH.

1358

HighPerformance and Parallel Computing with R

data.table

Extension of ‘data.frame’

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast characterseparatedvalue read/write. Offers a natural and flexible syntax, for faster development.

1359

HighPerformance and Parallel Computing with R

dclone

Data Cloning and MCMC Tools for Maximum Likelihood Methods

Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 (R Journal 2(2):2937). Sequential and parallel MCMC support for ‘JAGS’, ‘WinBUGS’, ‘OpenBUGS’, and ‘Stan’.

1360

HighPerformance and Parallel Computing with R

doFuture

A Universal Foreach Parallel Adapter using the Future API of the ‘future’ Package

Provides a ‘%dopar%’ adapter such that any type of futures can be used as backends for the ‘foreach’ framework.

1361

HighPerformance and Parallel Computing with R

doMC

Foreach Parallel Adaptor for ‘parallel’

Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package.

1362

HighPerformance and Parallel Computing with R

doMPI

Foreach Parallel Adaptor for the Rmpi Package

Provides a parallel backend for the %dopar% function using the Rmpi package.

1363

HighPerformance and Parallel Computing with R

doRedis

Foreach parallel adapter for the rredis package

A Redis parallel backend for the %dopar% function

1364

HighPerformance and Parallel Computing with R

doRNG

Generic Reproducible Parallel Backend for ‘foreach’ Loops

Provides functions to perform reproducible parallel foreach loops, using independent random streams as generated by L’Ecuyer’s combined multiplerecursive generator [L’Ecuyer (1999), <doi:10.1287/opre.47.1.159>]. It enables to easily convert standard %dopar% loops into fully reproducible loops, independently of the number of workers, the task scheduling strategy, or the chosen parallel environment and associated foreach backend.

1365

HighPerformance and Parallel Computing with R

doSNOW

Foreach Parallel Adaptor for the ‘snow’ Package

Provides a parallel backend for the %dopar% function using the snow package of Tierney, Rossini, Li, and Sevcikova.

1366

HighPerformance and Parallel Computing with R

dqrng

Fast Pseudo Random Number Generators

Several fast random number generators are provided as C++ header only libraries: The PCG family by O’Neill (2014 <https://www.cs.hmc.edu/tr/hmccs20140905.pdf>) as well as Xoroshiro128+ and Xoshiro256+ by Blackman and Vigna (2018 <arXiv:1805.01407>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+ and Xoshiro256+ as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011 <doi:10.1145/2063384.2063405>) as provided by the package ‘sitmo’.

1367

HighPerformance and Parallel Computing with R

drake

A Pipeline Toolkit for Reproducible Computation at Scale

A generalpurpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginnerfriendly tutorials to practical examples and more, is available at the reference website <https://ropensci.github.io/drake/> and the online manual <https://ropenscilabs.github.io/drakemanual/>.

1368

HighPerformance and Parallel Computing with R

ff

MemoryEfficient Storage of Large Data on Disk and Fast Access Functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory  the effective virtual memory consumption per ff object. ff supports R’s standard atomic data types ‘double’, ‘logical’, ‘raw’ and ‘integer’ and nonstandard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example ‘quad’ allows efficient storage of genomic data as an ‘A’,‘T’,‘G’,‘C’ factor. The unsigned types support ‘circular’ arithmetic. There is also support for closetoatomic types ‘factor’, ‘ordered’, ‘POSIXct’, ‘Date’ and custom closetoatomic types. ff not only has native Csupport for vectors, matrices and arrays with flexible dimorder (major columnorder, major roworder and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have welldefined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/decoding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with ‘permanent’ files as well as creating/removing ‘temporary’ ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, ‘logicals’ and nonstandard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package ‘bit’: chunked looping, fast bit operations and coercions between different objects that can store subscript information (‘bit’, ‘bitwhich’, ff ‘boolean’, ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further highperformance enhancements can be made available upon request.

1369

HighPerformance and Parallel Computing with R

ffbase

Basic Statistical Functions for Package ‘ff’

Extends the out of memory vectors of ‘ff’ with statistical functions and other utilities to ease their usage.

1370

HighPerformance and Parallel Computing with R

flowr

Streamlining Design and Deployment of Complex Workflows

This framework allows you to design and implement complex pipelines, and deploy them on your institution’s computing cluster. This has been built keeping in mind the needs of bioinformatics workflows. However, it is easily extendable to any field where a series of steps (shell commands) are to be executed in a (work)flow.

1371

HighPerformance and Parallel Computing with R

foreach

Provides Foreach Looping Construct

Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn’t require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.

1372

HighPerformance and Parallel Computing with R

future

Unified Parallel and Distributed Processing in R for Everyone

The purpose of this package is to provide a lightweight and unified Future API for sequential and parallel processing of R expression via futures. The simplest way to evaluate an expression in parallel is to use ‘x %<% { expression }’ with ‘plan(multiprocess)’. This package implements sequential, multicore, multisession, and cluster futures. With these, R expressions can be evaluated on the local machine, in parallel a set of local machines, or distributed on a mix of local and remote machines. Extensions to this package implement additional backends for processing futures via compute cluster schedulers etc. Because of its unified API, there is no need to modify any code in order switch from sequential on the local machine to, say, distributed processing on a remote compute cluster. Another strength of this package is that global variables and functions are automatically identified and exported as needed, making it straightforward to tweak existing code to make use of futures.

1373

HighPerformance and Parallel Computing with R

future.BatchJobs

A Future API for Parallel and Distributed Processing using BatchJobs

Implementation of the Future API on top of the ‘BatchJobs’ package. This allows you to process futures, as defined by the ‘future’ package, in parallel out of the box, not only on your local machine or adhoc cluster of machines, but also via highperformance compute (‘HPC’) job schedulers such as ‘LSF’, ‘OpenLava’, ‘Slurm’, ‘SGE’, and ‘TORQUE’ / ‘PBS’, e.g. ‘y < future.apply::future_lapply(files, FUN = process)’. NOTE: The ‘BatchJobs’ package is deprecated in favor of the ‘batchtools’ package. Because of this, it is recommended to use the ‘future.batchtools’ package instead of this package.

1374

HighPerformance and Parallel Computing with R

GAMBoost

Generalized linear and additive models by likelihood based boosting

This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting, using penalized Bsplines

1375

HighPerformance and Parallel Computing with R

gcbd

‘GPU’/CPU Benchmarking in DebianBased Systems

‘GPU’/CPU Benchmarking on Debianpackage based systems This package benchmarks performance of a few standard linear algebra operations (such as a matrix product and QR, SVD and LU decompositions) across a number of different ‘BLAS’ libraries as well as a ‘GPU’ implementation. To do so, it takes advantage of the ability to ‘plug and play’ different ‘BLAS’ implementations easily on a Debian and/or Ubuntu system. The current version supports  ‘Reference BLAS’ (‘refblas’) which are unaccelerated as a baseline  Atlas which are tuned but typically configure singlethreaded  Atlas39 which are tuned and configured for multithreaded mode  ‘Goto Blas’ which are accelerated and multithreaded  ‘Intel MKL’ which is a commercial accelerated and multithreaded version. As for ‘GPU’ computing, we use the CRAN package  ‘gputools’ For ‘Goto Blas’, the ‘gotoblas2helper’ script from the ISM in Tokyo can be used. For ‘Intel MKL’ we use the Revolution R packages from Ubuntu 9.10.

1376

HighPerformance and Parallel Computing with R

gpuR

GPU Functions for R Objects

Provides GPU enabled functions for R objects in a simple and approachable manner. New gpu* and vcl* classes have been provided to wrap typical R objects (e.g. vector, matrix), in both host and device spaces, to mirror typical R syntax without the need to know OpenCL.

1377

HighPerformance and Parallel Computing with R

GUIProfiler

Graphical User Interface for Rprof()

Show graphically the results of profiling R functions by tracking their execution time.

1378

HighPerformance and Parallel Computing with R

h2o

R Interface for ‘H2O’

R interface for ‘H2O’, the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, KMeans, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML).

1379

HighPerformance and Parallel Computing with R

HadoopStreaming

Utilities for using R scripts in Hadoop streaming

Provides a framework for writing map/reduce scripts for use in Hadoop Streaming. Also facilitates operating on data in a streaming fashion, without Hadoop.

1380

HighPerformance and Parallel Computing with R

HistogramTools

Utility Functions for R Histograms

Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.

1381

HighPerformance and Parallel Computing with R

inline

Functions to Inline C, C++, Fortran Function Calls from R

Functionality to dynamically define R functions and S4 methods with ‘inlined’ C, C++ or Fortran code supporting the .C and .Call calling conventions.

1382

HighPerformance and Parallel Computing with R

keras

R Interface to ‘Keras’

Interface to ‘Keras’ <https://keras.io>, a highlevel neural networks ‘API’. ‘Keras’ was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both ‘CPU’ and ‘GPU’ devices.

1383

HighPerformance and Parallel Computing with R

LaF

Fast Access to Large ASCII Files

Methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF vignette gives an overview of the functionality provided.

1384

HighPerformance and Parallel Computing with R

latentnet

Latent Position and Cluster Models for Statistical Networks

Fit and simulate latent position and cluster models for statistical networks.

1385

HighPerformance and Parallel Computing with R

lga

Tools for linear grouping analysis (LGA)

Tools for linear grouping analysis. Three userlevel functions: gap, rlga and lga.

1386

HighPerformance and Parallel Computing with R

Matching

Multivariate and Propensity Score Matching with Balance Optimization

Provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance has been obtained are also provided.

1387

HighPerformance and Parallel Computing with R

MonetDB.R

Connect MonetDB to R

Allows to pull data from MonetDB into R. Includes a DBI implementation and a dplyr backend.

1388

HighPerformance and Parallel Computing with R

mvnfast

Fast Multivariate Normal and Student’s t Methods

Provides computationally efficient tools related to the multivariate normal and Student’s t distributions. The main functionalities are: simulating multivariate random vectors, evaluating multivariate normal or Student’s t densities and Mahalanobis distances. These tools are very efficient thanks to the use of C++ code and of the OpenMP API.

1389

HighPerformance and Parallel Computing with R

nws

R functions for NetWorkSpaces and Sleigh

Provides coordination and parallel execution facilities, as well as limited crosslanguage data exchange, using the netWorkSpaces server developed by REvolution Computing

1390

HighPerformance and Parallel Computing with R

OpenCL

Interface Allowing R to Use OpenCL

Provides an interface to OpenCL, allowing R to leverage computing power of GPUs and other HPC accelerator devices.

1391

HighPerformance and Parallel Computing with R

orloca

Operations Research LOCational Analysis Models

Objects and methods to handle and solve the minsum location problem, also known as FermatWeber problem. The minsum location problem search for a point such that the weighted sum of the distances to the demand points are minimized. See “The FermatWeber location problem revisited” by Brimberg, Mathematical Programming, 1, pg. 7176, 1995. <doi:10.1007/BF01592245>. General global optimization algorithms are used to solve the problem, along with the adhoc Weiszfeld method, see “Sur le point pour lequel la Somme des distances de n points donnes est minimum”, by Weiszfeld, Tohoku Mathematical Journal, First Series, 43, pg. 355386, 1937 or “On the point for which the sum of the distances to n given points is minimum”, by E. Weiszfeld and F. Plastria, Annals of Operations Research, 167, pg. 741, 2009. <doi:10.1007/s104790080352z>.

1392

HighPerformance and Parallel Computing with R

parSim

Parallel Simulation Studies

Perform flexible simulation studies using one or multiple computer cores. The package is set up to be usable on highperformance clusters in addition to being run locally, see examples on <https://github.com/SachaEpskamp/parSim>.

1393

HighPerformance and Parallel Computing with R

partDSA

Partitioning Using Deletion, Substitution, and Addition Moves

A novel tool for generating a piecewise constant estimation list of increasingly complex predictors based on an intensive and comprehensive search over the entire covariate space.

1394

HighPerformance and Parallel Computing with R

pbapply

Adding Progress Bar to ’*apply’ Functions

A lightweight package that adds progress bar to vectorized R functions (’*apply’). The implementation can easily be added to functions where showing the progress is useful (e.g. bootstrap). The type and style of the progress bar (with percentages or remaining time) can be set through options. Supports several parallel processing backends.

1395

HighPerformance and Parallel Computing with R

pbdBASE

Programming with Big Data Base Wrappers for Distributed Matrices

An interface to and extensions for the ‘PBLAS’ and ‘ScaLAPACK’ numerical libraries. This enables R to utilize distributed linear algebra for codes written in the ‘SPMD’ fashion. This interface is deliberately lowlevel and mimics the style of the native libraries it wraps. For a much higher level way of managing distributed matrices, see the ‘pbdDMAT’ package.

1396

HighPerformance and Parallel Computing with R

pbdDEMO

Programming with Big Data Demonstrations and Examples Using ‘pbdR’ Packages

A set of demos of ‘pbdR’ packages, together with a useful, unifying vignette.

1397

HighPerformance and Parallel Computing with R

pbdDMAT

‘pbdR’ Distributed Matrix Methods

A set of classes for managing distributed matrices, and a collection of methods for computing linear algebra and statistics. Computation is handled mostly by routines from the ‘pbdBASE’ package, which itself relies on the ‘ScaLAPACK’ and ‘PBLAS’ numerical libraries for distributed computing.

1398

HighPerformance and Parallel Computing with R

pbdMPI

Programming with Big Data Interface to MPI

An efficient interface to MPI by utilizing S4 classes and methods with a focus on Single Program/Multiple Data (‘SPMD’) parallel programming style, which is intended for batch parallel execution.

1399

HighPerformance and Parallel Computing with R

pbdNCDF4

Programming with Big Data Interface to Parallel Unidata NetCDF4 Format Data Files

This package adds collective parallel read and write capability to the R package ncdf4 version 1.8. Typical use is as a parallel NetCDF4 file reader in SPMD style programming. Each R process reads and writes its own data in a synchronized collective mode, resulting in faster parallel performance. Performance improvement is conditional on a parallel file system.

1400

HighPerformance and Parallel Computing with R

pbdPROF

Programming with Big Data ― MPI Profiling Tools

MPI profiling tools.

1401

HighPerformance and Parallel Computing with R

pbdSLAP

Programming with Big Data Scalable Linear Algebra Packages

Utilizing scalable linear algebra packages mainly including ‘BLACS’, ‘PBLAS’, and ‘ScaLAPACK’ in double precision via ‘pbdMPI’ based on ‘ScaLAPACK’ version 2.0.2.

1402

HighPerformance and Parallel Computing with R

peperr

Parallelised Estimation of Prediction Error

Designed for prediction error estimation through resampling techniques, possibly accelerated by parallel execution on a compute cluster. Newly developed model fitting routines can be easily incorporated.

1403

HighPerformance and Parallel Computing with R

permGPU

Using GPUs in Statistical Genomics

Can be used to carry out permutation resampling inference in the context of RNA microarray studies.

1404

HighPerformance and Parallel Computing with R

pls

Partial Least Squares and Principal Component Regression

Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).

1405

HighPerformance and Parallel Computing with R

pmclust

Parallel ModelBased Clustering using ExpectationGatheringMaximization Algorithm for Finite Mixture Gaussian Model

Aims to utilize modelbased clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs ‘pbdMPI’ to perform a expectationgatheringmaximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through ‘pbdMPI’ and MPI’ implementations such as ‘OpenMPI’ and ‘MPICH’. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.

1406

HighPerformance and Parallel Computing with R

profr

An Alternative Display for Profiling Information

An alternative data structure and visual rendering for the profiling information generated by Rprof.

1407

HighPerformance and Parallel Computing with R

proftools

Profile Output Processing Tools for R

Tools for examining Rprof profile output.

1408

HighPerformance and Parallel Computing with R

profvis

Interactive Visualizations for Profiling R Code

Interactive visualizations for profiling R code.

1409

HighPerformance and Parallel Computing with R

pvclust

Hierarchical Clustering with PValues via Multiscale Bootstrap Resampling

An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) pvalue as well as BP (bootstrap probability) value for each cluster in a dendrogram.

1410

HighPerformance and Parallel Computing with R

qsub

Running Commands Remotely on ‘Gridengine’ Clusters

Run lapply() calls in parallel by submitting them to ‘gridengine’ clusters using the ‘qsub’ command.

1411

HighPerformance and Parallel Computing with R

randomForestSRC

Fast Unified Random Forests for Survival, Regression, and Classification (RFSRC)

Fast OpenMP parallel computing of Breiman’s random forests for survival, competing risks, regression and classification based on Ishwaran and Kogalur’s popular random survival forests (RSF) package. Handles missing data and now includes multivariate, unsupervised forests, quantile regression and solutions for class imbalanced data. New fast interface using subsampling and confidence regions for variable importance.

1412

HighPerformance and Parallel Computing with R

Rborist

Extensible, Parallelizable Implementation of the Random Forest Algorithm

Scalable implementation of classification and regression forests, as described by Breiman (2001), <doi:10.1023/A:1010933404324>.

1413

HighPerformance and Parallel Computing with R

Rcpp

Seamless R and C++ Integration

The ‘Rcpp’ package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of thirdparty libraries. Documentation about ‘Rcpp’ is provided by several vignettes included in this package, via the ‘Rcpp Gallery’ site at <http://gallery.rcpp.org>, the paper by Eddelbuettel and Francois (2011, <doi:10.18637/jss.v040.i08>), the book by Eddelbuettel (2013, <doi:10.1007/9781461468684>) and the paper by Eddelbuettel and Balamuta (2018, <doi:10.1080/00031305.2017.1375990>); see ‘citation(“Rcpp”)’ for details.

1414

HighPerformance and Parallel Computing with R

RcppParallel

Parallel Programming Tools for ‘Rcpp’

High level functions for parallel programming with ‘Rcpp’. For example, the ‘parallelFor()’ function can be used to convert the work of a standard serial “for” loop into a parallel one and the ‘parallelReduce()’ function can be used for accumulating aggregate or other values.

1415

HighPerformance and Parallel Computing with R

Rdsm

Threads Environment for R

Provides a threadstype programming environment for R. The package gives the R programmer the clearer, more concise shared memory world view, and in some cases gives superior performance as well. In addition, it enables parallel processing on very large, outofcore matrices.

1416

HighPerformance and Parallel Computing with R

reticulate

Interface to ‘Python’

Interface to ‘Python’ modules, classes, and functions. When calling into ‘Python’, R data types are automatically converted to their equivalent ‘Python’ types. When values are returned from ‘Python’ to R they are converted back to R types. Compatible with all versions of ‘Python’ >= 2.7.

1417

HighPerformance and Parallel Computing with R

rgenoud

R Version of GENetic Optimization Using Derivatives

A genetic algorithm plus derivative optimizer.

1418

HighPerformance and Parallel Computing with R

Rhpc

Permits *apply() Style Dispatch for ‘HPC’

Function of apply style using ‘MPI’ provides better ‘HPC’ environment on R. And this package supports long vector, can deal with slightly big data.

1419

HighPerformance and Parallel Computing with R

RhpcBLASctl

Control the Number of Threads on ‘BLAS’

Control the number of threads on ‘BLAS’ (Aka ‘GotoBLAS’, ‘OpenBLAS’, ‘ACML’, ‘BLIS’ and ‘MKL’). And possible to control the number of threads in ‘OpenMP’. Get a number of logical cores and physical cores if feasible.

1420

HighPerformance and Parallel Computing with R

RInside

C++ Classes to Embed R in C++ Applications

C++ classes to embed R in C++ applications A C++ class providing the R interpreter is offered by this package making it easier to have “R inside” your C++ application. As R itself is embedded into your application, a shared library build of R is required. This works on Linux, OS X and even on Windows provided you use the same tools used to build R itself. d Numerous examples are provided in the eight subdirectories of the examples/ directory of the installed package: standard, ‘mpi’ (for parallel computing), ‘qt’ (showing how to embed ‘RInside’ inside a Qt GUI application), ‘wt’ (showing how to build a “webapplication” using the Wt toolkit), ‘armadillo’ (for ‘RInside’ use with ‘RcppArmadillo’) and ‘eigen’ (for ‘RInside’ use with ‘RcppEigen’). The examples use ‘GNUmakefile(s)’ with GNU extensions, so a GNU make is required (and will use the ‘GNUmakefile’ automatically). ‘Doxygen’generated documentation of the C++ classes is available at the ‘RInside’ website as well.

1421

HighPerformance and Parallel Computing with R

rJava

LowLevel R to Java Interface

Lowlevel interface to Java VM very much like .C/.Call and friends. Allows creation of objects, calling methods and accessing fields.

1422

HighPerformance and Parallel Computing with R

rlecuyer

R Interface to RNG with Multiple Streams

Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L’Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.

1423

HighPerformance and Parallel Computing with R

Rmpi (core)

Interface (Wrapper) to MPI (MessagePassing Interface)

An interface (wrapper) to MPI. It also provides interactive R manager and worker environment.

1424

HighPerformance and Parallel Computing with R

RProtoBuf

R Interface to the ‘Protocol Buffers’ ‘API’ (Version 2 or 3)

Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal ‘RPC’ protocols and file formats. Additional documentation is available in two included vignettes one of which corresponds to our ‘JSS’ paper (2016, <doi:10.18637/jss.v071.i02>. Either version 2 or 3 of the ‘Protocol Buffers’ ‘API’ is supported.

1425

HighPerformance and Parallel Computing with R

rredis

“Redis” Key/Value Database Client

R client interface to the “Redis” keyvalue database.

1426

HighPerformance and Parallel Computing with R

rslurm

Submit R Calculations to a Slurm Cluster

Functions that simplify submitting R scripts to a Slurm workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.

1427

HighPerformance and Parallel Computing with R

rstream

Streams of Random Numbers

Unified object oriented interface for multiple independent streams of random numbers from different sources.

1428

HighPerformance and Parallel Computing with R

Sim.DiffProc

Simulation of Diffusion Processes

It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1562523422.

1429

HighPerformance and Parallel Computing with R

sitmo

Parallel Pseudo Random Number Generator (PPRNG) ‘sitmo’ Header Files

Provided within are two high quality and fast PPRNGs that may be used in an ‘OpenMP’ parallel environment. In addition, there is a generator for one dimensional lowdiscrepancy sequence. The objective of this library to consolidate the distribution of the ‘sitmo’ (C++98 & C++11), ‘threefry’ and ‘vandercorput’ (C++11only) engines on CRAN by enabling others to link to the header files inside of ‘sitmo’ instead of including a copy of each engine within their individual package. Lastly, the package contains example implementations using the ‘sitmo’ package and three accompanying vignette that provide additional information.

1430

HighPerformance and Parallel Computing with R

snow (core)

Simple Network of Workstations

Support for simple parallel computing in R.

1431

HighPerformance and Parallel Computing with R

snowfall

Easier cluster computing (based on snow)

Usability wrapper around snow for easier development of parallel R programs. This package offers e.g. extended error checks, and additional functions. All functions work in sequential mode, too, if no cluster is present or wished. Package is also designed as connector to the cluster management tool sfCluster, but can also used without it.

1432

HighPerformance and Parallel Computing with R

snowFT

Fault Tolerant Simple Network of Workstations

Extension of the snow package supporting fault tolerant and reproducible applications, as well as supporting easytouse parallel programming  only one function is needed. Dynamic cluster size is also available.

1433

HighPerformance and Parallel Computing with R

speedglm

Fitting Linear and Generalized Linear Models to Large Data Sets

Fitting linear models and generalized linear models to large data sets by updating algorithms.

1434

HighPerformance and Parallel Computing with R

sqldf

Manipulate R Data Frames Using SQL

The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. ‘RSQLite’, ‘RH2’, ‘RMySQL’ and ‘RPostgreSQL’ backends are supported.

1435

HighPerformance and Parallel Computing with R

ssgraph

Bayesian Graphical Estimation using SpikeandSlab Priors

Bayesian estimation for undirected graphical models using spikeandslab priors. The package handles continuous, discrete, and mixed data. To speed up the computations, the computationally intensive tasks of the package are implemented in C++ in parallel using OpenMP.

1436

HighPerformance and Parallel Computing with R

STAR

Spike Train Analysis with R

Functions to analyze neuronal spike trains from a single neuron or from several neurons recorded simultaneously.

1437

HighPerformance and Parallel Computing with R

tensorflow

R Interface to ‘TensorFlow’

Interface to ‘TensorFlow’ <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

1438

HighPerformance and Parallel Computing with R

tfestimators

Interface to ‘TensorFlow’ Estimators

Interface to ‘TensorFlow’ Estimators <https://www.tensorflow.org/programmers_guide/estimators>, a highlevel API that provides implementations of many different model types including linear models and deep neural networks.

1439

HighPerformance and Parallel Computing with R

tm

Text Mining Package

A framework for text mining applications within R.

1440

HighPerformance and Parallel Computing with R

varSelRF

Variable Selection using Random Forests

Variable selection from random forests using both backwards variable elimination (for the selection of small sets of nonredundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highlycorrelated variables). Main applications in highdimensional data (e.g., microarray data, and other genomics and proteomics applications).

1441

HighPerformance and Parallel Computing with R

xgboost

Extreme Gradient Boosting

Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.

1442

Hydrological Data and Modeling

airGR

Suite of GR Hydrological Models for PrecipitationRunoff Modelling

Hydrological modelling tools developed at IrsteaAntony (HYCAR Research Unit, France). The package includes several conceptual rainfallrunoff models (GR4H, GR4J, GR5J, GR6J, GR2M, GR1A), a snow accumulation and melt model (CemaNeige) and the associated functions for their calibration and evaluation. Use help(airGR) for package description and references.

1443

Hydrological Data and Modeling

airGRteaching

Teaching Hydrological Modelling with the GR RainfallRunoff Models (‘Shiny’ Interface Included)

Addon package to the ‘airGR’ package that simplifies its use and is aimed at being used for teaching hydrology. The package provides 1) three functions that allow to complete very simply a hydrological modelling exercise 2) plotting functions to help students to explore observed data and to interpret the results of calibration and simulation of the GR (‘Genie rural’) models 3) a ‘Shiny’ graphical interface that allows for displaying the impact of model parameters on hydrographs and models internal variables.

1444

Hydrological Data and Modeling

berryFunctions

Function Collection Related to Plotting and Hydrology

Draw horizontal histograms, color scattered points by 3rd dimension, enhance date and logaxis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.

1445

Hydrological Data and Modeling

bigleaf

Physical and Physiological Ecosystem Properties from Eddy Covariance Data

Calculation of physical (e.g. aerodynamic conductance, surface temperature), and physiological (e.g. canopy conductance, wateruse efficiency) ecosystem properties from eddy covariance data and accompanying meteorological measurements. Calculations assume the land surface to behave like a ‘bigleaf’ and return bulk ecosystem/canopy variables.

1446

Hydrological Data and Modeling

biotic

Calculation of Freshwater Biotic Indices

Calculates a range of UK freshwater invertebrate biotic indices including BMWP, Whalley, WHPT, Habitatspecific BMWP, AWIC, LIFE and PSI.

1447

Hydrological Data and Modeling

bomrang

Australian Government Bureau of Meteorology (‘BOM’) Data Client

Provides functions to interface with Australian Government Bureau of Meteorology (‘BOM’) data, fetching data and returning a tidy data frame of precis forecasts, historical and current weather data from stations, agriculture bulletin data, ‘BOM’ 0900 or 1500 weather bulletins and downloading and importing radar and satellite imagery files. Data (c) Australian Government Bureau of Meteorology Creative Commons (CC) Attribution 3.0 licence or Public Access Licence (PAL) as appropriate. See <http://www.bom.gov.au/other/copyright.shtml> for further details.

1448

Hydrological Data and Modeling

boussinesq

Analytic Solutions for (groundwater) Boussinesq Equation

This package is a collection of R functions implemented from published and available analytic solutions for the OneDimensional Boussinesq Equation (groundwater). In particular, the function “beq.lin” is the analytic solution of the linearized form of Boussinesq Equation between two different headbased boundary (Dirichlet) conditions; “beq.song” is the nonlinear powerseries analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.

1449

Hydrological Data and Modeling

CityWaterBalance

Track Flows of Water Through an Urban System

Retrieves data and estimates unmeasured flows of water through the urban network. Any city may be modeled with preassembled data, but data for US cities can be gathered via web services using this package and dependencies ‘geoknife’ and ‘dataRetrieval’.

1450

Hydrological Data and Modeling

clifro

Easily Download and Visualise Climate Data from CliFlo

CliFlo is a web portal to the New Zealand National Climate Database and provides public access (via subscription) to around 6,500 various climate stations (see <https://cliflo.niwa.co.nz/> for more information). Collating and manipulating data from CliFlo (hence clifro) and importing into R for further analysis, exploration and visualisation is now straightforward and coherent. The user is required to have an internet connection, and a current CliFlo subscription (free) if data from stations, other than the public Reefton electronic weather station, is sought.

1451

Hydrological Data and Modeling

climatol

Climate Tools (Series Homogenization and Derived Products)

Functions for the quality control, homogenization and missing data infilling of climatological series and to obtain climatological summaries and grids from the results. Also functions to draw windroses and Walter&Lieth climate diagrams.

1452

Hydrological Data and Modeling

climdex.pcic

PCIC Implementation of Climdex Routines

PCIC’s implementation of Climdex routines for computation of extreme climate indices.

1453

Hydrological Data and Modeling

CoSMoS

Complete Stochastic Modelling Solution

A single framework, unifying, extending, and improving a generalpurpose modelling strategy, based on the assumption that any process can emerge by transforming a specific ‘parent’ Gaussian process Papalexiou (2018) <doi:10.1016/j.advwatres.2018.02.013>.

1454

Hydrological Data and Modeling

countyweather

Compiles Meterological Data for U.S. Counties

Interacts with NOAA data sources (including the NCDC API at <http://www.ncdc.noaa.gov/cdoweb/webservices/v2> and ISD data) using functions from the ‘rnoaa’ package to obtain and compile weather time series for U.S. counties. This work was supported in part by grants from the National Institute of Environmental Health Sciences (R00ES022631) and the Colorado State University Water Center.

1455

Hydrological Data and Modeling

dataRetrieval

Retrieval Functions for USGS and EPA Hydrologic and Water Quality Data

Collection of functions to help retrieve U.S. Geological Survey (USGS) and U.S. Environmental Protection Agency (EPA) water quality and hydrology data from web services. USGS web services are discovered from National Water Information System (NWIS) <https://waterservices.usgs.gov/> and <https://waterdata.usgs.gov/nwis>. Both EPA and USGS water quality data are obtained from the Water Quality Portal <https://www.waterqualitydata.us/>.

1456

Hydrological Data and Modeling

dbhydroR

‘DBHYDRO’ Hydrologic and Water Quality Data

Client for programmatic access to the South Florida Water Management District’s ‘DBHYDRO’ database at <https://www.sfwmd.gov/sciencedata/dbhydro>, with functions for accessing hydrologic and water quality data.

1457

Hydrological Data and Modeling

driftR

Drift Correcting Water Quality Data

A tidy implementation of equations that correct for instrumental drift in continuous water quality monitoring data. There are many sources of water quality data including private (ex: YSI instruments) and open source (ex: USGS and NDBC), each of which are susceptible to errors/inaccuracies due to drift. This package allows the user to correct their data using one or two standard reference values in a uniform, reproducible way. The equations implemented are from Hasenmueller (2011) <doi:10.7936/K7N014KS>.

1458

Hydrological Data and Modeling

dynatopmodel

Implementation of the Dynamic TOPMODEL Hydrological Model

A native R implementation and enhancement of the Dynamic TOPMODEL semidistributed hydrological model. Includes some preprocessing, utility and routines for displaying outputs.

1459

Hydrological Data and Modeling

Ecohydmod

Ecohydrological Modelling

Simulates the soil water balance (soil moisture, evapotranspiration, leakage and runoff), rainfall series by using the marked Poisson process and the vegetation growth through the normalized difference vegetation index (NDVI). Please see Souza et al. (2016) <doi:10.1002/hyp.10953>.

1460

Hydrological Data and Modeling

EcoHydRology (core)

A Community Modeling Foundation for EcoHydrology

Provides a flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex ecohydrological interactions.

1461

Hydrological Data and Modeling

ecoval

Procedures for Ecological Assessment of Surface Waters

Functions for evaluating and visualizing ecological assessment procedures for surface waters containing physical, chemical and biological assessments in the form of value functions.

1462

Hydrological Data and Modeling

EGRET

Exploration and Graphics for RivEr Trends

Statistics and graphics for streamflow history, water quality trends, and the statistical modeling algorithm: Weighted Regressions on Time, Discharge, and Season (WRTDS). The modeling method is introduced and discussed in Hirsch et al. (2010) <doi:10.1111/j.17521688.2010.00482.x>, and expanded in Hirsch and De Cicco (2015) <doi:10.3133/tm4A10>.

1463

Hydrological Data and Modeling

EGRETci

Exploration and Graphics for RivEr Trends Confidence Intervals

Collection of functions to evaluate uncertainty of results from water quality analysis using the Weighted Regressions on Time Discharge and Season (WRTDS) method. This package is an addon to the EGRET package that performs the WRTDS analysis. The WRTDS modeling method was initially introduced and discussed in Hirsch et al. (2010) <doi:10.1111/j.17521688.2010.00482.x>, and expanded in Hirsch and De Cicco (2015) <doi:10.3133/tm4A10>. The paper describing the uncertainty and confidence interval calculations is Hirsch et al. (2015) <doi:10.1016/j.envsoft.2015.07.017>.

1464

Hydrological Data and Modeling

Evapotranspiration

Modelling Actual, Potential and Reference Crop Evapotranspiration

Uses data and constants to calculate potential evapotranspiration (PET) and actual evapotranspiration (AET) from 21 different formulations including Penman, PenmanMonteith FAO 56, PriestleyTaylor and Morton formulations.

1465

Hydrological Data and Modeling

FAdist

Distributions that are Sometimes Used in Hydrology

Probability distributions that are sometimes useful in hydrology.

1466

Hydrological Data and Modeling

FlowScreen

Daily Streamflow Trend and Change Point Screening

Screens daily streamflow time series for temporal trends and changepoints. This package has been primarily developed for assessing the quality of daily streamflow time series. It also contains tools for plotting and calculating many different streamflow metrics. The package can be used to produce summary screening plots showing changepoints and significant temporal trends for high flow, low flow, and/or baseflow statistics, or it can be used to perform more detailed hydrological time series analyses. The package was designed for screening daily streamflow time series from Water Survey Canada and the United States Geological Survey but will also work with streamflow time series from many other agencies.

1467

Hydrological Data and Modeling

geoknife

WebProcessing of Large Gridded Datasets

Processes gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a webenabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers.

1468

Hydrological Data and Modeling

geotopbricks

An R Plugin for the Distributed Hydrological Model GEOtop

It analyzes raster maps and other information as input/output files from the Hydrological Distributed Model GEOtop. It contains functions and methods to import maps and other keywords from geotop.inpts file. Some examples with simulation cases of GEOtop 2.x/3.x are presented in the package. Any information about the GEOtop Distributed Hydrological Model source code is available on www.geotop.org. Technical details about the model are available in Endrizzi et al, 2014 (<http://www.geoscimodeldev.net/7/2831/2014/gmd728312014.html>).

1469

Hydrological Data and Modeling

getMet

Get Meteorological Data for Hydrologic Models

Hydrologic models often require users to collect and format input meteorological data. This package contains functions for sourcing, formatting, and editing meteorological data for hydrologic models.

1470

Hydrological Data and Modeling

GSODR

Global Surface Summary of the Day (‘GSOD’) Weather Data Client

Provides automated downloading, parsing, cleaning, unit conversion and formatting of Global Surface Summary of the Day (‘GSOD’) weather data from the from the USA National Centers for Environmental Information (‘NCEI’) for use in R. Units are converted from from United States Customary System (‘USCS’) units to International System of Units (‘SI’). Stations may be individually checked for number of missing days defined by the user, where stations with too many missing observations are omitted. Only stations with valid reported latitude and longitude values are permitted in the final data. Additional useful elements, saturation vapour pressure (‘es’), actual vapour pressure (‘ea’) and relative humidity are calculated from the original data and included in the final data set. The resulting data include station identification information, state, country, latitude, longitude, elevation, weather observations and associated flags. Additional data are included with this R package: a list of elevation values for stations between 60 and 60 degrees latitude derived from the Shuttle Radar Topography Measuring Mission (‘SRTM’). For information on the ‘GSOD’ data from ‘NCEI’, please see the ‘GSOD’ ‘readme.txt’ file available from, <http://www1.ncdc.noaa.gov/pub/data/gsod/readme.txt>.

1471

Hydrological Data and Modeling

GWSDAT

GroundWater Spatiotemporal Data Analysis Tool (GWSDAT)

Shiny application for the analysis of groundwater monitoring data, designed to work with simple timeseries data for solute concentration and ground water elevation, but can also plot nonaqueous phase liquid (NAPL) thickness if required. Also provides the import of a site basemap in GIS shapefile format.

1472

Hydrological Data and Modeling

hddtools

Hydrological Data Discovery Tools

Facilitates discovery and handling of hydrological data, access to catalogues and databases.

1473

Hydrological Data and Modeling

humidity

Calculate Water Vapor Measures from Temperature and Dew Point

Vapor pressure, relative humidity, absolute humidity, specific humidity, and mixing ratio are commonly used water vapor measures in meteorology. This R package provides functions for calculating saturation vapor pressure (hPa), partial water vapor pressure (Pa), relative humidity (%), absolute humidity (kg/m^3), specific humidity (kg/kg), and mixing ratio (kg/kg) from temperature (K) and dew point (K). Conversion functions between humidity measures are also provided.

1474

Hydrological Data and Modeling

hydroApps

Tools and models for hydrological applications

Package providing tools for hydrological applications and models developed for regional analysis in Northwestern Italy.

1475

Hydrological Data and Modeling

hydrogeo

Groundwater Data Presentation and Interpretation

Contains one function for drawing Piper diagrams (also called PiperHill diagrams) of water analyses for major ions.

1476

Hydrological Data and Modeling

hydroGOF (core)

GoodnessofFit Functions for Comparison of Simulated and Observed Hydrological Time Series

S3 functions implementing both statistical and graphical goodnessoffit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological models. Missing values in observed and/or simulated values can be removed before computations. Comments / questions / collaboration of any kind are very welcomed.

1477

Hydrological Data and Modeling

hydrolinks

Hydrologic Network Linking Data and Tools

Tools to link geographic data with hydrologic network, including lakes, streams and rivers. Includes automated download of U.S. National Hydrography Network and other hydrolayers.

1478

Hydrological Data and Modeling

HydroMe

R codes for estimating water retention and infiltration model parameters using experimental data

This package is version 2 of HydroMe v.1 package. It estimates the parameters in infiltration and water retention models by curvefitting method. The models considered are those that are commonly used in soil science. It has new models for water retention characteristic curve and debugging of errors in HydroMe v.1

1479

Hydrological Data and Modeling

hydroPSO

Particle Swarm Optimisation, with Focus on Environmental Models

Stateoftheart version of the Particle Swarm Optimisation (PSO) algorithm (SPSO2011 and SPSO2007 capable). hydroPSO can be used as a replacement of the ‘optim’ R function for (global) optimization of nonsmooth and nonlinear functions. However, the main focus of hydroPSO is the calibration of environmental and other realworld models that need to be executed from the system console. hydroPSO is modelindependent, allowing the user to easily interface any computer simulation model with the calibration engine (PSO). hydroPSO communicates with the model through the model’s own input and output files, without requiring access to the model’s source code. Several PSO variants and controlling options are included to finetune the performance of the calibration engine to different calibration problems. An advanced sensitivity analysis function together with userfriendly plotting summaries facilitate the interpretation and assessment of the calibration results. hydroPSO is parallelcapable, to alleviate the computational burden of complex models with “long” execution time. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian). See ZambranoBigiarini and Rojas (2013) <doi:10.1016/j.envsoft.2013.01.004> for more details.

1480

Hydrological Data and Modeling

hydroscoper

Interface to the Greek National Data Bank for Hydrometeorological Information

R interface to the Greek National Data Bank for Hydrological and Meteorological Information <http://www.hydroscope.gr/>. It covers Hydroscope’s data sources and provides functions to transliterate, translate and download them into tidy dataframes.

1481

Hydrological Data and Modeling

hydrostats

Hydrologic Indices for Daily Time Series Data

Calculates a suite of hydrologic indices for daily time series data that are widely used in hydrology and stream ecology.

1482

Hydrological Data and Modeling

hydroTSM (core)

Time Series Management, Analysis and Interpolation for Hydrological Modelling

S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.

1483

Hydrological Data and Modeling

hyfo

Hydrology and Climate Forecasting

Focuses on data processing and visualization in hydrology and climate forecasting. Main function includes data extraction, data downscaling, data resampling, gap filler of precipitation, bias correction of forecasting data, flexible time series plot, and spatial map generation. It is a good pre processing and postprocessing tool for hydrological and hydraulic modellers.

1484

Hydrological Data and Modeling

IDF

Estimation and Plotting of IDF Curves

Intensitydurationfrequency (IDF) curves are a widely used analysistool in hydrology to assess extreme values of precipitation [e.g. Mailhot et al., 2007, <doi:10.1016/j.jhydrol.2007.09.019>]. The package ‘IDF’ provides a function to read precipitation data from German weather service (DWD) ‘webwerdis’ <http://www.dwd.de/EN/ourservices/webwerdis/webwerdis.html> files and Berlin station data from ‘Stadtmessnetz’ <http://www.geo.fuberlin.de/en/met/service/stadtmessnetz/index.html> files, and additionally IDF parameters can be estimated also from a given data.frame containing a precipitation time series. The data is aggregated to given levels yearly intensity maxima are calculated either for the whole year or given months. From these intensity maxima IDF parameters are estimated on the basis of a durationdependent generalised extreme value distribution [Koutsoyannis et al., 1998, <doi:10.1016/S00221694(98)000973>]. IDF curves based on these estimated parameters can be plotted.

1485

Hydrological Data and Modeling

kitagawa

Spectral Response of Water Wells to Harmonic Strain and Pressure Signals

Provides tools to calculate the theoretical hydrodynamic response of an aquifer undergoing harmonic straining or pressurization, or analyze measured responses. There are two classes of models here: (1) for sealed wells, based on the model of Kitagawa et al (2011, <doi:10.1029/2010JB007794>), and (2) for open wells, based on the models of Cooper et al (1965, <doi:10.1029/JZ070i016p03915>), Hsieh et al (1987, <doi:10.1029/WR023i010p01824>), Rojstaczer (1988, <doi:10.1029/JB093iB11p13619>), and Liu et al (1989, <doi:10.1029/JB094iB07p09453>). These models treat strain (or aquifer head) as an input to the physical system, and fluidpressure (or water height) as the output. The applicable frequency band of these models is characteristic of seismic waves, atmospheric pressure fluctuations, and solid earth tides.

1486

Hydrological Data and Modeling

kiwisR

A Wrapper for Querying KISTERS ‘WISKI’ Databases via the ‘KiWIS’ API

A wrapper for querying ‘WISKI’ databases via the ‘KiWIS’ ‘REST’ API. ‘WISKI’ is an ‘SQL’ relational database used for the collection and storage of water data developed by KISTERS and ‘KiWIS’ is a ‘REST’ service that provides access to ‘WISKI’ databases via HTTP requests (<https://water.kisters.de/en/technologytrends/kistersandopendata/>). Contains a list of default databases (called ‘hubs’) and also allows users to provide their own ‘KiWIS’ URL. Supports the entire query process from metadata to specific time series values. All data is returned as tidy tibbles.

1487

Hydrological Data and Modeling

kwb.hantush

Calculation of Groundwater Mounding Beneath an Infiltration Basin

Calculation groundwater mounding beneath an infiltration basin based on the Hantush (1967) equation (http://doi.org/10.1029/WR003i001p00227). The correct implementation is shown with a verification example based on a USGS report (page 25, http://pubs.usgs.gov/sir/2010/5102/support/sir20105102.pdf).

1488

Hydrological Data and Modeling

lakemorpho

Lake Morphometry Metrics

Lake morphometry metrics are used by limnologists to understand, among other things, the ecological processes in a lake. Traditionally, these metrics are calculated by hand, with planimeters, and increasingly with commercial GIS products. All of these methods work; however, they are either outdated, difficult to reproduce, or require expensive licenses to use. The ‘lakemorpho’ package provides the tools to calculate a typical suite of these metrics from an input elevation model and lake polygon. The metrics currently supported are: fetch, major axis, minor axis, major/minor axis ratio, maximum length, maximum width, mean width, maximum depth, mean depth, shoreline development, shoreline length, surface area, and volume.

1489

Hydrological Data and Modeling

lfstat

Calculation of Low Flow Statistics for Daily Stream Flow Data

The “Manual on Lowflow Estimation and Prediction”, published by the World Meteorological Organisation (WMO), gives a comprehensive summary on how to analyse stream flow data focusing on lowflows. This packages provides functions to compute the described statistics and produces plots similar to the ones in the manual.

1490

Hydrological Data and Modeling

LPM

Linear Parametric Models Applied to Hydrological Series

Apply Univariate Long Memory Models, Apply Multivariate Short Memory Models To Hydrological Dataset, Estimate Intensity Duration Frequency curve to rainfall series.

1491

Hydrological Data and Modeling

lulcc

Land Use Change Modelling in R

Classes and methods for spatially explicit land use change modelling in R.

1492

Hydrological Data and Modeling

MBC

Multivariate Bias Correction of Climate Model Outputs

Calibrate and apply multivariate bias correction algorithms for climate model simulations of multiple climate variables. Three methods described by Cannon (2016) <doi:10.1175/JCLID150679.1> and Cannon (2018) <doi:10.1007/s0038201735806> are implemented: (i) MBC Pearson correlation (MBCp), (ii) MBC rank correlation (MBCr), and (iii) MBC Ndimensional PDF transform (MBCn).

1493

Hydrological Data and Modeling

meteo

SpatioTemporal Analysis and Mapping of Meteorological Observations

Spatiotemporal geostatistical mapping of meteorological data. Global spatiotemporal models calculated using publicly available data are stored in package.

1494

Hydrological Data and Modeling

meteoland

Landscape Meteorology Tools

Functions to estimate weather variables at any position of a landscape [De Caceres et al. (2018) <doi:10.1016/j.envsoft.2018.08.003>].

1495

Hydrological Data and Modeling

MODISTools

Interface to the ‘MODIS Land Products Subsets’ Web Services

Programmatic interface to the ‘MODIS Land Products Subsets’ web services (<https://modis.ornl.gov/data/modis_webservice.html>). Allows for easy downloads of ‘MODIS’ time series directly to your R workspace or your computer.

1496

Hydrological Data and Modeling

MODIStsp

A Tool for Automating Download and Preprocessing of MODIS Land Products Data

Allows automating the creation of time series of rasters derived from MODIS Satellite Land Products data. It performs several typical preprocessing steps such as download, mosaicking, reprojection and resize of data acquired on a specified time period. All processing parameters can be set using a userfriendly GUI. Users can select which layers of the original MODIS HDF files they want to process, which additional Quality Indicators should be extracted from aggregated MODIS Quality Assurance layers and, in the case of Surface Reflectance products , which Spectral Indexes should be computed from the original reflectance bands. For each output layer, outputs are saved as singleband raster files corresponding to each available acquisition date. Virtual files allowing access to the entire time series as a single file are also created. Commandline execution exploiting a previously saved processing options file is also possible, allowing to automatically update time series related to a MODIS product whenever a new image is available.

1497

Hydrological Data and Modeling

musica

Multiscale Climate Model Assessment

Provides functions allowing for (1) easy aggregation of multivariate time series into custom time scales, (2) comparison of statistical summaries between different data sets at multiple time scales (e.g. observed and biascorrected data), (3) comparison of relations between variables and/or different data sets at multiple time scales (e.g. correlation of precipitation and temperature in control and scenario simulation) and (4) transformation of time series at custom time scales.

1498

Hydrological Data and Modeling

nhdR

Tools for working with the National Hydrography Dataset

Tools for working with the National Hydrography Dataset, with functions for querying, downloading, and networking both the NHD <https://www.usgs.gov/coresciencesystems/ngp/nationalhydrography> and NHDPlus <http://www.horizonsystems.com/nhdplus> datasets.

1499

Hydrological Data and Modeling

nsRFA

NonSupervised Regional Frequency Analysis

A collection of statistical tools for objective (nonsupervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the indexvalue method and, more precisely, helps the hydrologist to: (1) regionalize the indexvalue; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available.

1500

Hydrological Data and Modeling

qmap

Statistical Transformations for PostProcessing Climate Model Output

Empirical adjustment of the distribution of variables originating from (regional) climate model simulations using quantile mapping.

1501

Hydrological Data and Modeling

rdwd

Select and Download Climate Data from ‘DWD’ (German Weather Service)

Handle climate data from the ‘DWD’ (‘Deutscher Wetterdienst’, see <https://www.dwd.de/EN/climate_environment/cdc/cdc.html> for more information). Choose files with ‘selectDWD()’, download and process data sets with ‘dataDWD()’ and ‘readDWD()’.

1502

Hydrological Data and Modeling

reservoir

Tools for Analysis, Design, and Operation of Water Supply Storages

Measure singlestorage water supply system performance using resilience, reliability, and vulnerability metrics; assess storageyieldreliability relationships; determine nofail storage with sequent peak analysis; optimize release decisions for water supply, hydropower, and multiobjective reservoirs using deterministic and stochastic dynamic programming; generate inflow replicates using parametric and nonparametric models; evaluate inflow persistence using the Hurst coefficient.

1503

Hydrological Data and Modeling

RHMS

Hydrologic Modelling System for R Users

Hydrologic modelling system is an object oriented tool which enables R users to simulate and analyze hydrologic events. The package proposes functions and methods for construction, simulation, visualization, and calibration of hydrologic systems.

1504

Hydrological Data and Modeling

RMAWGEN

MultiSite AutoRegressive Weather GENerator

S3 and S4 functions are implemented for spatial multisite stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental “Gaussianized” time series through the ‘vars’ package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.

1505

Hydrological Data and Modeling

RNCEP

Obtain, Organize, and Visualize NCEP Weather Data

Contains functions to retrieve, organize, and visualize weather data from the NCEP/NCAR Reanalysis (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html) and NCEP/DOE Reanalysis II (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html) datasets. Data are queried via the Internet and may be obtained for a specified spatial and temporal extent or interpolated to a point in space and time. We also provide functions to visualize these weather data on a map. There are also functions to simulate flight trajectories according to specified behavior using either NCEP wind data or data specified by the user.

1506

Hydrological Data and Modeling

rnoaa

‘NOAA’ Weather Data from R

Client for many ‘NOAA’ data sources including the ‘NCDC’ climate ‘API’ at <https://www.ncdc.noaa.gov/cdoweb/webservices/v2>, with functions for each of the ‘API’ ‘endpoints’: data, data categories, data sets, data types, locations, location categories, and stations. In addition, we have an interface for ‘NOAA’ sea ice data, the ‘NOAA’ severe weather inventory, ‘NOAA’ Historical Observing ‘Metadata’ Repository (‘HOMR’) data, ‘NOAA’ storm data via ‘IBTrACS’, tornado data via the ‘NOAA’ storm prediction center, and more.

1507

Hydrological Data and Modeling

rnrfa

UK National River Flow Archive Data from R

Utility functions to retrieve data from the UK National River Flow Archive (<http://nrfa.ceh.ac.uk/>). The package contains R wrappers to the UK NRFA data temporaryAPI. There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information.

1508

Hydrological Data and Modeling

rpdo

Pacific Decadal Oscillation Index Data

Monthly Pacific Decadal Oscillation (PDO) index values from January 1900 to present.

1509

Hydrological Data and Modeling

RSAlgaeR

Builds Empirical Remote Sensing Models of Water Quality Variables and Analyzes LongTerm Trends

Assists in processing reflectance data, developing empirical models using stepwise regression and a generalized linear modeling approach, cross validation, and analysis of trends in water quality conditions (specifically chla) and climate conditions using the TheilSen estimator.

1510

Hydrological Data and Modeling

rsoi

Import Various Northern and Southern Hemisphere Climate Indices

Downloads Southern Oscillation Index, Oceanic Nino Index, North Pacific Gyre Oscillation data, North Atlantic Oscillation and Arctic Oscillation. Data sources are described in the README file.

1511

Hydrological Data and Modeling

rtop

Interpolation of Data with Variable Spatial Support

Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.

1512

Hydrological Data and Modeling

rwunderground

R Interface to Weather Underground API

Tools for getting historical weather information and forecasts from wunderground.com. Historical weather and forecast data includes, but is not limited to, temperature, humidity, windchill, wind speed, dew point, heat index. Additionally, the weather underground weather API also includes information on sunrise/sunset, tidal conditions, satellite/webcam imagery, weather alerts, hurricane alerts and historical high/low temperatures.

1513

Hydrological Data and Modeling

SCI

Standardized Climate Indices Such as SPI, SRI or SPEI

Functions for generating Standardized Climate Indices (SCI). SCI is a transformation of (smoothed) climate (or environmental) time series that removes seasonality and forces the data to take values of the standard normal distribution. SCI was originally developed for precipitation. In this case it is known as the Standardized Precipitation Index (SPI).

1514

Hydrological Data and Modeling

smapr

Acquisition and Processing of NASA Soil Moisture ActivePassive (SMAP) Data

Facilitates programmatic access to NASA Soil Moisture Active Passive (SMAP) data with R. It includes functions to search for, acquire, and extract SMAP data.

1515

Hydrological Data and Modeling

soilwater

Implementation of Parametric Formulas for Soil Water Retention or Conductivity Curve

It implements parametric formulas of soil water retention or conductivity curve. At the moment, only Van Genuchten (for soil water retention curve) and Mualem (for hydraulic conductivity) were implemented. See reference (<http://en.wikipedia.org/wiki/Water_retention_curve>).

1516

Hydrological Data and Modeling

somspace

Spatial Analysis with SelfOrganizing Maps

Application of the SelfOrganizing Maps technique for spatial classification of time series. The package uses spatial data, point or gridded, to create clusters with similar characteristics. The clusters can be further refined to a smaller number of regions by hierarchical clustering and their spatial dependencies can be presented as complex networks. Thus, meaningful maps can be created, representing the regional heterogeneity of a single variable. More information and an example of implementation can be found in Markonis and Strnad (2019).

1517

Hydrological Data and Modeling

SPEI

Calculation of the Standardised PrecipitationEvapotranspiration Index

A set of functions for computing potential evapotranspiration and several widely used drought indices including the Standardized PrecipitationEvapotranspiration Index (SPEI).

1518

Hydrological Data and Modeling

streamDepletr

Estimate Streamflow Depletion Due to Groundwater Pumping

Implementation of analytical models for estimating streamflow depletion due to groundwater pumping, and other related tools. Functions are broadly split into two groups: (1) analytical streamflow depletion models, which estimate streamflow depletion for a single stream reach resulting from groundwater pumping; and (2) depletion apportionment equations, which distribute estimated streamflow depletion among multiple stream reaches within a stream network. See Zipper et al. (2018) <doi:10.1029/2018WR022707> for more information on depletion apportionment equations and Zipper et al. (2019) <doi:10.31223/osf.io/uqbd7> for more information on analytical depletion functions, which combine analytical models and depletion apportionment equations.

1519

Hydrological Data and Modeling

SWATmodel

A multiOS implementation of the TAMU SWAT model

The Soil and Water Assessment Tool is a river basin or watershed scale model developed by Dr. Jeff Arnold for the USDAARS.

1520

Hydrological Data and Modeling

swmmr

R Interface for US EPA’s SWMM

Functions to connect the widely used Storm Water Management Model (SWMM) of the United States Environmental Protection Agency (US EPA) <https://www.epa.gov/waterresearch/stormwatermanagementmodelswmm> to R with currently two main goals: (1) Run a SWMM simulation from R and (2) provide fast access to simulation results, i.e. SWMM’s binary ‘.out’files. High performance is achieved with help of Rcpp. Additionally, reading SWMM’s ‘.inp’ and ‘.rpt’ files is supported to glance model structures and to get direct access to simulation summaries.

1521

Hydrological Data and Modeling

tidyhydat

Extract and Tidy Canadian ‘Hydrometric’ Data

Provides functions to access historical and realtime national ‘hydrometric’ data from Water Survey of Canada data sources (<http://dd.weather.gc.ca/hydrometric/csv/> and <http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/>) and then applies tidy data principles.

1522

Hydrological Data and Modeling

topmodel

Implementation of the Hydrological Model TOPMODEL in R

Set of hydrological functions including an R implementation of the hydrological model TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. From version 0.7.0, the package is put into maintenance mode.

1523

Hydrological Data and Modeling

TUWmodel

Lumped/SemiDistributed Hydrological Model for Education Purposes

The model, developed at the Vienna University of Technology, is a lumped conceptual rainfallrunoff model, following the structure of the HBV model. The model can also be run in a semidistributed fashion. The model runs on a daily or shorter time step and consists of a snow routine, a soil moisture routine and a flow routing routine. See Parajka, J., R. Merz, G. Bloeschl (2007) <doi:10.1002/hyp.6253> Uncertainty and multiple objective calibration in regional water balance modelling: case study in 320 Austrian catchments, Hydrological Processes, 21, 435446.

1524

Hydrological Data and Modeling

washdata

Urban Water and Sanitation Survey Dataset

Urban water and sanitation survey dataset collected by Water and Sanitation for the Urban Poor (WSUP) with technical support from Valid International. These citywide surveys have been collecting data allowing water and sanitation service levels across the entire city to be characterised, while also allowing more detailed data to be collected in areas of the city of particular interest. These surveys are intended to generate useful information for others working in the water and sanitation sector. Current release version includes datasets collected from a survey conducted in Dhaka, Bangladesh in March 2017. This survey in Dhaka is one of a series of surveys to be conducted by WSUP in various cities in which they operate including Accra, Ghana; Nakuru, Kenya; Antananarivo, Madagascar; Maputo, Mozambique; and, Lusaka, Zambia. This package will be updated once the surveys in other cities are completed and datasets have been made available.

1525

Hydrological Data and Modeling

wasim

Visualisation and analysis of output files of the hydrological model WASIM

Helpful tools for data processing and visualisation of results of the hydrological model WASIMETH.

1526

Hydrological Data and Modeling

water

Actual Evapotranspiration with Energy Balance Models

Tools and functions to calculate actual Evapotranspiration using surface energy balance models.

1527

Hydrological Data and Modeling

waterData

Retrieval, Analysis, and Anomaly Calculation of Daily Hydrologic Time Series Data

Imports U.S. Geological Survey (USGS) daily hydrologic data from USGS web services (see <https://waterservices.usgs.gov/> for more information), plots the data, addresses some common data problems, and calculates and plots anomalies.

1528

Hydrological Data and Modeling

WaterML

Fetch and Analyze Data from ‘WaterML’ and ‘WaterOneFlow’ Web Services

Lets you connect to any of the Consortium of Universities for the Advancement of Hydrologic Sciences, Inc. (‘CUAHSI’) Water Data Center ‘WaterOneFlow’ web services and read any ‘WaterML’ hydrological time series data file. To see list of available web services, see <http://hiscentral.cuahsi.org>. All versions of ‘WaterML’ (1.0, 1.1 and 2.0) and both types of the web service protocol (‘SOAP’ and ‘REST’) are supported. The package has six data download functions: GetServices(): show all public web services from the HIS Central Catalog. HISCentral_GetSites() and HISCentral_GetSeriesCatalog(): search for sites or time series from the HIS Central catalog based on geographic bounding box, server, or keyword. GetVariables(): Show a data.frame with all variables on the server. GetSites(): Show a data.frame with all sites on the server. GetSiteInfo(): Show what variables, methods and quality control levels are available at the specific site. GetValues(): Given a site code, variable code, start time and end time, fetch a data.frame of all the observation time series data values. The GetValues() function can also parse ‘WaterML’ data from a custom URL or from a local file. The package also has five data upload functions: AddSites(), AddVariables(), AddMethods(), AddSources(), and AddValues(). These functions can be used for uploading data to a ‘HydroServer Lite’ Observations Data Model (‘ODM’) database via the ‘JSON’ data upload web service interface.

1529

Hydrological Data and Modeling

Watersheds

Spatial Watershed Aggregation and Spatial Drainage Network Analysis

Methods for watersheds aggregation and spatial drainage network analysis.

1530

Hydrological Data and Modeling

weathercan

Download Weather Data from the Environment and Climate Change Canada Website

Provides means for downloading historical weather data from the Environment and Climate Change Canada website (<http://climate.weather.gc.ca/historical_data/search_historic_data_e.html>). Data can be downloaded from multiple stations and over large date ranges and automatically processed into a single dataset. Tools are also provided to identify stations either by name or proximity to a location.

1531

Hydrological Data and Modeling

worldmet

Import Surface Meteorological Data from NOAA Integrated Surface Database (ISD)

Functions to import data from more than 30,000 surface meteorological sites around the world managed by the National Oceanic and Atmospheric Administration (NOAA) Integrated Surface Database (ISD, see <https://www.ncdc.noaa.gov/isd>).

1532

Hydrological Data and Modeling

wql

Exploring Water Quality Monitoring Data

Functions to assist in the processing and exploration of data from environmental monitoring programs. The package name stands for “water quality” and reflects the original focus on time series data for physical and chemical properties of water, as well as the biota. Intended for programs that sample approximately monthly, quarterly or annually at discrete stations, a feature of many legacy data sets. Most of the functions should be useful for analysis of similarfrequency time series regardless of the subject matter.

1533

Hydrological Data and Modeling

WRSS

Water Resources System Simulator

Water resources system simulator is a tool for simulation and analysis of largescale water resources systems. ‘WRSS’ proposes functions and methods for construction, simulation and analysis of primary storage and hydropower water resources features (e.g. reservoirs, aquifers, and etc.) based on Standard Operating Policy (SOP).

1534

Hydrological Data and Modeling

WRTDStidal

Weighted Regression for Water Quality Evaluation in Tidal Waters

An adaptation for estuaries (tidal waters) of weighted regression on time, discharge, and season to evaluate trends in water quality time series.

1535

Machine Learning & Statistical Learning

ahaz

Regularization for semiparametric additive hazards regression

Computationally efficient procedures for regularized estimation with the semiparametric additive hazards regression model.

1536

Machine Learning & Statistical Learning

arules

Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat.

1537

Machine Learning & Statistical Learning

BART

Bayesian Additive Regression Trees

Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and timetoevent outcomes. For more information on BART, see Chipman, George and McCulloch (2010) <doi:10.1214/09AOAS285> and Sparapani, Logan, McCulloch and Laud (2016) <doi:10.1002/sim.6893>.

1538

Machine Learning & Statistical Learning

bartMachine

Bayesian Additive Regression Trees

An advanced implementation of Bayesian Additive Regression Trees with expanded features for data analysis and visualization.

1539

Machine Learning & Statistical Learning

BayesTree

Bayesian Additive Regression Trees

This is an implementation of BART:Bayesian Additive Regression Trees, by Chipman, George, McCulloch (2010).

1540

Machine Learning & Statistical Learning

BDgraph

Bayesian Structure Learning in Graphical Models using BirthDeath MCMC

Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14BA889>, Mohammadi and Wit (2019) <doi:10.18637/jss.v089.i03>.

1541

Machine Learning & Statistical Learning

biglasso

Extending Lasso Model Fitting to Big Data

Extend lasso and elasticnet model fitting for ultrahighdimensional, multigigabyte data sets that cannot be loaded into memory. It’s much more memory and computationefficient as compared to existing lassofitting packages like ‘glmnet’ and ‘ncvreg’, thus allowing for very powerful big data analysis even with an ordinary laptop.

1542

Machine Learning & Statistical Learning

bmrm

Bundle Methods for Regularized Risk Minimization Package

Bundle methods for minimization of convex and nonconvex risk under L1 or L2 regularization. Implements the algorithm proposed by Teo et al. (JMLR 2010) as well as the extension proposed by Do and Artieres (JMLR 2012). The package comes with lot of loss functions for machine learning which make it powerful for big data analysis. The applications includes: structured prediction, linear SVM, multiclass SVM, fbeta optimization, ROC optimization, ordinal regression, quantile regression, epsilon insensitive regression, least mean square, logistic regression, least absolute deviation regression (see package examples), etc… all with L1 and L2 regularization.

1543

Machine Learning & Statistical Learning

Boruta

Wrapper Algorithm for All Relevant Feature Selection

An all relevant feature selection wrapper algorithm. It finds relevant features by comparing original attributes’ importance with importance achievable at random, estimated using their permuted copies (shadows).

1544

Machine Learning & Statistical Learning

bst

Gradient Boosting

Functional gradient descent algorithm for a variety of convex and nonconvex loss functions, for both classical and robust regression and classification problems. See Wang (2011) <doi:10.2202/15574679.1304>, Wang (2012) <doi:10.3414/ME11020020>, Wang (2018) <doi:10.1080/10618600.2018.1424635>, Wang (2018) <doi:10.1214/18EJS1404>.

1545

Machine Learning & Statistical Learning

C50

C5.0 Decision Trees and RuleBased Models

C5.0 decision trees and rulebased models for pattern recognition that extend the work of Quinlan (1993, ISBN:1558602380).

1546

Machine Learning & Statistical Learning

caret

Classification and Regression Training

Misc functions for training and plotting classification and regression models.

1547

Machine Learning & Statistical Learning

CORElearn

Classification, Regression and Feature Evaluation

A suite of machine learning algorithms written in C++ with the R interface contains several learning techniques for classification and regression. Predictive models include e.g., classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. All predictions obtained with these models can be explained and visualized with the ‘ExplainPrediction’ package. This package is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, and DKM. These methods can be used for feature selection or discretization of numeric attributes. The OrdEval algorithm and its visualization is used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction. Several algorithms support parallel multithreaded execution via OpenMP. The toplevel documentation is reachable through ?CORElearn.

1548

Machine Learning & Statistical Learning

CoxBoost

Cox models by likelihood based boosting for a single survival endpoint or competing risks

This package provides routines for fitting Cox models by likelihood based boosting for a single endpoint or in presence of competing risks

1549

Machine Learning & Statistical Learning

Cubist

Rule And InstanceBased Regression Modeling

Regression modeling using rules with added instancebased corrections.

1550

Machine Learning & Statistical Learning

deepnet

deep learning toolkit in R

Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.

1551

Machine Learning & Statistical Learning

e1071 (core)

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

1552

Machine Learning & Statistical Learning

earth

Multivariate Adaptive Regression Splines

Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines” <doi:10.1214/aos/1176347963>. (The term “MARS” is trademarked and thus not used in the name of the package.)

1553

Machine Learning & Statistical Learning

effects

Effect Displays for Linear, Generalized Linear, and Other Models

Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.

1554

Machine Learning & Statistical Learning

elasticnet

ElasticNet for Sparse Estimation and Sparse PCA

Provides functions for fitting the entire solution path of the ElasticNet and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 200510.

1555

Machine Learning & Statistical Learning

ElemStatLearn

Data Sets, Functions and Examples from the Book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman

Useful when reading the book above mentioned, in the documentation referred to as ‘the book’.

1556

Machine Learning & Statistical Learning

evclass

Evidential DistanceBased Classification

Different evidential distancebased classifiers, which provide outputs in the form of DempsterShafer mass functions. The methods are: the evidential Knearest neighbor rule and the evidential neural network.

1557

Machine Learning & Statistical Learning

evtree

Evolutionary Learning of Globally Optimal Trees

Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. The ‘evtree’ package implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. CPU and memoryintensive tasks are fully computed in C++ while the ‘partykit’ package is leveraged to represent the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions.

1558

Machine Learning & Statistical Learning

frbs

Fuzzy RuleBased Systems for Classification and Regression Tasks

An implementation of various learning algorithms based on fuzzy rulebased systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IFTHEN rules, to handle reallife problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IFTHEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neurofuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named ‘frbsPMML’, which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XMLbased language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from ‘frbsPMML’. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.

1559

Machine Learning & Statistical Learning

GAMBoost

Generalized linear and additive models by likelihood based boosting

This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting, using penalized Bsplines

1560

Machine Learning & Statistical Learning

gamboostLSS

Boosting Methods for ‘GAMLSS’

Boosting models for fitting generalized additive models for location, shape and scale (‘GAMLSS’) to potentially high dimensional data.

1561

Machine Learning & Statistical Learning

gbm (core)

Generalized Boosted Regression Models

An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, tdistribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). Originally developed by Greg Ridgeway.

1562

Machine Learning & Statistical Learning

ggRandomForests

Visually Exploring Random Forests

Graphic elements for exploring Random Forests using the ‘randomForest’ or ‘randomForestSRC’ package for survival, regression and classification forests and ‘ggplot2’ package plotting.

1563

Machine Learning & Statistical Learning

glmnet

Lasso and ElasticNet Regularized Generalized Linear Models

Extremely efficient procedures for fitting the entire lasso or elasticnet regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model. Two recent additions are the multipleresponse Gaussian, and the grouped multinomial regression. The algorithm uses cyclical coordinate descent in a pathwise fashion, as described in the paper linked to via the URL below.

1564

Machine Learning & Statistical Learning

glmpath

L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model

A pathfollowing algorithm for L1 regularized generalized linear models and Cox proportional hazards model.

1565

Machine Learning & Statistical Learning

GMMBoost

Likelihoodbased Boosting for Generalized mixed models

Likelihoodbased Boosting for Generalized mixed models

1566

Machine Learning & Statistical Learning

gradDescent

Gradient Descent for Regression Tasks

An implementation of various learning algorithms based on Gradient Descent for dealing with regression tasks. The variants of gradient descent algorithm are : MiniBatch Gradient Descent (MBGD), which is an optimization to use training data partially to reduce the computation load. Stochastic Gradient Descent (SGD), which is an optimization to use a random data in learning to reduce the computation load drastically. Stochastic Average Gradient (SAG), which is a SGDbased algorithm to minimize stochastic step to average. Momentum Gradient Descent (MGD), which is an optimization to speedup gradient descent learning. Accelerated Gradient Descent (AGD), which is an optimization to accelerate gradient descent learning. Adagrad, which is a gradientdescentbased algorithm that accumulate previous cost to do adaptive learning. Adadelta, which is a gradientdescentbased algorithm that use hessian approximation to do adaptive learning. RMSprop, which is a gradientdescentbased algorithm that combine Adagrad and Adadelta adaptive learning ability. Adam, which is a gradientdescentbased algorithm that mean and variance moment to do adaptive learning. Stochastic Variance Reduce Gradient (SVRG), which is an optimization SGDbased algorithm to accelerates the process toward converging by reducing the gradient. Semi Stochastic Gradient Descent (SSGD),which is a SGDbased algorithm that combine GD and SGD to accelerates the process toward converging by choosing one of the gradients at a time. Stochastic Recursive Gradient Algorithm (SARAH), which is an optimization algorithm similarly SVRG to accelerates the process toward converging by accumulated stochastic information. Stochastic Recursive Gradient Algorithm+ (SARAHPlus), which is a SARAH practical variant algorithm to accelerates the process toward converging provides a possibility of earlier termination.

1567

Machine Learning & Statistical Learning

grf

Generalized Random Forests (Beta)

A pluggable package for forestbased statistical estimation and inference. GRF currently provides methods for nonparametric leastsquares regression, quantile regression, and treatment effect estimation (optionally using instrumental variables). This package is currently in beta, and we expect to make continual improvements to its performance and usability.

1568

Machine Learning & Statistical Learning

grplasso

Fitting UserSpecified Models with Group Lasso Penalty

Fits userspecified (GLM) models with group lasso penalty.

1569

Machine Learning & Statistical Learning

grpreg

Regularization Paths for Regression Models with Grouped Covariates

Efficient algorithms for fitting the regularization path of linear regression, GLM, and Cox regression models with grouped penalties. This includes group selection methods such as group lasso, group MCP, and group SCAD as well as bilevel selection methods such as the group exponential lasso, the composite MCP, and the group bridge.

1570

Machine Learning & Statistical Learning

h2o

R Interface for ‘H2O’

R interface for ‘H2O’, the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, KMeans, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML).

1571

Machine Learning & Statistical Learning

hda

Heteroscedastic Discriminant Analysis

Functions to perform dimensionality reduction for classification if the covariance matrices of the classes are unequal.

1572

Machine Learning & Statistical Learning

hdi

HighDimensional Inference

Implementation of multiple approaches to perform inference in highdimensional models.

1573

Machine Learning & Statistical Learning

hdm

HighDimensional Metrics

Implementation of selected highdimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various lowdimensional causal/ structural parameters are provided which appear in highdimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with nonGaussian errors and for instrumental variable (IV) and treatment effect estimation in a highdimensional setting. Moreover, the methods enable valid postselection inference and rely on a theoretically grounded, datadriven choice of the penalty. Chernozhukov, Hansen, Spindler (2016) <arXiv:1603.01700>.

1574

Machine Learning & Statistical Learning

ICEbox

Individual Conditional Expectation Plot Toolbox

Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. ICE plots refine Friedman’s partial dependence plot by graphing the functional relationship between the predicted response and a covariate of interest for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate of interest, suggesting where and to what extent they may exist.

1575

Machine Learning & Statistical Learning

ipred

Improved Predictors

Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.

1576

Machine Learning & Statistical Learning

kernlab (core)

KernelBased Machine Learning Lab

Kernelbased machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

1577

Machine Learning & Statistical Learning

klaR

Classification and Visualization

Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kerneldensity naive Bayes, an interface to ‘svmlight’ and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.

1578

Machine Learning & Statistical Learning

lars

Least Angle Regression, Lasso and Forward Stagewise

Efficient procedures for fitting an entire lasso sequence with the cost of a single least squares fit. Least angle regression and infinitesimal forward stagewise regression are related to the lasso, as described in the paper below.

1579

Machine Learning & Statistical Learning

lasso2

L1 Constrained Estimation aka ‘lasso’

Routines and documentation for solving regression problems while imposing an L1 constraint on the estimates, based on the algorithm of Osborne et al. (1998).

1580

Machine Learning & Statistical Learning

LiblineaR

Linear Predictive Models Based on the ‘LIBLINEAR’ C/C++ Library

A wrapper around the ‘LIBLINEAR’ C/C++ library for machine learning (available at <http://www.csie.ntu.edu.tw/~cjlin/liblinear>). ‘LIBLINEAR’ is a simple library for solving largescale regularized linear classification and regression. It currently supports L2regularized classification (such as logistic regression, L2loss linear SVM and L1loss linear SVM) as well as L1regularized classification (such as L2loss linear SVM and logistic regression) and L2regularized support vector regression (with L1 or L2loss). The main features of LiblineaR include multiclass classification (onevsthe rest, and Crammer & Singer method), cross validation for model selection, probability estimates (logistic regression only) or weights for unbalanced data. The estimation of the models is particularly fast as compared to other libraries.

1581

Machine Learning & Statistical Learning

LogicReg

Logic Regression

Routines for fitting Logic Regression models. Logic Regression is described in Ruczinski, Kooperberg, and LeBlanc (2003) <doi:10.1198/1061860032238>. Monte Carlo Logic Regression is described in and Kooperberg and Ruczinski (2005) <doi:10.1002/gepi.20042>.

1582

Machine Learning & Statistical Learning

LTRCtrees

Survival Trees to Fit LeftTruncated and RightCensored and IntervalCensored Survival Data

Recursive partition algorithms designed for fitting survival tree with lefttruncated and right censored (LTRC) data, as well as intervalcensored data. The LTRC trees can also be used to fit survival tree with timevarying covariates.

1583

Machine Learning & Statistical Learning

maptree

Mapping, pruning, and graphing tree models

Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees.

1584

Machine Learning & Statistical Learning

mboost (core)

ModelBased Boosting

Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing componentwise (penalised) least squares estimates or regression trees as baselearners for fitting generalized linear, additive and interaction models to potentially highdimensional data.

1585

Machine Learning & Statistical Learning

mlr

Machine Learning in R

Interface to a large number of classification and regression techniques, including machinereadable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, examplespecific costsensitive learning. Generic resampling, including crossvalidation, bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for single and multiobjective problems. Filter and wrapper methods for feature selection. Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling. Most operations can be parallelized.

1586

Machine Learning & Statistical Learning

model4you

Stratified and Personalised Models Based on ModelBased Trees and Forests

Modelbased trees for subgroup analyses in clinical trials and modelbased forests for the estimation and prediction of personalised treatment effects (personalised models). Currently partitioning of linear models, lm(), generalised linear models, glm(), and Weibull models, survreg(), is supported. Advanced plotting functionality is supported for the trees and a test for parameter heterogeneity is provided for the personalised models. For details on modelbased trees for subgroup analyses see Seibold, Zeileis and Hothorn (2016) <doi:10.1515/ijb20150032>; for details on modelbased forests for estimation of individual treatment effects see Seibold, Zeileis and Hothorn (2017) <doi:10.1177/0962280217693034>.

1587

Machine Learning & Statistical Learning

MXM

Feature Selection (Including Multiple Solutions) and Bayesian Networks

Many feature selection methods for a wide range of response variables, including minimal, statisticallyequivalent and equallypredictive feature subsets. Bayesian network algorithms and related functions are also included. The package name ‘MXM’ stands for “Mens eX Machina”, meaning “Mind from the Machine” in Latin. References: a) Lagani, V. and Athineou, G. and Farcomeni, A. and Tsagris, M. and Tsamardinos, I. (2017). Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets. Journal of Statistical Software, 80(7). <doi:10.18637/jss.v080.i07>. b) Tsagris, M., Lagani, V. and Tsamardinos, I. (2018). Feature selection for highdimensional temporal data. BMC Bioinformatics, 19:17. <doi:10.1186/s1285901820237>. c) Tsagris, M., Borboudakis, G., Lagani, V. and Tsamardinos, I. (2018). Constraintbased causal discovery with mixed data. International Journal of Data Science and Analytics, 6(1): 1930. <doi:10.1007/s410600180097y>. d) Tsagris, M., Papadovasilakis, Z., Lakiotaki, K. and Tsamardinos, I. (2018). Efficient feature selection on gene expression data: Which algorithm to use? BioRxiv. <doi:10.1101/431734>. e) Tsagris, M. (2019). Bayesian Network Learning with the PC Algorithm: An Improved and Correct Variation. Applied Artificial Intelligence, 33(2):101123. <doi:10.1080/08839514.2018.1526760>. f) Borboudakis, G. and Tsamardinos, I. (2019). ForwardBackward Selection with Early Dropping. Journal of Machine Learning Research 20: 139.

1588

Machine Learning & Statistical Learning

naivebayes

High Performance Implementation of the Naive Bayes Algorithm

In this implementation of the Naive Bayes classifier following class conditional distributions are available: Bernoulli, Categorical, Gaussian, Poisson and nonparametric representation of the class conditional density estimated via Kernel Density Estimation.

1589

Machine Learning & Statistical Learning

ncvreg

Regularization Paths for SCAD and MCP Penalized Regression Models

Fits regularization paths for linear regression, GLM, and Cox regression models using lasso or nonconvex penalties, in particular the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty, with options for additional L2 penalties (the “elastic net” idea). Utilities for carrying out crossvalidation as well as postfitting visualization, summarization, inference, and prediction are also provided.

1590

Machine Learning & Statistical Learning

nnet (core)

FeedForward Neural Networks and Multinomial LogLinear Models

Software for feedforward neural networks with a single hidden layer, and for multinomial loglinear models.

1591

Machine Learning & Statistical Learning

oem

Orthogonalizing EM: Penalized Regression for Big Tall Data

Solves penalized least squares problems for big tall data using the orthogonalizing EM algorithm of Xiong et al. (2016) <doi:10.1080/00401706.2015.1054436>. The main fitting function is oem() and the functions cv.oem() and xval.oem() are for cross validation, the latter being an accelerated cross validation function for linear models. The big.oem() function allows for out of memory fitting.

1592

Machine Learning & Statistical Learning

OneR

One Rule Machine Learning Classification Algorithm with Enhancements

Implements the One Rule (OneR) Machine Learning classification algorithm (Holte, R.C. (1993) <doi:10.1023/A:1022631118932>) with enhancements for sophisticated handling of numeric data and missing values together with extensive diagnostic functions. It is useful as a baseline for machine learning models and the rules are often helpful heuristics.

1593

Machine Learning & Statistical Learning

opusminer

OPUS Miner Algorithm for Filtered Topk Association Discovery

Provides a simple R interface to the OPUS Miner algorithm (implemented in C++) for finding the topk productive, nonredundant itemsets from transaction data. The OPUS Miner algorithm uses the OPUS search algorithm to efficiently discover the key associations in transaction data, in the form of selfsufficient itemsets, using either leverage or lift. See <http://i.giwebb.com/index.php/research/associationdiscovery/> for more information in relation to the OPUS Miner algorithm.

1594

Machine Learning & Statistical Learning

pamr

Pam: Prediction Analysis for Microarrays

Some functions for sample classification in microarrays.

1595

Machine Learning & Statistical Learning

party

A Laboratory for Recursive Partytioning

A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed treestructured regression models into a well defined theory of conditional inference procedures. This nonparametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing treestructured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/14712105825>.

1596

Machine Learning & Statistical Learning

partykit

A Toolkit for Recursive Partytioning

A toolkit with infrastructure for representing, summarizing, and visualizing treestructured regression and classification models. This unified infrastructure can be used for reading/coercing tree models from different sources (‘rpart’, ‘RWeka’, ‘PMML’) yielding objects that share functionality for print()/plot()/predict() methods. Furthermore, new and improved reimplementations of conditional inference trees (ctree()) and modelbased recursive partitioning (mob()) from the ‘party’ package are provided based on the new infrastructure. A description of this package was published by Hothorn and Zeileis (2015) <http://jmlr.org/papers/v16/hothorn15a.html>.

1597

Machine Learning & Statistical Learning

pdp

Partial Dependence Plots

A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.

1598

Machine Learning & Statistical Learning

penalized

L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model

Fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Crossvalidation routines allow optimization of the tuning parameters.

1599

Machine Learning & Statistical Learning

penalizedLDA

Penalized Classification using Fisher’s Linear Discriminant

Implements the penalized LDA proposal of “Witten and Tibshirani (2011), Penalized classification using Fisher’s linear discriminant, to appear in Journal of the Royal Statistical Society, Series B”.

1600

Machine Learning & Statistical Learning

picasso

Pathwise Calibrated Sparse Shooting Algorithm

Computationally efficient tools for fitting generalized linear model with convex or nonconvex penalty. Users can enjoy the superior statistical property of nonconvex penalty such as SCAD and MCP which has significantly less estimation error and overfitting compared to convex penalty such as lasso and ridge. Computation is handled by multistage convex relaxation and the PathwIse CAlibrated Sparse Shooting algOrithm (PICASSO) which exploits warm start initialization, active set updating, and strong rule for coordinate preselection to boost computation, and attains a linear convergence to a unique sparse local optimum with optimal statistical properties. The computation is memoryoptimized using the sparse matrix output.

1601

Machine Learning & Statistical Learning

plotmo

Plot a Model’s Residuals, Response, and Partial Dependence Plots

Plot model surfaces for a wide variety of models using partial dependence plots and other techniques. Also plot model residuals and other information on the model.

1602

Machine Learning & Statistical Learning

quantregForest

Quantile Regression Forests

Quantile Regression Forests is a treebased ensemble method for estimation of conditional quantiles. It is particularly well suited for highdimensional data. Predictor variables of mixed classes can be handled. The package is dependent on the package ‘randomForest’, written by Andy Liaw.

1603

Machine Learning & Statistical Learning

randomForest (core)

Breiman and Cutler’s Random Forests for Classification and Regression

Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <doi:10.1023/A:1010933404324>.

1604

Machine Learning & Statistical Learning

randomForestSRC

Fast Unified Random Forests for Survival, Regression, and Classification (RFSRC)

Fast OpenMP parallel computing of Breiman’s random forests for survival, competing risks, regression and classification based on Ishwaran and Kogalur’s popular random survival forests (RSF) package. Handles missing data and now includes multivariate, unsupervised forests, quantile regression and solutions for class imbalanced data. New fast interface using subsampling and confidence regions for variable importance.

1605

Machine Learning & Statistical Learning

ranger

A Fast Implementation of Random Forests

A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genomewide association studies can be analyzed efficiently. In addition to data frames, datasets of class ‘gwaa.data’ (R package ‘GenABEL’) and ‘dgCMatrix’ (R package ‘Matrix’) can be directly analyzed.

1606

Machine Learning & Statistical Learning

rattle

Graphical User Interface for Data Science in R

The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copyandpaste directly into R itself.

1607

Machine Learning & Statistical Learning

Rborist

Extensible, Parallelizable Implementation of the Random Forest Algorithm

Scalable implementation of classification and regression forests, as described by Breiman (2001), <doi:10.1023/A:1010933404324>.

1608

Machine Learning & Statistical Learning

RcppDL

Deep Learning Methods via Rcpp

This package is based on the C++ code from Yusuke Sugomori, which implements basic machine learning methods with many layers (deep learning), including dA (Denoising Autoencoder), SdA (Stacked Denoising Autoencoder), RBM (Restricted Boltzmann machine) and DBN (Deep Belief Nets).

1609

Machine Learning & Statistical Learning

rdetools

Relevant Dimension Estimation (RDE) in Feature Spaces

The package provides functions for estimating the relevant dimension of a data set in feature spaces, applications to model selection, graphical illustrations and prediction.

1610

Machine Learning & Statistical Learning

REEMtree

Regression Trees with Random Effects for Longitudinal (Panel) Data

This package estimates regression trees with random effects as a way to use data mining techniques to describe longitudinal or panel data.

1611

Machine Learning & Statistical Learning

relaxo

Relaxed Lasso

Relaxed Lasso is a generalisation of the Lasso shrinkage technique for linear regression. Both variable selection and parameter estimation is achieved by regular Lasso, yet both steps do not necessarily use the same penalty parameter. The results include all standard Lasso solutions but allow often for sparser models while having similar or even slightly better predictive performance if many predictor variables are present. The package depends on the LARS package.

1612

Machine Learning & Statistical Learning

rgenoud

R Version of GENetic Optimization Using Derivatives

A genetic algorithm plus derivative optimizer.

1613

Machine Learning & Statistical Learning

RGF

Regularized Greedy Forest

Regularized Greedy Forest wrapper of the ‘Regularized Greedy Forest’ <https://github.com/RGFteam/rgf/tree/master/pythonpackage> ‘python’ package, which also includes a Multicore implementation (FastRGF) <https://github.com/RGFteam/rgf/tree/master/FastRGF>.

1614

Machine Learning & Statistical Learning

RLT

Reinforcement Learning Trees

Random forest with a variety of additional features for regression, classification and survival analysis. The features include: parallel computing with OpenMP, embedded model for selecting the splitting variable (based on Zhu, Zeng & Kosorok, 2015), subject weight, variable weight, tracking subjects used in each tree, etc.

1615

Machine Learning & Statistical Learning

Rmalschains

Continuous Optimization using Memetic Algorithms with Local Search Chains (MALSChains) in R

An implementation of an algorithm family for continuous optimization called memetic algorithms with local search chains (MALSChains). Memetic algorithms are hybridizations of genetic algorithms with local search methods. They are especially suited for continuous optimization.

1616

Machine Learning & Statistical Learning

rminer

Data Mining Classification and Regression Methods

Facilitates the use of data mining algorithms in classification and regression (including time series forecasting) tasks by presenting a short and coherent set of functions. Versions: 1.4.2 new NMAE metric, “xgboost” and “cv.glmnet” models (16 classification and 18 regression models); 1.4.1 new tutorial and more robust version; 1.4  new classification and regression models/algorithms, with a total of 14 classification and 15 regression methods, including: Decision Trees, Neural Networks, Support Vector Machines, Random Forests, Bagging and Boosting; 1.3 and 1.3.1  new classification and regression metrics (improved mmetric function); 1.2  new input importance methods (improved Importance function); 1.0  first version.

1617

Machine Learning & Statistical Learning

ROCR

Visualizing the Performance of Scoring Classifiers

ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of tradeoff visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoffparameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different crossvalidation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.

1618

Machine Learning & Statistical Learning

RoughSets

Data Analysis Using Rough Set and Fuzzy Rough Set Theories

Implementations of algorithms for data analysis based on the rough set theory (RST) and the fuzzy rough set theory (FRST). We not only provide implementations for the basic concepts of RST and FRST but also popular algorithms that derive from those theories. The methods included in the package can be divided into several categories based on their functionality: discretization, feature selection, instance selection, rule induction and classification based on nearest neighbors. RST was introduced by Zdzisaw Pawlak in 1982 as a sophisticated mathematical tool to model and process imprecise or incomplete information. By using the indiscernibility relation for objects/instances, RST does not require additional parameters to analyze the data. FRST is an extension of RST. The FRST combines concepts of vagueness and indiscernibility that are expressed with fuzzy sets (as proposed by Zadeh, in 1965) and RST.

1619

Machine Learning & Statistical Learning

rpart (core)

Recursive Partitioning and Regression Trees

Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.

1620

Machine Learning & Statistical Learning

RPMM

Recursively Partitioned Mixture Model

Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a modelbased clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.

1621

Machine Learning & Statistical Learning

RSNNS

Neural Networks using the Stuttgart Neural Network Simulator (SNNS)

The Stuttgart Neural Network Simulator (SNNS) is a library containing many standard implementations of neural networks. This package wraps the SNNS functionality to make it available from within R. Using the ‘RSNNS’ lowlevel interface, all of the algorithmic functionality and flexibility of SNNS can be accessed. Furthermore, the package contains a convenient highlevel interface, so that the most common neural network topologies and learning algorithms integrate seamlessly into R.

1622

Machine Learning & Statistical Learning

RWeka

R/Weka Interface

An R interface to Weka (Version 3.9.3). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data preprocessing, classification, regression, clustering, association rules, and visualization. Package ‘RWeka’ contains the interface code, the Weka jar is in a separate package ‘RWekajars’. For more information on Weka see <http://www.cs.waikato.ac.nz/ml/weka/>.

1623

Machine Learning & Statistical Learning

RXshrink

Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression

Identify and display TRACEs for a specified shrinkage path and determine the extent of shrinkage most likely, under normal distribution theory, to produce an optimal reduction in MSE Risk in estimates of regression (beta) coefficients. Alternative estimates are also provided when illconditioned (nearly multicollinear) models yield OLS estimates with “wrong” numerical signs.

1624

Machine Learning & Statistical Learning

sda

Shrinkage Discriminant Analysis and CAT Score Variable Selection

Provides an efficient framework for highdimensional linear and diagonal discriminant analysis with variable selection. The classifier is trained using JamesSteintype shrinkage estimators and predictor variables are ranked using correlationadjusted tscores (CAT scores). Variable selection error is controlled using false nondiscovery rates or higher criticism.

1625

Machine Learning & Statistical Learning

SIS

Sure Independence Screening

Variable selection techniques are essential tools for model selection and estimation in highdimensional statistical models. Through this publicly available package, we provide a unified environment to carry out variable selection using iterative sure independence screening (SIS) and all of its variants in generalized linear models and the Cox proportional hazards model.

1626

Machine Learning & Statistical Learning

ssgraph

Bayesian Graphical Estimation using SpikeandSlab Priors

Bayesian estimation for undirected graphical models using spikeandslab priors. The package handles continuous, discrete, and mixed data. To speed up the computations, the computationally intensive tasks of the package are implemented in C++ in parallel using OpenMP.

1627

Machine Learning & Statistical Learning

stabs

Stability Selection with Error Control

Resampling procedures to assess the stability of selected variables with additional finite sample error control for highdimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.14679868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.14679868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.

1628

Machine Learning & Statistical Learning

SuperLearner

Super Learner Prediction

Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner.

1629

Machine Learning & Statistical Learning

svmpath

The SVM Path Algorithm

Computes the entire regularization path for the twoclass svm classifier with essentially the same cost as a single SVM fit.

1630

Machine Learning & Statistical Learning

tensorflow

R Interface to ‘TensorFlow’

Interface to ‘TensorFlow’ <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

1631

Machine Learning & Statistical Learning

tgp

Bayesian Treed Gaussian Process Models

Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP singleindex models. Provides 1d and 2d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgpclass output. Sensitivity analysis and multiresolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivativefree optimization of noisy blackbox functions.

1632

Machine Learning & Statistical Learning

tree

Classification and Regression Trees

Classification and regression trees.

1633

Machine Learning & Statistical Learning

trtf

Transformation Trees and Forests

Recursive partytioning of transformation models with corresponding random forest for conditional transformation models as described in ‘Transformation Forests’ (Hothorn and Zeileis, 2017, <arXiv:1701.02110>) and ‘TopDown Transformation Choice’ (Hothorn, 2018, <doi:10.1177/1471082X17748081>).

1634

Machine Learning & Statistical Learning

varSelRF

Variable Selection using Random Forests

Variable selection from random forests using both backwards variable elimination (for the selection of small sets of nonredundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highlycorrelated variables). Main applications in highdimensional data (e.g., microarray data, and other genomics and proteomics applications).

1635

Machine Learning & Statistical Learning

vcrpart

TreeBased Varying Coefficient Regression for Generalized Linear and Ordinal Mixed Models

Recursive partitioning for varying coefficient generalized linear models and ordinal linear mixed models. Special features are coefficientwise partitioning, nonvarying coefficients and partitioning of timevarying variables in longitudinal regression.

1636

Machine Learning & Statistical Learning

wsrf

Weighted Subspace Random Forest for Classification

A parallel implementation of Weighted Subspace Random Forest. The Weighted Subspace Random Forest algorithm was proposed in the International Journal of Data Warehousing and Mining by Baoxun Xu, Joshua Zhexue Huang, Graham Williams, Qiang Wang, and Yunming Ye (2012) <doi:10.4018/jdwm.2012040103>. The algorithm can classify very highdimensional data with random forests built using small subspaces. A novel variable weighting method is used for variable subspace selection in place of the traditional random variable sampling.This new approach is particularly useful in building models from highdimensional data.

1637

Machine Learning & Statistical Learning

xgboost

Extreme Gradient Boosting

Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.

1638

Medical Image Analysis

adaptsmoFMRI

Adaptive Smoothing of FMRI Data

This package contains R functions for estimating the blood oxygenation level dependent (BOLD) effect by using functional Magnetic Resonance Imaging (fMRI) data, based on adaptive Gauss Markov random fields, for real as well as simulated data. The implemented simulations make use of efficient Markov Chain Monte Carlo methods.

1639

Medical Image Analysis

adimpro (core)

Adaptive Smoothing of Digital Images

Implements tools for manipulation of digital images and the Propagation Separation approach by Polzehl and Spokoiny (2006) < 