1

Bayesian Inference

abc

Tools for Approximate Bayesian Computation (ABC)

Implements several ABC algorithms for performing parameter estimation, model selection, and goodnessoffit. Crossvalidation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models.

2

Bayesian Inference

abn

Modelling Multivariate Data with Additive Bayesian Networks

Bayesian network analysis is a form of probabilistic graphical models which derives from empirical data a directed acyclic graph, DAG, describing the dependency structure between random variables. An additive Bayesian network model consists of a form of a DAG where each node comprises a generalized linear model, GLM. Additive Bayesian network models are equivalent to Bayesian multivariate regression using graphical modelling, they generalises the usual multivariable regression, GLM, to multiple dependent variables. ‘abn’ provides routines to help determine optimal Bayesian network models for a given data set, where these models are used to identify statistical dependencies in messy, complex data. The additive formulation of these models is equivalent to multivariate generalised linear modelling (including mixed models with iid random effects). The usual term to describe this model selection process is structure discovery. The core functionality is concerned with model selection  determining the most robust empirical model of data from interdependent variables. Laplace approximations are used to estimate goodness of fit metrics and model parameters, and wrappers are also included to the INLA package which can be obtained from http://www.rinla.org. It is recommended the testing version, which can be downloaded by running: source(“http://www.math.ntnu.no/inla/givemeINLAtesting.R”). A comprehensive set of documented case studies, numerical accuracy/quality assurance exercises, and additional documentation are available from the ‘abn’ website.

3

Bayesian Inference

AdMit

Adaptive Mixture of Studentt Distributions

Provides functions to perform the fitting of an adaptive mixture of Studentt distributions to a target density through its kernel function as described in Ardia et al. (2009) doi:10.18637/jss.v029.i03. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the MetropolisHastings algorithm to obtain quantities of interest for the target density itself.

4

Bayesian Inference

arm (core)

Data Analysis Using Regression and Multilevel/Hierarchical Models

Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.

5

Bayesian Inference

AtelieR

A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests

A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (CentralLimit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).

6

Bayesian Inference

BaBooN

Bayesian Bootstrap Predictive Mean Matching  Multiple and Single Imputation for Discrete Data

Included are two variants of Bayesian Bootstrap Predictive Mean Matching to multiply impute missing data. The first variant is a variablebyvariable imputation combining sequential regression and Predictive Mean Matching (PMM) that has been extended for unordered categorical data. The Bayesian Bootstrap allows for generating approximately proper multiple imputations. The second variant is also based on PMM, but the focus is on imputing several variables at the same time. The suggestion is to use this variant, if the missingdata pattern resembles a data fusion situation, or any other missingbydesign pattern, where several variables have identical missingdata patterns. Both variants can be run as ‘single imputation’ versions, in case the analysis objective is of a purely descriptive nature.

7

Bayesian Inference

BACCO (core)

Bayesian Analysis of Computer Code Output (BACCO)

The BACCO bundle of packages is replaced by the BACCO package, which provides a vignette that illustrates the constituent packages (emulator, approximator, calibrator) in use.

8

Bayesian Inference

BaM

Functions and Datasets for Books by Jeff Gill

Functions and datasets for Jeff Gill: “Bayesian Methods: A Social and Behavioral Sciences Approach”. First, Second, and Third Edition. Published by Chapman and Hall/CRC (2002, 2007, 2014).

9

Bayesian Inference

BAS

Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling

Package for Bayesian Variable Selection and Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s gprior or mixtures of gpriors corresponding to the ZellnerSiow Cauchy Priors or the mixture of gpriors from Liang et al (2008) doi:10.1198/016214507000001337 for linear models or mixtures of gpriors in GLMs of Li and Clyde (2015) <arXiv:1503.06913>. Other model selection criteria include AIC, BIC and Empirical Bayes estimates of g. Sampling probabilities may be updated based on the sampled models using Sampling w/out Replacement or an efficient MCMC algorithm samples models using the BAS tree structure as an efficient hash table. Uniform priors over all models or betabinomial prior distributions on model size are allowed, and for large p truncated priors on the model space may be used. The user may force variables to always be included. Details behind the sampling algorithm are provided in Clyde, Ghosh and Littman (2010) doi:10.1198/jcgs.2010.09049. This material is based upon work supported by the National Science Foundation under Grant DMS1106891. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

10

Bayesian Inference

BayesDA

Functions and Datasets for the book “Bayesian Data Analysis”

Functions for Bayesian Data Analysis, with datasets from the book “Bayesian data Analysis (second edition)” by Gelman, Carlin, Stern and Rubin. Not all datasets yet, hopefully completed soon.

11

Bayesian Inference

bayesGARCH

Bayesian Estimation of the GARCH(1,1) Model with Studentt Innovations

Provides the bayesGARCH() function which performs the Bayesian estimation of the GARCH(1,1) model with Student’s t innovations as described in Ardia (2008) doi:10.1007/9783540786573.

12

Bayesian Inference

bayesImageS

Bayesian Methods for Image Segmentation using a Potts Model

Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior. Latent labels are sampled using chequerboard updating or SwendsenWang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, approximate Bayesian computation (ABCMCMC and ABCSMC), and Bayesian indirect likelihood (BIL).

13

Bayesian Inference

bayesm (core)

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

14

Bayesian Inference

bayesmeta

Bayesian RandomEffects MetaAnalysis

A collection of functions allowing to derive the posterior distribution of the two parameters in a randomeffects metaanalysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive pvalues, etc.

15

Bayesian Inference

bayesmix

Bayesian Mixture Models with JAGS

The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.

16

Bayesian Inference

bayesQR

Bayesian Quantile Regression

Bayesian quantile regression using the asymmetric Laplace distribution, both continuous as well as binary dependent variables are supported. The package consists of implementations of the methods of Yu & Moyeed (2001) doi:10.1016/S01677152(01)001249, Benoit & Van den Poel (2012) doi:10.1002/jae.1216 and AlHamzawi, Yu & Benoit (2012) doi:10.1177/1471082X1101200304. To speed up the calculations, the Markov Chain Monte Carlo core of all algorithms is programmed in Fortran and called from R.

17

Bayesian Inference

BayesSummaryStatLM

MCMC Sampling of Bayesian Linear Models via Summary Statistics

Methods for generating Markov Chain Monte Carlo (MCMC) posterior samples of Bayesian linear regression model parameters that require only summary statistics of data as input. Summary statistics are useful for systems with very limited amounts of physical memory. The package provides two functions: one function that computes summary statistics of data and one function that carries out the MCMC posterior sampling for Bayesian linear regression models where summary statistics are used as input. The function read.regress.data.ff utilizes the R package ‘ff’ to handle data sets that are too large to fit into a user’s physical memory, by reading in data in chunks.

18

Bayesian Inference

bayesSurv (core)

Bayesian Survival Regression with Flexible Error and Random Effects Distributions

Contains Bayesian implementations of MixedEffects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only rightcensored but also intervalcensored, doublyintervalcensored or misclassified intervalcensored.

19

Bayesian Inference

Bayesthresh

Bayesian thresholds mixedeffects models for categorical data

This package fits a linear mixed model for ordinal categorical responses using Bayesian inference via Monte Carlo Markov Chains. Default is Nandran & Chen algorithm using Gaussian link function and saving just the summaries of the chains. Among the options, package allow for two other options of algorithms, for using Student’s “t” link function and for saving the full chains.

20

Bayesian Inference

BayesTree

Bayesian Additive Regression Trees

This is an implementation of BART:Bayesian Additive Regression Trees, by Chipman, George, McCulloch (2010).

21

Bayesian Inference

BayesValidate

BayesValidate Package

BayesValidate implements the software validation method described in the paper “Validation of Software for Bayesian Models using Posterior Quantiles” (Cook, Gelman, and Rubin, 2005). It inputs a function to perform Bayesian inference as well as functions to generate data from the Bayesian model being fit, and repeatedly generates and analyzes data to check that the Bayesian inference program works properly.

22

Bayesian Inference

BayesVarSel

Bayes Factors, Model Choice and Variable Selection in Linear Models

Conceived to calculate Bayes factors in linear models and then to provide a formal Bayesian answer to testing and variable selection problems. From a theoretical side, the emphasis in this package is placed on the prior distributions and it allows a wide range of them: Jeffreys (1961); Zellner and Siow(1980)doi:10.1007/bf02888369; Zellner and Siow(1984); Zellner (1986)doi:10.2307/2233941; Fernandez et al. (2001)doi:10.1016/S03044076(00)000762; Liang et al. (2008)doi:10.1198/016214507000001337 and Bayarri et al. (2012)doi:10.1214/12aos1013. The interaction with the package is through a friendly interface that syntactically mimics the wellknown lm() command of R. The resulting objects can be easily explored providing the user very valuable information (like marginal, joint and conditional inclusion probabilities of potential variables; the highest posterior probability model, HPM; the median probability model, MPM) about the structure of the true data generating model. Additionally, this package incorporates abilities to handle problems with a large number of potential explanatory variables through parallel and heuristic versions of the main commands, GarciaDonato and MartinezBeneito (2013)doi:10.1080/01621459.2012.742443.

23

Bayesian Inference

BayesX

R Utilities Accompanying the Software Package BayesX

This package provides functionality for exploring and visualising estimation results obtained with the software package BayesX for structured additive regression. It also provides functions that allow to read, write and manipulate map objects that are required in spatial analyses performed with BayesX, a free software for estimating structured additive regression models (http://www.bayesx.org).

24

Bayesian Inference

BayHaz

R Functions for Bayesian Hazard Rate Estimation

A suite of R functions for Bayesian estimation of smooth hazard rates via Compound Poisson Process (CPP) and Bayesian Penalized Spline (BPS) priors.

25

Bayesian Inference

BAYSTAR

On Bayesian analysis of Threshold autoregressive model (BAYSTAR)

The manuscript introduces the BAYSTAR package, which provides the functionality for Bayesian estimation in autoregressive threshold models.

26

Bayesian Inference

bbemkr

Bayesian bandwidth estimation for multivariate kernel regression with Gaussian error

Bayesian bandwidth estimation for NadarayaWatson type multivariate kernel regression with Gaussian error density

27

Bayesian Inference

BCBCSF

BiasCorrected Bayesian Classification with Selected Features

Fully Bayesian Classification with a subset of highdimensional features, such as expression levels of genes. The data are modeled with a hierarchical Bayesian models using heavytailed t distributions as priors. When a large number of features are available, one may like to select only a subset of features to use, typically those features strongly correlated with the response in training cases. Such a feature selection procedure is however invalid since the relationship between the response and the features has be exaggerated by feature selection. This package provides a way to avoid this bias and yield bettercalibrated predictions for future cases when one uses Fstatistic to select features.

28

Bayesian Inference

BCE

Bayesian composition estimator: estimating sample (taxonomic) composition from biomarker data

Function to estimates taxonomic compositions from biomarker data, using a Bayesian approach.

29

Bayesian Inference

bclust

Bayesian Hierarchical Clustering Using Spike and Slab Models

Builds a dendrogram using log posterior as a natural distance defined by the model and meanwhile waits the clustering variables. It is also capable to computing equivalent Bayesian discrimination probabilities. The adopted method suites small sample large dimension setting. The model parameter estimation maybe difficult, depending on data structure and the chosen distribution family.

30

Bayesian Inference

bcp

Bayesian Analysis of Change Point Problems

Provides an implementation of the Barry and Hartigan (1993) product partition model for the normal errors change point problem using Markov Chain Monte Carlo. It also extends the methodology to regression models on a connected graph (Wang and Emerson, 2015); this allows estimation of change point models with multivariate responses. Parallel MCMC, previously available in bcp v.3.0.0, is currently not implemented.

31

Bayesian Inference

bisoreg

Bayesian Isotonic Regression with Bernstein Polynomials

Provides functions for fitting Bayesian monotonic regression models to data.

32

Bayesian Inference

BLR

Bayesian Linear Regression

Bayesian Linear Regression

33

Bayesian Inference

BMA

Bayesian Model Averaging

Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).

34

Bayesian Inference

Bmix

Bayesian Sampling for StickBreaking Mixtures

This is a barebones implementation of sampling algorithms for a variety of Bayesian stickbreaking (marginally DP) mixture models, including particle learning and Gibbs sampling for static DP mixtures, particle learning for dynamic BAR stickbreaking, and DP mixture regression. The software is designed to be easy to customize to suit different situations and for experimentation with stickbreaking models. Since particles are repeatedly copied, it is not an especially efficient implementation.

35

Bayesian Inference

BMS

Bayesian Model Averaging Library

Bayesian model averaging for linear models with a wide choice of (customizable) priors. Builtin priors include coefficient priors (fixed, flexible and hyperg priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Postprocessing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison.

36

Bayesian Inference

bnlearn

Bayesian Network Structure Learning, Parameter Learning and Inference

Bayesian network structure learning, parameter learning and inference. This package implements constraintbased (GS, IAMB, InterIAMB, FastIAMB, MMPC, HitonPC), pairwise (ARACNE and ChowLiu), scorebased (HillClimbing and Tabu Search) and hybrid (MMHC and RSMAX2) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the TreeAugmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries and crossvalidation. Development snapshots with the latest bugfixes are available from http://www.bnlearn.com.

37

Bayesian Inference

boa (core)

Bayesian Output Analysis Program (BOA) for MCMC

A menudriven program and library of functions for carrying out convergence diagnostics and statistical and graphical analysis of Markov chain Monte Carlo sampling output.

38

Bayesian Inference

Bolstad

Functions for Elementary Bayesian Inference

A set of R functions and data sets for the book Introduction to Bayesian Statistics, Bolstad, W.M. (2017), John Wiley & Sons ISBN 9781118091562.

39

Bayesian Inference

Boom

Bayesian Object Oriented Modeling

A C++ library for Bayesian modeling, with an emphasis on Markov chain Monte Carlo. Although boom contains a few R utilities (mainly plotting functions), its primary purpose is to install the BOOM C++ library on your system so that other packages can link against it.

40

Bayesian Inference

BoomSpikeSlab

MCMC for Spike and Slab Regression

Spike and slab regression a la McCulloch and George (1997).

41

Bayesian Inference

bqtl

Bayesian QTL Mapping Toolkit

QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools.

42

Bayesian Inference

bridgesampling

Bridge Sampling for Marginal Likelihoods and Bayes Factors

Provides functions for estimating marginal likelihoods, Bayes factors, posterior model probabilities, and normalizing constants in general, via different versions of bridge sampling (Meng & Wong, 1996, <http:// www3.stat.sinica.edu.tw/statistica/j6n4/j6n43/j6n43.htm>).

43

Bayesian Inference

bspec

Bayesian Spectral Inference

Bayesian inference on the (discrete) power spectrum of time series.

44

Bayesian Inference

bspmma

bspmma: Bayesian Semiparametric Models for MetaAnalysis

Some functions for nonparametric and semiparametric Bayesian models for random effects metaanalysis

45

Bayesian Inference

BSquare

Bayesian Simultaneous Quantile Regression

This package models the quantile process as a function of predictors.

46

Bayesian Inference

bsts

Bayesian Structural Time Series

Time series regression using dynamic linear models fit using MCMC. See Scott and Varian (2014) doi:10.1504/IJMMNO.2014.059942, among many other sources.

47

Bayesian Inference

BVS

Bayesian Variant Selection: Bayesian Model Uncertainty Techniques for Genetic Association Studies

The functions in this package focus on analyzing casecontrol association studies involving a group of genetic variants. In particular, we are interested in modeling the outcome variable as a function of a multivariate genetic profile using Bayesian model uncertainty and variable selection techniques. The package incorporates functions to analyze data sets involving common variants as well as extensions to model rare variants via the Bayesian Risk Index (BRI) as well as haplotypes. Finally, the package also allows the incorporation of external biological information to inform the marginal inclusion probabilities via the iBMU.

48

Bayesian Inference

catnet

Categorical Bayesian Network Inference

Structure learning and parameter estimation of discrete Bayesian networks using likelihoodbased criteria. Exhaustive search for fixed node orders and stochastic search of optimal orders via simulated annealing algorithm are implemented.

49

Bayesian Inference

coalescentMCMC

MCMC Algorithms for the Coalescent

Flexible framework for coalescent analyses in R. It includes a main function running the MCMC algorithm, auxiliary functions for tree rearrangement, and some functions to compute population genetic parameters.

50

Bayesian Inference

coda (core)

Output Analysis and Diagnostics for MCMC

Provides functions for summarizing and plotting the output from Markov Chain Monte Carlo (MCMC) simulations, as well as diagnostic tests of convergence to the equilibrium distribution of the Markov chain.

51

Bayesian Inference

cudaBayesreg

CUDA Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

Compute Unified Device Architecture (CUDA) is a software platform for massively parallel highperformance computing on NVIDIA GPUs. This package provides a CUDA implementation of a Bayesian multilevel model for the analysis of brain fMRI data. A fMRI data set consists of time series of volume data in 4D space. Typically, volumes are collected as slices of 64 x 64 voxels. Analysis of fMRI data often relies on fitting linear regression models at each voxel of the brain. The volume of the data to be processed, and the type of statistical analysis to perform in fMRI analysis, call for highperformance computing strategies. In this package, the CUDA programming model uses a separate thread for fitting a linear regression model at each voxel in parallel. The global statistical model implements a Gibbs Sampler for hierarchical linear models with a normal prior. This model has been proposed by Rossi, Allenby and McCulloch in ‘Bayesian Statistics and Marketing’, Chapter 3, and is referred to as ‘rhierLinearModel’ in the Rpackage bayesm. A notebook equipped with a NVIDIA ‘GeForce 8400M GS’ card having Compute Capability 1.1 has been used in the tests. The data sets used in the package’s examples are available in the separate package cudaBayesregData.

52

Bayesian Inference

dclone

Data Cloning and MCMC Tools for Maximum Likelihood Methods

Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods. Sequential and parallel MCMC support for JAGS, WinBUGS and OpenBUGS.

53

Bayesian Inference

deal

Learning Bayesian Networks with Mixed Variables

Bayesian networks with continuous and/or discrete variables can be learned and compared from data.

54

Bayesian Inference

deBInfer

Bayesian Inference for Differential Equations

A Bayesian framework for parameter inference in differential equations. This approach offers a rigorous methodology for parameter inference as well as modeling the link between unobservable model states and parameters, and observable quantities. Provides templates for the DE model, the observation model and data likelihood, and the model parameters and their prior distributions. A Markov chain Monte Carlo (MCMC) procedure processes these inputs to estimate the posterior distributions of the parameters and any derived quantities, including the model trajectories. Further functionality is provided to facilitate MCMC diagnostics and the visualisation of the posterior distributions of model parameters and trajectories.

55

Bayesian Inference

dlm

Bayesian and Likelihood Analysis of Dynamic Linear Models

Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models

56

Bayesian Inference

DPpackage (core)

Bayesian Nonparametric Modeling in R

Functions to perform inference via simulation from the posterior distributions for Bayesian nonparametric and semiparametric models. Although the name of the package was motivated by the Dirichlet Process prior, the package considers and will consider other priors on functional spaces. So far, DPpackage includes models considering Dirichlet Processes, Dependent Dirichlet Processes, Dependent Poisson Dirichlet Processes, Hierarchical Dirichlet Processes, Polya Trees, Linear Dependent Tailfree Processes, Mixtures of Triangular distributions, Random Bernstein polynomials priors and Dependent Bernstein Polynomials. The package also includes models considering Penalized BSplines. Includes semiparametric models for marginal and conditional density estimation, ROC curve analysis, interval censored data, binary regression models, generalized linear mixed models, IRT type models, and generalized additive models. Also contains functions to compute PseudoBayes factors for model comparison, and to elicitate the precision parameter of the Dirichlet Process. To maximize computational efficiency, the actual sampling for each model is done in compiled FORTRAN. The functions return objects which can be subsequently analyzed with functions provided in the ‘coda’ package.

57

Bayesian Inference

EbayesThresh

Empirical Bayes Thresholding and Related Methods

Empirical Bayes thresholding using the methods developed by I. M. Johnstone and B. W. Silverman. The basic problem is to estimate a mean vector given a vector of observations of the mean vector plus white noise, taking advantage of possible sparsity in the mean vector. Within a Bayesian formulation, the elements of the mean vector are modelled as having, independently, a distribution that is a mixture of an atom of probability at zero and a suitable heavytailed distribution. The mixing parameter can be estimated by a marginal maximum likelihood approach. This leads to an adaptive thresholding approach on the original data. Extensions of the basic method, in particular to wavelet thresholding, are also implemented within the package.

58

Bayesian Inference

ebdbNet

Empirical Bayes Estimation of Dynamic Bayesian Networks

Infer the adjacency matrix of a network from time course data using an empirical Bayes estimation procedure based on Dynamic Bayesian Networks.

59

Bayesian Inference

eco

Ecological Inference in 2x2 Tables

Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 doi:10.1093/pan/mpm017) and (2011 doi:10.18637/jss.v042.i05) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the ExpectationMaximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individuallevel data can be directly incorporated into the estimation whenever such data are available. Along with insample and outofsample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.

60

Bayesian Inference

eigenmodel

Semiparametric factor and regression models for symmetric relational data

This package estimates the parameters of a model for symmetric relational data (e.g., the abovediagonal part of a square matrix), using a modelbased eigenvalue decomposition and regression. Missing data is accomodated, and a posterior mean for missing data is calculated under the assumption that the data are missing at random. The marginal distribution of the relational data can be arbitrary, and is fit with an ordered probit specification.

61

Bayesian Inference

ensembleBMA

Probabilistic Forecasting using Ensembles and Bayesian Model Averaging

Bayesian Model Averaging to create probabilistic forecasts from ensemble forecasts and weather observations.

62

Bayesian Inference

evdbayes

Bayesian Analysis in Extreme Value Theory

Provides functions for the bayesian analysis of extreme value models, using MCMC methods.

63

Bayesian Inference

exactLoglinTest

Monte Carlo Exact Tests for Loglinear models

Monte Carlo and MCMC goodness of fit tests for loglinear models

64

Bayesian Inference

factorQR

Bayesian quantile regression factor models

Package to fit Bayesian quantile regression models that assume a factor structure for at least part of the design matrix.

65

Bayesian Inference

FME

A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis

Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package ‘deSolve’, or a steadystate solver from package ‘rootSolve’. However, the methods can also be used with other types of functions.

66

Bayesian Inference

geoR

Analysis of Geostatistical Data

Geostatistical analysis including traditional, likelihoodbased and Bayesian methods.

67

Bayesian Inference

geoRglm

A Package for Generalised Linear Spatial Models

Functions for inference in generalised linear spatial models. The posterior and predictive inference is based on Markov chain Monte Carlo methods. Package geoRglm is an extension to the package geoR, which must be installed first.

68

Bayesian Inference

ggmcmc

Tools for Analyzing MCMC Simulations from Bayesian Inference

Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables.

69

Bayesian Inference

glmmBUGS

Generalised Linear Mixed Models with BUGS and JAGS

Automates running Generalized Linear Mixed Models, including spatial models, with WinBUGS, OpenBUGS and JAGS. Models are specified with formulas, with the package writings model files, arranging unbalanced data in ragged arrays, and creating starting values. The model is reparameterized, and functions are provided for converting model outputs to the original parameterization.

70

Bayesian Inference

gRain

Graphical Independence Networks

Probability propagation in graphical independence networks, also known as Bayesian networks or probabilistic expert systems.

71

Bayesian Inference

growcurves

Bayesian Semi and Nonparametric Growth Curve Models that Additionally Include Multiple Membership Random Effects

Employs a nonparametric formulation for bysubject random effect parameters to borrow strength over a constrained number of repeated measurement waves in a fashion that permits multiple effects per subject. One class of models employs a Dirichlet process (DP) prior for the subject random effects and includes an additional set of random effects that utilize a different grouping factor and are mapped back to clients through a multiple membership weight matrix; e.g. treatment(s) exposure or dosage. A second class of models employs a dependent DP (DDP) prior for the subject random effects that directly incorporates the multiple membership pattern.

72

Bayesian Inference

hbsae

Hierarchical Bayesian Small Area Estimation

Functions to compute small area estimates based on a basic area or unitlevel model. The model is fit using restricted maximum likelihood, or in a hierarchical Bayesian way. In the latter case numerical integration is used to average over the posterior density for the betweenarea variance. The output includes the model fit, small area estimates and corresponding MSEs, as well as some model selection measures. Additional functions provide means to compute aggregate estimates and MSEs, to minimally adjust the small area estimates to benchmarks at a higher aggregation level, and to graphically compare different sets of small area estimates.

73

Bayesian Inference

HI

Simulation from distributions supported by nested hyperplanes

Simulation from distributions supported by nested hyperplanes, using the algorithm described in Petris & Tardella, “A geometric approach to transdimensional Markov chain Monte Carlo”, Canadian Journal of Statistics, v.31, n.4, (2003). Also random direction multivariate Adaptive Rejection Metropolis Sampling.

74

Bayesian Inference

Hmisc

Harrell Miscellaneous

Contains many functions useful for data analysis, highlevel graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.

75

Bayesian Inference

iterLap

Approximate Probability Densities by Iterated Laplace Approximations

The iterLap (iterated Laplace approximation) algorithm approximates a general (possibly nonnormalized) probability density on R^p, by repeated Laplace approximations to the difference between current approximation and true density (on log scale). The final approximation is a mixture of multivariate normal distributions and might be used for example as a proposal distribution for importance sampling (eg in Bayesian applications). The algorithm can be seen as a computational generalization of the Laplace approximation suitable for skew or multimodal densities.

76

Bayesian Inference

LaplacesDemon

Complete Environment for Bayesian Inference

Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview). The README describes the history of the package development process.

77

Bayesian Inference

LearnBayes

Functions for Learning Bayesian Inference

LearnBayes contains a collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.

78

Bayesian Inference

lme4

Linear MixedEffects Models using ‘Eigen’ and S4

Fit linear and generalized linear mixedeffects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

79

Bayesian Inference

lmm

Linear Mixed Models

Some improved procedures for linear mixed models.

80

Bayesian Inference

MasterBayes

ML and MCMC Methods for Pedigree Reconstruction and Analysis

The primary aim of MasterBayes is to use MCMC techniques to integrate over uncertainty in pedigree configurations estimated from molecular markers and phenotypic data. Emphasis is put on the marginal distribution of parameters that relate the phenotypic data to the pedigree. All simulation is done in compiled C++ for efficiency.

81

Bayesian Inference

matchingMarkets

Analysis of Stable Matchings

Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes onesided matching of agents into groups as well as twosided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.

82

Bayesian Inference

mcmc (core)

Markov Chain Monte Carlo

Simulates continuous distributions of random vectors using Markov chain Monte Carlo (MCMC). Users specify the distribution by an R function that evaluates the log unnormalized density. Algorithms are random walk Metropolis algorithm (function metrop), simulated tempering (function temper), and morphometric random walk Metropolis (Johnson and Geyer, 2012, https://doi.org/10.1214/12AOS1048, function morph.metrop), which achieves geometric ergodicity by change of variable.

83

Bayesian Inference

MCMCglmm

MCMC Generalised Linear Mixed Models

MCMC Generalised Linear Mixed Models.

84

Bayesian Inference

MCMCpack (core)

Markov Chain Monte Carlo (MCMC) Package

Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return coda mcmc objects that can then be summarized using the coda package. Some useful utility functions such as density functions, pseudorandom number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.

85

Bayesian Inference

mgcv

Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar. Includes a gam() function, a wide variety of smoothers, JAGS support and distributions beyond the exponential family.

86

Bayesian Inference

mlogitBMA

Bayesian Model Averaging for Multinomial Logit Models

Provides a modified function bic.glm of the BMA package that can be applied to multinomial logit (MNL) data. The data is converted to binary logit using the Begg & Gray approximation. The package also contains functions for maximum likelihood estimation of MNL.

87

Bayesian Inference

MNP

R Package for Fitting the Multinomial Probit Model

Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311334. doi:10.1016/j.jeconom.2004.02.002 Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 132. doi:10.18637/jss.v014.i03.

88

Bayesian Inference

mombf

Moment and Inverse Moment Bayes Factors

Model selection and parameter estimation based on nonlocal and Zellner priors. Bayes factors, marginal densities and variable selection in regression setups. Routines to sample, evaluate prior densities, distribution functions and quantiles are included.

89

Bayesian Inference

monomvn

Estimation for Multivariate Normal and Studentt Data with Monotone Missingness

Estimation of multivariate normal and studentt data of arbitrary dimension where the pattern of missing data is monotone. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scalemixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), NormalGamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and studentt errors (from Geweke) is also provided.

90

Bayesian Inference

MSBVAR

MarkovSwitching, Bayesian, Vector Autoregression Models

Provides methods for estimating frequentist and Bayesian Vector Autoregression (VAR) models and Markovswitching Bayesian VAR (MSBVAR). Functions for reduced form and structural VAR models are also available. Includes methods for the generating posterior inferences for these models, forecasts, impulse responses (using likelihoodbased error bands), and forecast error decompositions. Also includes utility functions for plotting forecasts and impulse responses, and generating draws from Wishart and singular multivariate normal densities. Current version includes functionality to build and evaluate models with Markov switching.

91

Bayesian Inference

NetworkChange

Bayesian Package for Network Changepoint Analysis

Network changepoint analysis for undirected network data. The package implements a hidden Markov multilinear tensor regression model (Park and Sohn, 2017, http://jhp.snu.ac.kr/NetworkChange.pdf). Functions for break number detection using the approximate marginal likelihood and WAIC are also provided.

92

Bayesian Inference

nimble

MCMC, Particle Filtering, and Programmable Hierarchical Modeling

A system for writing hierarchical statistical models largely compatible with ‘BUGS’ and ‘JAGS’, writing nimbleFunctions to operate models and do basic Rstyle math, and compiling both models and nimbleFunctions via customgenerated C++. ‘NIMBLE’ includes default methods for MCMC, particle filtering, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers ‘NIMBLE’ provides. ‘NIMBLE’ extends the ‘BUGS’/‘JAGS’ language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the ‘BUGS’/‘JAGS’ language for writing models, one can use ‘NIMBLE’ for writing arbitrary other kinds of modelgeneric algorithms as well. A full User Manual is available at http://rnimble.org.

93

Bayesian Inference

openEBGM

EBGM Scores for Mining Large Contingency Tables

An implementation of DuMouchel’s (1999) doi:10.1080/00031305.1999.10474456 Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and quantile scores from the posterior distribution using the GammaPoisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the cooccurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the ‘PhViD’ package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency.

94

Bayesian Inference

pacbpred

PACBayesian Estimation and Prediction in Sparse Additive Models

This package is intended to perform estimation and prediction in highdimensional additive models, using a sparse PACBayesian point of view and a MCMC algorithm. The method is fully described in Guedj and Alquier (2013), ‘PACBayesian Estimation and Prediction in Sparse Additive Models’, Electronic Journal of Statistics, 7, 264291.

95

Bayesian Inference

PAWL

Implementation of the PAWL algorithm

Implementation of the Parallel Adaptive WangLandau algorithm. Also implemented for comparison: parallel adaptive MetropolisHastings,SMC sampler.

96

Bayesian Inference

PottsUtils

Utility Functions of the Potts Models

A package including several functions related to the Potts models.

97

Bayesian Inference

predmixcor

Classification rule based on Bayesian mixture models with feature selection bias corrected

“train_predict_mix” predicts the binary response with binary features

98

Bayesian Inference

PReMiuM

Dirichlet Process Bayesian Clustering, Profile Regression

Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for postprocessing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection.

99

Bayesian Inference

prevalence

Tools for Prevalence Assessment Studies

The prevalence package provides Frequentist and Bayesian methods for prevalence assessment studies. IMPORTANT: the truePrev functions in the prevalence package call on JAGS (Just Another Gibbs Sampler), which therefore has to be available on the user’s system. JAGS can be downloaded from http://mcmcjags.sourceforge.net/.

100

Bayesian Inference

profdpm

Profile Dirichlet Process Mixtures

This package facilitates profile inference (inference at the posterior mode) for a class of product partition models (PPM). The Dirichlet process mixture is currently the only available member of this class. These methods search for the maximum posterior (MAP) estimate for the data partition in a PPM.

101

Bayesian Inference

pscl

Political Science Computational Laboratory

Bayesian analysis of itemresponse theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zeroinflated and hurdle models for count data; goodnessoffit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seatsvotes curves.

102

Bayesian Inference

R2jags

Using R to Run ‘JAGS’

Providing wrapper functions to implement Bayesian analysis in JAGS. Some major features include monitoring convergence of a MCMC model using Rubin and Gelman Rhat statistics, automatically running a MCMC model till it converges, and implementing parallel processing of a MCMC model for multiple chains.

103

Bayesian Inference

R2WinBUGS

Running ‘WinBUGS’ and ‘OpenBUGS’ from ‘R’ / ‘SPLUS’

Invoke a ‘BUGS’ model in ‘OpenBUGS’ or ‘WinBUGS’, a class “bugs” for ‘BUGS’ results and functions to work with that class. Function write.model() allows a ‘BUGS’ model file to be written. The class and auxiliary functions could be used with other MCMC programs, including ‘JAGS’.

104

Bayesian Inference

ramps

Bayesian Geostatistical Modeling with RAMPS

Bayesian geostatistical modeling of Gaussian processes using a reparameterized and marginalized posterior sampling (RAMPS) algorithm designed to lower autocorrelation in MCMC samples. Package performance is tuned for large spatial datasets.

105

Bayesian Inference

rbugs

Fusing R and OpenBugs and Beyond

Functions to prepare files needed for running BUGS in batchmode, and running BUGS from R. Support for Linux and Windows systems with OpenBugs is emphasized.

106

Bayesian Inference

revdbayes

RatioofUniforms Sampling for Bayesian Extreme Value Analysis

Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package https://cran.rproject.org/package=rust is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package https://cran.rproject.org/package=evdbayes, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the Kgaps model of Suveges and Davison (2010) doi:10.1214/09AOAS292. See the ‘revdbayes’ website for more information, documentation and examples.

107

Bayesian Inference

RJaCGH

Reversible Jump MCMC for the Analysis of CGH Arrays

Bayesian analysis of CGH microarrays fitting Hidden Markov Chain models. The selection of the number of states is made via their posterior probability computed by Reversible Jump Markov Chain Monte Carlo Methods. Also returns probabilistic common regions for gains/losses.

108

Bayesian Inference

rjags

Bayesian Graphical Models using MCMC

Interface to the JAGS MCMC library.

109

Bayesian Inference

RSGHB

Functions for Hierarchical Bayesian Estimation: A Flexible Approach

Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (nonvarying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative lognormal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html See Train’s chapter on HB in Discrete Choice with Simulation here: http://elsa.berkeley.edu/books/choice2.html; and his paper on using HB with nonnormal distributions here: http://eml.berkeley.edu//~train/trainsonnier.pdf.

110

Bayesian Inference

RSGHB

Functions for Hierarchical Bayesian Estimation: A Flexible Approach

Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (nonvarying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative lognormal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html See Train’s chapter on HB in Discrete Choice with Simulation here: http://elsa.berkeley.edu/books/choice2.html; and his paper on using HB with nonnormal distributions here: http://eml.berkeley.edu//~train/trainsonnier.pdf.

111

Bayesian Inference

rstan

R Interface to Stan

Userfacing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the headeronly Stan library provided by the ‘StanHeaders’ package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via ‘variational’ approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.

112

Bayesian Inference

rstiefel

Random orthonormal matrix generation on the Stiefel manifold

This package simulates random orthonormal matrices from linear and quadratic exponential family distributions on the Stiefel manifold. The most general type of distribution covered is the matrixvariate Binghamvon MisesFisher distribution. Most of the simulation methods are presented in Hoff(2009) “Simulation of the Matrix Binghamvon MisesFisher Distribution, With Applications to Multivariate and Relational Data.”

113

Bayesian Inference

runjags

Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS

Userfriendly interface utilities for MCMC models via Just Another Gibbs Sampler (JAGS), facilitating the use of parallel (or distributed) processors for multiple chains, automated control of convergence and sample length diagnostics, and evaluation of the performance of a model using dropk validation or against simulated data. Template model specifications can be generated using a standard lme4style formula interface to assist users less familiar with the BUGS syntax. A JAGS extension module provides additional distributions including the Pareto family of distributions, the DuMouchel prior and the halfCauchy prior.

114

Bayesian Inference

Runuran

R Interface to the UNU.RAN Random Variate Generators

Interface to the UNU.RAN library for Universal NonUniform RANdom variate generators. Thus it allows to build nonuniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.

115

Bayesian Inference

RxCEcolInf

R x C Ecological Inference With Optional Incorporation of Survey Information

Fits the R x C inference model described in Greiner and Quinn (2009). Allows incorporation of survey results.

116

Bayesian Inference

SamplerCompare

A Framework for Comparing the Performance of MCMC Samplers

A framework for running sets of MCMC samplers on sets of distributions with a variety of tuning parameters, along with plotting functions to visualize the results of those simulations. See scintro.pdf for an introduction.

117

Bayesian Inference

SampleSizeMeans

Sample size calculations for normal means

A set of R functions for calculating sample size requirements using three different Bayesian criteria in the context of designing an experiment to estimate a normal mean or the difference between two normal means. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of normal means are provided. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided.

118

Bayesian Inference

SampleSizeProportions

Calculating sample size requirements when estimating the difference between two binomial proportions

A set of R functions for calculating sample size requirements using three different Bayesian criteria in the context of designing an experiment to estimate the difference between two binomial proportions. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of binomial observations are provided. In all cases, estimation of the difference between two binomial proportions is considered. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided.

119

Bayesian Inference

sbgcop

Semiparametric Bayesian Gaussian copula estimation and imputation

This package estimates parameters of a Gaussian copula, treating the univariate marginal distributions as nuisance parameters as described in Hoff(2007). It also provides a semiparametric imputation procedure for missing multivariate data.

120

Bayesian Inference

SimpleTable

Bayesian Inference and Sensitivity Analysis for Causal Effects from 2 x 2 and 2 x 2 x K Tables in the Presence of Unmeasured Confounding

SimpleTable provides a series of methods to conduct Bayesian inference and sensitivity analysis for causal effects from 2 x 2 and 2 x 2 x K tables when unmeasured confounding is present or suspected.

121

Bayesian Inference

sna

Tools for Social Network Analysis

A range of tools for social network analysis, including node and graphlevel indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.

122

Bayesian Inference

spBayes

Univariate and Multivariate SpatialTemporal Modeling

Fits univariate and multivariate spatiotemporal random effects models for pointreferenced data using Markov chain Monte Carlo (MCMC). Details are given in Finley, Banerjee, and Gelfand (2015) doi:10.18637/jss.v063.i13 and Finley, Banerjee, and Cook (2014) doi:10.1111/2041210X.12189.

123

Bayesian Inference

spikeslab

Prediction and variable selection using spike and slab regression

Spike and slab for prediction and variable selection in linear regression models. Uses a generalized elastic net for variable selection.

124

Bayesian Inference

spikeSlabGAM

Bayesian Variable Selection and Model Choice for Generalized Additive Mixed Models

Bayesian variable selection, model choice, and regularized estimation for (spatial) generalized additive mixed regression models via stochastic search variable selection with spikeandslab priors.

125

Bayesian Inference

spTimer

SpatioTemporal Bayesian Modelling

Fits, spatially predicts and temporally forecasts large amounts of spacetime data using [1] Bayesian Gaussian Process (GP) Models, [2] Bayesian AutoRegressive (AR) Models, and [3] Bayesian Gaussian Predictive Processes (GPP) based AR Models for spatiotemporal bign problems. Bakar and Sahu (2015) doi:10.18637/jss.v063.i15.

126

Bayesian Inference

stochvol

Efficient Bayesian Inference for Stochastic Volatility (SV) Models

Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods.

127

Bayesian Inference

tgp

Bayesian Treed Gaussian Process Models

Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP singleindex models. Provides 1d and 2d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgpclass output. Sensitivity analysis and multiresolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivativefree optimization of noisy blackbox functions.

128

Bayesian Inference

tRophicPosition

Bayesian Trophic Position Calculation with Stable Isotopes

Estimates the trophic position of a consumer relative to a baseline species. It implements a Bayesian approach which combines an interface to the ‘JAGS’ MCMC library of ‘rjags’ and stable isotopes. Users are encouraged to test the package and send bugs and/or errors to trophicpositionsupport@googlegroups.com.

129

Bayesian Inference

zic

Bayesian Inference for ZeroInflated Count Models

Provides MCMC algorithms for the analysis of zeroinflated count models. The case of stochastic search variable selection (SVS) is also considered. All MCMC samplers are coded in C++ for improved efficiency. A data set considering the demand for health care is provided.

130

Chemometrics and Computational Physics

ALS (core)

Multivariate Curve Resolution Alternating Least Squares (MCRALS)

Alternating least squares is often used to resolve components contributing to data with a bilinear structure; the basic technique may be extended to alternating constrained least squares. Commonly applied constraints include unimodality, nonnegativity, and normalization of components. Several data matrices may be decomposed simultaneously by assuming that one of the two matrices in the bilinear decomposition is shared between datasets.

131

Chemometrics and Computational Physics

AnalyzeFMRI

Functions for analysis of fMRI datasets stored in the ANALYZE or NIFTI format

Functions for I/O, visualisation and analysis of functional Magnetic Resonance Imaging (fMRI) datasets stored in the ANALYZE or NIFTI format.

132

Chemometrics and Computational Physics

AquaEnv

Integrated Development Toolbox for Aquatic Chemical Model Generation

Toolbox for the experimental aquatic chemist, focused on acidification and CO2 airwater exchange. It contains all elements to model the pH, the related CO2 airwater exchange, and aquatic acidbase chemistry for an arbitrary marine, estuarine or freshwater system. It contains a suite of tools for sensitivity analysis, visualisation, modelling of chemical batches, and can be used to build dynamic models of aquatic systems. As from version 1.04, it also contains functions to calculate the buffer factors.

133

Chemometrics and Computational Physics

astro

Astronomy Functions, Tools and Routines

The astro package provides a series of functions, tools and routines in everyday use within astronomy. Broadly speaking, one may group these functions into 7 main areas, namely: cosmology, FITS file manipulation, the Sersic function, plotting, data manipulation, statistics and general convenience functions and scripting tools.

134

Chemometrics and Computational Physics

astrodatR

Astronomical Data

A collection of 19 datasets from contemporary astronomical research. They are described the textbook ‘Modern Statistical Methods for Astronomy with R Applications’ by Eric D. Feigelson and G. Jogesh Babu (Cambridge University Press, 2012, Appendix C) or on the website of Penn State’s Center for Astrostatistics (http://astrostatistics.psu.edu/datasets). These datasets can be used to exercise methodology involving: density estimation; heteroscedastic measurement errors; contingency tables; twosample hypothesis tests; spatial point processes; nonlinear regression; mixture models; censoring and truncation; multivariate analysis; classification and clustering; inhomogeneous Poisson processes; periodic and stochastic time series analysis.

135

Chemometrics and Computational Physics

astroFns

Astronomy: time and position functions, misc. utilities

Miscellaneous astronomy functions, utilities, and data.

136

Chemometrics and Computational Physics

astrolibR

Astronomy Users Library

Several dozen lowlevel utilities and codes from the Interactive Data Language (IDL) Astronomy Users Library (http://idlastro.gsfc.nasa.gov) are implemented in R. They treat: time, coordinate and proper motion transformations; terrestrial precession and nutation, atmospheric refraction and aberration, barycentric corrections, and related effects; utilities for astrometry, photometry, and spectroscopy; and utilities for planetary, stellar, Galactic, and extragalactic science.

137

Chemometrics and Computational Physics

Bchron

Radiocarbon Dating, AgeDepth Modelling, Relative Sea Level Rate Estimation, and NonParametric Phase Modelling

Enables quick calibration of radiocarbon dates under various calibration curves (including user generated ones); Agedepth modelling as per the algorithm of Haslett and Parnell (2008) doi:10.1111/j.14679876.2008.00623.x; Relative sea level rate estimation incorporating time uncertainty in polynomial regression models; and nonparametric phase modelling via Gaussian mixtures as a means to determine the activity of a site (and as an alternative to the Oxcal function SUM). The package includes a vignette which explains most of the basic functionality.

138

Chemometrics and Computational Physics

BioMark

Find Biomarkers in TwoClass Discrimination Problems

Variable selection methods are provided for several classification methods: the lasso/elastic net, PCLDA, PLSDA, and several ttests. Two approaches for selecting cutoffs can be used, one based on the stability of model coefficients under perturbation, and the other on higher criticism.

139

Chemometrics and Computational Physics

bvls

The StarkParker algorithm for boundedvariable least squares

An R interface to the StarkParker implementation of an algorithm for boundedvariable least squares

140

Chemometrics and Computational Physics

cda

CoupledDipole Approximation for Electromagnetic Scattering by ThreeDimensional Clusters of SubWavelength Particles

Coupleddipole simulations for electromagnetic scattering of light by subwavelength particles in arbitrary 3dimensional configurations. Scattering and absorption spectra are simulated by inversion of the interaction matrix, or by an orderofscattering approximation scheme. Highlevel functions are provided to simulate spectra with varying angles of incidence, as well as with full angular averaging.

141

Chemometrics and Computational Physics

celestial

Collection of Common Astronomical Conversion Routines and Functions

Contains a number of common astronomy conversion routines, particularly the HMS and degrees schemes, which can be fiddly to convert between on mass due to the textural nature of the former. It allows users to coordinate match datasets quickly. It also contains functions for various cosmological calculations.

142

Chemometrics and Computational Physics

CellularAutomaton

OneDimensional Cellular Automata

This package is an objectoriented implementation of onedimensional cellular automata. It supports many of the features offered by Mathematica, including elementary rules, userdefined rules, radii, userdefined seeding, and plotting.

143

Chemometrics and Computational Physics

chemCal (core)

Calibration Functions for Analytical Chemistry

Simple functions for plotting linear calibration functions and estimating standard errors for measurements according to the Handbook of Chemometrics and Qualimetrics: Part A by Massart et al. There are also functions estimating the limit of detection (LOD) and limit of quantification (LOQ). The functions work on model objects from  optionally weighted  linear regression (lm) or robust linear regression (‘rlm’ from the ‘MASS’ package).

144

Chemometrics and Computational Physics

chemometrics

Multivariate Statistical Analysis in Chemometrics

R companion to the book “Introduction to Multivariate Statistical Analysis in Chemometrics” written by K. Varmuza and P. Filzmoser (2009).

145

Chemometrics and Computational Physics

ChemometricsWithR

Chemometrics with R  Multivariate Data Analysis in the Natural Sciences and Life Sciences

Functions and scripts used in the book “Chemometrics with R  Multivariate Data Analysis in the Natural Sciences and Life Sciences” by Ron Wehrens, Springer (2011). Data used in the package are available from github.

146

Chemometrics and Computational Physics

ChemoSpec

Exploratory Chemometrics for Spectroscopy

A collection of functions for topdown exploratory data analysis of spectral data obtained via nuclear magnetic resonance (NMR), infrared (IR) or Raman spectroscopy. Includes functions for plotting and inspecting spectra, peak alignment, hierarchical cluster analysis (HCA), principal components analysis (PCA) and modelbased clustering. Robust methods appropriate for this type of highdimensional data are available. ChemoSpec is designed with metabolomics data sets in mind, where the samples fall into groups such as treatment and control. Graphical output is formatted consistently for publication quality plots. ChemoSpec is intended to be very user friendly and help you get usable results quickly. A vignette covering typical operations is available.

147

Chemometrics and Computational Physics

CHNOSZ

Thermodynamic Calculations for Geobiochemistry

An integrated set of tools for thermodynamic calculations in geochemistry and compositional biology. Thermodynamic properties are taken from a database for minerals and inorganic and organic aqueous species including biomolecules, or from amino acid group additivity for proteins. Hightemperature properties are calculated using the revised HelgesonKirkhamFlowers equations of state for aqueous species, and activity coefficients can be calculated for specified ionic strength. Functions are provided to define a system using basis species, automatically balance reactions, calculate the chemical affinities of formation reactions for selected species, calculate equilibrium activities, and plot the results on chemical activity diagrams.

148

Chemometrics and Computational Physics

clustvarsel

Variable Selection for Gaussian ModelBased Clustering

Variable selection for Gaussian modelbased clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forwardbackward or backwardforward direction, with or without subsampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.

149

Chemometrics and Computational Physics

compositions

Compositional Data Analysis

The package provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by Aitchison and PawlowskyGlahn.

150

Chemometrics and Computational Physics

cosmoFns

Functions for cosmological distances, times, luminosities, etc

Package encapsulates standard expressions for distances, times, luminosities, and other quantities useful in observational cosmology, including molecular line observations. Currently coded for a flat universe only.

151

Chemometrics and Computational Physics

CosmoPhotoz

Photometric redshift estimation using generalized linear models

Userfriendly interfaces to perform fast and reliable photometric redshift estimation. The code makes use of generalized linear models and can adopt gamma or inverse gaussian families, either from a frequentist or a Bayesian perspective. The code additionally provides a Shiny application providing a simple user interface.

152

Chemometrics and Computational Physics

CRAC

Cosmology R Analysis Code

R functions for cosmological research. The main functions are similar to the python library, cosmolopy.

153

Chemometrics and Computational Physics

dielectric

Defines some physical constants and dielectric functions commonly used in optics, plasmonics

Physical constants. Gold, silver and glass permittivities, together with spline interpolation functions.

154

Chemometrics and Computational Physics

diffractometry

Baseline identification and peak decomposition for xray diffractograms

Residualbased baseline identification and peak decomposition for xray diffractograms as introduced in Davies/Gather/Mergel/Meise/Mildenberger (2008).

155

Chemometrics and Computational Physics

drc

Analysis of DoseResponse Curves

Analysis of doseresponse data is made available through a suite of flexible and versatile model fitting and afterfitting functions.

156

Chemometrics and Computational Physics

drm

Regression and association models for repeated categorical data

Likelihoodbased marginal regression and association modelling for repeated, or otherwise clustered, categorical responses using dependence ratio as a measure of the association

157

Chemometrics and Computational Physics

EEM

Read and Preprocess Fluorescence ExcitationEmission Matrix (EEM) Data

Read raw EEM data and prepares them for further analysis.

158

Chemometrics and Computational Physics

elasticnet

ElasticNet for Sparse Estimation and Sparse PCA

This package provides functions for fitting the entire solution path of the ElasticNet and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 200510.

159

Chemometrics and Computational Physics

enpls

Ensemble Partial Least Squares Regression

An algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.

160

Chemometrics and Computational Physics

fastICA

FastICA Algorithms to Perform ICA and Projection Pursuit

Implementation of FastICA algorithm to perform Independent Component Analysis (ICA) and Projection Pursuit.

161

Chemometrics and Computational Physics

FITSio

FITS (Flexible Image Transport System) Utilities

Utilities to read and write files in the FITS (Flexible Image Transport System) format, a standard format in astronomy (see e.g. https://en.wikipedia.org/wiki/FITS for more information). Present lowlevel routines allow: reading, parsing, and modifying FITS headers; reading FITS images (multidimensional arrays); reading FITS binary and ASCII tables; and writing FITS images (multidimensional arrays). Higherlevel functions allow: reading files composed of one or more headers and a single (perhaps multidimensional) image or single table; reading tables into data frames; generating vectors for image array axes; scaling and writing images as 16bit integers. Known incompletenesses are reading random group extensions, as well as bit, complex, and array descriptor data types in binary tables.

162

Chemometrics and Computational Physics

fmri

Analysis of fMRI Experiments

Contains Rfunctions to perform an fMRI analysis as described in Tabelow et al. (2006) doi:10.1016/j.neuroimage.2006.06.029, Polzehl et al. (2010) doi:10.1016/j.neuroimage.2010.04.241, Tabelow and Polzehl (2011) doi:10.18637/jss.v044.i11.

163

Chemometrics and Computational Physics

fpca

Restricted MLE for Functional Principal Components Analysis

A geometric approach to MLE for functional principal components

164

Chemometrics and Computational Physics

FTICRMS

Programs for Analyzing Fourier TransformIon Cyclotron Resonance Mass Spectrometry Data

This package was developed partially with funding from the NIH Training Program in Biomolecular Technology (2T32GM08799).

165

Chemometrics and Computational Physics

homals

Gifi Methods for Optimal Scaling

Performs a homogeneity analysis (multiple correspondence analysis) and various extensions. Rank restrictions on the category quantifications can be imposed (nonlinear PCA). The categories are transformed by means of optimal scaling with options for nominal, ordinal, and numerical scale levels (for rank1 restrictions). Variables can be grouped into sets, in order to emulate regression analysis and canonical correlation analysis.

166

Chemometrics and Computational Physics

hyperSpec

Work with Hyperspectral Data, i.e. Spectra + Meta Information (Spatial, Time, Concentration, …)

Comfortable ways to work with hyperspectral data sets. I.e. spatially or timeresolved spectra, or spectra with any other kind of information associated with each of the spectra. The spectra can be data as obtained in XRF, UV/VIS, Fluorescence, AES, NIR, IR, Raman, NMR, MS, etc. More generally, any data that is recorded over a discretized variable, e.g. absorbance = f (wavelength), stored as a vector of absorbance values for discrete wavelengths is suitable.

167

Chemometrics and Computational Physics

investr

Inverse Estimation/Calibration Functions

Functions to facilitate inverse estimation (e.g., calibration) in linear, generalized linear, nonlinear, and (linear) mixedeffects models. A generic function is also provided for plotting fitted regression models with or without confidence/prediction bands that may be of use to the general user.

168

Chemometrics and Computational Physics

Iso (core)

Functions to Perform Isotonic Regression

Linear order and unimodal order (univariate) isotonic regression; bivariate isotonic regression with linear order on both variables.

169

Chemometrics and Computational Physics

kohonen (core)

Supervised and Unsupervised SelfOrganising Maps

Functions to train selforganising maps (SOMs). Also interrogation of the maps and prediction using trained maps are supported. The name of the package refers to Teuvo Kohonen, the inventor of the SOM.

170

Chemometrics and Computational Physics

leaps

Regression Subset Selection

Regression subset selection, including exhaustive search.

171

Chemometrics and Computational Physics

lspls

LSPLS Models

Implements the LSPLS (least squares  partial least squares) method described in for instance Jorgensen, K., Segtnan, V. H., Thyholt, K., Nas, T. (2004) A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables. Journal of Chemometrics, 18(10), 451464.

172

Chemometrics and Computational Physics

MALDIquant

Quantitative Analysis of Mass Spectrometry Data

A complete analysis pipeline for matrixassisted laser desorption/ionizationtimeofflight (MALDITOF) and other twodimensional mass spectrometry data. In addition to commonly used plotting and processing methods it includes distinctive features, namely baseline subtraction methods such as morphological filters (TopHat) or the statisticssensitive nonlinear iterative peakclipping algorithm (SNIP), peak alignment using warping functions, handling of replicated measurements as well as allowing spectra with different resolutions.

173

Chemometrics and Computational Physics

minpack.lm

R Interface to the LevenbergMarquardt Nonlinear LeastSquares Algorithm Found in MINPACK, Plus Support for Bounds

The nls.lm function provides an R interface to lmder and lmdif from the MINPACK library, for solving nonlinear leastsquares problems by a modification of the LevenbergMarquardt algorithm, with support for lower and upper parameter bounds. The implementation can be used via nlslike calls using the nlsLM function.

174

Chemometrics and Computational Physics

moonsun

Basic astronomical calculations with R

A collection of basic astronomical routines for R based on “Practical astronomy with your calculator” by Peter DuffetSmith.

175

Chemometrics and Computational Physics

nlme

Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixedeffects models.

176

Chemometrics and Computational Physics

nlreg

Higher Order Inference for Nonlinear Heteroscedastic Models

Likelihood inference based on higher order approximations for nonlinear models with possibly non constant variance

177

Chemometrics and Computational Physics

nnls (core)

The LawsonHanson algorithm for nonnegative least squares (NNLS)

An R interface to the LawsonHanson implementation of an algorithm for nonnegative least squares (NNLS). Also allows the combination of nonnegative and nonpositive constraints.

178

Chemometrics and Computational Physics

OrgMassSpecR

Organic Mass Spectrometry

Organic/biological mass spectrometry data analysis.

179

Chemometrics and Computational Physics

pcaPP

Robust PCA by Projection Pursuit

Provides functions for robust PCA by projection pursuit. The methods are described in Croux et al. (2006) doi:10.2139/ssrn.968376, Croux et al. (2013) doi:10.1080/00401706.2012.727746, Todorov and Filzmoser (2013) doi:10.1007/9783642330421_31.

180

Chemometrics and Computational Physics

Peaks

Peaks

Spectrum manipulation: background estimation, Markov smoothing, deconvolution and peaks search functions. Ported from ROOT/TSpectrum class.

181

Chemometrics and Computational Physics

PET

Simulation and Reconstruction of PET Images

This package implements different analytic/direct and iterative reconstruction methods of Peter Toft. It also offer the possibility to simulate PET data.

182

Chemometrics and Computational Physics

planar

Multilayer Optics

Solves the electromagnetic problem of reflection and transmission at a planar multilayer interface. Also computed are the decay rates and emission profile for a dipolar emitter.

183

Chemometrics and Computational Physics

pls (core)

Partial Least Squares and Principal Component Regression

Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).

184

Chemometrics and Computational Physics

plspm

Tools for Partial Least Squares Path Modeling (PLSPM)

Partial Least Squares Path Modeling (PLSPM) analysis for both metric and nonmetric data, as well as REBUS analysis.

185

Chemometrics and Computational Physics

ppls

Penalized Partial Least Squares

This package contains linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via crossvalidation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.

186

Chemometrics and Computational Physics

prospectr

Miscellaneous functions for processing and sample selection of visNIR diffuse reflectance data

The package provides functions for pretreatment and sample selection of visible and near infrared diffuse reflectance spectra

187

Chemometrics and Computational Physics

psy

Various procedures used in psychometry

Kappa, ICC, Cronbach alpha, screeplot, mtmm

188

Chemometrics and Computational Physics

PTAk (core)

Principal Tensor Analysis on k Modes

A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting nonidentity metrics and penalisations. 2way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tuckern) and PARAFAC/CANDECOMP with these extensions.

189

Chemometrics and Computational Physics

quantchem

Quantitative chemical analysis: calibration and evaluation of results

Statistical evaluation of calibration curves by different regression techniques: ordinary, weighted, robust (up to 4th order polynomial). Loglog and BoxCox transform, estimation of optimal power and weighting scheme. Tests for heteroscedascity and normality of residuals. Different kinds of plots commonly used in illustrating calibrations. Easy “inverse prediction” of concentration by given responses and statistical evaluation of results (comparison of precision and accuracy by common tests).

190

Chemometrics and Computational Physics

rcdk

Interface to the ‘CDK’ Libraries

Allows the user to access functionality in the ‘CDK’, a Java framework for chemoinformatics. This allows the user to load molecules, evaluate fingerprints, calculate molecular descriptors and so on. In addition, the ‘CDK’ API allows the user to view structures in 2D.

191

Chemometrics and Computational Physics

rcdklibs

The CDK Libraries Packaged for R

An R interface to the Chemistry Development Kit, a Java library for chemoinformatics. Given the size of the library itself, this package is not expected to change very frequently. To make use of the CDK within R, it is suggested that you use the ‘rcdk’ package. Note that it is possible to directly interact with the CDK using ‘rJava’. However ‘rcdk’ exposes functionality in a more idiomatic way. The CDK library itself is released as LGPL and the sources can be obtained from https://github.com/cdk/cdk.

192

Chemometrics and Computational Physics

represent

Determine the representativity of two multidimensional data sets

Contains workhorse function jrparams(), as well as two helper functions Mboxtest() and JRsMahaldist(), and four example data sets.

193

Chemometrics and Computational Physics

resemble

Regression and Similarity Evaluation for MemoryBased Learning in Spectral Chemometrics

Implementation of functions for spectral similarity/dissimilarity analysis and memorybased learning (MBL) for nonlinear modeling in complex spectral datasets. In chemometrics MBL is also known as local modeling.

194

Chemometrics and Computational Physics

RobPer

Robust Periodogram and Periodicity Detection Methods

Calculates periodograms based on (robustly) fitting periodic functions to light curves (irregularly observed time series, possibly with measurement accuracies, occurring in astroparticle physics). Three main functions are included: RobPer() calculates the periodogram. Outlying periodogram bars (indicating a period) can be detected with betaCvMfit(). Artificial light curves can be generated using the function tsgen(). For more details see the corresponding article: Thieler, Fried and Rathjens (2016), Journal of Statistical Software 69(9), 136, doi:10.18637/jss.v069.i09.

195

Chemometrics and Computational Physics

rpubchem

An Interface to the PubChem Collection

Access PubChem data (compounds, substance, assays) using R. Structural information is provided in the form of SMILES strings. It currently only provides access to a subset of the precalculated data stored by PubChem. Bioassay data can be accessed to obtain descriptions as well as the actual data. It is also possible to search for assay ID’s by keyword.

196

Chemometrics and Computational Physics

sapa

Spectral Analysis for Physical Applications

Software for the book Spectral Analysis for Physical Applications, Donald B. Percival and Andrew T. Walden, Cambridge University Press, 1993.

197

Chemometrics and Computational Physics

SCEPtER

Stellar CharactEristics Pisa Estimation gRid

SCEPtER pipeline for estimating the stellar age, mass, and radius given observational effective temperature, [Fe/H], and astroseismic parameters. The results are obtained adopting a maximum likelihood technique over a grid of precomputed stellar models.

198

Chemometrics and Computational Physics

simecol

Simulation of Ecological (and Other) Dynamic Systems

An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individualbased (or agentbased) and other models as well. The package helps to organize scenarios (to avoid copy and paste) and aims to improve readability and usability of code.

199

Chemometrics and Computational Physics

snapshot

Gadget Nbody cosmological simulation code snapshot I/O utilities

Functions for reading and writing Gadget Nbody snapshots. The Gadget code is popular in astronomy for running Nbody / hydrodynamical cosmological and merger simulations. To find out more about Gadget see the main distribution page at www.mpagarching.mpg.de/gadget/

200

Chemometrics and Computational Physics

solaR

Radiation and Photovoltaic Systems

Calculation methods of solar radiation and performance of photovoltaic systems from daily and intradaily irradiation data sources.

201

Chemometrics and Computational Physics

som

SelfOrganizing Map

SelfOrganizing Map (with application in gene clustering).

202

Chemometrics and Computational Physics

speaq

Tools for Nuclear Magnetic Resonance (NMR) Spectra Alignment, Peak Based Processing, Quantitative Analysis and Visualizations

Makes Nuclear Magnetic Resonance spectroscopy (NMR spectroscopy) data analysis as easy as possible by only requiring a small set of functions to perform an entire analysis. ‘speaq’ offers the possibility of raw spectra alignment and quantitation but also an analysis based on features whereby the spectra are converted to peaks which are then grouped and turned into features. These features can be processed with any number of statistical tools either included in ‘speaq’ or available elsewhere on CRAN. More detail can be found in doi:10.1186/1471210512405 and doi:10.1101/138503.

203

Chemometrics and Computational Physics

spls

Sparse Partial Least Squares (SPLS) Regression and Classification

This package provides functions for fitting a Sparse Partial Least Squares Regression and Classification

204

Chemometrics and Computational Physics

stellaR

stellar evolution tracks and isochrones

A package to manage and display stellar tracks and isochrones from Pisa lowmass database. Includes tools for isochrones construction and tracks interpolation.

205

Chemometrics and Computational Physics

stepPlr

L2 penalized logistic regression with a stepwise variable selection

L2 penalized logistic regression for both continuous and discrete predictors, with forward stagewise/forward stepwise variable selection procedure.

206

Chemometrics and Computational Physics

subselect

Selecting Variable Subsets

A collection of functions which (i) assess the quality of variable subsets as surrogates for a full data set, in either an exploratory data analysis or in the context of a multivariate linear model, and (ii) search for subsets which are optimal under various criteria.

207

Chemometrics and Computational Physics

TIMP

Fitting Separable Nonlinear Models in Spectroscopy and Microscopy

A problemsolving environment (PSE) for fitting separable nonlinear models to measurements arising in physics and chemistry experiments; has been extensively applied to timeresolved spectroscopy and FLIMFRET data.

208

Chemometrics and Computational Physics

titan

Titration analysis for mass spectrometry data

GUI to analyze mass spectrometric data on the relative abundance of two substances from a titration series.

209

Chemometrics and Computational Physics

titrationCurves

Acid/Base, Complexation, Redox, and Precipitation Titration Curves

A collection of functions to plot acid/base titration curves (pH vs. volume of titrant), complexation titration curves (pMetal vs. volume of EDTA), redox titration curves (potential vs.volume of titrant), and precipitation titration curves (either pAnalyte or pTitrant vs. volume of titrant). Options include the titration of mixtures, the ability to overlay two or more titration curves, and the ability to show equivalence points.

210

Chemometrics and Computational Physics

UPMASK

Unsupervised Photometric Membership Assignment in Stellar Clusters

An implementation of the UPMASK method for performing membership assignment in stellar clusters in R. It is prepared to use photometry and spatial positions, but it can take into account other types of data. The method is able to take into account arbitrary error models, and it is unsupervised, datadriven, physicalmodelfree and relies on as few assumptions as possible. The approach followed for membership assessment is based on an iterative process, principal component analysis, a clustering algorithm and a kernel density estimation.

211

Chemometrics and Computational Physics

varSelRF

Variable Selection using Random Forests

Variable selection from random forests using both backwards variable elimination (for the selection of small sets of nonredundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highlycorrelated variables). Main applications in highdimensional data (e.g., microarray data, and other genomics and proteomics applications).

212

Chemometrics and Computational Physics

webchem

Chemical Information from the Web

Chemical information from around the web. This package interacts with a suite of web APIs for chemical information.

213

Chemometrics and Computational Physics

WilcoxCV

Wilcoxonbased variable selection in crossvalidation

This package provides functions to perform fast variable selection based on the Wilcoxon rank sum test in the crossvalidation or MonteCarlo crossvalidation settings, for use in microarraybased binary classification.

214

Clinical Trial Design, Monitoring, and Analysis

adaptTest (core)

Adaptive twostage tests

The functions defined in this program serve for implementing adaptive twostage tests. Currently, four tests are included: Bauer and Koehne (1994), Lehmacher and Wassmer (1999), Vandemeulebroecke (2006), and the horizontal conditional error function. Userdefined tests can also be implemented. Reference: Vandemeulebroecke, An investigation of twostage tests, Statistica Sinica 2006.

215

Clinical Trial Design, Monitoring, and Analysis

AGSDest

Estimation in Adaptive Group Sequential Trials

Calculation of repeated confidence intervals as well as confidence intervals based on the stagewise ordering in group sequential designs and adaptive group sequential designs. For adaptive group sequential designs the confidence intervals are based on the conditional rejection probability principle. Currently the procedures do not support the use of futility boundaries or more than one adaptive interim analysis.

216

Clinical Trial Design, Monitoring, and Analysis

asd (core)

Simulations for Adaptive Seamless Designs

Package runs simulations for adaptive seamless designs with and without early outcomes for treatment selection and subpopulation type designs.

217

Clinical Trial Design, Monitoring, and Analysis

asypow

Calculate Power Utilizing Asymptotic Likelihood Ratio Methods

A set of routines written in the S language that calculate power and related quantities utilizing asymptotic likelihood ratio methods.

218

Clinical Trial Design, Monitoring, and Analysis

bcrm (core)

Bayesian Continual Reassessment Method for Phase I DoseEscalation Trials

Implements a wide variety of one and twoparameter Bayesian CRM designs. The program can run interactively, allowing the user to enter outcomes after each cohort has been recruited, or via simulation to assess operating characteristics.

219

Clinical Trial Design, Monitoring, and Analysis

bifactorial (core)

Inferences for bi and trifactorial trial designs

This package makes global and multiple inferences for given bi and trifactorial clinical trial designs using bootstrap methods and a classical approach.

220

Clinical Trial Design, Monitoring, and Analysis

binomSamSize

Confidence Intervals and Sample Size Determination for a Binomial Proportion under Simple Random Sampling and Pooled Sampling

A suite of functions to compute confidence intervals and necessary sample sizes for the parameter p of the Bernoulli B(p) distribution under simple random sampling or under pooled sampling. Such computations are e.g. of interest when investigating the incidence or prevalence in populations. The package contains functions to compute coverage probabilities and coverage coefficients of the provided confidence intervals procedures. Sample size calculations are based on expected length.

221

Clinical Trial Design, Monitoring, and Analysis

blockrand (core)

Randomization for block random clinical trials

Create randomizations for block random clinical trials. Can also produce a pdf file of randomization cards.

222

Clinical Trial Design, Monitoring, and Analysis

clinfun (core)

Clinical Trial Design and Data Analysis Functions

Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2stage and group sequential designs and for data analysis such as JonckheereTerpstra test and estimating survival quantiles.

223

Clinical Trial Design, Monitoring, and Analysis

clinsig

Clinical Significance Functions

Functions for calculating clinical significance.

224

Clinical Trial Design, Monitoring, and Analysis

clusterPower

Power Calculations for ClusterRandomized and ClusterRandomized Crossover Trials

Calculate power for cluster randomized trials (CRTs) that compare two means, two proportions, or two counts using closedform solutions. In addition, calculate power for cluster randomized crossover trials using Monte Carlo methods. For more information, see Reich et al. (2012) doi:10.1371/journal.pone.0035564.

225

Clinical Trial Design, Monitoring, and Analysis

coin

Conditional Inference Procedures in a Permutation Test Framework

Conditional inference procedures for the general independence problem including twosample, Ksample (nonparametric ANOVA), correlation, censored, ordered and multivariate problems.

226

Clinical Trial Design, Monitoring, and Analysis

conf.design

Construction of factorial designs

This small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs.

227

Clinical Trial Design, Monitoring, and Analysis

CRM

Continual Reassessment Method (CRM) for Phase I Clinical Trials

CRM simulator for Phase I Clinical Trials

228

Clinical Trial Design, Monitoring, and Analysis

crmPack

ObjectOriented Implementation of CRM Designs

Implements a wide range of modelbased dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on doselimiting toxicity endpoints to dualendpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with nonBayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules.

229

Clinical Trial Design, Monitoring, and Analysis

CRTSize (core)

Sample Size Estimation Functions for Cluster Randomized Trials

Sample size estimation in cluster (group) randomized trials. Contains traditional powerbased methods, empirical smoothing (Rotondi and Donner, 2009), and updated metaanalysis techniques (Rotondi and Donner, 2012).

230

Clinical Trial Design, Monitoring, and Analysis

dfcrm (core)

Dosefinding by the continual reassessment method

This package provides functions to run the CRM and TITECRM in phase I trials and calibration tools for trial planning purposes.

231

Clinical Trial Design, Monitoring, and Analysis

dfped

Extrapolation and Bridging of Adult Information in Early Phase DoseFinding Paediatrics Studies

A unified method for designing and analysing dosefinding trials in paediatrics, while bridging information from adults, is proposed in the dfped package. The dose range can be calculated under three extrapolation methods: linear, allometry and maturation adjustment, using pharmacokinetic (PK) data. To do this, it is assumed that target exposures are the same in both populations. The working model and prior distribution parameters of the dosetoxicity and doseefficacy relationships can be obtained using early phase adult toxicity and efficacy data at several dose levels through dfped package. Priors are used into the dose finding process through a Bayesian model selection or adaptive priors, to facilitate adjusting the amount of prior information to differences between adults and children. This calibrates the model to adjust for misspecification if the adult and paediatric data are very different. User can use his/her own Bayesian model written in Stan code through the dfped package. A template of this model is proposed in the examples of the corresponding R functions in the package. Finally, in this package you can find a simulation function for one trial or for more than one trial.

232

Clinical Trial Design, Monitoring, and Analysis

dfpk

Bayesian DoseFinding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials

Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).

233

Clinical Trial Design, Monitoring, and Analysis

DoseFinding

Planning and Analyzing Dose Finding Experiments

The DoseFinding package provides functions for the design and analysis of dosefinding experiments (with focus on pharmaceutical Phase II clinical trials). It provides functions for: multiple contrast tests, fitting nonlinear doseresponse models (using Bayesian and nonBayesian estimation), calculating optimal designs and an implementation of the MCPMod methodology.

234

Clinical Trial Design, Monitoring, and Analysis

epibasix

Elementary Epidemiological Functions for Epidemiology and Biostatistics

This package contains elementary tools for analysis of common epidemiological problems, ranging from sample size estimation, through 2x2 contingency table analysis and basic measures of agreement (kappa, sensitivity/specificity). Appropriate print and summary statements are also written to facilitate interpretation wherever possible. Source code is commented throughout to facilitate modification. The target audience includes advanced undergraduate and graduate students in epidemiology or biostatistics courses, and clinical researchers.

235

Clinical Trial Design, Monitoring, and Analysis

ewoc

Escalation with Overdose Control

An implementation of a variety of escalation with overdose control designs introduced by Babb, Rogatko and Zacks (1998) doi:10.1002/(SICI)10970258(19980530)17:10%3C1103::AIDSIM793%3E3.0.CO;29. It calculates the next dose as a clinical trial proceeds as well as performs simulations to obtain operating characteristics.

236

Clinical Trial Design, Monitoring, and Analysis

experiment (core)

experiment: R package for designing and analyzing randomized experiments

The package provides various statistical methods for designing and analyzing randomized experiments. One main functionality of the package is the implementation of randomizedblock and matchedpair designs based on possibly multivariate pretreatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.

237

Clinical Trial Design, Monitoring, and Analysis

FrF2

Fractional Factorial Designs with 2Level Factors

Regular and nonregular Fractional Factorial 2level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the builtin function alias).

238

Clinical Trial Design, Monitoring, and Analysis

GroupSeq (core)

A GUIBased Program to Compute Probabilities Regarding Group Sequential Designs

A graphical user interface to compute group sequential designs based on normally distributed test statistics, particularly critical boundaries, power, drift, and confidence intervals of such designs. All computations are based on the alpha spending approach by LanDeMets with various alpha spending functions being available to choose among.

239

Clinical Trial Design, Monitoring, and Analysis

gsbDesign

Group Sequential Bayes Design

Group Sequential Operating Characteristics for Clinical, Bayesian twoarm Trials with known Sigma and Normal Endpoints.

240

Clinical Trial Design, Monitoring, and Analysis

gsDesign (core)

Group Sequential Design

Derives group sequential designs and describes their properties.

241

Clinical Trial Design, Monitoring, and Analysis

HH

Statistical Analysis and Data Display: Heiberger and Holland

Support software for Statistical Analysis and Data Display (Second Edition, Springer, ISBN 9781493921218, 2015) and (First Edition, Springer, ISBN 0387402705, 2004) by Richard M. Heiberger and Burt Holland. This contemporary presentation of statistical methods features extensive use of graphical displays for exploring data and for displaying the analysis. The second edition includes redesigned graphics and additional chapters. The authors emphasize how to construct and interpret graphs, discuss principles of graphical design, and show how accompanying traditional tabular results are used to confirm the visual impressions derived directly from the graphs. Many of the graphical formats are novel and appear here for the first time in print. All chapters have exercises. All functions introduced in the book are in the package. R code for all examples, both graphs and tables, in the book is included in the scripts directory of the package.

242

Clinical Trial Design, Monitoring, and Analysis

Hmisc (core)

Harrell Miscellaneous

Contains many functions useful for data analysis, highlevel graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.

243

Clinical Trial Design, Monitoring, and Analysis

InformativeCensoring

Multiple Imputation for Informative Censoring

Multiple Imputation for Informative Censoring. This package implements two methods. Gamma Imputation from Jackson et al. (2014) doi:10.1002/sim.6274 and Risk Score Imputation from Hsu et al. (2009) doi:10.1002/sim.3480.

244

Clinical Trial Design, Monitoring, and Analysis

ldbounds (core)

LanDeMets Method for Group Sequential Boundaries

Computations related to group sequential boundaries. Includes calculation of bounds using the LanDeMets alpha spending function approach.

245

Clinical Trial Design, Monitoring, and Analysis

longpower

Sample Size Calculations for Longitudinal Data

The longpower package contains functions for computing power and sample size for linear models of longitudinal data based on the formula due to Liu and Liang (1997) and Diggle et al (2002). Either formula is expressed in terms of marginal model or Generalized Estimating Equations (GEE) parameters. This package contains functions which translate pilot mixed effect model parameters (e.g. random intercept and/or slope) into marginal model parameters so that the formulas of Diggle et al or Liu and Liang formula can be applied to produce sample size calculations for two sample longitudinal designs assuming known variance.

246

Clinical Trial Design, Monitoring, and Analysis

MChtest (core)

Monte Carlo hypothesis tests with Sequential Stopping

The package performs Monte Carlo hypothesis tests. It allows a couple of different sequential stopping boundaries (a truncated sequential probability ratio test boundary and a boundary proposed by Besag and Clifford, 1991). Gives valid pvalues and confidence intervals on pvalues.

247

Clinical Trial Design, Monitoring, and Analysis

MCPMod

Design and Analysis of DoseFinding Studies

Implements a methodology for the design and analysis of doseresponse studies that combines aspects of multiple comparison procedures and modeling approaches (Bretz, Pinheiro and Branson, 2005, Biometrics 61, 738748, doi:10.1111/j.15410420.2005.00344.x). The package provides tools for the analysis of dose finding trials as well as a variety of tools necessary to plan a trial to be conducted with the MCPMod methodology. Please note: The ‘MCPMod’ package will not be further developed, all future development of the MCPMod methodology will be done in the ‘DoseFinding’ Rpackage.

248

Clinical Trial Design, Monitoring, and Analysis

Mediana

Clinical Trial Simulations

Provides a general framework for clinical trial simulations based on the Clinical Scenario Evaluation (CSE) approach. The package supports a broad class of data models (including clinical trials with continuous, binary, survivaltype and counttype endpoints as well as multivariate outcomes that are based on combinations of different endpoints), analysis strategies and commonly used evaluation criteria.

249

Clinical Trial Design, Monitoring, and Analysis

meta

General Package for MetaAnalysis

Userfriendly general package providing standard methods for metaanalysis and supporting Schwarzer, Carpenter, and Rucker doi:10.1007/9783319214160, “MetaAnalysis with R” (2015):  fixed effect and random effects metaanalysis;  several plots (forest, funnel, Galbraith / radial, L’Abbe, Baujat, bubble);  statistical tests and trimandfill method to evaluate bias in metaanalysis;  import data from ‘RevMan 5’;  prediction interval, HartungKnapp and PauleMandel method for random effects model;  cumulative metaanalysis and leaveoneout metaanalysis;  metaregression (if R package ‘metafor’ is installed);  generalised linear mixed models (if R packages ‘metafor’, ‘lme4’, ‘numDeriv’, and ‘BiasedUrn’ are installed).

250

Clinical Trial Design, Monitoring, and Analysis

metafor

MetaAnalysis Package for R

A comprehensive collection of functions for conducting metaanalyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit fixed, random, and mixedeffects models to such data, carry out moderator and metaregression analyses, and create various types of metaanalytical plots (e.g., forest, funnel, radial, L’Abbe, Baujat, GOSH plots). For metaanalyses of binomial and persontime data, the package also provides functions that implement specialized methods, including the MantelHaenszel method, Peto’s method, and a variety of suitable generalized linear (mixedeffects) models (i.e., mixedeffects logistic and Poisson regression models). Finally, the package provides functionality for fitting metaanalytic multivariate/multilevel models that account for nonindependent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network metaanalyses and metaanalyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted.

251

Clinical Trial Design, Monitoring, and Analysis

metaLik

Likelihood Inference in MetaAnalysis and MetaRegression Models

First and higherorder likelihood inference in metaanalysis and metaregression models.

252

Clinical Trial Design, Monitoring, and Analysis

metasens

Advanced Statistical Methods to Model and Adjust for Bias in MetaAnalysis

The following methods are implemented to evaluate how sensitive the results of a metaanalysis are to potential bias in metaanalysis and to support Schwarzer et al. (2015) doi:10.1007/9783319214160, Chapter 5 “SmallStudy Effects in MetaAnalysis”:  Copas selection model described in Copas & Shi (2001) doi:10.1177/096228020101000402;  limit metaanalysis by Rucker et al. (2011) doi:10.1093/biostatistics/kxq046;  upper bound for outcome reporting bias by Copas & Jackson (2004) doi:10.1111/j.0006341X.2004.00161.x.

253

Clinical Trial Design, Monitoring, and Analysis

multcomp

Simultaneous Inference in General Parametric Models

Simultaneous tests and confidence intervals for general linear hypotheses in parametric models, including linear, generalized linear, linear mixed effects, and survival models. The package includes demos reproducing analyzes presented in the book “Multiple Comparisons Using R” (Bretz, Hothorn, Westfall, 2010, CRC Press).

254

Clinical Trial Design, Monitoring, and Analysis

nppbib

Nonparametric PartiallyBalanced Incomplete Block Design Analysis

Implements a nonparametric statistical test for rank or score data from partiallybalanced incomplete blockdesign experiments.

255

Clinical Trial Design, Monitoring, and Analysis

PIPS (core)

Predicted Interval Plots

Generate Predicted Interval Plots. Simulate and plot confidence intervals of an effect estimate given observed data and a hypothesis about the distribution of future data.

256

Clinical Trial Design, Monitoring, and Analysis

PowerTOST (core)

Power and Sample Size Based on Two OneSided tTests (TOST) for (Bio)Equivalence Studies

Contains functions to calculate power and sample size for various study designs used for bioequivalence studies. See function known.designs() for study designs covered. Moreover the package contains functions for power and sample size based on ‘expected’ power in case of uncertain (estimated) variability and/or uncertain theta0. ― Added are functions for the power and sample size for the ratio of two means with normally distributed data on the original scale (based on Fieller’s confidence (‘fiducial’) interval). ― Contains further functions for power and sample size calculations based on noninferiority ttest. This is not a TOST procedure but eventually useful if the question of ‘nonsuperiority’ must be evaluated. The power and sample size calculations based on noninferiority test may also performed via ‘expected’ power in case of uncertain (estimated) variability and/or uncertain theta0. ― Contains functions power.scABEL() and sampleN.scABEL() to calculate power and sample size for the BE decision via scaled (widened) BE acceptance limits (EMA recommended) based on simulations. Contains also functions scABEL.ad() and sampleN.scABEL.ad() to iteratively adjust alpha in order to maintain the overall consumer risk in ABEL studies and adapt the sample size for the loss in power. Contains further functions power.RSABE() and sampleN.RSABE() to calculate power and sample size for the BE decision via reference scaled ABE criterion according to the FDA procedure based on simulations. Contains further functions power.NTIDFDA() and sampleN.NTIDFDA() to calculate power and sample size for the BE decision via the FDA procedure for NTID’s based on simulations. Contains further functions power.HVNTID() and sampleN.HVNTID() to calculate power and sample size for the BE decision via the FDA procedure for highly variable NTID’s (see FDA Dabigatran / rivaroxaban guidances) ― Contains functions for power analysis of a sample size plan for ABE (pa.ABE()), scaled ABE (pa.scABE()) and scaled ABE for NTID’s (pa.NTIDFDA()) analysing power if deviating from assumptions of the plan. ― Contains further functions for power calculations / sample size estimation for dose proportionality studies using the Power model.

257

Clinical Trial Design, Monitoring, and Analysis

pwr (core)

Basic Functions for Power Analysis

Power analysis functions along the lines of Cohen (1988).

258

Clinical Trial Design, Monitoring, and Analysis

PwrGSD (core)

Power in a Group Sequential Design

Tools the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives.

259

Clinical Trial Design, Monitoring, and Analysis

qtlDesign (core)

Design of QTL experiments

Tools for the design of QTL experiments

260

Clinical Trial Design, Monitoring, and Analysis

rmeta

Metaanalysis

Functions for simple fixed and random effects metaanalysis for twosample comparisons and cumulative metaanalyses. Draws standard summary plots, funnel plots, and computes summaries and tests for association and heterogeneity

261

Clinical Trial Design, Monitoring, and Analysis

samplesize

Sample Size Calculation for Various tTests and WilcoxonTest

Computes sample size for Student’s ttest and for the WilcoxonMannWhitney test for categorical data. The ttest function allows paired and unpaired (balanced / unbalanced) designs as well as homogeneous and heterogeneous variances. The Wilcoxon function allows for ties.

262

Clinical Trial Design, Monitoring, and Analysis

seqmon (core)

Group Sequential Design Class for Clinical Trials

S4 class object for creating and managing group sequential designs. It calculates the efficacy and futility boundaries at each look. It allows modifying the design and tracking the design update history.

263

Clinical Trial Design, Monitoring, and Analysis

speff2trial (core)

Semiparametric efficient estimation for a twosample treatment effect

The package performs estimation and testing of the treatment effect in a 2group randomized clinical trial with a quantitative, dichotomous, or rightcensored timetoevent endpoint. The method improves efficiency by leveraging baseline predictors of the endpoint. The inverse probability weighting technique of Robins, Rotnitzky, and Zhao (JASA, 1994) is used to provide unbiased estimation when the endpoint is missing at random.

264

Clinical Trial Design, Monitoring, and Analysis

ssanv

Sample Size Adjusted for Nonadherence or Variability of Input Parameters

A set of functions to calculate sample size for twosample difference in means tests. Does adjustments for either nonadherence or variability that comes from using data to estimate parameters.

265

Clinical Trial Design, Monitoring, and Analysis

survival (core)

Survival Analysis

Contains the core survival analysis routines, including definition of Surv objects, KaplanMeier and AalenJohansen (multistate) curves, Cox models, and parametric accelerated failure time models.

266

Clinical Trial Design, Monitoring, and Analysis

TEQR (core)

Target Equivalence Range Design

The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is deescalated only if an unacceptable level of toxicity is experienced.

267

Clinical Trial Design, Monitoring, and Analysis

ThreeArmedTrials

Design and Analysis of Clinical NonInferiority or Superiority Trials with Active and Placebo Control

Design and analyze threearm noninferiority or superiority trials which follow a goldstandard design, i.e. trials with an experimental treatment, an active, and a placebo control.

268

Clinical Trial Design, Monitoring, and Analysis

ThreeGroups

ML Estimator for BaselinePlaceboTreatment (ThreeGroup) Experiments

Implements the Maximum Likelihood estimator for baseline, placebo, and treatment groups (threegroup) experiments with noncompliance proposed by Gerber, Green, Kaplan, and Kern (2010).

269

Clinical Trial Design, Monitoring, and Analysis

TrialSize (core)

R functions in Chapter 3,4,6,7,9,10,11,12,14,15

functions and examples in Sample Size Calculation in Clinical Research.

270

Cluster Analysis & Finite Mixture Models

AdMit

Adaptive Mixture of Studentt Distributions

Provides functions to perform the fitting of an adaptive mixture of Studentt distributions to a target density through its kernel function as described in Ardia et al. (2009) doi:10.18637/jss.v029.i03. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the MetropolisHastings algorithm to obtain quantities of interest for the target density itself.

271

Cluster Analysis & Finite Mixture Models

ADPclust

Fast Clustering Using Adaptive Density Peak Detection

An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon the idea of Rodriguez and Laio (2014)doi:10.1126/science.1242072. ADPclust clusters data by finding density peaks in a densitydistance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes a user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional “densitydistance plot”. Here is the research article associated with this package: “Wang, XiaoFeng, and Yifan Xu (2015)doi:10.1177/0962280215609948 Fast clustering using adaptive density peak detection.” Statistical methods in medical research“. url: http://smm.sagepub.com/content/early/2015/10/15/0962280215609948.abstract.

272

Cluster Analysis & Finite Mixture Models

amap

Another Multidimensional Analysis Package

Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).

273

Cluster Analysis & Finite Mixture Models

apcluster

Affinity Propagation Clustering

Implements Affinity Propagation clustering introduced by Frey and Dueck (2007) doi:10.1126/science.1136800. The algorithms are largely analogous to the ‘Matlab’ code published by Frey and Dueck. The package further provides leveraged affinity propagation and an algorithm for exemplarbased agglomerative clustering that can also be used to join clusters obtained from affinity propagation. Various plotting functions are available for analyzing clustering results.

274

Cluster Analysis & Finite Mixture Models

BayesLCA

Bayesian Latent Class Analysis

Bayesian Latent Class Analysis using several different methods.

275

Cluster Analysis & Finite Mixture Models

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

276

Cluster Analysis & Finite Mixture Models

bayesMCClust

MixturesofExperts Markov Chain Clustering and Dirichlet Multinomial Clustering

This package provides various Markov Chain Monte Carlo (MCMC) sampler for modelbased clustering of discretevalued time series obtained by observing a categorical variable with several states (in a Bayesian approach). In order to analyze group membership, we provide also an extension to the approaches by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule using a multinomial logit model.

277

Cluster Analysis & Finite Mixture Models

bayesmix

Bayesian Mixture Models with JAGS

The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.

278

Cluster Analysis & Finite Mixture Models

bclust

Bayesian Hierarchical Clustering Using Spike and Slab Models

Builds a dendrogram using log posterior as a natural distance defined by the model and meanwhile waits the clustering variables. It is also capable to computing equivalent Bayesian discrimination probabilities. The adopted method suites small sample large dimension setting. The model parameter estimation maybe difficult, depending on data structure and the chosen distribution family.

279

Cluster Analysis & Finite Mixture Models

bgmm

Gaussian Mixture Modeling Algorithms and the BeliefBased Mixture Modeling

Two partially supervised mixture modeling methods: softlabel and beliefbased modeling are implemented. For completeness, we equipped the package also with the functionality of unsupervised, semi and fully supervised mixture modeling. The package can be applied also to selection of the bestfitting from a set of models with different component numbers or constraints on their structures. For detailed introduction see: Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software doi:10.18637/jss.v047.i03.

280

Cluster Analysis & Finite Mixture Models

biclust

BiCluster Algorithms

The main function biclust provides several algorithms to find biclusters in twodimensional data: Cheng and Church, Spectral, Plaid Model, Xmotifs and Bimax. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.

281

Cluster Analysis & Finite Mixture Models

Bmix

Bayesian Sampling for StickBreaking Mixtures

This is a barebones implementation of sampling algorithms for a variety of Bayesian stickbreaking (marginally DP) mixture models, including particle learning and Gibbs sampling for static DP mixtures, particle learning for dynamic BAR stickbreaking, and DP mixture regression. The software is designed to be easy to customize to suit different situations and for experimentation with stickbreaking models. Since particles are repeatedly copied, it is not an especially efficient implementation.

282

Cluster Analysis & Finite Mixture Models

bmixture

Bayesian Estimation for Finite Mixture of Distributions

Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and tdistributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) doi:10.1007/s0018001203233 and Mohammadi and SalehiRad (2012) doi:10.1080/03610918.2011.588358.

283

Cluster Analysis & Finite Mixture Models

cba

Clustering for Business Analytics

Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.

284

Cluster Analysis & Finite Mixture Models

cclust

Convex Clustering Methods and Clustering Indexes

Convex Clustering methods, including Kmeans algorithm, Online Update algorithm (Hard Competitive Learning) and Neural Gas algorithm (Soft Competitive Learning), and calculation of several indexes for finding the number of clusters in a data set.

285

Cluster Analysis & Finite Mixture Models

CEC

CrossEntropy Clustering

CrossEntropy Clustering (CEC) divides the data into Gaussian type clusters. It performs the automatic reduction of unnecessary clusters, while at the same time allows the simultaneous use of various type Gaussian mixture models.

286

Cluster Analysis & Finite Mixture Models

CHsharp

Choi and Hall Style Data Sharpening

Functions for use in perturbing data prior to use of nonparametric smoothers and clustering.

287

Cluster Analysis & Finite Mixture Models

clue

Cluster Ensembles

CLUster Ensembles.

288

Cluster Analysis & Finite Mixture Models

cluster (core)

“Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.

289

Cluster Analysis & Finite Mixture Models

clusterfly

Explore clustering interactively using R and GGobi

Visualise clustering algorithms with GGobi. Contains both general code for visualising clustering results and specific visualisations for modelbased, hierarchical and SOM clustering.

290

Cluster Analysis & Finite Mixture Models

clusterGeneration

Random Cluster Generation (with Specified Degree of Separation)

We developed the clusterGeneration package to provide functions for generating random clusters, generating random covariance/correlation matrices, calculating a separation index (data and population version) for pairs of clusters or cluster distributions, and 1D and 2D projection plots to visualize clusters. The package also contains a function to generate random clusters based on factorial designs with factors such as degree of separation, number of clusters, number of variables, number of noisy variables.

291

Cluster Analysis & Finite Mixture Models

clusterRepro

Reproducibility of gene expression clusters

A function for validating microarry clusters via reproducibility

292

Cluster Analysis & Finite Mixture Models

clusterSim

Searching for Optimal Clustering Procedure for a Data Set

Distance measures (GDM1, GDM2, SokalMichener, BrayCurtis, for symbolic intervalvalued data), cluster quality indices (CalinskiHarabasz, BakerHubert, HubertLevine, Silhouette, KrzanowskiLai, Hartigan, Gap, DaviesBouldin), data normalization formulas, data generation (typical and nontypical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic intervalvalued data). (MILLIGAN, G.W., COOPER, M.C. (1985) doi:10.1007/BF02294245, HUBERT, L., ARABIE, P. (1985), doi:10.1007%2FBF01908075, RAND, W.M. (1971) doi:10.1080/01621459.1971.10482356, JAJUGA, K., WALESIAK, M. (2000) doi:10.1007/9783642572807_11, MILLIGAN, G.W., COOPER, M.C. (1988) doi:10.1007/BF01897163, CORMACK, R.M. (1971) doi:10.2307/2344237, JAJUGA, K., WALESIAK, M., BAK, A. (2003) doi:10.1007/9783642557217_12, CARMONE, F.J., KARA, A., MAXWELL, S. (1999) doi:10.2307/3152003, DAVIES, D.L., BOULDIN, D.W. (1979) doi:10.1109/TPAMI.1979.4766909, CALINSKI, T., HARABASZ, J. (1974) doi:10.1080/03610927408827101, HUBERT, L. (1974) doi:10.1080/01621459.1974.10480191, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) doi:10.1111/14679868.00293, KRZANOWSKI, W.J., LAI, Y.T. (1988) doi:10.2307/2531893, BRECKENRIDGE, J.N. (2000) doi:10.1207/S15327906MBR3502_5, WALESIAK, M., DUDEK, A. (2008) doi:10.1007/9783540782469_11).

293

Cluster Analysis & Finite Mixture Models

clustMixType

kPrototypes Clustering for Mixed VariableType Data

Functions to perform kprototypes partitioning clustering for mixed variabletype data according to Z.Huang (1998): Extensions to the kMeans Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283304, doi:10.1023/A:1009769707641.

294

Cluster Analysis & Finite Mixture Models

clustvarsel

Variable Selection for Gaussian ModelBased Clustering

Variable selection for Gaussian modelbased clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forwardbackward or backwardforward direction, with or without subsampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.

295

Cluster Analysis & Finite Mixture Models

clv

Cluster Validation Techniques

Package contains most of the popular internal and external cluster validation methods ready to use for the most of the outputs produced by functions coming from package “cluster”. Package contains also functions and examples of usage for cluster stability approach that might be applied to algorithms implemented in “cluster” package as well as user defined clustering algorithms.

296

Cluster Analysis & Finite Mixture Models

clValid

Validation of Clustering Results

Statistical and biological validation of clustering results.

297

Cluster Analysis & Finite Mixture Models

CoClust

Copula Based Cluster Analysis

Copula Based Cluster Analysis.

298

Cluster Analysis & Finite Mixture Models

compHclust

Complementary Hierarchical Clustering

Performs the complementary hierarchical clustering procedure and returns X’ (the expected residual matrix) and a vector of the relative gene importances.

299

Cluster Analysis & Finite Mixture Models

dbscan

Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms

A fast reimplementation of several densitybased algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (densitybased spatial clustering of applications with noise) and OPTICS (ordering points to identify the clustering structure) clustering algorithms HDBSCAN (hierarchical DBSCAN) and the LOF (local outlier factor) algorithm. The implementations uses the kdtree data structure (from library ANN) for faster knearest neighbor search. An R interface to fast kNN and fixedradius NN search is also provided.

300

Cluster Analysis & Finite Mixture Models

dendextend

Extending ‘Dendrogram’ Functionality in R

Offers a set of functions for extending ‘dendrogram’ objects in R, letting you visualize and compare trees of ‘hierarchical clusterings’. You can (1) Adjust a tree’s graphical parameters  the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different ‘dendrograms’ to one another.

301

Cluster Analysis & Finite Mixture Models

depmix

Dependent Mixture Models

Fits (multigroup) mixtures of latent or hidden Markov models on mixed categorical and continuous (timeseries) data. The Rdonlp2 package can optionally be used for optimization of the loglikelihood and is available from Rforge.

302

Cluster Analysis & Finite Mixture Models

depmixS4

Dependent Mixture Models  Hidden Markov Models of GLMs and Other Distributions in S4

Fits latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models.

303

Cluster Analysis & Finite Mixture Models

dpmixsim

Dirichlet Process Mixture model simulation for clustering and image segmentation

The package implements a Dirichlet Process Mixture (DPM) model for clustering and image segmentation. The DPM model is a Bayesian nonparametric methodology that relies on MCMC simulations for exploring mixture models with an unknown number of components. The code implements conjugate models with normal structure (conjugate normalnormal DP mixture model). The package’s applications are oriented towards the classification of magnetic resonance images according to tissue type or region of interest.

304

Cluster Analysis & Finite Mixture Models

dynamicTreeCut

Methods for Detection of Clusters in Hierarchical Clustering Dendrograms

Contains methods for detection of clusters in hierarchical clustering dendrograms.

305

Cluster Analysis & Finite Mixture Models

e1071

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

306

Cluster Analysis & Finite Mixture Models

edci

Edge Detection and Clustering in Images

Detection of edge points in images based on the difference of two asymmetric Mkernel estimators. Linear and circular regression clustering based on redescending Mestimators. Detection of linear edges in images.

307

Cluster Analysis & Finite Mixture Models

EMCluster

EM Algorithm for ModelBased Clustering of Finite Mixture Gaussian Distribution

EM algorithms and several efficient initialization methods for modelbased clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semisupervised learning.

308

Cluster Analysis & Finite Mixture Models

evclust

Evidential Clustering

Various clustering algorithms that produce a credal partition, i.e., a set of DempsterShafer mass functions representing the membership of objects to clusters. The mass functions quantify the clustermembership uncertainty of the objects. The algorithms are: Evidential cMeans (ECM), Relational Evidential cMeans (RECM), Constrained Evidential cMeans (CECM), EVCLUS and EKNNclus.

309

Cluster Analysis & Finite Mixture Models

FactoClass

Combination of Factorial Methods and Cluster Analysis

Some functions of ‘ade4’ and ‘stats’ are combined in order to obtain a partition of the rows of a data table, with columns representing variables of scales: quantitative, qualitative or frequency. First, a principal axes method is performed and then, a combination of Ward agglomerative hierarchical classification and Kmeans is performed, using some of the first coordinates obtained from the previous principal axes method. The function ‘kmeansW’, a modification of ‘kmeans’, programmed in C++, is included, in order to permit to have different weights of the elements to be clustered. Some complementary functions and datasets are included. See, for example: Lebart, L. and Piron, M. and Morineau, A. (2006). Statitique exploratoire multidimensionnelle, Dunod, Paris.

310

Cluster Analysis & Finite Mixture Models

fastcluster

Fast Hierarchical Clustering Routines for R and Python

This is a twoinone package which provides interfaces to both R and Python. It implements fast hierarchical, agglomerative clustering routines. Part of the functionality is designed as dropin replacement for existing routines: linkage() in the SciPy package ‘scipy.cluster.hierarchy’, hclust() in R’s ‘stats’ package, and the ‘flashClust’ package. It provides the same functionality with the benefit of a much faster implementation. Moreover, there are memorysaving routines for clustering of vector data, which go beyond what the existing packages provide. For information on how to install the Python files, see the file INSTALL in the source distribution.

311

Cluster Analysis & Finite Mixture Models

fclust

Fuzzy Clustering

Algorithms for fuzzy clustering, cluster validity indices and plots for cluster validity and visualizing fuzzy clustering results.

312

Cluster Analysis & Finite Mixture Models

FisherEM

The FisherEM algorithm

The FisherEM package provides an efficient algorithm for the unsupervised classification of highdimensional data. This FisherEM algorithm models and clusters the data in a discriminative and lowdimensional latent subspace. It also provides a lowdimensional representation of the clustered data. A sparse version of FisherEM algorithm is also provided.

313

Cluster Analysis & Finite Mixture Models

flashClust

Implementation of optimal hierarchical clustering

Fast implementation of hierarchical clustering

314

Cluster Analysis & Finite Mixture Models

flexclust (core)

Flexible Cluster Algorithms

The main function kcca implements a general framework for kcentroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, …), and bootstrap methods for the analysis of cluster stability.

315

Cluster Analysis & Finite Mixture Models

flexCWM

Flexible ClusterWeighted Modeling

Allows for maximum likelihood fitting of clusterweighted models, a class of mixtures of regression models with random covariates.

316

Cluster Analysis & Finite Mixture Models

flexmix (core)

Flexible Mixture Modeling

A general framework for finite mixtures of regression models using the EM algorithm is implemented. The package provides the Estep and all data handling, while the Mstep can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and modelbased clustering.

317

Cluster Analysis & Finite Mixture Models

fpc

Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Clusterwise cluster stability assessment. Methods for estimation of the number of clusters: CalinskiHarabasz, Tibshirani and Walther’s prediction strength, Fang and Wang’s bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variablewise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

318

Cluster Analysis & Finite Mixture Models

FunCluster

Functional Profiling of Microarray Expression Data

FunCluster performs a functional analysis of microarray expression data based on Gene Ontology & KEGG functional annotations. From expression data and functional annotations FunCluster builds classes of putatively coregulated biological processes through a specially designed clustering procedure.

319

Cluster Analysis & Finite Mixture Models

funFEM

Clustering in the Discriminative Functional Subspace

The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.

320

Cluster Analysis & Finite Mixture Models

funHDDC

Modelbased clustering in groupspecific functional subspaces

The package provides the funHDDC algorithm (Bouveyron & Jacques, 2011) which allows to cluster functional data by modeling each group within a specific functional subspace.

321

Cluster Analysis & Finite Mixture Models

gamlss.mx

Fitting Mixture Distributions with GAMLSS

The main purpose of this package is to allow fitting of mixture distributions with GAMLSS models.

322

Cluster Analysis & Finite Mixture Models

genie

A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm

A new hierarchical clustering linkage criterion: the Genie algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed, see (Gagolewski et al. 2016a doi:10.1016/j.ins.2016.05.003, 2016b doi:10.1007/9783319456560_16) for more details.

323

Cluster Analysis & Finite Mixture Models

GLDEX

Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods

The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KSresample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.

324

Cluster Analysis & Finite Mixture Models

GSM

Gamma Shape Mixture

Implementation of a Bayesian approach for estimating a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavytailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter.

325

Cluster Analysis & Finite Mixture Models

HDclassif

High Dimensional Supervised Classification and Clustering

Discriminant analysis and data clustering methods for high dimensional data, based on the assumption that highdimensional data live in different subspaces with low dimensionality proposing a new parametrization of the Gaussian mixture model which combines the ideas of dimension reduction and constraints on the model.

326

Cluster Analysis & Finite Mixture Models

hybridHclust

Hybrid Hierarchical Clustering

Hybrid hierarchical clustering via mutual clusters. A mutual cluster is a set of points closer to each other than to all other points. Mutual clusters are used to enrich topdown hierarchical clustering.

327

Cluster Analysis & Finite Mixture Models

idendr0

Interactive Dendrograms

Interactive dendrogram that enables the user to select and color clusters, to zoom and pan the dendrogram, and to visualize the clustered data not only in a builtin heat map, but also in ‘GGobi’ interactive plots and usersupplied plots. This is a backport of Qtbased ‘idendro’ (https://github.com/tsieger/idendro) to base R graphics and Tcl/Tk GUI.

328

Cluster Analysis & Finite Mixture Models

isopam

Isopam (Clustering)

Isopam clustering algorithm and utilities. Isopam optimizes clusters and optionally cluster numbers in a brute force style and aims at an optimum separation by all or some descriptors (typically species).

329

Cluster Analysis & Finite Mixture Models

kernlab

KernelBased Machine Learning Lab

Kernelbased machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

330

Cluster Analysis & Finite Mixture Models

kml

KMeans for Longitudinal Data

An implementation of kmeans specifically design to cluster longitudinal data. It provides facilities to deal with missing value, compute several quality criterion (Calinski and Harabatz, Ray and Turie, Davies and Bouldin, BIC, …) and propose a graphical interface for choosing the ‘best’ number of clusters.

331

Cluster Analysis & Finite Mixture Models

largeVis

HighQuality Visualizations of Large, HighDimensional Datasets

Implements the largeVis algorithm (see Tang, et al. (2016) doi:10.1145/2872427.2883041) for visualizing very large highdimensional datasets. Also very fast search for approximate nearest neighbors; outlier detection; and optimized implementations of the HDBSCAN*, DBSCAN and OPTICS clustering algorithms; plotting functions for visualizing the above.

332

Cluster Analysis & Finite Mixture Models

latentnet

Latent Position and Cluster Models for Statistical Networks

Fit and simulate latent position and cluster models for statistical networks.

333

Cluster Analysis & Finite Mixture Models

lcmm

Extended Mixed Models Using Latent Classes and Latent Processes

Estimation of various extensions of the mixed models including latent class mixed models, joint latent latent class mixed models and mixed models for curvilinear univariate or multivariate longitudinal outcomes using a maximum likelihood estimation method.

334

Cluster Analysis & Finite Mixture Models

longclust

ModelBased Clustering and Classification for Longitudinal Data

Clustering or classification of longitudinal data based on a mixture of multivariate t or Gaussian distributions with a Choleskydecomposed covariance structure.

335

Cluster Analysis & Finite Mixture Models

mcclust

Process an MCMC Sample of Clusterings

Implements methods for processing a sample of (hard) clusterings, e.g. the MCMC output of a Bayesian clustering model. Among them are methods that find a single best clustering to represent the sample, which are based on the posterior similarity matrix or a relabelling algorithm.

336

Cluster Analysis & Finite Mixture Models

mclust (core)

Gaussian Mixture Modelling for ModelBased Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for modelbased clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resamplingbased inference.

337

Cluster Analysis & Finite Mixture Models

MetabolAnalyze

Probabilistic latent variable models for metabolomic data

Fits probabilistic principal components analysis, probabilistic principal components and covariates analysis and mixtures of probabilistic principal components models to metabolomic spectral data.

338

Cluster Analysis & Finite Mixture Models

mixAK

Multivariate Normal Mixture Models and Mixtures of Generalized Linear Mixed Models Including Model Based Clustering

Contains a mixture of statistical methods including the MCMC methods to analyze normal mixtures. Additionally, model based clustering methods are implemented to perform classification based on (multivariate) longitudinal (or otherwise correlated) data. The basis for such clustering is a mixture of multivariate generalized linear mixed models.

339

Cluster Analysis & Finite Mixture Models

mixdist

Finite Mixture Distribution Models

This package contains functions for fitting finite mixture distribution models to grouped data and conditional data by the method of maximum likelihood using a combination of a Newtontype algorithm and the EM algorithm.

340

Cluster Analysis & Finite Mixture Models

mixer

Random graph clustering

Routines for the analysis (unsupervised clustering) of networks using MIXtures of ErdosRenyi random graphs

341

Cluster Analysis & Finite Mixture Models

mixPHM

Mixtures of Proportional Hazard Models

Fits multiple variable mixtures of various parametric proportional hazard models using the EMAlgorithm. Proportionality restrictions can be imposed on the latent groups and/or on the variables. Several survival distributions can be specified. Missing values and censored values are allowed. Independence is assumed over the single variables.

342

Cluster Analysis & Finite Mixture Models

mixRasch

Mixture Rasch Models with JMLE

Estimates Rasch models and mixture Rasch models, including the dichotomous Rasch model, the rating scale model, and the partial credit model.

343

Cluster Analysis & Finite Mixture Models

mixreg

Functions to fit mixtures of regressions

Fits mixtures of (possibly multivariate) regressions (which has been described as doing ANCOVA when you don’t know the levels).

344

Cluster Analysis & Finite Mixture Models

MixSim

Simulating Data to Study Performance of Clustering Algorithms

The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of ‘MixSim’, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and nonGaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.

345

Cluster Analysis & Finite Mixture Models

mixsmsn

Fitting Finite Mixture of Scale Mixture of SkewNormal Distributions

Functions to fit finite mixture of scale mixture of skewnormal (FMSMSN) distributions.

346

Cluster Analysis & Finite Mixture Models

mixtools

Tools for Analyzing Finite Mixture Models

Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixturesofregressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictordependent mixing proportions, random effects regressions, hierarchical mixturesofexperts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic and model selection criteria). Bayesian estimation of mixturesoflinearregressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES0518772.

347

Cluster Analysis & Finite Mixture Models

mixture

Mixture Models for Clustering and Classification

An implementation of all 14 Gaussian parsimonious clustering models (GPCMs) for modelbased clustering and modelbased classification.

348

Cluster Analysis & Finite Mixture Models

MOCCA

Multiobjective optimization for collecting cluster alternatives

This package provides methods to analyze cluster alternatives based on multiobjective optimization of cluster validation indices.

349

Cluster Analysis & Finite Mixture Models

movMF

Mixtures of von MisesFisher Distributions

Fit and simulate mixtures of von MisesFisher distributions.

350

Cluster Analysis & Finite Mixture Models

mritc

MRI Tissue Classification

Various methods for MRI tissue classification.

351

Cluster Analysis & Finite Mixture Models

NbClust

Determining the Best Number of Clusters in a Data Set

It provides 30 indexes for determining the optimal number of clusters in a data set and offers the best clustering scheme from different results to the user.

352

Cluster Analysis & Finite Mixture Models

nor1mix

Normal (1d) Mixture Models (S3 Classes and Methods)

Onedimensional Normal Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used MarronWand densities. Efficient random number generation and graphics; now fitting to data by ML (Maximum Likelihood) or EM estimation.

353

Cluster Analysis & Finite Mixture Models

optpart

Optimal Partitioning of Similarity Relations

Contains a set of algorithms for creating partitions and coverings of objects largely based on operations on (dis)similarity relations (or matrices). There are several iterative reassignment algorithms optimizing different goodnessofclustering criteria. In addition, there are covering algorithms ‘clique’ which derives maximal cliques, and ‘maxpact’ which creates a covering of maximally compact sets. Graphical analyses and conversion routines are also included.

354

Cluster Analysis & Finite Mixture Models

ORIClust

Orderrestricted Information Criterionbased Clustering Algorithm

ORIClust is a userfriendly Rbased software package for gene clustering. Clusters are given by genes matched to prespecified profiles across various ordered treatment groups. It is particularly useful for analyzing data obtained from short timecourse or doseresponse microarray experiments.

355

Cluster Analysis & Finite Mixture Models

pdfCluster

Cluster analysis via nonparametric density estimation

The package performs cluster analysis via nonparametric density estimation. Operationally, the kernel method is used throughout to estimate the density. Diagnostics methods for evaluating the quality of the clustering are available. The package includes also a routine to estimate the probability density function obtained by the kernel method, given a set of data with arbitrary dimensions.

356

Cluster Analysis & Finite Mixture Models

pendensity

Density Estimation with a Penalized Mixture Approach

Estimation of univariate (conditional) densities using penalized Bsplines with automatic selection of optimal smoothing parameter.

357

Cluster Analysis & Finite Mixture Models

pgmm

Parsimonious Gaussian Mixture Models

Carries out modelbased clustering or classification using parsimonious Gaussian mixture models.

358

Cluster Analysis & Finite Mixture Models

pmclust

Parallel ModelBased Clustering using ExpectationGatheringMaximization Algorithm for Finite Mixture Gaussian Model

Aims to utilize modelbased clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs pbdMPI to perform a expectationgatheringmaximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through pbdMPI and independent to most MPI applications. See the High Performance Statistical Computing website for more information, documents and examples.

359

Cluster Analysis & Finite Mixture Models

poLCA

Polytomous variable Latent Class Analysis

Latent class analysis and latent class regression models for polytomous outcome variables. Also known as latent structure analysis.

360

Cluster Analysis & Finite Mixture Models

prabclus

Functions for Clustering of PresenceAbsence, Abundance and Multilocus Genetic Data

Distancebased parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presenceabsence, abundance and multilocus genetical data for species delimitation, nearest neighbor based noise detection. Try package?prabclus for on overview.

361

Cluster Analysis & Finite Mixture Models

prcr

PersonCentered Analysis

Provides an easytouse yet adaptable set of tools to conduct personcenter analysis using a twostep clustering procedure. As described in Bergman and ElKhouri (1999) doi:10.1002/(SICI)15214036(199910)41:6%3C753::AIDBIMJ753%3E3.0.CO;2K, hierarchical clustering is performed to determine the initial partition for the subsequent kmeans clustering procedure.

362

Cluster Analysis & Finite Mixture Models

PReMiuM

Dirichlet Process Bayesian Clustering, Profile Regression

Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for postprocessing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection.

363

Cluster Analysis & Finite Mixture Models

profdpm

Profile Dirichlet Process Mixtures

This package facilitates profile inference (inference at the posterior mode) for a class of product partition models (PPM). The Dirichlet process mixture is currently the only available member of this class. These methods search for the maximum posterior (MAP) estimate for the data partition in a PPM.

364

Cluster Analysis & Finite Mixture Models

protoclust

Hierarchical Clustering with Prototypes

Performs minimax linkage hierarchical clustering. Every cluster has an associated prototype element that represents that cluster as described in Bien, J., and Tibshirani, R. (2011), “Hierarchical Clustering with Prototypes via Minimax Linkage,” accepted for publication in The Journal of the American Statistical Association, DOI: 10.1198/jasa.2011.tm10183.

365

Cluster Analysis & Finite Mixture Models

psychomix

Psychometric Mixture Models

Psychometric mixture models based on ‘flexmix’ infrastructure. At the moment Rasch mixture models with different parameterizations of the score distribution (saturated vs. mean/variance specification), BradleyTerry mixture models, and MPT mixture models are implemented. These mixture models can be estimated with or without concomitant variables. See vignette(‘raschmix’, package = ‘psychomix’) for details on the Rasch mixture models.

366

Cluster Analysis & Finite Mixture Models

pvclust

Hierarchical Clustering with PValues via Multiscale Bootstrap Resampling

An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) pvalue as well as BP (bootstrap probability) value for each cluster in a dendrogram.

367

Cluster Analysis & Finite Mixture Models

randomLCA

Random Effects Latent Class Analysis

Fits standard and random effects latent class models. The single level random effects model is described in Qu et al doi:10.2307/2533043 and the two level random effects model in Beath and Heller doi:10.1177/1471082X0800900302. Examples are given for their use in diagnostic testing.

368

Cluster Analysis & Finite Mixture Models

rjags

Bayesian Graphical Models using MCMC

Interface to the JAGS MCMC library.

369

Cluster Analysis & Finite Mixture Models

Rmixmod (core)

Supervised, Unsupervised, SemiSupervised Classification with MIXture MODelling (Interface of MIXMOD Software)

Interface of MIXMOD software for supervised, unsupervised and semiSupervised classification with MIXture MODelling.

370

Cluster Analysis & Finite Mixture Models

RPMM

Recursively Partitioned Mixture Model

Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a modelbased clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.

371

Cluster Analysis & Finite Mixture Models

seriation

Infrastructure for Ordering Objects Using Seriation

Infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

372

Cluster Analysis & Finite Mixture Models

sigclust

Statistical Significance of Clustering

SigClust is a statistical method for testing the significance of clustering results. SigClust can be applied to assess the statistical significance of splitting a data set into two clusters. For more than two clusters, SigClust can be used iteratively.

373

Cluster Analysis & Finite Mixture Models

skmeans

Spherical kMeans Clustering

Algorithms to compute spherical kmeans partitions. Features several methods, including a genetic and a fixedpoint algorithm and an interface to the CLUTO vcluster program.

374

Cluster Analysis & Finite Mixture Models

som

SelfOrganizing Map

SelfOrganizing Map (with application in gene clustering).

375

Cluster Analysis & Finite Mixture Models

sparcl

Perform sparse hierarchical clustering and sparse kmeans clustering

Implements the sparse clustering methods of Witten and Tibshirani (2010): “A framework for feature selection in clustering”; published in Journal of the American Statistical Association 105(490): 713726.

376

Cluster Analysis & Finite Mixture Models

tclust

Robust Trimmed Clustering

Provides functions for robust trimmed clustering. The methods are described in GarciaEscudero (2008) doi:10.1214/07AOS515, Fritz et al. (2012) doi:10.18637/jss.v047.i12 and others.

377

Cluster Analysis & Finite Mixture Models

teigen

ModelBased Clustering and Classification with the Multivariate t Distribution

Fits mixtures of multivariate tdistributions (with eigendecomposed covariance structure) via the expectation conditionalmaximization algorithm under a clustering or classification paradigm.

378

Cluster Analysis & Finite Mixture Models

treeClust

Cluster Distances Through Trees

Create a measure of interpoint dissimilarity useful for clustering mixed data, and, optionally, perform the clustering.

379

Cluster Analysis & Finite Mixture Models

trimcluster

Cluster analysis with trimming

Trimmed kmeans clustering.

380

Cluster Analysis & Finite Mixture Models

wle

Weighted Likelihood Estimation

Approach to the robustness via Weighted Likelihood.

381

Differential Equations

adaptivetau

TauLeaping Stochastic Simulation

Implements adaptive tau leaping to approximate the trajectory of a continuoustime stochastic process as described by Cao et al. (2007) The Journal of Chemical Physics doi:10.1063/1.2745299. This package is based upon work supported by NSF DBI0906041 and NIH K99GM104158 to Philip Johnson and NIH R01AI049334 to Rustom Antia.

382

Differential Equations

bvpSolve (core)

Solvers for Boundary Value Problems of Differential Equations

Functions that solve boundary value problems (‘BVP’) of systems of ordinary differential equations (‘ODE’) and differential algebraic equations (‘DAE’). The functions provide an interface to the FORTRAN functions ‘twpbvpC’, ‘colnew/colsys’, and an Rimplementation of the shooting method.

383

Differential Equations

cOde

Automated C Code Generation for ‘deSolve’, ‘bvpSolve’ and ‘Sundials’

Generates all necessary C functions allowing the user to work with the compiledcode interface of ode() and bvptwp(). The implementation supports “forcings” and “events”. Also provides functions to symbolically compute Jacobians, sensitivity equations and adjoint sensitivities being the basis for sensitivity analysis. Alternatively to ‘deSolve’, the Sundials ‘CVODES’ solver is implemented for computation of model sensitivities.

384

Differential Equations

CollocInfer

Collocation Inference for Dynamic Systems

These functions implement collocationinference for continuoustime and discretetime stochastic processes. They provide modelbased smoothing, gradientmatching, generalized profiling and forwards prediction error methods.

385

Differential Equations

deSolve (core)

Solvers for Initial Value Problems of Differential Equations (‘ODE’, ‘DAE’, ‘DDE’)

Functions that solve initial value problems of a system of firstorder ordinary differential equations (‘ODE’), of partial differential equations (‘PDE’), of differential algebraic equations (‘DAE’), and of delay differential equations. The functions provide an interface to the FORTRAN functions ‘lsoda’, ‘lsodar’, ‘lsode’, ‘lsodes’ of the ‘ODEPACK’ collection, to the FORTRAN functions ‘dvode’, ‘zvode’ and ‘daspk’ and a Cimplementation of solvers of the ‘RungeKutta’ family with fixed or variable time steps. The package contains routines designed for solving ‘ODEs’ resulting from 1D, 2D and 3D partial differential equations (‘PDE’) that have been converted to ‘ODEs’ by numerical differencing.

386

Differential Equations

deTestSet

Test Set for Differential Equations

Solvers and test set for stiff and nonstiff differential equations, and differential algebraic equations.

387

Differential Equations

dMod

Dynamic Modeling and Parameter Estimation in ODE Models

The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives.

388

Differential Equations

ecolMod

“A practical guide to ecological modelling  using R as a simulation platform”

Figures, data sets and examples from the book “A practical guide to ecological modelling  using R as a simulation platform” by Karline Soetaert and Peter MJ Herman (2009). Springer. All figures from chapter x can be generated by “demo(chapx)”, where x = 1 to 11. The Rscripts of the model examples discussed in the book are in subdirectory “examples”, ordered per chapter. Solutions to model projects are in the same subdirectories.

389

Differential Equations

FME

A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis

Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package ‘deSolve’, or a steadystate solver from package ‘rootSolve’. However, the methods can also be used with other types of functions.

390

Differential Equations

GillespieSSA

Gillespie’s Stochastic Simulation Algorithm (SSA)

GillespieSSA provides a simple to use, intuitive, and extensible interface to several stochastic simulation algorithms for generating simulated trajectories of finite population continuoustime model. Currently it implements Gillespie’s exact stochastic simulation algorithm (Direct method) and several approximate methods (Explicit tauleap, Binomial tauleap, and Optimized tauleap). The package also contains a library of template models that can be run as demo models and can easily be customized and extended. Currently the following models are included, decayingdimerization reaction set, linear chain system, logistic growth model, Lotka predatorprey model, RosenzweigMacArthur predatorprey model, KermackMcKendrick SIR model, and a metapopulation SIRS model.

391

Differential Equations

mkin

Kinetic Evaluation of Chemical Degradation Data

Calculation routines based on the FOCUS Kinetics Report (2006, 2014). Includes a function for conveniently defining differential equation models, model solution based on eigenvalues if possible or using numerical solvers and a choice of the optimisation methods made available by the ‘FME’ package. If a C compiler (on windows: ‘Rtools’) is installed, differential equation models are solved using compiled C functions. Please note that no warranty is implied for correctness of results or fitness for a particular purpose.

392

Differential Equations

nlmeODE

Nonlinear mixedeffects modelling in nlme using differential equations

This package combines the odesolve and nlme packages for mixedeffects modelling using differential equations.

393

Differential Equations

odeintr

C++ ODE Solvers Compiled onDemand

Wraps the Boost odeint library for integration of differential equations.

394

Differential Equations

PBSddesolve

Solver for Delay Differential Equations

Routines for solving systems of delay differential equations by interfacing numerical routines written by Simon N. Wood , with contributions by Benjamin J. Cairns. These numerical routines first appeared in Simon Wood’s ‘solv95’ program. This package includes a vignette and a complete user’s guide. ‘PBSddesolve’ originally appeared on CRAN under the name ‘ddesolve’. That version is no longer supported. The current name emphasizes a close association with other PBS packages, particularly ‘PBSmodelling’.

395

Differential Equations

PBSmodelling

GUI Tools Made Easy: Interact with Models and Explore Data

Provides software to facilitate the design, testing, and operation of computer models. It focuses particularly on tools that make it easy to construct and edit a customized graphical user interface (GUI). Although our simplified GUI language depends heavily on the R interface to the Tcl/Tk package, a user does not need to know Tcl/Tk. Examples illustrate models built with other R packages, including PBSmapping, PBSddesolve, and BRugs. A complete user’s guide ‘PBSmodellingUG.pdf’ shows how to use this package effectively.

396

Differential Equations

phaseR

Phase Plane Analysis of One and Two Dimensional Autonomous ODE Systems

phaseR is an R package for the qualitative analysis of one and two dimensional autonomous ODE systems, using phase plane methods. Programs are available to identify and classify equilibrium points, plot the direction field, and plot trajectories for multiple initial conditions. In the one dimensional case, a program is also available to plot the phase portrait. Whilst in the two dimensional case, additionally a program is available to plot nullclines. Many example systems are provided for the user.

397

Differential Equations

pomp

Statistical Inference for Partially Observed Markov Processes

Tools for working with partially observed Markov process (POMP) models (also known as stochastic dynamical systems, hidden Markov models, and nonlinear, nonGaussian, statespace models). The package provides facilities for implementing POMP models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. It is also a versatile platform for implementation of inference methods for general POMP models.

398

Differential Equations

pracma

Practical Numerical Math Functions

Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some wellknown special mathematical functions. Uses ‘MATLAB’ function names where appropriate to simplify porting.

399

Differential Equations

primer

Functions and data for A Primer of Ecology with R

Functions are primarily functions for systems of ordinary differential equations, difference equations, and eigenanalysis and projection of demographic matrices; data are for examples.

400

Differential Equations

QPot

QuasiPotential Analysis for Stochastic Differential Equations

Tools to 1) simulate and visualize stochastic differential equations and 2) determine stability of equilibria using the orderedupwind method to compute the quasipotential.

401

Differential Equations

ReacTran

Reactive Transport Modelling in 1d, 2d and 3d

Routines for developing models that describe reaction and advectivediffusive transport in one, two or three dimensions. Includes transport routines in porous media, in estuaries, and in bodies with variable shape.

402

Differential Equations

rODE

Ordinary Differential Equation (ODE) Solvers Written in R Using S4 Classes

Show physics, math and engineering students how an ODE solver is made and how effective R classes can be for the construction of the equations that describe natural phenomena. Inspiration for this work comes from the book on “Computer Simulations in Physics” by Harvey Gould, Jan Tobochnik, and Wolfgang Christian. Book link: http://www.compadre.org/osp/items/detail.cfm?ID=7375.

403

Differential Equations

rodeo

A Code Generator for ODEBased Models

Provides an R6 class and several utility methods to facilitate the implementation of models based on ordinary differential equations. The heart of the package is a code generator that creates compiled ‘Fortran’ (or ‘R’) code which can be passed to a numerical solver. There is direct support for solvers contained in packages ‘deSolve’ and ‘rootSolve’.

404

Differential Equations

rootSolve (core)

Nonlinear Root Finding, Equilibrium and SteadyState Analysis of Ordinary Differential Equations

Routines to find the root of nonlinear functions, and to perform steadystate and equilibrium analysis of ordinary differential equations (ODE). Includes routines that: (1) generate gradient and jacobian matrices (full and banded), (2) find roots of nonlinear equations by the ‘NewtonRaphson’ method, (3) estimate steadystate conditions of a system of (differential) equations in full, banded or sparse form, using the ‘NewtonRaphson’ method, or by dynamically running, (4) solve the steadystate conditions for uniand multicomponent 1D, 2D, and 3D partial differential equations, that have been converted to ordinary differential equations by numerical differencing (using the methodoflines approach). Includes fortran code.

405

Differential Equations

rpgm

Fast Simulation of Normal/Exponential Random Variables and Stochastic Differential Equations / Poisson Processes

Fast simulation of some random variables than the usual native functions, including rnorm() and rexp(), using Ziggurat method, reference: MARSAGLIA, George, TSANG, Wai Wan, and al. (2000) doi:10.18637/jss.v005.i08, and fast simulation of stochastic differential equations / Poisson processes.

406

Differential Equations

scaRabee

Optimization Toolkit for PharmacokineticPharmacodynamic Models

scaRabee is a port of the Scarabee toolkit originally written as a Matlabbased application. It provides a framework for simulation and optimization of pharmacokineticpharmacodynamic models at the individual and population level. It is built on top of the neldermead package, which provides the direct search algorithm proposed by Nelder and Mead for model optimization.

407

Differential Equations

sde (core)

Simulation and Inference for Stochastic Differential Equations

Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 9780387758381, Springer, NY.

408

Differential Equations

Sim.DiffProc

Simulation of Diffusion Processes

A package for symbolic and numerical computations on scalar and multivariate systems of stochastic differential equations. It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of these systems in both forms Ito and Stratonovich. Statistical analysis with Parallel MonteCarlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996).

409

Differential Equations

simecol

Simulation of Ecological (and Other) Dynamic Systems

An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individualbased (or agentbased) and other models as well. The package helps to organize scenarios (to avoid copy and paste) and aims to improve readability and usability of code.

410

Probability Distributions

actuar (core)

Actuarial Functions and Heavy Tailed Distributions

Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss amounts and loss frequency: 19 continuous heavy tailed distributions; the Poissoninverse Gaussian discrete distribution; zerotruncated and zeromodified extensions of the standard discrete distributions. Support for phasetype distributions commonly used to compute ruin probabilities.

411

Probability Distributions

AdMit

Adaptive Mixture of Studentt Distributions

Provides functions to perform the fitting of an adaptive mixture of Studentt distributions to a target density through its kernel function as described in Ardia et al. (2009) doi:10.18637/jss.v029.i03. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the MetropolisHastings algorithm to obtain quantities of interest for the target density itself.

412

Probability Distributions

agricolae

Statistical Procedures for Agricultural Research

Original idea was presented in the thesis “A statistical analysis tool for agricultural research” to obtain the degree of Master on science, National Engineering University (UNI), LimaPeru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, GraecoLatin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several nonparametric tests comparison, biodiversity indexes and consensus cluster.

413

Probability Distributions

ald

The Asymmetric Laplace Distribution

It provides the density, distribution function, quantile function, random number generator, likelihood function, moments and Maximum Likelihood estimators for a given sample, all this for the three parameter Asymmetric Laplace Distribution defined in Koenker and Machado (1999). This is a special case of the skewed family of distributions available in Galarza (2016) http://www.ime.unicamp.br/sites/default/files/rp0716.pdf useful for quantile regression.

414

Probability Distributions

AtelieR

A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests

A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (CentralLimit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).

415

Probability Distributions

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

416

Probability Distributions

benchden

28 benchmark densities from Berlinet/Devroye (1994)

Full implementation of the 28 distributions introduced as benchmarks for nonparametric density estimation by Berlinet and Devroye (1994). Includes densities, cdfs, quantile functions and generators for samples as well as additional information on features of the densities. Also contains the 4 histogram densities used in Rozenholc/Mildenberger/Gather (2010).

417

Probability Distributions

BiasedUrn

Biased Urn Model Distributions

Statistical models of biased sampling in the form of univariate and multivariate noncentral hypergeometric distributions, including Wallenius’ noncentral hypergeometric distribution and Fisher’s noncentral hypergeometric distribution (also called extended hypergeometric distribution). See vignette(“UrnTheory”) for explanation of these distributions.

418

Probability Distributions

BivarP

Estimating the Parameters of Some Bivariate Distributions

Parameter estimation of bivariate distribution functions modeled as a Archimedean copula function. The input data may contain values from right censored. Used marginal distributions are twoparameter. Methods for density, distribution, survival, random sample generation.

419

Probability Distributions

bmixture

Bayesian Estimation for Finite Mixture of Distributions

Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and tdistributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) doi:10.1007/s0018001203233 and Mohammadi and SalehiRad (2012) doi:10.1080/03610918.2011.588358.

420

Probability Distributions

bridgedist

An Implementation of the Bridge Distribution with LogitLink as in Wang and Louis (2003)

An implementation of the bridge distribution with logitlink in R. In Wang and Louis (2003) doi:10.1093/biomet/90.4.765, such a univariate bridge distribution was derived as the distribution of the random intercept that ‘bridged’ a marginal logistic regression and a conditional logistic regression. The conditional and marginal regression coefficients are a scalar multiple of each other. Such is not the case if the random intercept distribution was Gaussian.

421

Probability Distributions

CDVine

Statistical Inference of C And DVine Copulas

Functions for statistical inference of canonical vine (Cvine) and Dvine copulas. Tools for bivariate exploratory data analysis and for bivariate as well as vine copula selection are provided. Models can be estimated either sequentially or by joint maximum likelihood estimation. Sampling algorithms and plotting methods are also included. Data is assumed to lie in the unit hypercube (socalled copula data).

422

Probability Distributions

cmvnorm

The Complex Multivariate Gaussian Distribution

Various utilities for the complex multivariate Gaussian distribution.

423

Probability Distributions

coga

Convolution of Gamma Distributions

Convolution of gamma distributions in R. The convolution of gamma distributions is the sum of series of gamma distributions and all gamma distributions here can have different parameters. This package can calculate density, distribution function and do simulation work.

424

Probability Distributions

CompGLM

ConwayMaxwellPoisson GLM and distribution functions

The package contains a function (which uses a similar interface to the ‘glm’ function) for the fitting of a ConwayMaxwellPoisson GLM. There are also various methods for analysis of the model fit. The package also contains functions for the ConwayMaxwellPoisson distribution in a similar interface to functions ‘dpois’, ‘ppois’ and ‘rpois’. The functions are generally quick, since the workhorse functions are written in C++ (thanks to the Rcpp package).

425

Probability Distributions

CompLognormal

Functions for actuarial scientists

Computes the probability density function, cumulative distribution function, quantile function, random numbers of any composite model based on the lognormal distribution.

426

Probability Distributions

compoisson

ConwayMaxwellPoisson Distribution

Provides routines for density and moments of the ConwayMaxwellPoisson distribution as well as functions for fitting the COMPoisson model for over/underdispersed count data.

427

Probability Distributions

Compositional

Compositional Data Analysis

Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. The standard textbook for such data is John Aitchison’s (1986). “The statistical analysis of compositional data”. Chapman & Hall.

428

Probability Distributions

Compounding

Computing Continuous Distributions

Computing Continuous Distributions Obtained by Compounding a Continuous and a Discrete Distribution

429

Probability Distributions

CompQuadForm

Distribution Function of Quadratic Forms in Normal Variables

Computes the distribution function of quadratic forms in normal variables using Imhof’s method, Davies’s algorithm, Farebrother’s algorithm or Liu et al.’s algorithm.

430

Probability Distributions

condMVNorm

Conditional Multivariate Normal Distribution

Computes conditional multivariate normal probabilities, random deviates and densities.

431

Probability Distributions

copBasic

General Bivariate Copula Theory and Many Utility Functions

Extensive functions for bivariate copula (bicopula) computations and related operations concerning oft cited bicopula theory described by Nelsen (2006), Joe (2014), and other selected works. The lower, upper, product, and select other bicopula are implemented. Arbitrary bicopula expressions include the diagonal, survival copula, the dual of a copula, cocopula, numerical bicopula density, and maximum likelihood estimation. Level curves (sets), horizontal and vertical sections also are supported. Numerical derivatives and inverses of a bicopula are provided; simulation by the conditional distribution method thus is supported. Bicopula composition, convex combination, and products are provided. Support extends to Kendall Function as well as the Lmoments thereof, Kendall Tau, Spearman Rho and Footrule, Gini Gamma, Blomqvist Beta, Hoeffding Phi, SchweizerWolff Sigma, tail dependency (including pseudopolar representation) and tail order, skewness, and bivariate Lmoments. Evaluators of positively/negatively quadrant dependency, left increasing and right decreasing are available. KullbackLeibler divergence, Vuong’s procedure, Spectral Measure, and Lcomoments for copula inference are available. Quantile and median regressions for V with respect to U and U with respect to V are available. Empirical copulas (EC) are supported.

432

Probability Distributions

copula (core)

Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extremevalue and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extremevalue dependence, goodnessoffit) and model selection based on crossvalidation. Empirical copula, smoothed versions, and nonparametric estimators of the Pickands dependence function.

433

Probability Distributions

csn

Closed SkewNormal Distribution

Provides functions for computing the density and the loglikelihood function of closedskew normal variates, and for generating random vectors sampled from this distribution. See GonzalezFarias, G., DominguezMolina, J., and Gupta, A. (2004). The closed skew normal distribution, Skewelliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 2542.

434

Probability Distributions

Davies

The Davies Quantile Function

Various utilities for the Davies distribution.

435

Probability Distributions

degreenet

Models for Skewed Count Distributions Relevant to Networks

Likelihoodbased inference for skewed count distributions used in network modeling. “degreenet” is a part of the “statnet” suite of packages for network analysis.

436

Probability Distributions

Delaporte

Statistical Functions for the Delaporte Distribution

Provides probability mass, distribution, quantile, randomvariate generation, and methodofmoments parameterestimation functions for the Delaporte distribution. The Delaporte is a discrete probability distribution which can be considered the convolution of a negative binomial distribution with a Poisson distribution. Alternatively, it can be considered a counting distribution with both Poisson and negative binomial components. It has been studied in actuarial science as a frequency distribution which has more variability than the Poisson, but less than the negative binomial.

437

Probability Distributions

denstrip

Density strips and other methods for compactly illustrating distributions

Graphical methods for compactly illustrating probability distributions, including density strips, density regions, sectioned density plots and varying width strips.

438

Probability Distributions

dirmult

Estimation in DirichletMultinomial distribution

Estimate parameters in DirichletMultinomial and compute profile loglikelihoods.

439

Probability Distributions

disclap

Discrete Laplace Exponential Family

Discrete Laplace exponential family for models such as a generalized linear model

440

Probability Distributions

DiscreteInverseWeibull

Discrete Inverse Weibull Distribution

Probability mass function, distribution function, quantile function, random generation and parameter estimation for the discrete inverse Weibull distribution.

441

Probability Distributions

DiscreteLaplace

Discrete Laplace Distributions

Probability mass function, distribution function, quantile function, random generation and estimation for the skew discrete Laplace distributions.

442

Probability Distributions

DiscreteWeibull

Discrete Weibull Distributions (Type 1 and 3)

Probability mass function, distribution function, quantile function, random generation and parameter estimation for the type I and III discrete Weibull distributions.

443

Probability Distributions

distr (core)

Object Oriented Implementation of Distributions

S4classes and methods for distributions.

444

Probability Distributions

distrDoc

Documentation for ‘distr’ Family of R Packages

Provides documentation in form of a common vignette to packages ‘distr’, ‘distrEx’, ‘distrMod’, ‘distrSim’, ‘distrTEst’, ‘distrTeach’, and ‘distrEllipse’.

445

Probability Distributions

distrEllipse

S4 Classes for Elliptically Contoured Distributions

Distribution (S4)classes for elliptically contoured distributions (based on package ‘distr’).

446

Probability Distributions

distrEx

Extensions of Package ‘distr’

Extends package ‘distr’ by functionals, distances, and conditional distributions.

447

Probability Distributions

DistributionUtils

Distribution Utilities

This package contains utilities which are of use in the packages I have developed for dealing with distributions. Currently these packages are GeneralizedHyperbolic, VarianceGamma, and SkewHyperbolic and NormalLaplace. Each of these packages requires DistributionUtils. Functionality includes sample skewness and kurtosis, loghistogram, tail plots, moments by integration, changing the point about which a moment is calculated, functions for testing distributions using inversion tests and the Massart inequality. Also includes an implementation of the incomplete Bessel K function.

448

Probability Distributions

distrMod

Object Oriented Implementation of Probability Models

Implements S4 classes for probability models based on packages ‘distr’ and ‘distrEx’.

449

Probability Distributions

distrSim

Simulation Classes Based on Package ‘distr’

S4classes for setting up a coherent framework for simulation within the distr family of packages.

450

Probability Distributions

distrTeach

Extensions of Package ‘distr’ for Teaching Stochastics/Statistics in Secondary School

Provides flexible examples of LLN and CLT for teaching purposes in secondary school.

451

Probability Distributions

distrTEst

Estimation and Testing Classes Based on Package ‘distr’

Evaluation (S4)classes based on package distr for evaluating procedures (estimators/tests) at data/simulation in a unified way.

452

Probability Distributions

e1071

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

453

Probability Distributions

emdbook

Support Functions and Data for “Ecological Models and Data”

Auxiliary functions and data sets for “Ecological Models and Data”, a book presenting maximum likelihood estimation and related topics for ecologists (ISBN 9780691125220).

454

Probability Distributions

emg

Exponentially Modified Gaussian (EMG) Distribution

Provides basic distribution functions for a mixture model of a Gaussian and exponential distribution.

455

Probability Distributions

EnvStats

Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous builtin data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 9781461484554, http://www.springer.com/book/9781461484554).

456

Probability Distributions

evd

Functions for Extreme Value Distributions

Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.

457

Probability Distributions

evdbayes

Bayesian Analysis in Extreme Value Theory

Provides functions for the bayesian analysis of extreme value models, using MCMC methods.

458

Probability Distributions

evir

Extreme Values in R

Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.

459

Probability Distributions

ExtDist

Extending the Range of Functions for Probability Distributions

A consistent, unified and extensible framework for estimation of parameters for probability distributions, including parameter estimation procedures that allow for weighted samples; the current set of distributions included are: the standard beta, The fourparameter beta, Burr, gamma, Gumbel, Johnson SB and SU, Laplace, logistic, normal, symmetric truncated normal, truncated normal, symmetricreflected truncated beta, standard symmetricreflected truncated beta, triangular, uniform, and Weibull distributions; decision criteria and selections based on these decision criteria.

460

Probability Distributions

extraDistr

Additional Univariate and Multivariate Distributions

Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, betabinomial, betanegative binomial, beta prime, Bhattacharjee, BirnbaumSaunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichletmultinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gammaPoisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, halfCauchy, halfnormal, halft, Huber density, inverse chisquared, inversegamma, Kumaraswamy, Laplace, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, nonstandard t, nonstandard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zeroinflated binomial, zeroinflated negative binomial, zeroinflated Poisson.

461

Probability Distributions

extremefit

Estimation of Extreme Conditional Quantiles and Probabilities

Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.

462

Probability Distributions

FAdist

Distributions that are Sometimes Used in Hydrology

Probability distributions that are sometimes useful in hydrology.

463

Probability Distributions

FatTailsR

Kiener Distributions and Fat Tails in Finance

Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in market finance, neuroscience and other disciplines. Two algorithms to estimate with a high accuracy distribution parameters, quantiles, valueatrisk and expected shortfall. Include power hyperbolas and power hyperbolic functions.

464

Probability Distributions

fBasics

Rmetrics  Markets and Basic Statistics

Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.

465

Probability Distributions

fCopulae (core)

Rmetrics  Bivariate Dependence Structures with Copulae

Provides a collection of functions to manage, to investigate and to analyze bivariate financial returns by Copulae. Included are the families of Archemedean, Elliptical, Extreme Value, and Empirical Copulae.

466

Probability Distributions

fExtremes

Rmetrics  Modelling Extreme Events in Finance

Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data preprocessing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.

467

Probability Distributions

fgac

Generalized Archimedean Copula

Bivariate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).

468

Probability Distributions

fitdistrplus

Help to Fit of a Parametric Distribution to NonCensored or Censored Data

Extends the fitdistr() function (of the MASS package) with several functions to help the fit of a parametric distribution to noncensored or censored data. Censored data may contain left censored, right censored and interval censored values, with several lower and upper bounds. In addition to maximum likelihood estimation (MLE), the package provides moment matching (MME), quantile matching (QME) and maximum goodnessoffit estimation (MGE) methods (available only for noncensored data). Weighted versions of MLE, MME and QME are available.

469

Probability Distributions

flexsurv

Flexible Parametric Survival and MultiState Models

Flexible parametric models for timetoevent data, including the RoystonParmar spline model, generalized gamma and generalized F distributions. Any userdefined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multistate models.

470

Probability Distributions

FMStable

Finite Moment Stable Distributions

This package implements some basic procedures for dealing with log maximally skew stable distributions, which are also called finite moment log stable distributions.

471

Probability Distributions

fpow

Computing the noncentrality parameter of the noncentral F distribution

Returns the noncentrality parameter of the noncentral F distribution if probability of type I and type II error, degrees of freedom of the numerator and the denominator are given. It may be useful for computing minimal detectable differences for general ANOVA models. This program is documented in the paper of A. Baharev, S. Kemeny, On the computation of the noncentral F and noncentral beta distribution; Statistics and Computing, 2008, 18 (3), 333340.

472

Probability Distributions

frmqa

The Generalized Hyperbolic Distribution, Related Distributions and Their Applications in Finance

A collection of R and C++ functions to work with the generalized hyperbolic distribution, related distributions and their applications in financial risk management and quantitative analysis.

473

Probability Distributions

gambin

Fit the Gambin Model to Species Abundance Distributions

Fits unimodal and multimodal gambin distributions to speciesabundance distributions from ecological data. ‘gambin’ is short for ‘gammabinomial’. The main function is fit_abundances(), which estimates the ‘alpha’ parameter(s) of the gambin distribution using maximum likelihood. Functions are also provided to generate the gambin distribution and for calculating likelihood statistics.

474

Probability Distributions

gamlss.dist (core)

Distributions for Generalized Additive Models for Location Scale and Shape

A set of distributions which can be used for modelling the response variables in Generalized Additive Models for Location Scale and Shape, Rigby and Stasinopoulos (2005), doi:10.1111/j.14679876.2005.00510.x. The distributions can be continuous, discrete or mixed distributions. Extra distributions can be created, by transforming, any continuous distribution defined on the real line, to a distribution defined on ranges 0 to infinity or 0 to 1, by using a “log” or a “logit” transformation respectively.

475

Probability Distributions

gamlss.mx

Fitting Mixture Distributions with GAMLSS

The main purpose of this package is to allow fitting of mixture distributions with GAMLSS models.

476

Probability Distributions

gaussDiff

Difference measures for multivariate Gaussian probability density functions

A collection difference measures for multivariate Gaussian probability density functions, such as the Euclidea mean, the Mahalanobis distance, the KullbackLeibler divergence, the JCoefficient, the Minkowski L2distance, the Chisquare divergence and the Hellinger Coefficient.

477

Probability Distributions

gb

Generalize Lambda Distribution and Generalized Bootstrapping

This package collects algorithms and functions for fitting data to a generalized lambda distribution via moment matching methods, and generalized bootstrapping.

478

Probability Distributions

GB2

Generalized Beta Distribution of the Second Kind: Properties, Likelihood, Estimation

Package GB2 explores the Generalized Beta distribution of the second kind. Density, cumulative distribution function, quantiles and moments of the distributions are given. Functions for the full loglikelihood, the profile loglikelihood and the scores are provided. Formulas for various indicators of inequality and poverty under the GB2 are implemented. The GB2 is fitted by the methods of maximum pseudolikelihood estimation using the full and profile loglikelihood, and nonlinear least squares estimation of the model parameters. Various plots for the visualization and analysis of the results are provided. Variance estimation of the parameters is provided for the method of maximum pseudolikelihood estimation. A mixture distribution based on the compounding property of the GB2 is presented (denoted as “compound” in the documentation). This mixture distribution is based on the discretization of the distribution of the underlying random scale parameter. The discretization can be left or right tail. Density, cumulative distribution function, moments and quantiles for the mixture distribution are provided. The compound mixture distribution is fitted using the method of maximum pseudolikelihood estimation. The fit can also incorporate the use of auxiliary information. In this new version of the package, the mixture case is complemented with new functions for variance estimation by linearization and comparative density plots.

479

Probability Distributions

GenBinomApps

ClopperPearson Confidence Interval and Generalized Binomial Distribution

Density, distribution function, quantile function and random generation for the Generalized Binomial Distribution. Functions to compute the ClopperPearson Confidence Interval and the required sample size. Enhanced model for burnin studies, where failures are tackled by countermeasures.

480

Probability Distributions

GeneralizedHyperbolic

The Generalized Hyperbolic Distribution

This package provides functions for the hyperbolic and related distributions. Density, distribution and quantile functions and random number generation are provided for the hyperbolic distribution, the generalized hyperbolic distribution, the generalized inverse Gaussian distribution and the skewLaplace distribution. Additional functionality is provided for the hyperbolic distribution, normal inverse Gaussian distribution and generalized inverse Gaussian distribution, including fitting of these distributions to data. Linear models with hyperbolic errors may be fitted using hyperblmFit.

481

Probability Distributions

GenOrd

Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions

A gaussian copula based procedure for generating samples from discrete random variables with prescribed correlation matrix and marginal distributions.

482

Probability Distributions

geoR

Analysis of Geostatistical Data

Geostatistical analysis including traditional, likelihoodbased and Bayesian methods.

483

Probability Distributions

ghyp

A Package on Generalized Hyperbolic Distribution and Its Special Cases

Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Studentt and Gaussian distribution). Especially, it contains fitting procedures, an AICbased model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution.

484

Probability Distributions

GIGrvg

Random Variate Generator for the GIG Distribution

Generator and density function for the Generalized Inverse Gaussian (GIG) distribution.

485

Probability Distributions

gld

Estimation and Use of the Generalised (Tukey) Lambda Distribution

The generalised lambda distribution, or Tukey lambda distribution, provides a wide variety of shapes with one functional form. This package provides random numbers, quantiles, probabilities, densities and density quantiles for four different parameterisations of the distribution. It provides the density function, distribution function, and QuantileQuantile plots. It implements a variety of estimation methods for the distribution, including diagnostic plots. Estimation methods include the starship (all 4 parameterisations) and a number of methods for only the FKML parameterisation. These include maximum likelihood, maximum product of spacings, Titterington’s method, Moments, LMoments, Trimmed LMoments and Distributional Least Absolutes.

486

Probability Distributions

GLDEX

Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods

The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KSresample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.

487

Probability Distributions

glogis

Fitting and Testing Generalized Logistic Distributions

Tools for the generalized logistic distribution (Type I, also known as skewlogistic distribution), encompassing basic distribution functions (p, q, d, r, score), maximum likelihood estimation, and structural change methods.

488

Probability Distributions

GMD

Generalized Minimum Distance of distributions

GMD is a package for nonparametric distance measurement between two discrete frequency distributions.

489

Probability Distributions

GSM

Gamma Shape Mixture

Implementation of a Bayesian approach for estimating a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavytailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter.

490

Probability Distributions

gumbel

The GumbelHougaard Copula

Provides probability functions (cumulative distribution and density functions), simulation function (Gumbel copula multivariate simulation) and estimation functions (Maximum Likelihood Estimation, Inference For Margins, Moment Based Estimation and Canonical Maximum Likelihood).

491

Probability Distributions

HAC

Estimation, Simulation and Visualization of Hierarchical Archimedean Copulae (HAC)

Package provides the estimation of the structure and the parameters, sampling methods and structural plots of Hierarchical Archimedean Copulae (HAC).

492

Probability Distributions

hermite

Generalized Hermite Distribution

Probability functions and other utilities for the generalized Hermite distribution.

493

Probability Distributions

HI

Simulation from distributions supported by nested hyperplanes

Simulation from distributions supported by nested hyperplanes, using the algorithm described in Petris & Tardella, “A geometric approach to transdimensional Markov chain Monte Carlo”, Canadian Journal of Statistics, v.31, n.4, (2003). Also random direction multivariate Adaptive Rejection Metropolis Sampling.

494

Probability Distributions

HistogramTools

Utility Functions for R Histograms

Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.

495

Probability Distributions

hyper2

The Hyperdirichlet Distribution, Mark 2

A suite of routines for the hyperdirichlet distribution; supersedes the hyperdirichlet package for most purposes.

496

Probability Distributions

HyperbolicDist

The hyperbolic distribution

This package provides functions for the hyperbolic and related distributions. Density, distribution and quantile functions and random number generation are provided for the hyperbolic distribution, the generalized hyperbolic distribution, the generalized inverse Gaussian distribution and the skewLaplace distribution. Additional functionality is provided for the hyperbolic distribution, including fitting of the hyperbolic to data.

497

Probability Distributions

ihs

Inverse Hyperbolic Sine Distribution

Density, distribution function, quantile function and random generation for the inverse hyperbolic sine distribution. This package also provides a function that can fit data to the inverse hyperbolic sine distribution using maximum likelihood estimation.

498

Probability Distributions

kernelboot

Smoothed Bootstrap and Random Generation from Kernel Densities

Smoothed bootstrap and functions for random generation from univariate and multivariate kernel densities. It does not estimate kernel densities.

499

Probability Distributions

kolmim

An Improved Evaluation of Kolmogorov’s Distribution

Provides an alternative, more efficient evaluation of extreme probabilities of Kolmogorov’s goodnessoffit measure, Dn, when compared to the original implementation of Wang, Marsaglia, and Tsang. These probabilities are used in KolmogorovSmirnov tests when comparing two samples.

500

Probability Distributions

KScorrect

LillieforsCorrected KolmogorovSmirnoff GoodnessofFit Tests

Implements the Lillieforscorrected KolmogorovSmirnoff test for use in goodnessoffit tests, suitable when population parameters are unknown and must be estimated by sample statistics. Pvalues are estimated by simulation. Can be used with a variety of continuous distributions, including normal, lognormal, univariate mixtures of normals, uniform, loguniform, exponential, gamma, and Weibull distributions. Functions to generate random numbers and calculate density, distribution, and quantile functions are provided for use with the log uniform and mixture distributions.

501

Probability Distributions

LambertW

Probabilistic Models to Analyze and Gaussianize HeavyTailed, Skewed Data

Lambert W x F distributions are a generalized framework to analyze skewed, heavytailed data. It is based on an input/output system, where the output random variable (RV) Y is a nonlinearly transformed version of an input RV X ~ F with similar properties as X, but slightly skewed (heavytailed). The transformed RV Y has a Lambert W x F distribution. This package contains functions to model and analyze skewed, heavytailed data the Lambert Way: simulate random samples, estimate parameters, compute quantiles, and plot/ print results nicely. Probably the most important function is ‘Gaussianize’, which works similarly to ‘scale’, but actually makes the data Gaussian. A doityourself toolkit allows users to define their own Lambert W x ‘MyFavoriteDistribution’ and use it in their analysis right away.

502

Probability Distributions

LearnBayes

Functions for Learning Bayesian Inference

LearnBayes contains a collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.

503

Probability Distributions

lhs

Latin Hypercube Samples

Provides a number of methods for creating and augmenting Latin Hypercube Samples.

504

Probability Distributions

LIHNPSD

Poisson Subordinated Distribution

A Poisson Subordinated Distribution to capture major leptokurtic features in logreturn time series of financial data.

505

Probability Distributions

lmom

LMoments

Functions related to Lmoments: computation of Lmoments and trimmed Lmoments of distributions and data samples; parameter estimation; Lmoment ratio diagram; plot vs. quantiles of an extremevalue distribution.

506

Probability Distributions

lmomco (core)

LMoments, Censored LMoments, Trimmed LMoments, LComoments, and Many Distributions

Extensive functions for Lmoments (LMs) and probabilityweighted moments (PWMs), parameter estimation for distributions, LM computation for distributions, and Lmoment ratio diagrams. Maximum likelihood and maximum product of spacings estimation are also available. LMs for righttail and lefttail censoring by known or unknown threshold and by indicator variable are available. Asymmetric (asy) trimmed LMs (TLmoments, TLMs) are supported. LMs of residual (resid) and reversed (rev) resid life are implemented along with 13 quantile function operators for reliability and survival analyses. Exact analytical bootstrap estimates of order statistics, LMs, and variances covariances of LMs are provided. The HarriCoble Tau34squared Normality Test is available. Distribution support with “L” (LMs), “TL” (TLMs) and added (+) support for righttail censoring (RC) encompasses: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], EtaMu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L,TL], Gen Logistic [L), Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], KappaMu [L], Kumaraswamy [L], Laplace [L], Linear Mean Resid. Quantile Function [L], Normal [L], 3p logNormal [L], Pearson Type III [L], Rayleigh [L], RevGumbel [L+RC], Rice/Rician [L], Slash [TL], 3p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L]. Multivariate sample Lcomoments (LCMs) are implemented to measure asymmetric associations.

507

Probability Distributions

Lmoments

LMoments and Quantile Mixtures

Contains functions to estimate Lmoments and trimmed Lmoments from the data. Also contains functions to estimate the parameters of the normal polynomial quantile mixture and the Cauchy polynomial quantile mixture from Lmoments and trimmed Lmoments.

508

Probability Distributions

logitnorm

Functions for the Logitnormal Distribution

Density, distribution, quantile and random generation function for the logitnormal distribution. Estimation of the mode and the first two moments. Estimation of distribution parameters.

509

Probability Distributions

loglognorm

Double log normal distribution functions

r,d,p,q functions for the double log normal distribution

510

Probability Distributions

marg

Approximate marginal inference for regressionscale models

Likelihood inference based on higher order approximations for linear nonnormal regression models

511

Probability Distributions

MASS

Support Functions and Datasets for Venables and Ripley’s MASS

Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).

512

Probability Distributions

mbbefd

Maxwell Boltzmann Bose Einstein Fermi Dirac Distribution and Destruction Rate Modelling

Distributions that are typically used for exposure rating in general insurance, in particular to price reinsurance contracts. The vignettes show code snippets to fit the distribution to empirical data.

513

Probability Distributions

mc2d

Tools for TwoDimensional MonteCarlo Simulations

A complete framework to build and study TwoDimensional MonteCarlo simulations, aka SecondOrder MonteCarlo simulations. Also includes various distributions (pert, triangular, Bernoulli, empirical discrete and continuous).

514

Probability Distributions

mclust

Gaussian Mixture Modelling for ModelBased Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for modelbased clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resamplingbased inference.

515

Probability Distributions

MCMCpack

Markov Chain Monte Carlo (MCMC) Package

Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return coda mcmc objects that can then be summarized using the coda package. Some useful utility functions such as density functions, pseudorandom number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.

516

Probability Distributions

mgpd

mgpd: Functions for multivariate generalized Pareto distribution (MGPD of Type II)

Extends distribution and density functions to parametric multivariate generalized Pareto distributions (MGPD of Type II), and provides fitting functions which calculate maximum likelihood estimates for bivariate and trivariate models. (Help is under progress)

517

Probability Distributions

minimax

Minimax distribution family

The minimax family of distributions is a twoparameter family like the beta family, but computationally a lot more tractible.

518

Probability Distributions

MitISEM

Mixture of Student t Distributions using Importance Sampling and Expectation Maximization

Flexible multivariate function approximation using adapted Mixture of Student t Distributions. Mixture of t distribution is obtained using Importance Sampling weighted Expectation Maximization algorithm.

519

Probability Distributions

MittagLeffleR

The MittagLeffler Distribution

Calculates MittagLeffler probabilities and the MittagLeffler function, generates MittagLeffler random variables, and fits the MittagLeffler distribution to data. Based on the algorithm by Garrappa, R. (2015) doi:10.1137/140971191.

520

Probability Distributions

MixedTS

Mixed Tempered Stable Distribution

We provide detailed functions for univariate Mixed Tempered Stable distribution.

521

Probability Distributions

mixtools

Tools for Analyzing Finite Mixture Models

Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixturesofregressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictordependent mixing proportions, random effects regressions, hierarchical mixturesofexperts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic and model selection criteria). Bayesian estimation of mixturesoflinearregressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES0518772.

522

Probability Distributions

MM

The multiplicative multinomial distribution

Description: Various utilities for the Multiplicative Multinomial distribution

523

Probability Distributions

mnormpow

Multivariate Normal Distributions with Power Integrand

Computes integral of f(x)*x_i^k on a product of intervals, where f is the density of a gaussian law. This a is small alteration of the mnormt code from A. Genz and A. Azzalini.

524

Probability Distributions

mnormt (core)

The Multivariate Normal and t Distributions

Functions are provided for computing the density and the distribution function of multivariate normal and “t” random variables, and for generating random vectors sampled from these distributions. Probabilities are computed via nonMonte Carlo methods; different routines are used in the case d=1, d=2, d>2, if d denotes the number of dimensions.

525

Probability Distributions

modeest

Mode Estimation

This package provides estimators of the mode of univariate unimodal data or univariate unimodal distributions

526

Probability Distributions

moments

Moments, cumulants, skewness, kurtosis and related tests

Functions to calculate: moments, Pearson’s kurtosis, Geary’s kurtosis and skewness; tests related to them (AnscombeGlynn, D’Agostino, BonettSeier).

527

Probability Distributions

movMF

Mixtures of von MisesFisher Distributions

Fit and simulate mixtures of von MisesFisher distributions.

528

Probability Distributions

msm

MultiState Markov and Hidden Markov Models in Continuous Time

Functions for fitting continuoustime Markov and hidden Markov multistate models to longitudinal data. Designed for processes observed at arbitrary times in continuous time (panel data) but some other observation schemes are supported. Both Markov transition rates and the hidden Markov output process can be modelled in terms of covariates, which may be constant or piecewiseconstant in time.

529

Probability Distributions

mvprpb

Orthant Probability of the Multivariate Normal Distribution

Computes orthant probabilities multivariate normal distribution.

530

Probability Distributions

mvrtn

Mean and Variance of Truncated Normal Distribution

Mean, variance, and random variates for left/right truncated normal distributions.

531

Probability Distributions

mvtnorm (core)

Multivariate Normal and t Distributions

Computes multivariate normal and t probabilities, quantiles, random deviates and densities.

532

Probability Distributions

nCDunnett

Noncentral Dunnett’s Test Distribution

Computes the noncentral Dunnett’s test distribution (pdf, cdf and quantile) and generates random numbers.

533

Probability Distributions

Newdistns

Computes Pdf, Cdf, Quantile and Random Numbers, Measures of Inference for 19 General Families of Distributions

Computes the probability density function, cumulative distribution function, quantile function, random numbers and measures of inference for the following general families of distributions (each family defined in terms of an arbitrary cdf G): Marshall Olkin G distributions, exponentiated G distributions, beta G distributions, gamma G distributions, Kumaraswamy G distributions, generalized beta G distributions, beta extended G distributions, gamma G distributions, gamma uniform G distributions, beta exponential G distributions, Weibull G distributions, log gamma G I distributions, log gamma G II distributions, exponentiated generalized G distributions, exponentiated Kumaraswamy G distributions, geometric exponential Poisson G distributions, truncatedexponential skewsymmetric G distributions, modified beta G distributions, and exponentiated exponential Poisson G distributions.

534

Probability Distributions

nor1mix

Normal (1d) Mixture Models (S3 Classes and Methods)

Onedimensional Normal Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used MarronWand densities. Efficient random number generation and graphics; now fitting to data by ML (Maximum Likelihood) or EM estimation.

535

Probability Distributions

NormalGamma

Normalgamma convolution model

The functions proposed in this package compute the density of the sum of a Gaussian and a gamma random variables, estimate the parameters and correct the noise effect in a gammasignal and Gaussiannoise model. This package has been used to implement the background correction method for Illumina microarray data presented in Plancade S., Rozenholc Y. and Lund E. “Generalization of the normalexponential model : exploration of a more accurate parameterization for the signal distribution on Illumina BeadArrays”, BMC Bioinfo 2012, 13(329).

536

Probability Distributions

NormalLaplace

The Normal Laplace Distribution

This package provides functions for the normal Laplace distribution. It is currently under development and provides only limited functionality. Density, distribution and quantile functions, random number generation, and moments are provided.

537

Probability Distributions

normalp

Routines for Exponential Power Distribution

Collection of utilities referred to Exponential Power distribution, also known as General Error Distribution (see Mineo, A.M. and Ruggieri, M. (2005), A software Tool for the Exponential Power Distribution: The normalp package. In Journal of Statistical Software, Vol. 12, Issue 4)

538

Probability Distributions

npde

Normalised prediction distribution errors for nonlinear mixedeffect models

Routines to compute normalised prediction distribution errors, a metric designed to evaluate nonlinear mixed effect models such as those used in pharmacokinetics and pharmacodynamics

539

Probability Distributions

ORDER2PARENT

Estimate parent distributions with data of several order statistics

This package uses Bspline based nonparametric smooth estimators to estimate parent distributions given observations on multiple order statistics.

540

Probability Distributions

OrdNor

Concurrent Generation of Ordinal and Normal Data with Given Correlation Matrix and Marginal Distributions

Implementation of a procedure for generating samples from a mixed distribution of ordinal and normal random variables with prespecified correlation matrix and marginal distributions.

541

Probability Distributions

ParetoPosStable

Computing, Fitting and Validating the PPS Distribution

Statistical functions to describe a Pareto Positive Stable (PPS) distribution and fit it to real data. Graphical and statistical tools to validate the fits are included.

542

Probability Distributions

PDQutils

PDQ Functions via Gram Charlier, Edgeworth, and Cornish Fisher Approximations

A collection of tools for approximating the ‘PDQ’ functions (respectively, the cumulative distribution, density, and quantile) of probability distributions via classical expansions involving moments and cumulants.

543

Probability Distributions

PearsonDS (core)

Pearson Distribution System

Implementation of the Pearson distribution system, including full support for the (d,p,q,r)family of functions for probability distributions and fitting via method of moments and maximum likelihood method.

544

Probability Distributions

PhaseType

Inference for Phasetype Distributions

Functions to perform Bayesian inference on absorption time data for Phasetype distributions. Plans to expand this to include frequentist inference and simulation tools.

545

Probability Distributions

poibin

The Poisson Binomial Distribution

This package implements both the exact and approximation methods for computing the cdf of the Poisson binomial distribution. It also provides the pmf, quantile function, and random number generation for the Poisson binomial distribution.

546

Probability Distributions

poilog

Poisson lognormal and bivariate Poisson lognormal distribution

Functions for obtaining the density, random deviates and maximum likelihood estimates of the Poisson lognormal distribution and the bivariate Poisson lognormal distribution.

547

Probability Distributions

poistweedie

PoissonTweedie exponential family models

Simulation of models PoissonTweedie.

548

Probability Distributions

polyaAeppli

Implementation of the PolyaAeppli distribution

Functions for evaluating the mass density, cumulative distribution function, quantile function and random variate generation for the PolyaAeppli distribution, also known as the geometric compound Poisson distribution.

549

Probability Distributions

poweRlaw

Analysis of Heavy Tailed Distributions

An implementation of maximum likelihood estimators for a variety of heavy tailed distributions, including both the discrete and continuous power law distributions. Additionally, a goodnessoffit based approach is used to estimate the lower cutoff for the scaling region.

550

Probability Distributions

qmap

Statistical Transformations for PostProcessing Climate Model Output

Empirical adjustment of the distribution of variables originating from (regional) climate model simulations using quantile mapping.

551

Probability Distributions

QRM

Provides RLanguage Code to Examine Quantitative Risk Management Concepts

Accompanying package to the book Quantitative Risk Management: Concepts, Techniques and Tools by Alexander J. McNeil, Rudiger Frey, and Paul Embrechts.

552

Probability Distributions

randaes

Random number generator based on AES cipher

The deterministic part of the Fortuna cryptographic pseudorandom number generator, described by Schneier & Ferguson “Practical Cryptography”

553

Probability Distributions

random

True Random Numbers using RANDOM.ORG

The true random number service provided by the RANDOM.ORG website created by Mads Haahr samples atmospheric noise via radio tuned to an unused broadcasting frequency together with a skew correction algorithm due to John von Neumann. More background is available in the included vignette based on an essay by Mads Haahr. In its current form, the package offers functions to retrieve random integers, randomized sequences and random strings.

554

Probability Distributions

randtoolbox

Toolbox for Pseudo and Quasi Random Number Generation and RNG Tests

Provides (1) pseudo random generators  general linear congruential generators, multiple recursive generators and generalized feedback shift register (SFMersenne Twister algorithm and WELL generators); (2) quasi random generators  the Torus algorithm, the Sobol sequence, the Halton sequence (including the Van der Corput sequence) and (3) some RNG tests  the gap test, the serial test, the poker test. The package depends on rngWELL package but it can be provided without this dependency on demand to the maintainer. For true random number generation, use the ‘random’ package, for Latin Hypercube Sampling (a hybrid QMC method), use the ‘lhs’ package. A number of RNGs and tests for RNGs are also provided by ‘RDieHarder’, all available on CRAN. There is also a small standalone package ‘rngwell19937’ for the WELL19937a RNG.

555

Probability Distributions

RDieHarder

R interface to the dieharder RNG test suite

The RDieHarder packages provides an R interface to the dieharder suite of random number generators and tests that was developed by Robert G. Brown and David Bauer, extending earlier work by George Marsaglia and others.

556

Probability Distributions

ReIns

Functions from “Reinsurance: Actuarial and Statistical Aspects”

Functions from the book “Reinsurance: Actuarial and Statistical Aspects” (2017) by Hansjoerg Albrecher, Jan Beirlant and Jef Teugels http://wiley.com/WileyCDA/WileyTitle/productCd0470772689.html.

557

Probability Distributions

reliaR (core)

Package for some probability distributions

A collection of utilities for some reliability models/probability distributions.

558

Probability Distributions

Renext

Renewal Method for Extreme Values Extrapolation

Peaks Over Threshold (POT) or ‘methode du renouvellement’. The distribution for the exceedances can be chosen, and heterogeneous data (including historical data or block data) can be used in a MaximumLikelihood framework.

559

Probability Distributions

retimes

Reaction Time Analysis

Reaction time analysis by maximum likelihood

560

Probability Distributions

revdbayes

RatioofUniforms Sampling for Bayesian Extreme Value Analysis

Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package https://cran.rproject.org/package=rust is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package https://cran.rproject.org/package=evdbayes, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the Kgaps model of Suveges and Davison (2010) doi:10.1214/09AOAS292. See the ‘revdbayes’ website for more information, documentation and examples.

561

Probability Distributions

rlecuyer

R Interface to RNG with Multiple Streams

Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L’Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.

562

Probability Distributions

RMKdiscrete

Sundry Discrete Probability Distributions

Sundry discrete probability distributions and helper functions.

563

Probability Distributions

RMTstat

Distributions, Statistics and Tests derived from Random Matrix Theory

Functions for working with the TracyWidom laws and other distributions related to the eigenvalues of large Wishart matrices. The tables for computing the TracyWidom densities and distribution functions were computed by Momar Dieng’s MATLAB package “RMLab” (formerly available on his homepage at http://math.arizona.edu/~momar/research.htm ). This package is part of a collaboration between Iain Johnstone, Zongming Ma, Patrick Perry, and Morteza Shahram. It will soon be replaced by a package with more accuracy and builtin support for relevant statistical tests.

564

Probability Distributions

rngwell19937

Random number generator WELL19937a with 53 or 32 bit output

Long period linear random number generator WELL19937a by F. Panneton, P. L’Ecuyer and M. Matsumoto. The initialization algorithm allows to seed the generator with a numeric vector of an arbitrary length and uses MRG32k5a by P. L’Ecuyer to achieve good quality of the initialization. The output function may be set to provide numbers from the interval (0,1) with 53 (the default) or 32 random bits. WELL19937a is of similar type as Mersenne Twister and has the same period. WELL19937a is slightly slower than Mersenne Twister, but has better equidistribution and “bitmixing” properties and faster recovery from states with prevailing zeros than Mersenne Twister. All WELL generators with orders 512, 1024, 19937 and 44497 can be found in randtoolbox package.

565

Probability Distributions

rstream

Streams of Random Numbers

Unified object oriented interface for multiple independent streams of random numbers from different sources.

566

Probability Distributions

RTDE

Robust Tail Dependence Estimation

Robust tail dependence estimation for bivariate models. This package is based on two papers by the authors:‘Robust and biascorrected estimation of the coefficient of tail dependence’ and ‘Robust and biascorrected estimation of probabilities of extreme failure sets’. This work was supported by a research grant (VKR023480) from VILLUM FONDEN and an international project for scientific cooperation (PICS6416).

567

Probability Distributions

rtdists

Response Time Distributions

Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model (Ratcliff & McKoon, 2008, doi:10.1162/neco.2008.1206420) based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA; Brown & Heathcote, 2008, doi:10.1016/j.cogpsych.2007.12.002) with different distributions underlying the drift rate.

568

Probability Distributions

Runuran

R Interface to the UNU.RAN Random Variate Generators

Interface to the UNU.RAN library for Universal NonUniform RANdom variate generators. Thus it allows to build nonuniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.

569

Probability Distributions

s20x

Functions for University of Auckland Course STATS 201/208 Data Analysis

A set of functions used in teaching STATS 201/208 Data Analysis at the University of Auckland. The functions are designed to make parts of R more accessible to a large undergraduate population who are mostly not statistics majors.

570

Probability Distributions

sadists

Some Additional Distributions

Provides the density, distribution, quantile and generation functions of some obscure probability distributions, including the doubly non central t, F, Beta, and Eta distributions; the lambdaprime and Kprime; the upsilon distribution; the (weighted) sum of noncentral chisquares to a power; the (weighted) sum of log noncentral chisquares; the product of noncentral chisquares to powers; the product of doubly noncentral F variables; the product of independent normals.

571

Probability Distributions

SCI

Standardized Climate Indices Such as SPI, SRI or SPEI

Functions for generating Standardized Climate Indices (SCI). SCI is a transformation of (smoothed) climate (or environmental) time series that removes seasonality and forces the data to take values of the standard normal distribution. SCI was originally developed for precipitation. In this case it is known as the Standardized Precipitation Index (SPI).

572

Probability Distributions

setRNG

Set (Normal) Random Number Generator and Seed

SetRNG provides utilities to help set and record the setting of the seed and the uniform and normal generators used when a random experiment is run. The utilities can be used in other functions that do random experiments to simplify recording and/or setting all the necessary information for reproducibility. See the vignette and reference manual for examples.

573

Probability Distributions

sfsmisc

Utilities from ‘Seminar fuer Statistik’ ETH Zurich

Useful utilities [‘goodies’] from Seminar fuer Statistik ETH Zurich, quite a few related to graphics; some were ported from Splus.

574

Probability Distributions

sgt

Skewed Generalized T Distribution Tree

Density, distribution function, quantile function and random generation for the skewed generalized t distribution. This package also provides a function that can fit data to the skewed generalized t distribution using maximum likelihood estimation.

575

Probability Distributions

skellam

Densities and Sampling for the Skellam Distribution

Functions for the Skellam distribution, including: density (pmf), cdf, quantiles and regression.

576

Probability Distributions

SkewHyperbolic

The Skew Hyperbolic Student tDistribution

Functions are provided for the density function, distribution function, quantiles and random number generation for the skew hyperbolic tdistribution. There are also functions that fit the distribution to data. There are functions for the mean, variance, skewness, kurtosis and mode of a given distribution and to calculate moments of any order about any centre. To assess goodness of fit, there are functions to generate a QQ plot, a PP plot and a tail plot.

577

Probability Distributions

skewt

The Skewed Studentt Distribution

Density, distribution function, quantile function and random generation for the skewed t distribution of Fernandez and Steel.

578

Probability Distributions

sld

Estimation and Use of the QuantileBased Skew Logistic Distribution

The skew logistic distribution is a quantiledefined generalisation of the logistic distribution (van Staden and King 2015). Provides random numbers, quantiles, probabilities, densities and density quantiles for the distribution. It provides QuantileQuantile plots and method of LMoments estimation (including asymptotic standard errors) for the distribution.

579

Probability Distributions

smoothmest

Smoothed Mestimators for 1dimensional location

Some Mestimators for 1dimensional location (Bisquare, ML for the Cauchy distribution, and the estimators from application of the smoothing principle introduced in Hampel, Hennig and Ronchetti (2011) to the above, the Huber Mestimator, and the median, main function is smoothm), and Pitman estimator.

580

Probability Distributions

SMR

Externally Studentized Midrange Distribution

Computes the studentized midrange distribution (pdf, cdf and quantile) and generates random numbers

581

Probability Distributions

sn

The SkewNormal and Related Distributions Such as the Skewt

Build and manipulate probability distributions of the skewnormal family and some related ones, notably the skewt family, and provide related statistical methods for data fitting and model diagnostics, in the univariate and the multivariate case.

582

Probability Distributions

sparseMVN

Multivariate Normal Functions for Sparse Covariance and Precision Matrices

Computes multivariate normal (MVN) densities, and samples from MVN distributions, when the covariance or precision matrix is sparse.

583

Probability Distributions

spd

Semi Parametric Distribution

The Semi Parametric Piecewise Distribution blends the Generalized Pareto Distribution for the tails with a kernel based interior.

584

Probability Distributions

stabledist

Stable Distribution Functions

Density, Probability and Quantile functions, and random number generation for (skew) stable distributions, using the parametrizations of Nolan.

585

Probability Distributions

STAR

Spike Train Analysis with R

Functions to analyze neuronal spike trains from a single neuron or from several neurons recorded simultaneously.

586

Probability Distributions

statmod

Statistical Modeling

A collection of algorithms and functions to aid statistical modeling. Includes growth curve comparisons, limiting dilution analysis (aka ELDA), mixed linear models, heteroscedastic regression, inverseGaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Includes advanced generalized linear model functions that implement secure convergence, dispersion modeling and Tweedie powerlaw families.

587

Probability Distributions

SuppDists

Supplementary Distributions

Ten distributions supplementing those built into R. Inverse Gauss, KruskalWallis, Kendall’s Tau, Friedman’s chi squared, Spearman’s rho, maximum F ratio, the Pearson product moment correlation coefficient, Johnson distributions, normal scores and generalized hypergeometric distributions. In addition two random number generators of George Marsaglia are included.

588

Probability Distributions

symmoments

Symbolic central and noncentral moments of the multivariate normal distribution

Symbolic central and noncentral moments of the multivariate normal distribution. Computes a standard representation, LateX code, and values at specified mean and covariance matrices.

589

Probability Distributions

tmvtnorm

Truncated Multivariate Normal and Student t Distribution

Random number generation for the truncated multivariate normal and Student t distribution. Computes probabilities, quantiles and densities, including onedimensional and bivariate marginal densities. Computes first and second moments (i.e. mean and covariance matrix) for the doubletruncated multinormal case.

590

Probability Distributions

tolerance

Statistical Tolerance Intervals and Regions

Statistical tolerance limits provide the limits between which we can expect to find a specified proportion of a sampled population with a given level of confidence. This package provides functions for estimating tolerance limits (intervals) for various univariate distributions (binomial, Cauchy, discrete Pareto, exponential, twoparameter exponential, extreme value, hypergeometric, Laplace, logistic, negative binomial, negative hypergeometric, normal, Pareto, PoissonLindley, Poisson, uniform, and ZipfMandelbrot), Bayesian normal tolerance limits, multivariate normal tolerance regions, nonparametric tolerance intervals, tolerance bands for regression settings (linear regression, nonlinear regression, nonparametric regression, and multivariate regression), and analysis of variance tolerance intervals. Visualizations are also available for most of these settings.

591

Probability Distributions

trapezoid

The Trapezoidal Distribution

The trapezoid package provides dtrapezoid, ptrapezoid, qtrapezoid, and rtrapezoid functions for the trapezoidal distribution.

592

Probability Distributions

triangle

Provides the Standard Distribution Functions for the Triangle Distribution

Provides the “r, q, p, and d” distribution functions for the triangle distribution.

593

Probability Distributions

truncnorm

Truncated normal distribution

r/d/p/q functions for the truncated normal distribution

594

Probability Distributions

TSA

Time Series Analysis

Contains R functions and datasets detailed in the book “Time Series Analysis with Applications in R (second edition)” by Jonathan Cryer and KungSik Chan

595

Probability Distributions

tsallisqexp

Tsallis qExp Distribution

Tsallis distribution also known as the qexponential family distribution. Provide distribution d, p, q, r functions, fitting and testing functions. Project initiated by Paul Higbie and based on Cosma Shalizi’s code.

596

Probability Distributions

TTmoment

Sampling and Calculating the First and Second Moments for the Doubly Truncated Multivariate t Distribution

Computing the first two moments of the truncated multivariate t (TMVT) distribution under the double truncation. Appling the slice sampling algorithm to generate random variates from the TMVT distribution.

597

Probability Distributions

tweedie

Evaluation of Tweedie Exponential Family Models

Maximum likelihood computations for Tweedie families, including the series expansion (Dunn and Smyth, 2005; <doi10.1007/s112220054070y>) and the Fourier inversion (Dunn and Smyth, 2008; doi:10.1007/s1122200790396), and related methods.

598

Probability Distributions

VarianceGamma

The Variance Gamma Distribution

This package provides functions for the variance gamma distributions. Density, distribution and quantile functions. Functions for random number generation and fitting of the variance gamma to data. Also, functions for computing moments of the variance gamma distribution of any order about any location. In addition, there are functions for checking the validity of parameters and to interchange different sets of parameterizatons for the variance gamma distribution.

599

Probability Distributions

VGAM (core)

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. At the heart of it are the vector generalized linear and additive model (VGLM/VGAM) classes, and the book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) doi:10.1007/9781493928187 gives details of the statistical framework and VGAM package. Currently only fixedeffects models are implemented, i.e., no randomeffects models. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE, using Fisher scoring. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs (i.e., with smoothing). The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

600

Probability Distributions

VineCopula

Statistical Inference of Vine Copulas

Provides tools for the statistical analysis of vine copula models. The package includes tools for parameter estimation, model selection, simulation, goodnessoffit tests, and visualization. Tools for estimation, selection and exploratory data analysis of bivariate copula models are also provided.

601

Probability Distributions

vines

Multivariate Dependence Modeling with Vines

Implementation of the vine graphical model for building highdimensional probability distributions as a factorization of bivariate copulas and marginal density functions. This package provides S4 classes for vines (Cvines and Dvines) and methods for inference, goodnessoffit tests, density/distribution function evaluation, and simulation.

602

Probability Distributions

zipfR

Statistical Models for Word Frequency Distributions

Statistical models and utilities for the analysis of word frequency distributions. The utilities include functions for loading, manipulating and visualizing word frequency data and vocabulary growth curves. The package also implements several statistical models for the distribution of word frequencies in a population. (The name of this package derives from the most famous word frequency distribution, Zipf’s law.)

603

Econometrics

AER (core)

Applied Econometrics with R

Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, SpringerVerlag, New York. ISBN 9780387773162. (See the vignette “AER” for a package overview.)

604

Econometrics

aod

Analysis of Overdispersed Data

This package provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).

605

Econometrics

apt

Asymmetric Price Transmission

Asymmetric price transmission between two time series is assessed. Several functions are available for linear and nonlinear threshold cointegration, and furthermore, symmetric and asymmetric error correction model. A graphical user interface is also included for major functions included in the package, so users can also use these functions in a more intuitive way.

606

Econometrics

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

607

Econometrics

betareg

Beta Regression

Beta regression for modeling betadistributed dependent variables, e.g., rates and proportions. In addition to maximum likelihood regression (for both mean and precision of a betadistributed response), biascorrected and biasreduced estimation as well as finite mixture models and recursive partitioning for beta regressions are provided.

608

Econometrics

BMA

Bayesian Model Averaging

Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).

609

Econometrics

BMS

Bayesian Model Averaging Library

Bayesian model averaging for linear models with a wide choice of (customizable) priors. Builtin priors include coefficient priors (fixed, flexible and hyperg priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Postprocessing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison.

610

Econometrics

boot

Bootstrap Functions (Originally by Angelo Canty for S)

Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.

611

Econometrics

bootstrap

Functions for the Book “An Introduction to the Bootstrap”

Software (bootstrap, crossvalidation, jackknife) and data for the book “An Introduction to the Bootstrap” by B. Efron and R. Tibshirani, 1993, Chapman and Hall. This package is primarily provided for projects already based on it, and for support of the book. New projects should preferentially use the recommended package “boot”.

612

Econometrics

brglm

Bias Reduction in BinomialResponse Generalized Linear Models

Fit generalized linear models with binomial responses using either an adjustedscore approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudodata. The interface is essentially the same as ‘glm’. More flexibility is provided by the fact that custom pseudodata representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reducedbias estimates.

613

Econometrics

CADFtest

A Package to Perform Covariate Augmented DickeyFuller Unit Root Tests

Hansen’s (1995) CovariateAugmented DickeyFuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The pvalues of the test are computed using the procedure illustrated in Lupi (2009).

614

Econometrics

car (core)

Companion to Applied Regression

Functions and Datasets to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Second Edition, Sage, 2011.

615

Econometrics

CDNmoney

Components of Canadian Monetary and Credit Aggregates

Components of Canadian Credit Aggregates and Monetary Aggregates with continuity adjustments.

616

Econometrics

censReg

Censored Regression (Tobit) Models

Maximum Likelihood estimation of censored regression (Tobit) models with crosssectional and panel data.

617

Econometrics

clubSandwich

ClusterRobust (Sandwich) Variance Estimators with SmallSample Corrections

Provides several clusterrobust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the biasreduced linearization estimator introduced by Bell and McCaffrey (2002) http://www.statcan.gc.ca/pub/12001x/2002002/article/9058eng.pdf and developed further by Pustejovsky and Tipton (2017) doi:10.1080/07350015.2016.1247004. The package includes functions for estimating the variance covariance matrix and for testing single and multiplecontrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddlepoint corrections. Tests of multiplecontrast hypotheses use an approximation to Hotelling’s Tsquared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), ivreg (from package ‘AER’), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and rma.mv() (from ‘metafor’).

618

Econometrics

clusterSEs

Calculate ClusterRobust pValues and Confidence Intervals

Calculate pvalues and confidence intervals using clusteradjusted tstatistics (based on Ibragimov and Muller (2010) doi:10.1198/jbes.2009.08046, pairs cluster bootstrapped tstatistics, and wild cluster bootstrapped tstatistics (the latter two techniques based on Cameron, Gelbach, and Miller (2008) doi:10.1162/rest.90.3.414. Procedures are included for use with GLM, ivreg, plm (pooling or fixed effects), and mlogit models.

619

Econometrics

crch

Censored Regression with Conditional Heteroscedasticity

Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by intervalcensoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals.

620

Econometrics

decompr

GlobalValueChain Decomposition

Two globalvaluechain decompositions are implemented. Firstly, the WangWeiZhu (Wang, Wei, and Zhu, 2013) algorithm splits bilateral gross exports into 16 valueadded components. Secondly, the Leontief decomposition (default) derives the value added origin of exports by country and industry, which is also based on Wang, Wei, and Zhu (Wang, Z., S.J. Wei, and K. Zhu. 2013. “Quantifying International Production Sharing at the Bilateral and Sector Levels.”).

621

Econometrics

dlsem

DistributedLag Linear Structural Equation Modelling

Inference functionalities for distributedlag linear structural equation models. Endpointconstrained quadratic, quadratic decreasing and gamma lag shapes are available.

622

Econometrics

dynlm

Dynamic Linear Regression

Dynamic linear models and time series regression.

623

Econometrics

Ecdat

Data Sets for Econometrics

Data sets for econometrics.

624

Econometrics

effects

Effect Displays for Linear, Generalized Linear, and Other Models

Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.

625

Econometrics

erer

Empirical Research in Economics with R

Functions, datasets, and sample codes related to the book of ‘Empirical Research in Economics: Growing up with R’ by Dr. Changyou Sun are included. Marginal effects for binary or ordered choice models can be calculated. Static and dynamic Almost Ideal Demand System (AIDS) models can be estimated. A typical event analysis in finance can be conducted with several functions included.

626

Econometrics

expsmooth

Data Sets from “Forecasting with Exponential Smoothing”

Data sets from the book “Forecasting with exponential smoothing: the state space approach” by Hyndman, Koehler, Ord and Snyder (Springer, 2008).

627

Econometrics

ExtremeBounds

Extreme Bounds Analysis (EBA)

An implementation of Extreme Bounds Analysis (EBA), a global sensitivity analysis that examines the robustness of determinants in regression models. The package supports both Leamer’s and SalaiMartin’s versions of EBA, and allows users to customize all aspects of the analysis.

628

Econometrics

fma

Data Sets from “Forecasting: Methods and Applications” by Makridakis, Wheelwright & Hyndman (1998)

All data sets from “Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman (Wiley, 3rd ed., 1998).

629

Econometrics

forecast (core)

Forecasting Functions for Time Series and Linear Models

Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

630

Econometrics

frm

Regression Analysis of Fractional Responses

Estimation and specification analysis of one and twopart fractional regression models and calculation of partial effects.

631

Econometrics

frontier

Stochastic Frontier Analysis

Maximum Likelihood Estimation of Stochastic Frontier Production and Cost Functions. Two specifications are available: the error components specification with timevarying efficiencies (Battese and Coelli, 1992) and a model specification in which the firm effects are directly influenced by a number of variables (Battese and Coelli, 1995).

632

Econometrics

fxregime

Exchange Rate Regime Analysis

Exchange rate regression and structural change tools for estimating, testing, dating, and monitoring (de facto) exchange rate regimes.

633

Econometrics

gam

Generalized Additive Models

Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).

634

Econometrics

gamlss

Generalised Additive Models for Location Scale and Shape

Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), doi:10.1111/j.14679876.2005.00510.x. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.

635

Econometrics

geepack

Generalized Estimating Equation Package

Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses.

636

Econometrics

gets

GeneraltoSpecific (GETS) Modelling and Indicator Saturation Methods

Automated GeneraltoSpecific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean.

637

Econometrics

glmx

Generalized Linear Models Extended

Extended techniques for generalized linear models (GLMs), especially for binary responses, including parametric links and heteroskedastic latent variables.

638

Econometrics

gmm

Generalized Method of Moments and Generalized Empirical Likelihood

It is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; doi:10.2307/1912775), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; doi:10.2307/1392442) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; doi:10.1111/j.00130133.1997.174.x, Kitamura 1997; doi:10.1214/aos/1069362388, Newey and Smith 2004; doi:10.1111/j.14680262.2004.00482.x, and Anatolyev 2005 doi:10.1111/j.14680262.2005.00601.x).

639

Econometrics

gmnl

Multinomial Logit Models with Random Parameters

An implementation of maximum simulated likelihood method for the estimation of multinomial logit models with random coefficients. Specifically, it allows estimating models with continuous heterogeneity such as the mixed multinomial logit and the generalized multinomial logit. It also allows estimating models with discrete heterogeneity such as the latent class and the mixedmixed multinomial logit model.

640

Econometrics

gvc

Global Value Chains Tools

Several tools for Global Value Chain (‘GVC’) analysis are implemented.

641

Econometrics

Hmisc

Harrell Miscellaneous

Contains many functions useful for data analysis, highlevel graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.

642

Econometrics

ineq

Measuring Inequality, Concentration, and Poverty

Inequality, concentration, and poverty measures. Lorenz curves (empirical and theoretical).

643

Econometrics

intReg

Interval Regression

Estimating interval regression models. Supports both common and observationspecific boundaries.

644

Econometrics

ivbma

Bayesian Instrumental Variable Estimation and Model Determination via Conditional Bayes Factors

This package allows one incorporate instrument and covariate uncertainty into instrumental variable regression.

645

Econometrics

ivfixed

Instrumental fixed effect panel data model

Fit an Instrumental least square dummy variable model

646

Econometrics

ivlewbel

Uses heteroscedasticity to estimate mismeasured and endogenous regressor models

GMM estimation of triangular systems using heteroscedasticity based instrumental variables as in Lewbel (2012)

647

Econometrics

ivpack

Instrumental Variable Estimation

This package contains functions for carrying out instrumental variable estimation of causal effects and power analyses for instrumental variable studies.

648

Econometrics

ivpanel

Instrumental Panel Data Models

Fit the instrumental panel data models: the fixed effects, random effects and between models.

649

Econometrics

ivprobit

Instrumental variables probit model

ivprobit fit an Instrumental variables probit model using the generalized least squares estimator

650

Econometrics

LARF

Local Average Response Functions for Instrumental Variable Estimation of Treatment Effects

Provides instrumental variable estimation of treatment effects when both the endogenous treatment and its instrument are binary. Applicable to both binary and continuous outcomes.

651

Econometrics

lavaan

Latent Variable Analysis

Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.

652

Econometrics

lfe

Linear Group Fixed Effects

Transforms away factors with many levels prior to doing an OLS. Useful for estimating linear models with multiple group fixed effects, and for estimating linear models which uses factors with many levels as pure control variables. Includes support for instrumental variables, conditional F statistics for weak instruments, robust and multiway clustered standard errors, as well as limited mobility bias correction.

653

Econometrics

LinRegInteractive

Interactive Interpretation of Linear Regression Models

Interactive visualization of effects, response functions and marginal effects for different kinds of regression models. In this version linear regression models, generalized linear models, generalized additive models and linear mixedeffects models are supported. Major features are the interactive approach and the handling of the effects of categorical covariates: if two or more factors are used as covariates every combination of the levels of each factor is treated separately. The automatic calculation of marginal effects and a number of possibilities to customize the graphical output are useful features as well.

654

Econometrics

lme4

Linear MixedEffects Models using ‘Eigen’ and S4

Fit linear and generalized linear mixedeffects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

655

Econometrics

lmtest (core)

Testing Linear Regression Models

A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.

656

Econometrics

margins

Marginal Effects for Model Objects

An R port of Stata’s ‘margins’ command, which can be used to calculate marginal (or partial) effects from model objects.

657

Econometrics

MASS

Support Functions and Datasets for Venables and Ripley’s MASS

Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).

658

Econometrics

matchingMarkets

Analysis of Stable Matchings

Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes onesided matching of agents into groups as well as twosided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.

659

Econometrics

Matrix

Sparse and Dense Matrix Classes and Methods

A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using ‘LAPACK’ and ‘SuiteSparse’ libraries.

660

Econometrics

Mcomp

Data from the MCompetitions

The 1001 time series from the Mcompetition (Makridakis et al. 1982) doi:10.1002/for.3980010202 and the 3003 time series from the IJFM3 competition (Makridakis and Hibon, 2000) doi:10.1016/S01692070(00)000571.

661

Econometrics

meboot

Maximum Entropy Bootstrap for Time Series

Maximum entropy density based dependent data bootstrap. An algorithm is provided to create a population of time series (ensemble) without assuming stationarity. The reference paper (Vinod, H.D., 2004) explains how the algorithm satisfies the ergodic theorem and the central limit theorem.

662

Econometrics

mfx

Marginal Effects, Odds Ratios and Incidence Rate Ratios for GLMs

Estimates probit, logit, Poisson, negative binomial, and beta regression models, returning their marginal effects, odds ratios, or incidence rate ratios as an output.

663

Econometrics

mgcv

Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar. Includes a gam() function, a wide variety of smoothers, JAGS support and distributions beyond the exponential family.

664

Econometrics

mhurdle

Multiple Hurdle Tobit Models

Estimation of models with zero leftcensored variables. Null values may be caused by a selection process, insufficient resources or infrequency of purchase.

665

Econometrics

micEcon

Microeconomic Analysis and Modelling

Various tools for microeconomic analysis and microeconomic modelling, e.g. estimating quadratic, CobbDouglas and Translog functions, calculating partial derivatives and elasticities of these functions, and calculating Hessian matrices, checking curvature and preparing restrictions for imposing monotonicity of Translog functions.

666

Econometrics

micEconAids

Demand Analysis with the Almost Ideal Demand System (AIDS)

Functions and tools for analysing consumer demand with the Almost Ideal Demand System (AIDS) suggested by Deaton and Muellbauer (1980).

667

Econometrics

micEconCES

Analysis with the Constant Elasticity of Substitution (CES) function

Tools for economic analysis and economic modelling with a Constant Elasticity of Substitution (CES) function

668

Econometrics

micEconSNQP

Symmetric Normalized Quadratic Profit Function

Production analysis with the Symmetric Normalized Quadratic (SNQ) profit function

669

Econometrics

midasr

Mixed Data Sampling Regression

Methods and tools for mixed frequency time series data analysis. Allows estimation, model selection and forecasting for MIDAS regressions.

670

Econometrics

mlogit

multinomial logit model

Estimation of the multinomial logit model

671

Econometrics

mnlogit

Multinomial Logit Model

Time and memory efficient estimation of multinomial logit models using maximum likelihood method. Numerical optimization performed by NewtonRaphson method using an optimized, parallel C++ library to achieve fast computation of Hessian matrices. Motivated by large scale multiclass classification problems in econometrics and machine learning.

672

Econometrics

MNP

R Package for Fitting the Multinomial Probit Model

Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311334. doi:10.1016/j.jeconom.2004.02.002 Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 132. doi:10.18637/jss.v014.i03.

673

Econometrics

MSBVAR

MarkovSwitching, Bayesian, Vector Autoregression Models

Provides methods for estimating frequentist and Bayesian Vector Autoregression (VAR) models and Markovswitching Bayesian VAR (MSBVAR). Functions for reduced form and structural VAR models are also available. Includes methods for the generating posterior inferences for these models, forecasts, impulse responses (using likelihoodbased error bands), and forecast error decompositions. Also includes utility functions for plotting forecasts and impulse responses, and generating draws from Wishart and singular multivariate normal densities. Current version includes functionality to build and evaluate models with Markov switching.

674

Econometrics

multiwayvcov

MultiWay Standard Error Clustering

Exports two functions implementing multiway clustering using the method suggested by Cameron, Gelbach, & Miller (2011) and cluster (or block) bootstrapping for estimating variancecovariance matrices. Normal one and twoway clustering matches the results of other common statistical packages. Missing values are handled transparently and rudimentary parallelization support is provided.

675

Econometrics

mvProbit

Multivariate Probit Models

Tools for estimating multivariate probit models, calculating conditional and unconditional expectations, and calculating marginal effects on conditional and unconditional expectations.

676

Econometrics

nlme

Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixedeffects models.

677

Econometrics

nnet

FeedForward Neural Networks and Multinomial LogLinear Models

Software for feedforward neural networks with a single hidden layer, and for multinomial loglinear models.

678

Econometrics

nonnest2

Tests of NonNested Models

Testing nonnested models via theory supplied by Vuong (1989) doi:10.2307/1912557. Includes tests of model distinguishability and of model fit that can be applied to both nested and nonnested models. Also includes functionality to obtain confidence intervals associated with AIC and BIC. This material is based on work supported by the National Science Foundation under Grant Number SES1061334.

679

Econometrics

np

Nonparametric Kernel Smoothing Methods for Mixed Data Types

Nonparametric (and semiparametric) kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor data types. We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC:www.nserc.ca), the Social Sciences and Humanities Research Council of Canada (SSHRC:www.sshrc.ca), and the Shared Hierarchical Academic Research Computing Network (SHARCNET:www.sharcnet.ca).

680

Econometrics

ordinal

Regression Models for Ordinal Data

Implementation of cumulative link (mixed) models also known as ordered regression models, proportional odds models, proportional hazards models for grouped survival times and ordered logit/probit/… models. Estimation is via maximum likelihood and mixed models are fitted with the Laplace approximation and adaptive GaussHermite quadrature. Multiple random effect terms are allowed and they may be nested, crossed or partially nested/crossed. Restrictions of symmetry and equidistance can be imposed on the thresholds (cutpoints/intercepts). Standard model methods are available (summary, anova, dropmethods, step, confint, predict etc.) in addition to profile methods and slice methods for visualizing the likelihood function and checking convergence.

681

Econometrics

OrthoPanels

Dynamic Panel Models with Orthogonal Reparameterization of Fixed Effects

Implements the orthogonal reparameterization approach recommended by Lancaster (2002) to estimate dynamic panel models with fixed effects (and optionally: panel specific intercepts). The approach uses a likelihoodbased estimator and produces estimates that are asymptotically unbiased as N goes to infinity, with a T as low as 2.

682

Econometrics

pampe

Implementation of the Panel Data Approach Method for Program Evaluation

Implements the Panel Data Approach Method for program evaluation as developed in Hsiao, Ching and Ki Wan (2012). pampe estimates the effect of an intervention by comparing the evolution of the outcome for a unit affected by an intervention or treatment to the evolution of the unit had it not been affected by the intervention.

683

Econometrics

panelAR

Estimation of Linear AR(1) Panel Data Models with CrossSectional Heteroskedasticity and/or Correlation

The package estimates linear models on panel data structures in the presence of AR(1)type autocorrelation as well as panel heteroskedasticity and/or contemporaneous correlation. First, AR(1)type autocorrelation is addressed via a twostep PraisWinsten feasible generalized least squares (FGLS) procedure, where the autocorrelation coefficients may be panelspecific. A number of common estimators for the autocorrelation coefficient are supported. In case of panel heteroskedasticty, one can choose to use a sandwichtype robust standard error estimator with OLS or a panel weighted least squares estimator after the twostep PraisWinsten estimator. Alternatively, if panels are both heteroskedastic and contemporaneously correlated, the package supports panelcorrected standard errors (PCSEs) as well as the ParksKmenta FGLS estimator.

684

Econometrics

Paneldata

Linear models for panel data

Linear models for panel data: the fixed effect model and the random effect model

685

Econometrics

PANICr

PANIC Tests of Nonstationarity

A methodology that makes use of the factor structure of large dimensional panels to understand the nature of nonstationarity inherent in data. This is referred to as PANIC, Panel Analysis of Nonstationarity in Idiosyncratic and Common Components. PANIC (2004)doi:10.1111/j.14680262.2004.00528.x includes valid pooling methods that allow panel tests to be constructed. PANIC (2004) can detect whether the nonstationarity in a series is pervasive, or variable specific, or both. PANIC (2010) doi:10.1017/s0266466609990478 includes two new tests on the idiosyncratic component that estimates the pooled autoregressive coefficient and sample moment, respectively. The PANIC model approximates the number of factors based on Bai and Ng (2002) doi:10.1111/14680262.00273.

686

Econometrics

pco

Panel Cointegration Tests

Computation of the Pedroni (1999) panel cointegration test statistics. Reported are the empirical and the standardized values.

687

Econometrics

pcse

PanelCorrected Standard Error Estimation in R

This package contains a function to estimate panelcorrected standard errors. Data may contain balanced or unbalanced panels.

688

Econometrics

pder

Panel Data Econometrics with R

Data sets for the Panel Data Econometrics with R book.

689

Econometrics

pdR

Threshold Model and Unit Root Tests in Panel Data

Threshold model, panel version of Hylleberg et al.(1990)doi:10.1016/03044076(90)90080D seasonal unit root tests, and panel unit root test of Chang(2002)doi:10.1016/S03044076(02)000957.

690

Econometrics

pglm

Panel Generalized Linear Models

Estimation of panel models for glmlike models: this includes binomial models (logit and probit) count models (poisson and negbin) and ordered models (logit and probit).

691

Econometrics

phtt

Panel Data Analysis with Heterogeneous Time Trends

The package provides estimation procedures for panel data with large dimensions n, T, and general forms of unobservable heterogeneous effects. Particularly, the estimation procedures are those of Bai (2009) and Kneip, Sickles, and Song (2012), which complement one another very well: both models assume the unobservable heterogeneous effects to have a factor structure. The method of Bai (2009) assumes that the factors are stationary, whereas the method of Kneip et al. (2012) allows the factors to be nonstationary. Additionally, the ‘phtt’ package provides a wide range of dimensionality criteria in order to estimate the number of the unobserved factors simultaneously with the remaining model parameters.

692

Econometrics

plm (core)

Linear Models for Panel Data

A set of estimators and tests for panel data econometrics.

693

Econometrics

pscl

Political Science Computational Laboratory

Bayesian analysis of itemresponse theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zeroinflated and hurdle models for count data; goodnessoffit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seatsvotes curves.

694

Econometrics

psidR

Build Panel Data Sets from PSID Raw Data

Makes it easy to build panel data in wide format from Panel Survey of Income Dynamics (PSID) delivered raw data. Deals with data downloaded and preprocessed by ‘Stata’ or ‘SAS’, or can optionally download directly from the PSID server using the ‘SAScii’ package. ‘psidR’ takes care of merging data from each wave onto a crossperiod index file, so that individuals can be followed over time. The user must specify which years they are interested in, and the PSID variable names (e.g. ER21003) for each year (they differ in each year). There are different panel data designs and sample subsetting criteria implemented (“SRC”, “SEO”, “immigrant” and “latino” samples).

695

Econometrics

pwt

Penn World Table (Versions 5.6, 6.x, 7.x)

The Penn World Table provides purchasing power parity and national income accounts converted to international prices for 189 countries for some or all of the years 19502010.

696

Econometrics

pwt8

Penn World Table (Version 8.x)

The Penn World Table 8.x provides information on relative levels of income, output, inputs, and productivity for 167 countries between 1950 and 2011.

697

Econometrics

pwt9

Penn World Table (Version 9.x)

The Penn World Table 9.x provides information on relative levels of income, output, inputs, and productivity for 182 countries between 1950 and 2014.

698

Econometrics

quantreg

Quantile Regression

Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and nonparametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.

699

Econometrics

Rchoice

Discrete Choice (Binary, Poisson and Ordered) Models with Random Parameters

An implementation of simulated maximum likelihood method for the estimation of Binary (Probit and Logit), Ordered (Probit and Logit) and Poisson models with random parameters for crosssectional and longitudinal data.

700

Econometrics

rdd

Regression Discontinuity Estimation

Provides the tools to undertake estimation in Regression Discontinuity Designs. Both sharp and fuzzy designs are supported. Estimation is accomplished using local linear regression. A provided function will utilize ImbensKalyanaraman optimal bandwidth calculation. A function is also included to test the assumption of nosorting effects.

701

Econometrics

rddtools

Toolbox for Regression Discontinuity Design (‘RDD’)

Set of functions for Regression Discontinuity Design (‘RDD’), for data visualisation, estimation and testing.

702

Econometrics

rdlocrand

Local Randomization Methods for RD Designs

The regression discontinuity (RD) design is a popular quasiexperimental design for causal inference and policy evaluation. Under the local randomization approach, RD designs can be interpreted as randomized experiments inside a window around the cutoff. This package provides tools to perform randomization inference for RD designs under local randomization: rdrandinf() to perform hypothesis testing using randomization inference, rdwinselect() to select a window around the cutoff in which randomization is likely to hold, rdsensitivity() to assess the sensitivity of the results to different window lengths and null hypotheses and rdrbounds() to construct Rosenbaum bounds for sensitivity to unobserved confounders.

703

Econometrics

rdrobust

Robust DataDriven Statistical Inference in RegressionDiscontinuity Designs

Regressiondiscontinuity (RD) designs are quasiexperimental research designs popular in social, behavioral and natural sciences. The RD design is usually employed to study the (local) causal effect of a treatment, intervention or policy. This package provides tools for datadriven graphical and analytical statistical inference in RD designs: rdrobust() to construct localpolynomial point estimators and robust confidence intervals for average treatment effects at the cutoff in Sharp, Fuzzy and Kink RD settings, rdbwselect() to perform bandwidth selection for the different procedures implemented, and rdplot() to conduct exploratory data analysis (RD plots).

704

Econometrics

reldist

Relative Distribution Methods

Tools for the comparison of distributions. This includes nonparametric estimation of the relative distribution PDF and CDF and numerical summaries as described in “Relative Distribution Methods in the Social Sciences” by Mark S. Handcock and Martina Morris, SpringerVerlag, 1999, SpringerVerlag, ISBN 0387987789.

705

Econometrics

REndo

Fitting Linear Models with Endogenous Regressors using Latent Instrumental Variables

Fits linear models with endogenous regressor using latent instrumental variable approaches. The methods included in the package are Lewbel’s (1997) doi:10.2307/2171884 higher moments approach as well as Lewbel’s (2012) doi:10.1080/07350015.2012.643126 heteroskedasticity approach, Park and Gupta’s (2012) doi:10.1287/mksc.1120.0718joint estimation method that uses Gaussian copula and Kim and Frees’s (2007) doi:10.1007/s1133600790081 multilevel generalized method of moment approach that deals with endogeneity in a multilevel setting. These are statistical techniques to address the endogeneity problem where no external instrumental variables are needed. This version:  includes an omitted variable test in the multilevel estimation. It is reported in the summary() function of the multilevelIV() function.  resolves the error “Error in listIDs[, 1] : incorrect number of dimensions” when using the multilevelIV() function.  a new simulated dataset is provided, dataMultilevelIV, on which to exemplify the multilevelIV() function.

706

Econometrics

rms

Regression Modeling Strategies

Regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. ‘rms’ is a collection of functions that assist with and streamline modeling. It also contains functions for binary and ordinal logistic regression models, ordinal models for continuous Y with a variety of distribution families, and the BuckleyJames multiple regression model for rightcensored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models. ‘rms’ works with almost any regression model, but it was especially written to work with binary or ordinal regression models, Cox regression, accelerated failure time models, ordinary linear models, the BuckleyJames model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.

707

Econometrics

RSGHB

Functions for Hierarchical Bayesian Estimation: A Flexible Approach

Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (nonvarying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative lognormal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html See Train’s chapter on HB in Discrete Choice with Simulation here: http://elsa.berkeley.edu/books/choice2.html; and his paper on using HB with nonnormal distributions here: http://eml.berkeley.edu//~train/trainsonnier.pdf.

708

Econometrics

rUnemploymentData

Data and Functions for USA State and County Unemployment Data

Contains data and visualization functions for USA unemployment data. Data comes from the US Bureau of Labor Statistics (BLS). State data is in ?df_state_unemployment and covers 20002013. County data is in ?df_county_unemployment and covers 19902013. Choropleth maps of the data can be generated with ?state_unemployment_choropleth() and ?county_unemployment_choropleth() respectively.

709

Econometrics

sampleSelection

Sample Selection Models

Twostep estimation and maximum likelihood estimation of Heckmantype sample selection models: standard sample selection models (Tobit2) and endogenous switching regression models (Tobit5).

710

Econometrics

sandwich (core)

Robust Covariance Matrix Estimators

Modelrobust standard error estimators for crosssectional, time series, clustered, panel, and longitudinal data.

711

Econometrics

segmented

Regression Models with BreakPoints / ChangePoints Estimation

Given a regression model, segmented ‘updates’ the model by adding one or more segmented (i.e., piecewise linear) relationships. Several variables with multiple breakpoints are allowed.

712

Econometrics

sem

Structural Equation Models

Functions for fitting general linear structural equation models (with observed and latent variables) using the RAM approach, and for fitting structural equations in observedvariable models by twostage least squares.

713

Econometrics

SemiParSampleSel

SemiParametric Sample Selection Modelling with Continuous or Discrete Response

Routine for fitting continuous or discrete response copula sample selection models with semiparametric predictors, including linear and nonlinear effects.

714

Econometrics

semsfa

Semiparametric Estimation of Stochastic Frontier Models

Semiparametric Estimation of Stochastic Frontier Models following a two step procedure: in the first step semiparametric or nonparametric regression techniques are used to relax parametric restrictions of the functional form representing technology and in the second step variance parameters are obtained by pseudolikelihood estimators or by method of moments.

715

Econometrics

sfa

Stochastic Frontier Analysis

Stochastic Frontier Analysis introduced by Aigner, Lovell and Schmidt (1976) and Battese and Coelli (1992, 1995).

716

Econometrics

simpleboot

Simple Bootstrap Routines

Simple bootstrap routines

717

Econometrics

SparseM

Sparse Linear Algebra

Some basic linear algebra functionality for sparse matrices is provided: including Cholesky decomposition and backsolving as well as standard R subsetting and Kronecker products.

718

Econometrics

spatialprobit

Spatial Probit Models

Bayesian Estimation of Spatial Probit and Tobit Models.

719

Econometrics

spdep

Spatial Dependence: Weighting Schemes, Statistics and Models

A collection of functions to create spatial weights matrix objects from polygon ‘contiguities’, from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial ‘autocorrelation’, including global ‘Morans I’, ‘APLE’, ‘Gearys C’, ‘Hubert/Mantel’ general cross product statistic, Empirical Bayes estimates and ‘Assuncao/Reis’ Index, ‘Getis/Ord’ G and multicoloured join count statistics, local ‘Moran’s I’ and ‘Getis/Ord’ G, ‘saddlepoint’ approximations and exact tests for global and local ‘Moran’s I’; and functions for estimating spatial simultaneous ‘autoregressive’ (‘SAR’) lag and error models, impact measures for lag models, weighted and ‘unweighted’ ‘SAR’ and ‘CAR’ spatial regression models, semiparametric and Moran ‘eigenvector’ spatial filtering, ‘GM SAR’ error models, and generalized spatial two stage least squares models.

720

Econometrics

spfrontier

Spatial Stochastic Frontier Models

A set of tools for estimation of various spatial specifications of stochastic frontier models.

721

Econometrics

sphet

Estimation of spatial autoregressive models with and without heteroskedastic innovations

Generalized Method of Moment estimation of CliffOrdtype spatial autoregressive models with and without heteroskedastic innovations

722

Econometrics

splm

Econometric Models for Spatial Panel Data

ML and GM estimation and diagnostic testing of econometric models for spatial panel data.

723

Econometrics

ssfa

Spatial Stochastic Frontier Analysis

Spatial Stochastic Frontier Analysis (SSFA) is an original method for controlling the spatial heterogeneity in Stochastic Frontier Analysis (SFA) models, for crosssectional data, by splitting the inefficiency term into three terms: the first one related to spatial peculiarities of the territory in which each single unit operates, the second one related to the specific production features and the third one representing the error term.

724

Econometrics

strucchange

Testing, Monitoring, and Dating Structural Changes

Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.

725

Econometrics

survival

Survival Analysis

Contains the core survival analysis routines, including definition of Surv objects, KaplanMeier and AalenJohansen (multistate) curves, Cox models, and parametric accelerated failure time models.

726

Econometrics

systemfit

Estimating Systems of Simultaneous Equations

Fitting simultaneous systems of linear and nonlinear equations using Ordinary Least Squares (OLS), Weighted Least Squares (WLS), Seemingly Unrelated Regressions (SUR), TwoStage Least Squares (2SLS), Weighted TwoStage Least Squares (W2SLS), and ThreeStage Least Squares (3SLS).

727

Econometrics

truncreg

Truncated Gaussian Regression Models

Estimation of models for truncated Gaussian variables by maximum likelihood.

728

Econometrics

tsDyn

Nonlinear Time Series Models with Regime Switching

Implements nonlinear autoregressive (AR) time series models. For univariate series, a nonparametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).

729

Econometrics

tseries (core)

Time Series Analysis and Computational Finance

Time series analysis and computational finance.

730

Econometrics

tsfa

Time Series Factor Analysis

Extraction of Factors from Multivariate Time Series. See ?00tsfaIntro for more details.

731

Econometrics

urca (core)

Unit Root and Cointegration Tests for Time Series Data

Unit root and cointegration tests encountered in applied econometric analysis are implemented.

732

Econometrics

vars

VAR Modelling

Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.

733

Econometrics

VGAM

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. At the heart of it are the vector generalized linear and additive model (VGLM/VGAM) classes, and the book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) doi:10.1007/9781493928187 gives details of the statistical framework and VGAM package. Currently only fixedeffects models are implemented, i.e., no randomeffects models. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE, using Fisher scoring. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs (i.e., with smoothing). The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

734

Econometrics

wahc

Autocorrelation and Heteroskedasticity Correction in Fixed Effect Panel Data Model

Fit the fixed effect panel data model with heteroskedasticity and autocorrelation correction.

735

Econometrics

wbstats

Programmatic Access to Data and Statistics from the World Bank API

Tools for searching and downloading data and statistics from the World Bank Data API (http://data.worldbank.org/developers/apioverview) and the World Bank Data Catalog API (http://data.worldbank.org/developers/datacatalogapi).

736

Econometrics

wooldridge

105 Data Sets from “Introductory Econometrics: A Modern Approach” by Jeffrey M. Wooldridge

Those new to econometrics and R may find themselves challenged by data management tasks inherent to both. The wooldridge data package aims to lighten the task by efficiently loading any data set from the text with a single command. Collectively, all data sets have been compressed to 62.73% of their original size. Most sets have robust documentation including page numbers on which they are used, original data sources, original year of publication, and notes which chronicle their history while offering ideas for further exploration and research. To resurrect a data set, one can pass it’s name to the ‘data()’ function or just define it as the ‘data =’ argument of the model function. The data will lazily load and, provided the syntax is correct, model estimates shall spring forth from the otherwise lifeless abyss of your R console! If the syntax is an issue, the wooldridgevignette displays solutions to examples from each chapter of the text, providing a relevant introduction to econometric modeling with R. The vignette closes with an Appendix of recommended sources for R and econometrics. Note: Data sets are from the 5th edition (Wooldridge 2013, ISBN13:9781111531041), and are compatible with all others.

737

Econometrics

xts

eXtensible Time Series

Provide for uniform handling of R’s different timebased data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying crossclass interoperability.

738

Econometrics

Zelig

Everyone’s Statistical Software

A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical workflowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and reweighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.

739

Econometrics

zoo (core)

S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)

An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.

740

Econometrics

zTree

Functions to Import Data from ‘zTree’ into R

Read ‘.xls’ and ‘.sbj’ files which are written by the Microsoft Windows program ‘zTree’. The latter is a software for developing and carrying out economic experiments (see http://www.ztree.uzh.ch/ for more information).

741

Analysis of Ecological and Environmental Data

ade4 (core)

Analysis of Ecological Data : Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of onetable (e.g., principal component analysis, correspondence analysis), twotable (e.g., coinertia analysis, redundancy analysis), threetable (e.g., RLQ analysis) and Ktable (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) doi:10.18637/jss.v022.i04.

742

Analysis of Ecological and Environmental Data

adehabitat

Analysis of Habitat Selection by Animals

A collection of tools for the analysis of habitat selection by animals.

743

Analysis of Ecological and Environmental Data

amap

Another Multidimensional Analysis Package

Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).

744

Analysis of Ecological and Environmental Data

analogue

Analogue and Weighted Averaging Methods for Palaeoecology

Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.

745

Analysis of Ecological and Environmental Data

aod

Analysis of Overdispersed Data

This package provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).

746

Analysis of Ecological and Environmental Data

ape

Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clocklike trees using mean path lengths and penalized likelihood, dating trees with noncontemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, TCoffee, Muscle) whose results are returned into R.

747

Analysis of Ecological and Environmental Data

aqp

Algorithms for Quantitative Pedology

A collection of algorithms related to modeling of soil resources, soil classification, soil profile aggregation, and visualization.

748

Analysis of Ecological and Environmental Data

BiodiversityR

Package for Community Ecology and Suitability Analysis

Graphical User Interface (via the RCommander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presenceabsence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.

749

Analysis of Ecological and Environmental Data

boussinesq

Analytic Solutions for (groundwater) Boussinesq Equation

This package is a collection of R functions implemented from published and available analytic solutions for the OneDimensional Boussinesq Equation (groundwater). In particular, the function “beq.lin” is the analytic solution of the linearized form of Boussinesq Equation between two different headbased boundary (Dirichlet) conditions; “beq.song” is the nonlinear powerseries analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.

750

Analysis of Ecological and Environmental Data

bReeze

Functions for Wind Resource Assessment

A collection of functions to analyse, visualize and interpret wind data and to calculate the potential energy production of wind turbines.

751

Analysis of Ecological and Environmental Data

CircStats

Circular Statistics, from “Topics in circular Statistics” (2001)

Circular Statistics, from “Topics in circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.

752

Analysis of Ecological and Environmental Data

circular

Circular Statistics

Circular Statistics, from “Topics in circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.

753

Analysis of Ecological and Environmental Data

cluster (core)

“Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.

754

Analysis of Ecological and Environmental Data

cocorresp

CoCorrespondence Analysis Methods

Fits predictive and symmetric cocorrespondence analysis (CoCA) models to relate one data matrix to another data matrix. More specifically, CoCA maximises the weighted covariance between the weighted averaged species scores of one community and the weighted averaged species scores of another community. CoCA attempts to find patterns that are common to both communities.

755

Analysis of Ecological and Environmental Data

Distance

Distance Sampling Detection Function and Abundance Estimation

A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a HorvitzThompsonlike estimator) if survey area information is provided.

756

Analysis of Ecological and Environmental Data

diveMove

Dive Analysis and Calibration

Utilities to represent, visualize, filter, analyse, and summarize timedepth recorder (TDR) data. Miscellaneous functions for handling location data are also provided.

757

Analysis of Ecological and Environmental Data

dse

Dynamic Systems Estimation (Time Series Package)

Tools for multivariate, linear, timeinvariant, time series models. This includes ARMA and statespace representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and statespace model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.

758

Analysis of Ecological and Environmental Data

dsm

Density Surface Modelling of Distance Sampling Data

Density surface modelling of line transect data. A Generalized Additive Modelbased approach is used to calculate spatiallyexplicit estimates of animal abundance from distance sampling (also presence/absence and strip transect) data. Several utility functions are provided for model checking, plotting and variance estimation.

759

Analysis of Ecological and Environmental Data

DSpat

Spatial Modelling for Distance Sampling Data

Fits inhomogeneous Poisson process spatial models to line transect sampling data and provides estimate of abundance within a region.

760

Analysis of Ecological and Environmental Data

dyn

Time Series Regression

Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.

761

Analysis of Ecological and Environmental Data

dynatopmodel

Implementation of the Dynamic TOPMODEL Hydrological Model

A native R implementation and enhancement of the Dynamic TOPMODEL semidistributed hydrological model. Includes some preprocesssing and output routines.

762

Analysis of Ecological and Environmental Data

dynlm

Dynamic Linear Regression

Dynamic linear models and time series regression.

763

Analysis of Ecological and Environmental Data

e1071

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

764

Analysis of Ecological and Environmental Data

earth

Multivariate Adaptive Regression Splines

Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines”. (The term “MARS” is trademarked and thus not used in the name of the package.)

765

Analysis of Ecological and Environmental Data

eco

Ecological Inference in 2x2 Tables

Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 doi:10.1093/pan/mpm017) and (2011 doi:10.18637/jss.v042.i05) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the ExpectationMaximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individuallevel data can be directly incorporated into the estimation whenever such data are available. Along with insample and outofsample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.

766

Analysis of Ecological and Environmental Data

ecodist

DissimilarityBased Functions for Ecological Analysis

Dissimilaritybased analysis functions including ordination and Mantel test functions, intended for use with spatial and community data.

767

Analysis of Ecological and Environmental Data

EcoHydRology

A community modeling foundation for EcoHydrology

This package provides a flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex ecohydrological interactions.

768

Analysis of Ecological and Environmental Data

EnvStats

Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous builtin data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 9781461484554, http://www.springer.com/book/9781461484554).

769

Analysis of Ecological and Environmental Data

equivalence

Provides Tests and Graphics for Assessing Tests of Equivalence

Provides statistical tests and graphics for assessing tests of equivalence. Such tests have similarity as the alternative hypothesis instead of the null. Sample data sets are included.

770

Analysis of Ecological and Environmental Data

evd

Functions for Extreme Value Distributions

Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.

771

Analysis of Ecological and Environmental Data

evdbayes

Bayesian Analysis in Extreme Value Theory

Provides functions for the bayesian analysis of extreme value models, using MCMC methods.

772

Analysis of Ecological and Environmental Data

evir

Extreme Values in R

Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.

773

Analysis of Ecological and Environmental Data

extRemes

Extreme Value Analysis

Functions for performing extreme value analysis.

774

Analysis of Ecological and Environmental Data

fast

Implementation of the Fourier Amplitude Sensitivity Test (FAST)

The Fourier Amplitude Sensitivity Test (FAST) is a method to determine global sensitivities of a model on parameter changes with relatively few model runs. This package implements this sensitivity analysis method.

775

Analysis of Ecological and Environmental Data

FD

Measuring functional diversity (FD) from multiple traits, and other tools for functional ecology

FD is a package to compute different multidimensional FD indices. It implements a distancebased framework to measure FD that allows any number and type of functional traits, and can also consider species relative abundances. It also contains other useful tools for functional ecology.

776

Analysis of Ecological and Environmental Data

flexmix

Flexible Mixture Modeling

A general framework for finite mixtures of regression models using the EM algorithm is implemented. The package provides the Estep and all data handling, while the Mstep can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and modelbased clustering.

777

Analysis of Ecological and Environmental Data

forecast

Forecasting Functions for Time Series and Linear Models

Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

778

Analysis of Ecological and Environmental Data

fso

Fuzzy Set Ordination

Fuzzy set ordination is a multivariate analysis used in ecology to relate the composition of samples to possible explanatory variables. While differing in theory and method, in practice, the use is similar to ‘constrained ordination.’ The package contains plotting and summary functions as well as the analyses

779

Analysis of Ecological and Environmental Data

gam

Generalized Additive Models

Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).

780

Analysis of Ecological and Environmental Data

gamair

Data for “GAMs: An Introduction with R”

Data sets and scripts used in the book “Generalized Additive Models: An Introduction with R”, Wood (2006) CRC.

781

Analysis of Ecological and Environmental Data

hydroGOF

GoodnessofFit Functions for Comparison of Simulated and Observed Hydrological Time Series

S3 functions implementing both statistical and graphical goodnessoffit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological models. Missing values in observed and/or simulated values can be removed before computations. Comments / questions / collaboration of any kind are very welcomed.

782

Analysis of Ecological and Environmental Data

HydroMe

R codes for estimating water retention and infiltration model parameters using experimental data

This package is version 2 of HydroMe v.1 package. It estimates the parameters in infiltration and water retention models by curvefitting method. The models considered are those that are commonly used in soil science. It has new models for water retention characteristic curve and debugging of errors in HydroMe v.1

783

Analysis of Ecological and Environmental Data

hydroPSO

Particle Swarm Optimisation, with focus on Environmental Models

This package implements a stateoftheart version of the Particle Swarm Optimisation (PSO) algorithm (SPSO2011 and SPSO2007 capable). hydroPSO can be used as a replacement of the ‘optim’ R function for (global) optimization of nonsmooth and nonlinear functions. However, the main focus of hydroPSO is the calibration of environmental and other realworld models that need to be executed from the system console. hydroPSO is modelindependent, allowing the user to easily interface any computer simulation model with the calibration engine (PSO). hydroPSO communicates with the model through the model’s own input and output files, without requiring access to the model’s source code. Several PSO variants and controlling options are included to finetune the performance of the calibration engine to different calibration problems. An advanced sensitivity analysis function together with userfriendly plotting summaries facilitate the interpretation and assessment of the calibration results. hydroPSO is parallelcapable, to alleviate the computational burden of complex models with “long” execution time. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian).

784

Analysis of Ecological and Environmental Data

hydroTSM

Time Series Management, Analysis and Interpolation for Hydrological Modelling

S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.

785

Analysis of Ecological and Environmental Data

Interpol.T

Hourly interpolation of multiple temperature daily series

Hourly interpolation of daily minimum and maximum temperature series. Carries out interpolation on multiple series ad once. Requires some hourly series for calibration (alternatively can use default calibration table).

786

Analysis of Ecological and Environmental Data

ipred

Improved Predictors

Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.

787

Analysis of Ecological and Environmental Data

ismev

An Introduction to Statistical Modeling of Extreme Values

Functions to support the computations carried out in ‘An Introduction to Statistical Modeling of Extreme Values’ by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes.

788

Analysis of Ecological and Environmental Data

labdsv (core)

Ordination and Multivariate Analysis for Ecology

A variety of ordination and community analyses useful in analysis of data sets in community ecology. Includes many of the common ordination methods, with graphical routines to facilitate their interpretation, as well as several novel analyses.

789

Analysis of Ecological and Environmental Data

latticeDensity

Density estimation and nonparametric regression on irregular regions

This package contains functions that compute the latticebased density estimator of Barry and McIntyre, which accounts for point processes in twodimensional regions with irregular boundaries and holes. The package also implements twodimensional nonparametric regression for similar regions.

790

Analysis of Ecological and Environmental Data

lme4

Linear MixedEffects Models using ‘Eigen’ and S4

Fit linear and generalized linear mixedeffects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

791

Analysis of Ecological and Environmental Data

maptree

Mapping, pruning, and graphing tree models

Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees.

792

Analysis of Ecological and Environmental Data

marked

MarkRecapture Analysis for Survival and Abundance Estimation

Functions for fitting various models to capturerecapture data including fixed and mixedeffects CormackJollySeber(CJS) for survival estimation and POPAN structured JollySeber models for abundance estimation. Includes a CJS models that concurrently estimates and corrects for tag loss. Hidden Markov model (HMM) implementations of CJS and multistate models with and without state uncertainty.

793

Analysis of Ecological and Environmental Data

MASS (core)

Support Functions and Datasets for Venables and Ripley’s MASS

Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).

794

Analysis of Ecological and Environmental Data

mclust

Gaussian Mixture Modelling for ModelBased Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for modelbased clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resamplingbased inference.

795

Analysis of Ecological and Environmental Data

mda

Mixture and Flexible Discriminant Analysis

Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, …

796

Analysis of Ecological and Environmental Data

mefa

Multivariate Data Handling in Ecology and Biogeography

A framework package aimed to provide standardized computational environment for specialist work via object classes to represent the data coded by samples, taxa and segments (i.e. subpopulations, repeated measures). It supports easy processing of the data along with cross tabulation and relational data tables for samples and taxa. An object of class ‘mefa’ is a project specific compendium of the data and can be easily used in further analyses. Methods are provided for extraction, aggregation, conversion, plotting, summary and reporting of ‘mefa’ objects. Reports can be generated in plain text or LaTeX format. Vignette contains worked examples.

797

Analysis of Ecological and Environmental Data

metacom

Analysis of the ‘Elements of Metacommunity Structure’

Functions to analyze coherence, boundary clumping, and turnover following the patternbased metacommunity analysis of Leibold and Mikkelson 2002 doi:10.1034/j.16000706.2002.970210.x. The package also includes functions to visualize ecological networks, and to calculate modularity as a replacement to boundary clumping.

798

Analysis of Ecological and Environmental Data

mgcv (core)

Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar. Includes a gam() function, a wide variety of smoothers, JAGS support and distributions beyond the exponential family.

799

Analysis of Ecological and Environmental Data

mrds

MarkRecapture Distance Sampling

Animal abundance estimation via conventional, multiple covariate and markrecapture distance sampling (CDS/MCDS/MRDS). Detection function fitting is performed via maximum likelihood. Also included are diagnostics and plotting for fitted detection functions. Abundance estimation is via a HorvitzThompsonlike estimator.

800

Analysis of Ecological and Environmental Data

nlme

Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixedeffects models.

801

Analysis of Ecological and Environmental Data

nsRFA

Nonsupervised Regional Frequency Analysis

A collection of statistical tools for objective (nonsupervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the indexvalue method and, more precisely, helps the hydrologist to: (1) regionalize the indexvalue; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves.

802

Analysis of Ecological and Environmental Data

oce

Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including ‘ADCP’ measurements, measurements made with ‘argo’ floats, ‘CTD’ measurements, sectional data, sealevel time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the ‘UNESCO’ or ‘TEOS10’ equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature.

803

Analysis of Ecological and Environmental Data

openair

Tools for the Analysis of Air Pollution Data

Tools to analyse, interpret and understand air pollution data. Data are typically hourly time series and both monitoring data and dispersion model output can be analysed. Many functions can also be applied to other data, including meteorological and traffic data.

804

Analysis of Ecological and Environmental Data

ouch

OrnsteinUhlenbeck Models for Phylogenetic Comparative Hypotheses

Fit and compare OrnsteinUhlenbeck models for evolution along a phylogenetic tree.

805

Analysis of Ecological and Environmental Data

party

A Laboratory for Recursive Partytioning

A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed treestructured regression models into a well defined theory of conditional inference procedures. This nonparametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing treestructured regression models is available. The methods are described in Hothorn et al. (2006) doi:10.1198/106186006X133933, Zeileis et al. (2008) doi:10.1198/106186008X319331 and Strobl et al. (2007) doi:10.1186/14712105825.

806

Analysis of Ecological and Environmental Data

pastecs

Package for Analysis of SpaceTime Ecological Series

Regulation, decomposition and analysis of spacetime series. The pastecs library is a PNECArt4 and IFREMER (Benoit Beliaeff Benoit.Beliaeff@ifremer.fr) initiative to bring PASSTEC 2000 (http://www.obsvlfr.fr/~enseigne/anado/passtec/passtec.htm) functionalities to R.

807

Analysis of Ecological and Environmental Data

pgirmess

Data Analysis in Ecology

Miscellaneous functions for data analysis in ecology, with special emphasis on spatial data.

808

Analysis of Ecological and Environmental Data

popbio

Construction and Analysis of Matrix Population Models

Construct and analyze projection matrix models from a demography study of marked individuals classified by age or stage. The package covers methods described in Matrix Population Models by Caswell (2001) and Quantitative Conservation Biology by Morris and Doak (2002).

809

Analysis of Ecological and Environmental Data

prabclus

Functions for Clustering of PresenceAbsence, Abundance and Multilocus Genetic Data

Distancebased parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presenceabsence, abundance and multilocus genetical data for species delimitation, nearest neighbor based noise detection. Try package?prabclus for on overview.

810

Analysis of Ecological and Environmental Data

primer

Functions and data for A Primer of Ecology with R

Functions are primarily functions for systems of ordinary differential equations, difference equations, and eigenanalysis and projection of demographic matrices; data are for examples.

811

Analysis of Ecological and Environmental Data

pscl

Political Science Computational Laboratory

Bayesian analysis of itemresponse theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zeroinflated and hurdle models for count data; goodnessoffit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seatsvotes curves.

812

Analysis of Ecological and Environmental Data

pvclust

Hierarchical Clustering with PValues via Multiscale Bootstrap Resampling

An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) pvalue as well as BP (bootstrap probability) value for each cluster in a dendrogram.

813

Analysis of Ecological and Environmental Data

qualV

Qualitative Validation Methods

Qualitative methods for the validation of dynamic models. It contains (i) an orthogonal set of deviance measures for absolute, relative and ordinal scale and (ii) approaches accounting for time shifts. The first approach transforms time to take time delays and speed differences into account. The second divides the time series into interval units according to their main features and finds the longest common subsequence (LCS) using a dynamic programming algorithm.

814

Analysis of Ecological and Environmental Data

quantreg

Quantile Regression

Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and nonparametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.

815

Analysis of Ecological and Environmental Data

quantregGrowth

Growth Charts via Regression Quantiles

Fits noncrossing regression quantiles as a function of linear covariates and a smooth terms via Bsplines with difference penalties.

816

Analysis of Ecological and Environmental Data

randomForest

Breiman and Cutler’s Random Forests for Classification and Regression

Classification and regression based on a forest of trees using random inputs.

817

Analysis of Ecological and Environmental Data

Rcapture

Loglinear Models for CaptureRecapture Experiments

Estimation of abundance and other of demographic parameters for closed populations, open populations and the robust design in capturerecapture experiments using loglinear models.

818

Analysis of Ecological and Environmental Data

rioja

Analysis of Quaternary Science Data

Functions for the analysis of Quaternary science data, including constrained clustering, WA, WAPLS, IKFA, MLRC and MAT transfer functions, and stratigraphic diagrams.

819

Analysis of Ecological and Environmental Data

RMark

R Code for Mark Analysis

An interface to the software package MARK that constructs input files for MARK and extracts the output. MARK was developed by Gary White and is freely available at http://www.phidot.org/software/mark/downloads/ but is not open source.

820

Analysis of Ecological and Environmental Data

RMAWGEN

MultiSite AutoRegressive Weather GENerator

S3 and S4 functions are implemented for spatial multisite stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental “Gaussianized” time series through the ‘vars’ package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.

821

Analysis of Ecological and Environmental Data

rpart

Recursive Partitioning and Regression Trees

Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.

822

Analysis of Ecological and Environmental Data

rtop

Interpolation of Data with Variable Spatial Support

Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.

823

Analysis of Ecological and Environmental Data

seacarb

Seawater Carbonate Chemistry

Calculates parameters of the seawater carbonate system and assists the design of ocean acidification perturbation experiments.

824

Analysis of Ecological and Environmental Data

seas

Seasonal analysis and graphics, especially for climatology

Capable of deriving seasonal statistics, such as “normals”, and analysis of seasonal data, such as departures. This package also has graphics capabilities for representing seasonal data, including boxplots for seasonal parameters, and bars for summed normals. There are many specific functions related to climatology, including precipitation normals, temperature normals, cumulative precipitation departures and precipitation interarrivals. However, this package is designed to represent any timevarying parameter with a discernible seasonal signal, such as found in hydrology and ecology.

825

Analysis of Ecological and Environmental Data

secr

Spatially Explicit CaptureRecapture

Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distancedependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.

826

Analysis of Ecological and Environmental Data

segmented

Regression Models with BreakPoints / ChangePoints Estimation

Given a regression model, segmented ‘updates’ the model by adding one or more segmented (i.e., piecewise linear) relationships. Several variables with multiple breakpoints are allowed.

827

Analysis of Ecological and Environmental Data

sensitivity

Global Sensitivity Analysis of Model Outputs

A collection of functions for factor screening, global sensitivity analysis and reliability sensitivity analysis. Most of the functions have to be applied on model with scalar output, but several functions support multidimensional outputs.

828

Analysis of Ecological and Environmental Data

simba

A Collection of functions for similarity analysis of vegetation data

Besides functions for the calculation of similarity and multiple plot similarity measures with binary data (for instance presence/absence species data) the package contains some simple wrapper functions for reshaping species lists into matrices and vice versa and some other functions for further processing of similarity data (Mantellike permutation procedures) as well as some other useful stuff for vegetation analysis.

829

Analysis of Ecological and Environmental Data

simecol

Simulation of Ecological (and Other) Dynamic Systems

An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individualbased (or agentbased) and other models as well. The package helps to organize scenarios (to avoid copy and paste) and aims to improve readability and usability of code.

830

Analysis of Ecological and Environmental Data

siplab

Spatial IndividualPlant Modelling

A platform for experimenting with spatially explicit individualbased vegetation models.

831

Analysis of Ecological and Environmental Data

soiltexture

Functions for Soil Texture Plot, Classification and Transformation

“The Soil Texture Wizard” is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, rightangled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui().

832

Analysis of Ecological and Environmental Data

SPACECAP

A Program to Estimate Animal Abundance and Density using Bayesian SpatiallyExplicit CaptureRecapture Models

SPACECAP is a userfriendly software package for estimating animal densities using closed model capturerecapture sampling based on photographic captures using Bayesian spatiallyexplicit capturerecapture models. This approach offers advantage such as: substantially dealing with problems posed by individual heterogeneity in capture probabilities in conventional capturerecapture analyses. It also offers nonasymptotic inferences which are more appropriate for small samples of capture data typical of photocapture studies.

833

Analysis of Ecological and Environmental Data

SpatialExtremes

Modelling Spatial Extremes

Tools for the statistical modelling of spatial extremes using maxstable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric maxstable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple maxstable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the socalled conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) doi:10.1214/11STS376, Padoan et al. (2010) doi:10.1198/jasa.2009.tm08577, Dombry et al. (2013) doi:10.1093/biomet/ass067.

834

Analysis of Ecological and Environmental Data

StreamMetabolism

Calculate Single Station Metabolism from Diurnal Oxygen Curves

I provide functions to calculate Gross Primary Productivity, Net Ecosystem Production, and Ecosystem Respiration from single station diurnal Oxygen curves.

835

Analysis of Ecological and Environmental Data

strucchange

Testing, Monitoring, and Dating Structural Changes

Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.

836

Analysis of Ecological and Environmental Data

surveillance

Temporal and SpatioTemporal Modeling and Monitoring of Epidemic Phenomena

Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuoustime point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLRCUSUM method of Hohle and Paul (2008) doi:10.1016/j.csda.2008.02.015. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several realworld data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatiotemporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) doi:10.18637/jss.v070.i10. For the retrospective analysis of epidemic spread, the package provides three endemicepidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. ‘hhh4’ estimates models for (multivariate) count time series following Paul and Held (2011) doi:10.1002/sim.4177 and Meyer and Held (2014) doi:10.1214/14AOAS743. ‘twinSIR’ models the susceptibleinfectiousrecovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hohle (2009) doi:10.1002/bimj.200900050. ‘twinstim’ estimates selfexciting point process models for a spatiotemporal point pattern of infective events, e.g., timestamped georeferenced surveillance data, as proposed by Meyer et al. (2012) doi:10.1111/j.15410420.2011.01684.x. A recent overview of the implemented spacetime modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) doi:10.18637/jss.v077.i11.

837

Analysis of Ecological and Environmental Data

tiger

TIme series of Grouped ERrors

Temporally resolved groups of typical differences (errors) between two time series are determined and visualized

838

Analysis of Ecological and Environmental Data

topmodel

Implementation of the hydrological model TOPMODEL in R

Set of hydrological functions including an R implementation of the hydrological model TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. From version 0.7.0, the package is put into maintenance mode. New functions for hydrological analysis are now developed as part of the RHydro package. RHydro can be found on Rforge and is built on a set of dedicated S4 classes.

839

Analysis of Ecological and Environmental Data

tseries

Time Series Analysis and Computational Finance

Time series analysis and computational finance.

840

Analysis of Ecological and Environmental Data

unmarked

Models for Data from Unmarked Animals

Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates.

841

Analysis of Ecological and Environmental Data

untb

ecological drift under the UNTB

A collection of utilities for biodiversity data. Includes the simulation of ecological drift under Hubbell’s Unified Neutral Theory of Biodiversity, and the calculation of various diagnostics such as Preston curves. Now includes functionality provided by Francois Munoz and Andrea Manica.

842

Analysis of Ecological and Environmental Data

vegan (core)

Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

843

Analysis of Ecological and Environmental Data

vegetarian

Jost Diversity Measures for Community Data

This package computes diversity for community data sets using the methods outlined by Jost (2006, 2007). While there are differing opinions on the ideal way to calculate diversity (e.g. Magurran 2004), this method offers the advantage of providing diversity numbers equivalents, independent alpha and beta diversities, and the ability to incorporate ‘order’ (q) as a continuous measure of the importance of rare species in the metrics. The functions provided in this package largely correspond with the equations offered by Jost in the cited papers. The package computes alpha diversities, beta diversities, gamma diversities, and similarity indices. Confidence intervals for diversity measures are calculated using a bootstrap method described by Chao et al. (2008). For datasets with many samples (sites, plots), sim.table creates tables of all pairwise comparisons possible, and for grouped samples sim.groups calculates pairwise combinations of within and betweengroup comparisons.

844

Analysis of Ecological and Environmental Data

VGAM

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. At the heart of it are the vector generalized linear and additive model (VGLM/VGAM) classes, and the book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) doi:10.1007/9781493928187 gives details of the statistical framework and VGAM package. Currently only fixedeffects models are implemented, i.e., no randomeffects models. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE, using Fisher scoring. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs (i.e., with smoothing). The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

845

Analysis of Ecological and Environmental Data

wasim

Visualisation and analysis of output files of the hydrological model WASIM

Helpful tools for data processing and visualisation of results of the hydrological model WASIMETH.

846

Analysis of Ecological and Environmental Data

zoo

S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)

An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.

847

Design of Experiments (DoE) & Analysis of Experimental Data

acebayes

Optimal Bayesian Experimental Design using the ACE Algorithm

Optimal Bayesian experimental design using the approximate coordinate exchange (ACE) algorithm.

848

Design of Experiments (DoE) & Analysis of Experimental Data

agricolae (core)

Statistical Procedures for Agricultural Research

Original idea was presented in the thesis “A statistical analysis tool for agricultural research” to obtain the degree of Master on science, National Engineering University (UNI), LimaPeru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, GraecoLatin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several nonparametric tests comparison, biodiversity indexes and consensus cluster.

849

Design of Experiments (DoE) & Analysis of Experimental Data

agridat

Agricultural Datasets

Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from smallplot trials, multienvironment trials, uniformity trials, yield monitors, and more.

850

Design of Experiments (DoE) & Analysis of Experimental Data

AlgDesign (core)

Algorithmic Experimental Design

Algorithmic experimental designs. Calculates exact and approximate theory experimental designs for D,A, and I criteria. Very large designs may be created. Experimental designs may be blocked or blocked designs created from a candidate list, using several criteria. The blocking can be done when whole and within plot factors interact.

851

Design of Experiments (DoE) & Analysis of Experimental Data

ALTopt

Optimal Experimental Designs for Accelerated Life Testing

Creates the optimal (D, U and I) designs for the accelerated life testing with right censoring or interval censoring. It uses generalized linear model (GLM) approach to derive the asymptotic variancecovariance matrix of regression coefficients. The failure time distribution is assumed to follow Weibull distribution with a known shape parameter and loglinear link functions are used to model the relationship between failure time parameters and stress variables. The acceleration model may have multiple stress factors, although most ALTs involve only two or less stress factors. ALTopt package also provides several plotting functions including contour plot, Fraction of Use Space (FUS) plot and Variance Dispersion graphs of Use Space (VDUS) plot.

852

Design of Experiments (DoE) & Analysis of Experimental Data

asd

Simulations for Adaptive Seamless Designs

Package runs simulations for adaptive seamless designs with and without early outcomes for treatment selection and subpopulation type designs.

853

Design of Experiments (DoE) & Analysis of Experimental Data

BatchExperiments

Statistical Experiments on Batch Computing Clusters

Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.

854

Design of Experiments (DoE) & Analysis of Experimental Data

BayesMAMS

Designing Bayesian MultiArm MultiStage Studies

Calculating Bayesian sample sizes for multiarm trials where several experimental treatments are compared to a common control, perhaps even at multiple stages.

855

Design of Experiments (DoE) & Analysis of Experimental Data

bcrm

Bayesian Continual Reassessment Method for Phase I DoseEscalation Trials

Implements a wide variety of one and twoparameter Bayesian CRM designs. The program can run interactively, allowing the user to enter outcomes after each cohort has been recruited, or via simulation to assess operating characteristics.

856

Design of Experiments (DoE) & Analysis of Experimental Data

BHH2

Useful Functions for Box, Hunter and Hunter II

Functions and data sets reproducing some examples in Box, Hunter and Hunter II. Useful for statistical design of experiments, especially factorial experiments.

857

Design of Experiments (DoE) & Analysis of Experimental Data

binseqtest

Exact Binary Sequential Designs and Analysis

For a series of binary responses, create stopping boundary with exact results after stopping, allowing updating for missing assessments.

858

Design of Experiments (DoE) & Analysis of Experimental Data

bioOED

Sensitivity Analysis and Optimum Experiment Design for Microbial Inactivation

Extends the bioinactivation package with functions for Sensitivity Analysis and Optimum Experiment Design.

859

Design of Experiments (DoE) & Analysis of Experimental Data

blocksdesign

Nested and Crossed Block Designs for Factorial, Fractional Factorial and Unstructured Treatment Sets

The ‘blocksdesign’ package constructs nested block and Doptimal factorial designs for any unstructured or factorial treatment model of any size. The nested block designs can have repeated nesting down to any required depth of nesting with either a simple set of nested blocks or a crossed rowandcolumn blocks design at each level of nesting. The block design at each level of nesting is optimized for Defficiency within the blocks of each preceding set of blocks. The block sizes in any particular block classification are always as nearly equal as possible and never differ by more than a single plot. Outputs include a table showing the allocation of treatments to blocks, a plan layout showing the allocation of treatments within blocks (unstructured treatment designs only) the achieved D and Aefficiency factors for the block and treatment design (factorial treatment designs only) and, where feasible, an Aefficiency upper bound for the block design (unstructured treatment designs only).

860

Design of Experiments (DoE) & Analysis of Experimental Data

blockTools

Block, Assign, and Diagnose Potential Interference in Randomized Experiments

Blocks units into experimental blocks, with one unit per treatment condition, by creating a measure of multivariate distance between all possible pairs of units. Maximum, minimum, or an allowable range of differences between units on one variable can be set. Randomly assign units to treatment conditions. Diagnose potential interference between units assigned to different treatment conditions. Write outputs to .tex and .csv files.

861

Design of Experiments (DoE) & Analysis of Experimental Data

BOIN

Bayesian Optimal INterval (BOIN) Design for SingleAgent and Drug Combination Phase I Clinical Trials

The Bayesian optimal interval (BOIN) design is a novel phase I clinical trial design for finding the maximum tolerated dose (MTD). It can be used to design both singleagent and drugcombination trials. The BOIN design is motivated by the top priority and concern of clinicians when testing a new drug, which is to effectively treat patients and minimize the chance of exposing them to subtherapeutic or overly toxic doses. The prominent advantage of the BOIN design is that it achieves simplicity and superior performance at the same time. The BOIN design is algorithmbased and can be implemented in a simple way similar to the traditional 3+3 design. The BOIN design yields an average performance that is comparable to that of the continual reassessment method (CRM, one of the best modelbased designs) in terms of selecting the MTD, but has a substantially lower risk of assigning patients to subtherapeutic or overly toxic doses.

862

Design of Experiments (DoE) & Analysis of Experimental Data

BsMD

Bayes Screening and Model Discrimination

Bayes screening and model discrimination followup designs.

863

Design of Experiments (DoE) & Analysis of Experimental Data

choiceDes

Design Functions for Choice Studies

This package consists of functions to design DCMs and other types of choice studies (including MaxDiff and other tradeoffs)

864

Design of Experiments (DoE) & Analysis of Experimental Data

CombinS

Construction Methods of some Series of PBIB Designs

Series of partially balanced incomplete block designs (PBIB) based on the combinatory method (S) introduced in (Imane Rezgui et al, 2014) doi:10.3844/jmssp.2014.45.48; and it gives their associated Utype design.

865

Design of Experiments (DoE) & Analysis of Experimental Data

conf.design (core)

Construction of factorial designs

This small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs.

866

Design of Experiments (DoE) & Analysis of Experimental Data

crmPack

ObjectOriented Implementation of CRM Designs

Implements a wide range of modelbased dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on doselimiting toxicity endpoints to dualendpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with nonBayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules.

867

Design of Experiments (DoE) & Analysis of Experimental Data

crossdes (core)

Construction of Crossover Designs

Contains functions for the construction of carryover balanced crossover designs. In addition contains functions to check given designs for balance.

868

Design of Experiments (DoE) & Analysis of Experimental Data

Crossover

Analysis and Search of Crossover Designs

Package Crossover provides different crossover designs from combinatorial or search algorithms as well as from literature and a GUI to access them.

869

Design of Experiments (DoE) & Analysis of Experimental Data

dae

Functions Useful in the Design and ANOVA of Experiments

The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the Design functions for randomizing and assessing designs available in the file ‘daeDesignNotes.pdf’. The ANOVA functions facilitate the extraction of information when the ‘Error’ function has been used in the call to ‘aov’.

870

Design of Experiments (DoE) & Analysis of Experimental Data

daewr

Design and Analysis of Experiments with R

Contains Data frames and functions used in the book “Design and Analysis of Experiments with R”.

871

Design of Experiments (DoE) & Analysis of Experimental Data

designGG

Computational tool for designing genetical genomics experiments

The package provides R scripts for designing genetical genomics experiments.

872

Design of Experiments (DoE) & Analysis of Experimental Data

designGLMM

Finding Optimal Block Designs for a Generalised Linear Mixed Model

Use simulated annealing to find optimal designs for Poisson regression models with blocks.

873

Design of Experiments (DoE) & Analysis of Experimental Data

designmatch

Matched Samples that are Balanced and Representative by Design

Includes functions for the construction of matched samples that are balanced and representative by design. Among others, these functions can be used for matching in observational studies with treated and control units, with cases and controls, in related settings with instrumental variables, and in discontinuity designs. Also, they can be used for the design of randomized experiments, for example, for matching before randomization. By default, ‘designmatch’ uses the ‘GLPK’ optimization solver, but its performance is greatly enhanced by the ‘Gurobi’ optimization solver and its associated R interface. For their installation, please follow the instructions at http://user.gurobi.com/download/gurobioptimizer and http://www.gurobi.com/documentation/7.0/refman/r_api_overview.html. We have also included directions in the gurobi_installation file in the inst folder.

874

Design of Experiments (DoE) & Analysis of Experimental Data

desirability

Function Optimization and Ranking via Desirability Functions

S3 classes for multivariate optimization using the desirability function by Derringer and Suich (1980).

875

Design of Experiments (DoE) & Analysis of Experimental Data

desplot

Plotting Field Plans for Agricultural Experiments

A function for plotting maps of agricultural field experiments that are laid out in grids.

876

Design of Experiments (DoE) & Analysis of Experimental Data

dfcomb

Phase I/II Adaptive DoseFinding Design for Combination Studies

Phase I/II adaptive dosefinding design for combination studies. Several methods are proposed depending on the type of combinations: (1) the combination of two cytotoxic agents, and (2) combination of a molecularly targeted agent with a cytotoxic agent.

877

Design of Experiments (DoE) & Analysis of Experimental Data

dfcrm

Dosefinding by the continual reassessment method

This package provides functions to run the CRM and TITECRM in phase I trials and calibration tools for trial planning purposes.

878

Design of Experiments (DoE) & Analysis of Experimental Data

dfmta

Phase I/II Adaptive DoseFinding Design for MTA

Phase I/II adaptive dosefinding design for singleagent Molecularly Targeted Agent (MTA), according to the paper “Phase I/II DoseFinding Design for Molecularly Targeted Agent: Plateau Determination using Adaptive Randomization”.

879

Design of Experiments (DoE) & Analysis of Experimental Data

dfpk

Bayesian DoseFinding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials

Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).

880

Design of Experiments (DoE) & Analysis of Experimental Data

DiceDesign

Designs of Computer Experiments

SpaceFilling Designs and Uniformity Criteria.

881

Design of Experiments (DoE) & Analysis of Experimental Data

DiceEval

Construction and Evaluation of Metamodels

Estimation, validation and prediction of models of different types : linear models, additive models, MARS,PolyMARS and Kriging.

882

Design of Experiments (DoE) & Analysis of Experimental Data

DiceKriging

Kriging Methods for Computer Experiments

Estimation, validation and prediction of kriging models. Important functions : km, print.km, plot.km, predict.km.

883

Design of Experiments (DoE) & Analysis of Experimental Data

DiceView

Plot methods for computer experiments design and surrogate

View 2D/3D sections or contours of computer experiments designs, surrogates or test functions.

884

Design of Experiments (DoE) & Analysis of Experimental Data

docopulae

Optimal Designs for Copula Models

A direct approach to optimal designs for copula models based on the Fisher information. Provides flexible functions for building joint PDFs, evaluating the Fisher information and finding optimal designs. It includes an extensible solution to summation and integration called ‘nint’, functions for transforming, plotting and comparing designs, as well as a set of tools for common lowlevel tasks.

885

Design of Experiments (DoE) & Analysis of Experimental Data

DoE.base (core)

Full Factorials, Orthogonal Arrays and Base Utilities for DoE Packages

Package DoE.base creates full factorial experimental designs and designs based on orthogonal arrays for (industrial) experiments. Additionally, it provides utility functions for the class design, which is also used by other packages for designed experiments.

886

Design of Experiments (DoE) & Analysis of Experimental Data

DoE.MIParray

Creation of Arrays by Mixed Integer Programming

‘CRAN’ package ‘DoE.base’ and non‘CRAN’ packages ‘gurobi’ and ‘Rmosek’ (newer version than that on ‘CRAN’) are enhanced with functionality for the creation of optimized arrays for experimentation, where optimization is in terms of generalized minimum aberration. It is also possible to optimally extend existing arrays to larger run size. Optimization requires the availability of at least one of the commercial products ‘Gurobi’ or ‘Mosek’ (free academic licenses available for both). For installing ‘Gurobi’ and its R package ‘gurobi’, follow instructions at http://www.gurobi.com/downloads/gurobioptimizer and http://www.gurobi.com/documentation/7.5/refman/r_api_overview.html. For installing ‘Mosek’ and its R package ‘Rmosek’, follow instructions at https://www.mosek.com/downloads/ and http://docs.mosek.com/8.1/rmosek/installinterface.html.

887

Design of Experiments (DoE) & Analysis of Experimental Data

DoE.wrapper (core)

Wrapper Package for Design of Experiments Functionality

Various kinds of designs for (industrial) experiments can be created. The package uses, and sometimes enhances, design generation routines from other packages. So far, response surface designs from package rsm, latin hypercube samples from packages lhs and DiceDesign, and Doptimal designs from package AlgDesign have been implemented.

888

Design of Experiments (DoE) & Analysis of Experimental Data

DoseFinding

Planning and Analyzing Dose Finding Experiments

The DoseFinding package provides functions for the design and analysis of dosefinding experiments (with focus on pharmaceutical Phase II clinical trials). It provides functions for: multiple contrast tests, fitting nonlinear doseresponse models (using Bayesian and nonBayesian estimation), calculating optimal designs and an implementation of the MCPMod methodology.

889

Design of Experiments (DoE) & Analysis of Experimental Data

dynaTree

Dynamic Trees for Learning and Design

Inference by sequential Monte Carlo for dynamic tree regression and classification models with hooks provided for sequential design and optimization, fully online learning with drift, variable selection, and sensitivity analysis of inputs. Illustrative examples from the original dynamic trees paper are facilitated by demos in the package; see demo(package=“dynaTree”).

890

Design of Experiments (DoE) & Analysis of Experimental Data

easypower

Sample Size Estimation for Experimental Designs

Power analysis is used in the estimation of sample sizes for experimental designs. Most programs and R packages will only output the highest recommended sample size to the user. Often the user input can be complicated and computing multiple power analyses for different treatment comparisons can be time consuming. This package simplifies the user input and allows the user to view all of the sample size recommendations or just the ones they want to see. The calculations used to calculate the recommended sample sizes are from the ‘pwr’ package.

891

Design of Experiments (DoE) & Analysis of Experimental Data

edesign

Maximum Entropy Sampling

An implementation of maximum entropy sampling for spatial data is provided. An exact branchandbound algorithm as well as greedy and dual greedy heuristics are included.

892

Design of Experiments (DoE) & Analysis of Experimental Data

EngrExpt

Data sets from “Introductory Statistics for Engineering Experimentation”

Datasets from Nelson, Coffin and Copeland “Introductory Statistics for Engineering Experimentation” (Elsevier, 2003) with sample code.

893

Design of Experiments (DoE) & Analysis of Experimental Data

experiment

experiment: R package for designing and analyzing randomized experiments

The package provides various statistical methods for designing and analyzing randomized experiments. One main functionality of the package is the implementation of randomizedblock and matchedpair designs based on possibly multivariate pretreatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.

894

Design of Experiments (DoE) & Analysis of Experimental Data

ez

Easy Analysis and Visualization of Factorial Experiments

Facilitates easy analysis of factorial experiments, including purely withinSs designs (a.k.a. “repeated measures”), purely betweenSs designs, and mixed withinandbetweenSs designs. The functions in this package aim to provide simple, intuitive and consistent specification of data analysis and visualization. Visualization functions also include design visualization for preanalysis data auditing, and correlation matrix visualization. Finally, this package includes functions for nonparametric analysis, including permutation tests and bootstrap resampling. The bootstrap function obtains predictions either by cell means or by more advanced/powerful mixed effects models, yielding predictions and confidence intervals that may be easily visualized at any level of the experiment’s design.

895

Design of Experiments (DoE) & Analysis of Experimental Data

FMC

Factorial Experiments with Minimum Level Changes

Generate cost effective minimally changed run sequences for symmetrical as well as asymmetrical factorial designs.

896

Design of Experiments (DoE) & Analysis of Experimental Data

FrF2 (core)

Fractional Factorial Designs with 2Level Factors

Regular and nonregular Fractional Factorial 2level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the builtin function alias).

897

Design of Experiments (DoE) & Analysis of Experimental Data

FrF2.catlg128

Catalogues of resolution IV 128 run 2level fractional factorials up to 33 factors that do have 5letter words

This package provides catalogues of resolution IV regular fractional factorial designs in 128 runs for up to 33 2level factors. The catalogues are complete, excluding resolution IV designs without 5letter words, because these do not add value for a search for clear designs. The previous package version 1.0 with complete catalogues up to 24 runs (24 runs and a namespace added later) can be downloaded from the authors website.

898

Design of Experiments (DoE) & Analysis of Experimental Data

GAD

GAD: Analysis of variance from general principles

This package analyses complex ANOVA models with any combination of orthogonal/nested and fixed/random factors, as described by Underwood (1997). There are two restrictions: (i) data must be balanced; (ii) fixed nested factors are not allowed. Homogeneity of variances is checked using Cochran’s C test and ‘a posteriori’ comparisons of means are done using StudentNewmanKeuls (SNK) procedure.

899

Design of Experiments (DoE) & Analysis of Experimental Data

geospt

Geostatistical Analysis and Design of Optimal Spatial Sampling Networks

Estimation of the variogram through trimmed mean, radial basis functions (optimization, prediction and crossvalidation), summary statistics from crossvalidation, pocket plot, and design of optimal sampling networks through sequential and simultaneous points methods.

900

Design of Experiments (DoE) & Analysis of Experimental Data

granova

Graphical Analysis of Variance

This small collection of functions provides what we call elemental graphics for display of anova results. The term elemental derives from the fact that each function is aimed at construction of graphical displays that afford direct visualizations of data with respect to the fundamental questions that drive the particular anova methods. The two main functions are granova.1w (a graphic for one way anova) and granova.2w (a corresponding graphic for two way anova). These functions were written to display data for any number of groups, regardless of their sizes (however, very large data sets or numbers of groups can be problematic). For these two functions a specialized approach is used to construct databased contrast vectors for which anova data are displayed. The result is that the graphics use straight lines, and when appropriate flat surfaces, to facilitate clear interpretations while being faithful to the standard effect tests in anova. The graphic results are complementary to standard summary tables for these two basic kinds of analysis of variance; numerical summary results of analyses are also provided as side effects. Two additional functions are granova.ds (for comparing two dependent samples), and granova.contr (which provides graphic displays for a priori contrasts). All functions provide relevant numerical results to supplement the graphic displays of anova data. The graphics based on these functions should be especially helpful for learning how the methods have been applied to answer the question(s) posed. This means they can be particularly helpful for students and nonstatistician analysts. But these methods should be quite generally helpful for workaday applications of all kinds, as they can help to identify outliers, clusters or patterns, as well as highlight the role of nonlinear transformations of data. In the case of granova.1w and granova.ds especially, several arguments are provided to facilitate flexibility in the construction of graphics that accommodate diverse features of data, according to their corresponding display requirements. See the help files for individual functions.

901

Design of Experiments (DoE) & Analysis of Experimental Data

GroupSeq

A GUIBased Program to Compute Probabilities Regarding Group Sequential Designs

A graphical user interface to compute group sequential designs based on normally distributed test statistics, particularly critical boundaries, power, drift, and confidence intervals of such designs. All computations are based on the alpha spending approach by LanDeMets with various alpha spending functions being available to choose among.

902

Design of Experiments (DoE) & Analysis of Experimental Data

gsbDesign

Group Sequential Bayes Design

Group Sequential Operating Characteristics for Clinical, Bayesian twoarm Trials with known Sigma and Normal Endpoints.

903

Design of Experiments (DoE) & Analysis of Experimental Data

gsDesign

Group Sequential Design

Derives group sequential designs and describes their properties.

904

Design of Experiments (DoE) & Analysis of Experimental Data

gset

Group Sequential Design in Equivalence Studies

calculate equivalence and futility boundaries based on the exact bivariate \(t\) test statistics for group sequential designs in studies with equivalence hypotheses.

905

Design of Experiments (DoE) & Analysis of Experimental Data

hiPOD

hierarchical Pooled Optimal Design

Based on hierarchical modeling, this package provides a few practical functions to find and present the optimal designs for a pooled NGS design.

906

Design of Experiments (DoE) & Analysis of Experimental Data

ibd

INCOMPLETE BLOCK DESIGNS

This package contains several utility functions related to incomplete block designs. The package contains function to generate efficient incomplete block designs with given numbers of treatments, blocks and block size. The package also contains function to generate an incomplete block design with specified concurrence matrix. There are functions to generate balanced treatment incomplete block designs and incomplete block designs for test versus control treatments comparisons with specified concurrence matrix. Package also allows performing analysis of variance of data and computing least square means of factors from experiments using a connected incomplete block design. Tests of hypothesis of treatment contrasts in incomplete block design set up is supported.

907

Design of Experiments (DoE) & Analysis of Experimental Data

ICAOD

Imperialist Competitive Algorithm for Optimal Designs

Finding locally Doptimal, minimax Doptimal, standardized maximin Doptimal, optimontheaverage and multiple objective optimal designs for nonlinear models. Different Fisher information matrices can also be set by user. There are also useful functions for verifying the optimality of the designs with respect to different criteria by equivalence theorem. ICA is a metaheuristic evolutionary algorithm inspired from the sociopolitical process of humans. See Masoudi et al. (2016) doi:10.1016/j.csda.2016.06.014.

908

Design of Experiments (DoE) & Analysis of Experimental Data

idefix

Efficient Designs for Discrete Choice Experiments

Generates efficient designs for discrete choice experiments based on the multinomial logit model, and individually adapted designs for the mixed multinomial logit model. Crabbe M, Akinc D and Vandebroek M (2014) doi:10.1016/j.trb.2013.11.008.

909

Design of Experiments (DoE) & Analysis of Experimental Data

JMdesign

Joint Modeling of Longitudinal and Survival Data  Power Calculation

Performs power calculations for joint modeling of longitudinal and survival data with kth order trajectories when the variancecovariance matrix, Sigma_theta, is unknown.

910

Design of Experiments (DoE) & Analysis of Experimental Data

LDOD

Finding Locally Doptimal optimal designs for some nonlinear and generalized linear models

this package provides functions for Finding Locally Doptimal designs for Logistic, Negative Binomial, Poisson, MichaelisMenten, Exponential, LogLinear, Emax, Richards, Weibull and Inverse Quadratic regression models and also functions for autoconstructing Fisher information matrix and Frechet derivative based on some input variables and without userinterfere.

911

Design of Experiments (DoE) & Analysis of Experimental Data

lhs

Latin Hypercube Samples

Provides a number of methods for creating and augmenting Latin Hypercube Samples.

912

Design of Experiments (DoE) & Analysis of Experimental Data

MAMS

Designing MultiArm MultiStage Studies

Designing multiarm multistage studies with (asymptotically) normal endpoints and known variance.

913

Design of Experiments (DoE) & Analysis of Experimental Data

MaxPro

Maximum Projection Designs

Generate a maximum projection (MaxPro) design, a MaxPro Latin hypercube design or improve an initial design based on the MaxPro criterion. Details of the MaxPro criterion can be found in: Joseph, V. R., Gul, E., and Ba, S. (2015) “Maximum Projection Designs for Computer Experiments”, Biometrika.

914

Design of Experiments (DoE) & Analysis of Experimental Data

MBHdesign

Spatial Designs for Ecological and Environmental Surveys

Provides spatially balanced designs from a set of (contiguous) potential sampling locations in a study region. Accommodates , without detrimental effects on spatial balance, sites that the researcher wishes to include in the survey for reasons other than the current randomisation (legacy sites).

915

Design of Experiments (DoE) & Analysis of Experimental Data

minimalRSD

Minimally Changed CCD and BBD

Generate central composite designs (CCD)with full as well as fractional factorial points (half replicate) and Box Behnken designs (BBD) with minimally changed run sequence.

916

Design of Experiments (DoE) & Analysis of Experimental Data

minimaxdesign

Minimax and Minimax Projection Designs

Provides two main functions: mMcPSO() and miniMaxPro(), which generates minimax designs and minimax projection designs using a hybrid clustering  particle swarm optimization (PSO) algorithm. These designs can be used in a variety of settings, e.g., as spacefilling designs for computer experiments or sensor allocation designs. A detailed description of the two designs and the employed algorithms can be found in Mak and Joseph (2017) doi:10.1080/10618600.2017.1302881.

917

Design of Experiments (DoE) & Analysis of Experimental Data

mixexp

Design and Analysis of Mixture Experiments

Functions for creating designs for mixture experiments, making ternary contour plots, and making mixture effect plots.

918

Design of Experiments (DoE) & Analysis of Experimental Data

mkssd

Efficient multilevel kcirculant supersaturated designs

mkssd is a package that generates efficient balanced nonaliased multilevel kcirculant supersaturated designs by interchanging the elements of the generator vector. The package tries to generate a supersaturated design that has chisquare efficiency more than user specified efficiency level (mef). The package also displays the progress of generation of an efficient multilevel kcirculant design through a progress bar. The progress of 100% means that one full round of interchange is completed. More than one full round (typically 45 rounds) of interchange may be required for larger designs.

919

Design of Experiments (DoE) & Analysis of Experimental Data

mxkssd

Efficient mixedlevel kcirculant supersaturated designs

mxkssd is a package that generates efficient balanced mixedlevel kcirculant supersaturated designs by interchanging the elements of the generator vector. The package tries to generate a supersaturated design that has EfNOD efficiency more than user specified efficiency level (mef). The package also displays the progress of generation of an efficient mixedlevel kcirculant design through a progress bar. The progress of 100 per cent means that one full round of interchange is completed. More than one full round (typically 45 rounds) of interchange may be required for larger designs.

920

Design of Experiments (DoE) & Analysis of Experimental Data

OBsMD

Objective Bayesian Model Discrimination in FollowUp Designs

Implements the objective Bayesian methodology proposed in Consonni and Deldossi in order to choose the optimal experiment that better discriminate between competing models. G.Consonni, L. Deldossi (2014) Objective Bayesian Model Discrimination in Followup Experimental Designs, Test. doi:10.1007/s1174901504613.

921

Design of Experiments (DoE) & Analysis of Experimental Data

odr

Optimal Design and Statistical Power of CostEfficient Multilevel Randomized Trials

Calculate the optimal sample allocation that minimizes variance of treatment effect in a multilevel randomized trial under fixed budget and cost structure, perform power analyses with and without accommodating costs and budget. The reference for proposed methods is: Shen, Z., & Kelcey, B. (under review). Optimal design of cluster randomized trials under condition and unitspecific cost structures. 2018 American Educational Research Association (AERA) annual conference.

922

Design of Experiments (DoE) & Analysis of Experimental Data

OPDOE

OPtimal Design Of Experiments

Experimental Design

923

Design of Experiments (DoE) & Analysis of Experimental Data

optbdmaeAT

Optimal Block Designs for TwoColour cDNA Microarray Experiments

Computes A, MV, D and Eoptimal or nearoptimal block designs for twocolour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all possible elementary treatment contrasts. The algorithms used in this package are based on the treatment exchange and array exchange algorithms of Debusho, Gemechu and Haines (2016, unpublished). The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.

924

Design of Experiments (DoE) & Analysis of Experimental Data

optDesignSlopeInt

Optimal Designs for Estimating the Slope Divided by the Intercept

Compute optimal experimental designs that measure the slope divided by the intercept.

925

Design of Experiments (DoE) & Analysis of Experimental Data

OptGS

NearOptimal and Balanced GroupSequential Designs for Clinical Trials with Continuous Outcomes

Functions to find nearoptimal multistage designs for continuous outcomes.

926

Design of Experiments (DoE) & Analysis of Experimental Data

OptimalDesign

Algorithms for D, A, and IVOptimal Designs

Algorithms for D, A and IVoptimal designs of experiments. Some of the functions in this package require the ‘gurobi’ software and its accompanying R package. For their installation, please follow the instructions at <www.gurobi.com> and the file gurobi_inst.txt, respectively.

927

Design of Experiments (DoE) & Analysis of Experimental Data

OptimaRegion

Confidence Regions for Optima

Computes confidence regions on the location of response surface optima.

928

Design of Experiments (DoE) & Analysis of Experimental Data

OptInterim

Optimal Two and Three Stage Designs for SingleArm and TwoArm Randomized Controlled Trials with a LongTerm Binary Endpoint

Optimal two and three stage designs monitoring timetoevent endpoints at a specified timepoint

929

Design of Experiments (DoE) & Analysis of Experimental Data

optrcdmaeAT

Optimal RowColumn Designs for TwoColour cDNA Microarray Experiments

Computes A, MV, D and Eoptimal or nearoptimal rowcolumn designs for twocolour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all pairwise treatment contrasts. The algorithms used in this package are based on the array exchange and treatment exchange algorithms adopted from Debusho, Gemechu and Haines (2016, unpublished) algorithms after adjusting for the rowcolumn designs setup. The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.

930

Design of Experiments (DoE) & Analysis of Experimental Data

osDesign

Design and analysis of observational studies

The osDesign serves for planning an observational study. Currently, functionality is focused on the twophase and casecontrol designs. Functions in this packages provides Monte Carlo based evaluation of operating characteristics such as powers for estimators of the components of a logistic regression model.

931

Design of Experiments (DoE) & Analysis of Experimental Data

PBIBD

Partially Balanced Incomplete Block Designs

It constructs four series of PBIB designs and also assists in calculating the efficiencies of PBIB Designs with any number of associate classes. This will help the researchers in adopting a PBIB designs and calculating the efficiencies of any PBIB design very quickly and efficiently.

932

Design of Experiments (DoE) & Analysis of Experimental Data

PGM2

Nested Resolvable Designs and their Associated Uniform Designs

Construction method of nested resolvable designs from a projective geometry defined on Galois field of order 2. The obtained Resolvable designs are used to build uniform design. The presented results are based on https://eudml.org/doc/219563 and A. Boudraa et al. (See references).

933

Design of Experiments (DoE) & Analysis of Experimental Data

ph2bayes

Bayesian SingleArm Phase II Designs

An implementation of Bayesian singlearm phase II design methods for binary outcome based on posterior probability and predictive probability.

934

Design of Experiments (DoE) & Analysis of Experimental Data

ph2bye

Phase II Clinical Trial Design Using Bayesian Methods

Calculate the Bayesian posterior/predictive probability and determine the sample size and stopping boundaries for singlearm Phase II design.

935

Design of Experiments (DoE) & Analysis of Experimental Data

pid

Process Improvement using Data

A collection of scripts and data files for the statistics text: “Process Improvement using Data”. The package contains code for designed experiments, data sets and other convenience functions used in the book.

936

Design of Experiments (DoE) & Analysis of Experimental Data

pipe.design

DualAgent Dose Escalation for Phase I Trials using the PIPE Design

Implements the Product of Independent beta Probabilities dose Escalation (PIPE) design for dualagent Phase I trials as described in Mander AP, Sweeting MJ (2015) doi:10.1002/sim.6434.

937

Design of Experiments (DoE) & Analysis of Experimental Data

planor (core)

Generation of Regular Factorial Designs

Automatic generation of regular factorial designs, including fractional designs, orthogonal block designs, rowcolumn designs and splitplots.

938

Design of Experiments (DoE) & Analysis of Experimental Data

plgp

Particle Learning of Gaussian Processes

Sequential Monte Carlo inference for fully Bayesian Gaussian process (GP) regression and classification models by particle learning (PL). The sequential nature of inference and the active learning (AL) hooks provided facilitate thrifty sequential design (by entropy) and optimization (by improvement) for classification and regression models, respectively. This package essentially provides a generic PL interface, and functions (arguments to the interface) which implement the GP models and AL heuristics. Functions for a special, linked, regression/classification GP model and an integrated expected conditional improvement (IECI) statistic is provides for optimization in the presence of unknown constraints. Separable and isotropic Gaussian, and singleindex correlation functions are supported. See the examples section of ?plgp and demo(package=“plgp”) for an index of demos

939

Design of Experiments (DoE) & Analysis of Experimental Data

PopED

Population (and Individual) Optimal Experimental Design

Optimal experimental designs for both population and individual studies based on nonlinear mixedeffect models. Often this is based on a computation of the Fisher Information Matrix. This package was developed for pharmacometric problems, and examples and predefined models are available for these types of systems.

940

Design of Experiments (DoE) & Analysis of Experimental Data

powerAnalysis

Power Analysis in Experimental Design

Basic functions for power analysis and effect size calculation.

941

Design of Experiments (DoE) & Analysis of Experimental Data

powerbydesign

Power Estimates for ANOVA Designs

Functions for bootstrapping the power of ANOVA designs based on estimated means and standard deviations of the conditions. Please refer to the documentation of the boot.power.anova() function for further details.

942

Design of Experiments (DoE) & Analysis of Experimental Data

powerGWASinteraction

Power Calculations for GxE and GxG Interactions for GWAS

Analytical power calculations for GxE and GxG interactions for casecontrol studies of candidate genes and genomewide association studies (GWAS). This includes power calculation for four twostep screening and testing procedures. It can also calculate power for GxE and GxG without any screening.

943

Design of Experiments (DoE) & Analysis of Experimental Data

PwrGSD

Power in a Group Sequential Design

Tools the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives.

944

Design of Experiments (DoE) & Analysis of Experimental Data

qtlDesign

Design of QTL experiments

Tools for the design of QTL experiments

945

Design of Experiments (DoE) & Analysis of Experimental Data

qualityTools

Statistical Methods for Quality Science

Contains methods associated with the Define, Measure, Analyze, Improve and Control (i.e. DMAIC) cycle of the Six Sigma Quality Management methodology.It covers distribution fitting, normal and nonnormal process capability indices, techniques for Measurement Systems Analysis especially gage capability indices and Gage Repeatability (i.e Gage RR) and Reproducibility studies, factorial and fractional factorial designs as well as response surface methods including the use of desirability functions. Improvement via Six Sigma is project based strategy that covers 5 phases: Define  Pareto Chart; Measure  Probability and QuantileQuantile Plots, Process Capability Indices for various distributions and Gage RR Analyze i.e. Pareto Chart, MultiVari Chart, Dot Plot; Improve  Full and fractional factorial, response surface and mixture designs as well as the desirability approach for simultaneous optimization of more than one response variable. Normal, Pareto and Lenth Plot of effects as well as Interaction Plots; Control  Quality Control Charts can be found in the ‘qcc’ package. The focus is on teaching the statistical methodology used in the Quality Sciences.

946

Design of Experiments (DoE) & Analysis of Experimental Data

RcmdrPlugin.DoE

R Commander Plugin for (industrial) Design of Experiments

The package provides a platformindependent GUI for design of experiments. It is implemented as a plugin to the RCommander, which is a more general graphical user interface for statistics in R based on tcl/tk. DoE functionality can be accessed through the menu Design that is added to the RCommander menus.

947

Design of Experiments (DoE) & Analysis of Experimental Data

rodd

Optimal Discriminating Designs

A collection of functions for numerical construction of optimal discriminating designs. At the current moment Toptimal designs (which maximize the lower bound for the power of Ftest for regression model discrimination), KLoptimal designs (for lognormal errors) and their robust analogues can be calculated with the package.

948

Design of Experiments (DoE) & Analysis of Experimental Data

RPPairwiseDesign

Resolvable partially pairwise balanced design and Spacefilling design via association scheme

Using some association schemes to obtain a new series of resolvable partially pairwise balanced designs (RPPBD) and spacefilling designs.

949

Design of Experiments (DoE) & Analysis of Experimental Data

rsm (core)

ResponseSurface Analysis

Provides functions to generate responsesurface designs, fit first and secondorder responsesurface models, make surface plots, obtain the path of steepest ascent, and do canonical analysis. A good reference on these methods is Chapter 10 of Wu, CF J and Hamada, M (2009) “Experiments: Planning, Analysis, and Parameter Design Optimization” ISBN 9780471699460.

950

Design of Experiments (DoE) & Analysis of Experimental Data

rsurface

Design of Rotatable Central Composite Experiments and Response Surface Analysis

Produces tables with the level of replication (number of replicates) and the experimental uncoded values of the quantitative factors to be used for rotatable Central Composite Design (CCD) experimentation and a 2D contour plot of the corresponding variance of the predicted response according to Mead et al. (2012) doi:10.1017/CBO9781139020879 design_ccd(), and analyzes CCD data with response surface methodology ccd_analysis(). A rotatable CCD provides values of the variance of the predicted response that are concentrically distributed around the average treatment combination used in the experimentation, which with uniform precision (implied by the use of several replicates at the average treatment combination) improves greatly the search and finding of an optimum response. These properties of a rotatable CCD represent undeniable advantages over the classical factorial design, as discussed by Panneton et al. (1999) doi:10.13031/2013.13267 and Mead et al. (2012) doi:10.1017/CBO9781139020879.018 among others.

951

Design of Experiments (DoE) & Analysis of Experimental Data

SensoMineR

Sensory data analysis with R

an R package for analysing sensory data

952

Design of Experiments (DoE) & Analysis of Experimental Data

seqDesign

Simulation and Group Sequential Monitoring of Randomized TwoStage Treatment Efficacy Trials with TimetoEvent Endpoints

A modification of the preventive vaccine efficacy trial design of Gilbert, Grove et al. (2011, Statistical Communications in Infectious Diseases) is implemented, with application generally to individualrandomized clinical trials with multiple active treatment groups and a shared control group, and a study endpoint that is a timetoevent endpoint subject to rightcensoring. The design accounts for the issues that the efficacy of the treatment/vaccine groups may take time to accrue while the multiple treatment administrations/vaccinations are given; there is interest in assessing the durability of treatment efficacy over time; and group sequential monitoring of each treatment group for potential harm, nonefficacy/efficacy futility, and high efficacy is warranted. The design divides the trial into two stages of time periods, where each treatment is first evaluated for efficacy in the first stage of followup, and, if and only if it shows significant treatment efficacy in stage one, it is evaluated for longerterm durability of efficacy in stage two. The package produces plots and tables describing operating characteristics of a specified design including an unconditional power for intentiontotreat and perprotocol/astreated analyses; trial duration; probabilities of the different possible trial monitoring outcomes (e.g., stopping early for nonefficacy); unconditional power for comparing treatment efficacies; and distributions of numbers of endpoint events occurring after the treatments/vaccinations are given, useful as input parameters for the design of studies of the association of biomarkers with a clinical outcome (surrogate endpoint problem). The code can be used for a single active treatment versus control design and for a singlestage design.

953

Design of Experiments (DoE) & Analysis of Experimental Data

sFFLHD

Sequential Full FactorialBased Latin Hypercube Design

Gives design points from a sequential full factorialbased Latin hypercube design, as described in Duan, Ankenman, Sanchez, and Sanchez (2015, Technometrics, doi:10.1080/00401706.2015.1108233).

954

Design of Experiments (DoE) & Analysis of Experimental Data

simrel

Linear Model Data Simulation and Design of Computer Experiments

Facilitates data simulation from a random regression model where the data properties can be controlled by a few input parameters. The data simulation is based on the concept of relevant latent components and relevant predictors, and was developed for the purpose of testing methods for variable selection for prediction. Included are also functions for designing computer experiments in order to investigate the effects of the data properties on the performance of the tested methods. The design is constructed using the Multilevel Binary Replacement (MBR) design approach which makes it possible to set up fractional designs for multifactor problems with potentially many levels for each factor.

955

Design of Experiments (DoE) & Analysis of Experimental Data

skpr (core)

Design of Experiments Suite: Generate and Evaluate Optimal Designs

Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of split/splitsplit/…/Nsplit plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve easeofuse and make analyses more reproducible.

956

Design of Experiments (DoE) & Analysis of Experimental Data

SLHD

MaximinDistance (Sliced) Latin Hypercube Designs

Generate the optimal Latin Hypercube Designs (LHDs) for computer experiments with quantitative factors and the optimal Sliced Latin Hypercube Designs (SLHDs) for computer experiments with both quantitative and qualitative factors. Details of the algorithm can be found in Ba, S., Brenneman, W. A. and Myers, W. R. (2015), “Optimal Sliced Latin Hypercube Designs,” Technometrics. Important function in this package is “maximinSLHD”.

957

Design of Experiments (DoE) & Analysis of Experimental Data

soptdmaeA

Sequential Optimal Designs for TwoColour cDNA Microarray Experiments

Computes sequential A, MV, D and Eoptimal or nearoptimal block and rowcolumn designs for twocolour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all possible elementary treatment contrasts. The package also provides an optional method of using the graphical user interface (GUI) R package ‘tcltk’ to ensure that it is user friendly.

958

Design of Experiments (DoE) & Analysis of Experimental Data

sp23design

Design and Simulation of seamless Phase IIIII Clinical Trials

Provides methods for generating, exploring and executing seamless Phase IIIII designs of Lai, Lavori and Shih using generalized likelihood ratio statistics. Includes pdf and source files that describe the entire R implementation with the relevant mathematical details.

959

Design of Experiments (DoE) & Analysis of Experimental Data

ssize.fdr

Sample Size Calculations for Microarray Experiments

This package contains a set of functions that calculates appropriate sample sizes for onesample ttests, twosample ttests, and Ftests for microarray experiments based on desired power while controlling for false discovery rates. For all tests, the standard deviations (variances) among genes can be assumed fixed or random. This is also true for effect sizes among genes in onesample and two sample experiments. Functions also output a chart of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes.

960

Design of Experiments (DoE) & Analysis of Experimental Data

ssizeRNA

Sample Size Calculation for RNASeq Experimental Design

We propose a procedure for sample size calculation while controlling false discovery rate for RNAseq experimental design. Our procedure depends on the Voom method proposed for RNAseq data analysis by Law et al. (2014) doi:10.1186/gb2014152r29 and the sample size calculation method proposed for microarray experiments by Liu and Hwang (2007) doi:10.1093/bioinformatics/btl664. We develop a set of functions that calculates appropriate sample sizes for twosample ttest for RNAseq experiments with fixed or varied set of parameters. The outputs also contain a plot of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes. To install this package, please use ‘source(“http://bioconductor.org/biocLite.R”); biocLite(“ssizeRNA”)’.

961

Design of Experiments (DoE) & Analysis of Experimental Data

support.CEs

Basic Functions for Supporting an Implementation of Choice Experiments

Provides seven basic functions that support an implementation of choice experiments.

962

Design of Experiments (DoE) & Analysis of Experimental Data

TEQR

Target Equivalence Range Design

The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is deescalated only if an unacceptable level of toxicity is experienced.

963

Design of Experiments (DoE) & Analysis of Experimental Data

tgp

Bayesian Treed Gaussian Process Models

Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP singleindex models. Provides 1d and 2d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgpclass output. Sensitivity analysis and multiresolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivativefree optimization of noisy blackbox functions.

964

Design of Experiments (DoE) & Analysis of Experimental Data

ThreeArmedTrials

Design and Analysis of Clinical NonInferiority or Superiority Trials with Active and Placebo Control

Design and analyze threearm noninferiority or superiority trials which follow a goldstandard design, i.e. trials with an experimental treatment, an active, and a placebo control.

965

Design of Experiments (DoE) & Analysis of Experimental Data

toxtestD

Experimental design for binary toxicity tests

Calculates sample size and dose allocation for binary toxicity tests, using the Fish Embryo Toxicity Test as example. An optimal test design is obtained by running (i) spoD (calculate the number of individuals to test under control conditions), (ii) setD (estimate the minimal sample size per treatment given the users precision requirements) and (iii) doseD (construct an individual dose scheme).

966

Design of Experiments (DoE) & Analysis of Experimental Data

unrepx

Analysis and Graphics for Unreplicated Experiments

Provides halfnormal plots, reference plots, and Pareto plots of effects from an unreplicated experiment, along with various pseudostandarderror measures, simulated reference distributions, and other tools. Many of these methods are described in Daniel C. (1959) doi:10.1080/00401706.1959.10489866 and/or Lenth R.V. (1989) doi:10.1080/00401706.1989.10488595, but some new approaches are added and integrated in one package.

967

Design of Experiments (DoE) & Analysis of Experimental Data

vdg

Variance Dispersion Graphs and Fraction of Design Space Plots

Facilities for constructing variance dispersion graphs, fraction ofdesignspace plots and similar graphics for exploring the properties of experimental designs. The design region is explored via random sampling, which allows for more flexibility than traditional variance dispersion graphs. A formula interface is leveraged to provide access to complex model formulae. Graphics can be constructed simultaneously for multiple experimental designs and/or multiple model formulae. Instead of using pointwise optimization to find the minimum and maximum scaled prediction variance curves, which can be inaccurate and time consuming, this package uses quantile regression as an alternative.

968

Design of Experiments (DoE) & Analysis of Experimental Data

Vdgraph

Variance dispersion graphs and Fraction of design space plots for response surface designs

Uses a modification of the published FORTRAN code in “A Computer Program for Generating Variance Dispersion Graphs” by G. Vining, Journal of Quality Technology, Vol. 25 No. 1 January 1993, to produce variance dispersion graphs. Also produces fraction of design space plots, and contains data frames for several minimal run response surface designs.

969

Design of Experiments (DoE) & Analysis of Experimental Data

VdgRsm

Plots of Scaled Prediction Variances for Response Surface Designs

Functions for creating variance dispersion graphs, fraction of design space plots, and contour plots of scaled prediction variances for secondorder response surface designs in spherical and cuboidal regions. Also, some standard response surface designs can be generated.

970

Design of Experiments (DoE) & Analysis of Experimental Data

VNM

Finding MultipleObjective Optimal Designs for the 4Parameter Logistic Model

Provide tools for finding multipleobjective optimal designs for estimating the shape of doseresponse, the ED50 (the dose producing an effect midway between the expected responses at the extreme doses) and the MED (the minimum effective dose level) for the 2,3,4parameter logistic models and for evaluating its efficiencies for the three objectives. The acronym VNM stands for Valgorithm using Newton Raphson method to search multipleobjective optimal design.

971

Extreme Value Analysis

copula

Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extremevalue and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extremevalue dependence, goodnessoffit) and model selection based on crossvalidation. Empirical copula, smoothed versions, and nonparametric estimators of the Pickands dependence function.

972

Extreme Value Analysis

evd (core)

Functions for Extreme Value Distributions

Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.

973

Extreme Value Analysis

evdbayes

Bayesian Analysis in Extreme Value Theory

Provides functions for the bayesian analysis of extreme value models, using MCMC methods.

974

Extreme Value Analysis

evir (core)

Extreme Values in R

Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.

975

Extreme Value Analysis

extremefit

Estimation of Extreme Conditional Quantiles and Probabilities

Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.

976

Extreme Value Analysis

extRemes

Extreme Value Analysis

Functions for performing extreme value analysis.

977

Extreme Value Analysis

extremeStat

Extreme Value Statistics and Quantile Estimation

Code to fit, plot and compare several (extreme value) distribution functions. Can also compute (truncated) distribution quantile estimates and draw a plot with return periods on a linear scale.

978

Extreme Value Analysis

fExtremes

Rmetrics  Modelling Extreme Events in Finance

Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data preprocessing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.

979

Extreme Value Analysis

ismev

An Introduction to Statistical Modeling of Extreme Values

Functions to support the computations carried out in ‘An Introduction to Statistical Modeling of Extreme Values’ by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes.

980

Extreme Value Analysis

lmom

LMoments

Functions related to Lmoments: computation of Lmoments and trimmed Lmoments of distributions and data samples; parameter estimation; Lmoment ratio diagram; plot vs. quantiles of an extremevalue distribution.

981

Extreme Value Analysis

lmomco

LMoments, Censored LMoments, Trimmed LMoments, LComoments, and Many Distributions

Extensive functions for Lmoments (LMs) and probabilityweighted moments (PWMs), parameter estimation for distributions, LM computation for distributions, and Lmoment ratio diagrams. Maximum likelihood and maximum product of spacings estimation are also available. LMs for righttail and lefttail censoring by known or unknown threshold and by indicator variable are available. Asymmetric (asy) trimmed LMs (TLmoments, TLMs) are supported. LMs of residual (resid) and reversed (rev) resid life are implemented along with 13 quantile function operators for reliability and survival analyses. Exact analytical bootstrap estimates of order statistics, LMs, and variances covariances of LMs are provided. The HarriCoble Tau34squared Normality Test is available. Distribution support with “L” (LMs), “TL” (TLMs) and added (+) support for righttail censoring (RC) encompasses: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], EtaMu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L,TL], Gen Logistic [L), Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], KappaMu [L], Kumaraswamy [L], Laplace [L], Linear Mean Resid. Quantile Function [L], Normal [L], 3p logNormal [L], Pearson Type III [L], Rayleigh [L], RevGumbel [L+RC], Rice/Rician [L], Slash [TL], 3p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L]. Multivariate sample Lcomoments (LCMs) are implemented to measure asymmetric associations.

982

Extreme Value Analysis

lmomRFA

Regional Frequency Analysis using LMoments

Functions for regional frequency analysis using the methods of J. R. M. Hosking and J. R. Wallis (1997), “Regional frequency analysis: an approach based on Lmoments”.

983

Extreme Value Analysis

mev

Multivariate Extreme Value Distributions

Exact simulation from maxstable processes and multivariate extreme value distributions for various parametric models. Threshold selection methods.

984

Extreme Value Analysis

POT

Generalized Pareto Distribution and Peaks Over Threshold

Some functions useful to perform a Peak Over Threshold analysis in univariate and bivariate cases. A user’s guide is available.

985

Extreme Value Analysis

QRM

Provides RLanguage Code to Examine Quantitative Risk Management Concepts

Accompanying package to the book Quantitative Risk Management: Concepts, Techniques and Tools by Alexander J. McNeil, Rudiger Frey, and Paul Embrechts.

986

Extreme Value Analysis

ReIns

Functions from “Reinsurance: Actuarial and Statistical Aspects”

Functions from the book “Reinsurance: Actuarial and Statistical Aspects” (2017) by Hansjoerg Albrecher, Jan Beirlant and Jef Teugels http://wiley.com/WileyCDA/WileyTitle/productCd0470772689.html.

987

Extreme Value Analysis

Renext

Renewal Method for Extreme Values Extrapolation

Peaks Over Threshold (POT) or ‘methode du renouvellement’. The distribution for the exceedances can be chosen, and heterogeneous data (including historical data or block data) can be used in a MaximumLikelihood framework.

988

Extreme Value Analysis

revdbayes

RatioofUniforms Sampling for Bayesian Extreme Value Analysis

Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package https://cran.rproject.org/package=rust is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package https://cran.rproject.org/package=evdbayes, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the Kgaps model of Suveges and Davison (2010) doi:10.1214/09AOAS292. See the ‘revdbayes’ website for more information, documentation and examples.

989

Extreme Value Analysis

RTDE

Robust Tail Dependence Estimation

Robust tail dependence estimation for bivariate models. This package is based on two papers by the authors:‘Robust and biascorrected estimation of the coefficient of tail dependence’ and ‘Robust and biascorrected estimation of probabilities of extreme failure sets’. This work was supported by a research grant (VKR023480) from VILLUM FONDEN and an international project for scientific cooperation (PICS6416).

990

Extreme Value Analysis

SpatialExtremes

Modelling Spatial Extremes

Tools for the statistical modelling of spatial extremes using maxstable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric maxstable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple maxstable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the socalled conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) doi:10.1214/11STS376, Padoan et al. (2010) doi:10.1198/jasa.2009.tm08577, Dombry et al. (2013) doi:10.1093/biomet/ass067.

991

Extreme Value Analysis

texmex

Statistical Modelling of Extreme Values

Statistical extreme value modelling of threshold excesses, maxima and multivariate extremes. Univariate models for threshold excesses and maxima are the Generalised Pareto, and Generalised Extreme Value model respectively. These models may be fitted by using maximum (optionally penalised)likelihood, or Bayesian estimation, and both classes of models may be fitted with covariates in any/all model parameters. Model diagnostics support the fitting process. Graphical output for visualising fitted models and return level estimates is provided. For serially dependent sequences, the intervals declustering algorithm of Ferro and Segers (2003) doi:10.1111/14679868.00401 is provided, with diagnostic support to aid selection of threshold and declustering horizon. Multivariate modelling is performed via the conditional approach of Heffernan and Tawn (2004) doi:10.1111/j.14679868.2004.02050.x, with graphical tools for threshold selection and to diagnose estimation convergence.

992

Extreme Value Analysis

VGAM

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. At the heart of it are the vector generalized linear and additive model (VGLM/VGAM) classes, and the book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) doi:10.1007/9781493928187 gives details of the statistical framework and VGAM package. Currently only fixedeffects models are implemented, i.e., no randomeffects models. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE, using Fisher scoring. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs (i.e., with smoothing). The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

993

Empirical Finance

actuar

Actuarial Functions and Heavy Tailed Distributions

Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss amounts and loss frequency: 19 continuous heavy tailed distributions; the Poissoninverse Gaussian discrete distribution; zerotruncated and zeromodified extensions of the standard discrete distributions. Support for phasetype distributions commonly used to compute ruin probabilities.

994

Empirical Finance

AmericanCallOpt

This package includes pricing function for selected American call options with underlying assets that generate payouts

This package includes a set of pricing functions for American call options. The following cases are covered: Pricing of an American call using the standard binomial approximation; Hedge parameters for an American call with a standard binomial tree; Binomial pricing of an American call with continuous payout from the underlying asset; Binomial pricing of an American call with an underlying stock that pays proportional dividends in discrete time; Pricing of an American call on futures using a binomial approximation; Pricing of a currency futures American call using a binomial approximation; Pricing of a perpetual American call. The user should kindly notice that this material is for educational purposes only. The codes are not optimized for computational efficiency as they are meant to represent standard cases of analytical and numerical solution.

995

Empirical Finance

backtest

Exploring PortfolioBased Conjectures About Financial Instruments

The backtest package provides facilities for exploring portfoliobased conjectures about financial instruments (stocks, bonds, swaps, options, et cetera).

996

Empirical Finance

bayesGARCH

Bayesian Estimation of the GARCH(1,1) Model with Studentt Innovations

Provides the bayesGARCH() function which performs the Bayesian estimation of the GARCH(1,1) model with Student’s t innovations as described in Ardia (2008) doi:10.1007/9783540786573.

997

Empirical Finance

BCC1997

Calculation of Option Prices Based on a Universal Solution

Calculates the prices of European options based on the universal solution provided by Bakshi, Cao and Chen (1997) doi:10.1111/j.15406261.1997.tb02749.x. This solution considers stochastic volatility, stochastic interest and random jumps. Please cite their work if this package is used.

998

Empirical Finance

BenfordTests

Statistical Tests for Evaluating Conformity to Benford’s Law

Several specialized statistical tests and support functions for determining if numerical data could conform to Benford’s law.

999

Empirical Finance

betategarch

Simulation, Estimation and Forecasting of BetaSkewtEGARCH Models

Simulation, estimation and forecasting of firstorder BetaSkewtEGARCH models with leverage (onecomponent, twocomponent, skewed versions).

1000

Empirical Finance

bizdays

Business Days Calculations and Utilities

Business days calculations based on a list of holidays and nonworking weekdays. Quite useful for fixed income and derivatives pricing.

1001

Empirical Finance

BLModel

BlackLitterman Posterior Distribution

Posterior distribution in the BlackLitterman model is computed from a prior distribution given in the form of a time series of asset returns and a continuous distribution of views provided by the user as an external function.

1002

Empirical Finance

BurStFin

Burns Statistics Financial

A suite of functions for finance, including the estimation of variance matrices via a statistical factor model or LedoitWolf shrinkage.

1003

Empirical Finance

BurStMisc

Burns Statistics Miscellaneous

Script search, corner, genetic optimization, permutation tests, write expect test.

1004

Empirical Finance

CADFtest

A Package to Perform Covariate Augmented DickeyFuller Unit Root Tests

Hansen’s (1995) CovariateAugmented DickeyFuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The pvalues of the test are computed using the procedure illustrated in Lupi (2009).

1005

Empirical Finance

car

Companion to Applied Regression

Functions and Datasets to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Second Edition, Sage, 2011.

1006

Empirical Finance

ccgarch

Conditional Correlation GARCH models

Functions for estimating and simulating the family of the CCGARCH models.

1007

Empirical Finance

ChainLadder

Statistical Methods and Models for Claims Reserving in General Insurance

Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.

1008

Empirical Finance

copula

Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extremevalue and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extremevalue dependence, goodnessoffit) and model selection based on crossvalidation. Empirical copula, smoothed versions, and nonparametric estimators of the Pickands dependence function.

1009

Empirical Finance

covmat

Covariance Matrix Estimation

We implement a collection of techniques for estimating covariance matrices. Covariance matrices can be built using missing data. Stambaugh Estimation and FMMC methods can be used to construct such matrices. Covariance matrices can be built by denoising or shrinking the eigenvalues of a sample covariance matrix. Such techniques work by exploiting the tools in Random Matrix Theory to analyse the distribution of eigenvalues. Covariance matrices can also be built assuming that data has many underlying regimes. Each regime is allowed to follow a Dynamic Conditional Correlation model. Robust covariance matrices can be constructed by multivariate cleaning and smoothing of noisy data.

1010

Empirical Finance

CreditMetrics

Functions for calculating the CreditMetrics risk model

A set of functions for computing the CreditMetrics risk model

1011

Empirical Finance

credule

Credit Default Swap Functions

It provides functions to bootstrap Credit Curves from market quotes (Credit Default Swap  CDS  spreads) and price Credit Default Swaps  CDS.

1012

Empirical Finance

crp.CSFP

CreditRisk+ Portfolio Model

Modelling credit risks based on the concept of “CreditRisk+”, First Boston Financial Products, 1997 and “CreditRisk+ in the Banking Industry”, Gundlach & Lehrbass, Springer, 2003.

1013

Empirical Finance

data.table

Extension of ‘data.frame’

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, a fast friendly file reader and parallel file writer. Offers a natural and flexible syntax, for faster development.

1014

Empirical Finance

derivmkts

Functions and R Code to Accompany Derivatives Markets

A set of pricing and expository functions that should be useful in teaching a course on financial derivatives.

1015

Empirical Finance

dlm

Bayesian and Likelihood Analysis of Dynamic Linear Models

Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models

1016

Empirical Finance

Dowd

Functions Ported from ‘MMR2’ Toolbox Offered in Kevin Dowd’s Book Measuring Market Risk

‘Kevin Dowd’s’ book Measuring Market Risk is a widely read book in the area of risk measurement by students and practitioners alike. As he claims, ‘MATLAB’ indeed might have been the most suitable language when he originally wrote the functions, but, with growing popularity of R it is not entirely valid. As ‘Dowd’s’ code was not intended to be error free and were mainly for reference, some functions in this package have inherited those errors. An attempt will be made in future releases to identify and correct them. ‘Dowd’s’ original code can be downloaded from www.kevindowd.org/measuringmarketrisk/. It should be noted that ‘Dowd’ offers both ‘MMR2’ and ‘MMR1’ toolboxes. Only ‘MMR2’ was ported to R. ‘MMR2’ is more recent version of ‘MMR1’ toolbox and they both have mostly similar function. The toolbox mainly contains different parametric and non parametric methods for measurement of market risk as well as backtesting risk measurement methods.

1017

Empirical Finance

dse

Dynamic Systems Estimation (Time Series Package)

Tools for multivariate, linear, timeinvariant, time series models. This includes ARMA and statespace representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and statespace model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.

1018

Empirical Finance

dyn

Time Series Regression

Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.

1019

Empirical Finance

dynlm

Dynamic Linear Regression

Dynamic linear models and time series regression.

1020

Empirical Finance

ESG

ESG  A package for asset projection

The package presents a “Scenarios” class containing general parameters, risk parameters and projection results. Risk parameters are gathered together into a ParamsScenarios subobject. The general process for using this package is to set all needed parameters in a Scenarios object, use the customPathsGeneration method to proceed to the projection, then use xxx_PriceDistribution() methods to get asset prices.

1021

Empirical Finance

factorstochvol

Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models

Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models. Sparsity can be achieved through the usage of NormalGamma priors on the factor loading matrix.

1022

Empirical Finance

fame

Interface for FAME Time Series Database

Read and write FAME databases.

1023

Empirical Finance

fAssets (core)

Rmetrics  Analysing and Modelling Financial Assets

Provides a collection of functions to manage, to investigate and to analyze data sets of financial assets from different points of view.

1024

Empirical Finance

FatTailsR

Kiener Distributions and Fat Tails in Finance

Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in market finance, neuroscience and other disciplines. Two algorithms to estimate with a high accuracy distribution parameters, quantiles, valueatrisk and expected shortfall. Include power hyperbolas and power hyperbolic functions.

1025

Empirical Finance

fBasics (core)

Rmetrics  Markets and Basic Statistics

Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.

1026

Empirical Finance

fBonds (core)

Rmetrics  Pricing and Evaluating Bonds

It implements the NelsonSiegel and the NelsonSiegelSvensson term structures.

1027

Empirical Finance

fCopulae (core)

Rmetrics  Bivariate Dependence Structures with Copulae

Provides a collection of functions to manage, to investigate and to analyze bivariate financial returns by Copulae. Included are the families of Archemedean, Elliptical, Extreme Value, and Empirical Copulae.

1028

Empirical Finance

fExoticOptions (core)

Rmetrics  Pricing and Evaluating Exotic Option

Provides a collection of functions to evaluate barrier options, Asian options, binary options, currency translated options, lookback options, multiple asset options and multiple exercise options.

1029

Empirical Finance

fExtremes (core)

Rmetrics  Modelling Extreme Events in Finance

Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data preprocessing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.

1030

Empirical Finance

fgac

Generalized Archimedean Copula

Bivariate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).

1031

Empirical Finance

fGarch (core)

Rmetrics  Autoregressive Conditional Heteroskedastic Modelling

Provides a collection of functions to analyze and model heteroskedastic behavior in financial time series models.

1032

Empirical Finance

fImport (core)

Rmetrics  Importing Economic and Financial Data

Provides a collection of utility functions to download and manage data sets from the Internet or from other sources.

1033

Empirical Finance

financial

Solving financial problems in R

Time value of money, cash flows and other financial functions.

1034

Empirical Finance

FinancialMath

Financial Mathematics for Actuaries

Contains financial math functions and introductory derivative functions included in the Society of Actuaries and Casualty Actuarial Society ‘Financial Mathematics’ exam, and some topics in the ‘Models for Financial Economics’ exam.

1035

Empirical Finance

FinAsym

Classifies implicit trading activity from market quotes and computes the probability of informed trading

This package accomplishes two tasks: a) it classifies implicit trading activity from quotes in OTC markets using the algorithm of Lee and Ready (1991); b) based on information for trade initiation, the package computes the probability of informed trading of Easley and O’Hara (1987).

1036

Empirical Finance

finreportr

Financial Data from U.S. Securities and Exchange Commission

Download and display company financial data from the U.S. Securities and Exchange Commission’s EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See https://www.sec.gov/edgar/searchedgar/companysearch.html for more information.

1037

Empirical Finance

fmdates

Financial Market Date Calculations

Implements common date calculations relevant for specifying the economic nature of financial market contracts that are typically defined by International Swap Dealer Association (ISDA, http://www2.isda.org) legal documentation. This includes methods to check whether dates are business days in certain locales, functions to adjust and shift dates and time length (or day counter) calculations.

1038

Empirical Finance

fMultivar (core)

Rmetrics  Analysing and Modeling Multivariate Financial Return Distributions

Provides a collection of functions to manage, to investigate and to analyze bivariate and multivariate data sets of financial returns.

1039

Empirical Finance

fNonlinear (core)

Rmetrics  Nonlinear and Chaotic Time Series Modelling

Provides a collection of functions for testing various aspects of univariate time series including independence and neglected nonlinearities. Further provides functions to investigate the chaotic behavior of time series processes and to simulate different types of chaotic time series maps.

1040

Empirical Finance

fOptions (core)

Rmetrics  Pricing and Evaluating Basic Options

Provides a collection of functions to valuate basic options. This includes the generalized BlackScholes option, options on futures and options on commodity futures.

1041

Empirical Finance

forecast

Forecasting Functions for Time Series and Linear Models

Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

1042

Empirical Finance

fPortfolio (core)

Rmetrics  Portfolio Selection and Optimization

Provides a collection of functions to optimize portfolios and to analyze them from different points of view.

1043

Empirical Finance

fracdiff

Fractionally differenced ARIMA aka ARFIMA(p,d,q) models

Maximum likelihood estimation of the parameters of a fractionally differenced ARIMA(p,d,q) model (Haslett and Raftery, Appl.Statistics, 1989).

1044

Empirical Finance

fractal

Fractal Time Series Modeling and Analysis

Stochastic fractal and deterministic chaotic time series analysis.

1045

Empirical Finance

FRAPO

Financial Risk Modelling and Portfolio Optimisation with R

Accompanying package of the book ‘Financial Risk Modelling and Portfolio Optimisation with R’, second edition. The data sets used in the book are contained in this package.

1046

Empirical Finance

fRegression (core)

Rmetrics  Regression Based Decision and Prediction

A collection of functions for linear and nonlinear regression modelling. It implements a wrapper for several regression models available in the base and contributed packages of R.

1047

Empirical Finance

frmqa

The Generalized Hyperbolic Distribution, Related Distributions and Their Applications in Finance

A collection of R and C++ functions to work with the generalized hyperbolic distribution, related distributions and their applications in financial risk management and quantitative analysis.

1048

Empirical Finance

fTrading (core)

Rmetrics  Trading and Rebalancing Financial Instruments

A collection of functions for trading and rebalancing financial instruments. It implements various technical indicators to analyse time series such as moving averages or stochastic oscillators.

1049

Empirical Finance

GCPM

Generalized Credit Portfolio Model

Analyze the default risk of credit portfolios. Commonly known models, like CreditRisk+ or the CreditMetrics model are implemented in their very basic settings. The portfolio loss distribution can be achieved either by simulation or analytically in case of the classic CreditRisk+ model. Models are only implemented to respect losses caused by defaults, i.e. migration risk is not included. The package structure is kept flexible especially with respect to distributional assumptions in order to quantify the sensitivity of risk figures with respect to several assumptions. Therefore the package can be used to determine the credit risk of a given portfolio as well as to quantify model sensitivities.

1050

Empirical Finance

GetHFData

Download and Aggregate High Frequency Trading Data from Bovespa

Downloads and aggregates high frequency trading data for Brazilian instruments directly from Bovespa ftp site ftp://ftp.bmf.com.br/MarketData/.

1051

Empirical Finance

gets

GeneraltoSpecific (GETS) Modelling and Indicator Saturation Methods

Automated GeneraltoSpecific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean.

1052

Empirical Finance

GetTDData

Get Data for Brazilian Bonds (Tesouro Direto)

Downloads and aggregates data for Brazilian government issued bonds directly from the website of Tesouro Direto http://www.tesouro.fazenda.gov.br/tesourodiretobalancoeestatisticas.

1053

Empirical Finance

GEVStableGarch

ARMAGARCH/APARCH Models with GEV and Stable Distributions

Package for simulation and estimation of ARMAGARCH/APARCH models with GEV and stable distributions.

1054

Empirical Finance

ghyp

A Package on Generalized Hyperbolic Distribution and Its Special Cases

Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Studentt and Gaussian distribution). Especially, it contains fitting procedures, an AICbased model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution.

1055

Empirical Finance

gmm

Generalized Method of Moments and Generalized Empirical Likelihood

It is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; doi:10.2307/1912775), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; doi:10.2307/1392442) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; doi:10.1111/j.00130133.1997.174.x, Kitamura 1997; doi:10.1214/aos/1069362388, Newey and Smith 2004; doi:10.1111/j.14680262.2004.00482.x, and Anatolyev 2005 doi:10.1111/j.14680262.2005.00601.x).

1056

Empirical Finance

gogarch

Generalized Orthogonal GARCH (GOGARCH) models

Implementation of the GOGARCH model class

1057

Empirical Finance

GUIDE

GUI for DErivatives in R

A nice GUI for financial DErivatives in R.

1058

Empirical Finance

highfrequency

Tools for Highfrequency Data Analysis

Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, and investigate microstructure noise and intraday periodicity.

1059

Empirical Finance

IBrokers

R API to Interactive Brokers Trader Workstation

Provides native R access to Interactive Brokers Trader Workstation API.

1060

Empirical Finance

InfoTrad

Calculates the Probability of Informed Trading (PIN)

Estimates the probability of informed trading (PIN) initially introduced by Easley et. al. (1996) doi:10.1111/j.15406261.1996.tb04074.x . Contribution of the package is that it uses likelihood factorizations of Easley et. al. (2010) doi:10.1017/S0022109010000074 (EHO factorization) and Lin and Ke (2011) doi:10.1016/j.finmar.2011.03.001 (LK factorization). Moreover, the package uses different estimation algorithms. Specifically, the gridsearch algorithm proposed by Yan and Zhang (2012) doi:10.1016/j.jbankfin.2011.08.003 , hierarchical agglomerative clustering approach proposed by Gan et. al. (2015) doi:10.1080/14697688.2015.1023336 and later extended by Ersan and Alici (2016) doi:10.1016/j.intfin.2016.04.001 .

1061

Empirical Finance

lgarch

Simulation and Estimation of LogGARCH Models

Simulation and estimation of univariate and multivariate logGARCH models. The main functions of the package are: lgarchSim(), mlgarchSim(), lgarch() and mlgarch(). The first two functions simulate from a univariate and a multivariate logGARCH model, respectively, whereas the latter two estimate a univariate and multivariate logGARCH model, respectively.

1062

Empirical Finance

lifecontingencies

Financial and Actuarial Mathematics for Life Contingencies

Classes and methods that allow the user to manage life table, actuarial tables (also multiple decrements tables). Moreover, functions to easily perform demographic, financial and actuarial mathematics on life contingencies insurances calculations are contained therein.

1063

Empirical Finance

lmtest

Testing Linear Regression Models

A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.

1064

Empirical Finance

longmemo

Statistics for LongMemory Processes (Jan Beran) Data and Functions

Datasets and Functionality from the textbook Jan Beran (1994). Statistics for LongMemory Processes; Chapman & Hall.

1065

Empirical Finance

LSMonteCarlo

American options pricing with Least Squares Monte Carlo method

The package compiles functions for calculating prices of American put options with Least Squares Monte Carlo method. The option types are plain vanilla American put, Asian American put, and Quanto American put. The pricing algorithms include variance reduction techniques such as Antithetic Variates and Control Variates. Additional functions are given to derive “price surfaces” at different volatilities and strikes, create 3D plots, quickly generate Geometric Brownian motion, and calculate prices of European options with Black & Scholes analytical solution.

1066

Empirical Finance

maRketSim

Market simulator for R

maRketSim is a market simulator for R. It was initially designed around the bond market, with plans to expand to stocks. maRketSim is built around the idea of portfolios of fundamental objects. Therefore it is slow in its current incarnation, but allows you the flexibility of seeing exactly what is in your final results, since the objects are retained.

1067

Empirical Finance

markovchain

Easy Handling Discrete Time Markov Chains

Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided.

1068

Empirical Finance

MarkowitzR

Statistical Significance of the Markowitz Portfolio

A collection of tools for analyzing significance of Markowitz portfolios.

1069

Empirical Finance

matchingMarkets

Analysis of Stable Matchings

Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes onesided matching of agents into groups as well as twosided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.

1070

Empirical Finance

MSBVAR

MarkovSwitching, Bayesian, Vector Autoregression Models

Provides methods for estimating frequentist and Bayesian Vector Autoregression (VAR) models and Markovswitching Bayesian VAR (MSBVAR). Functions for reduced form and structural VAR models are also available. Includes methods for the generating posterior inferences for these models, forecasts, impulse responses (using likelihoodbased error bands), and forecast error decompositions. Also includes utility functions for plotting forecasts and impulse responses, and generating draws from Wishart and singular multivariate normal densities. Current version includes functionality to build and evaluate models with Markov switching.

1071

Empirical Finance

MSGARCH

MarkovSwitching GARCH Models

Fit (by Maximum Likelihood or MCMC/Bayesian), simulate, and forecast various MarkovSwitching GARCH models as described in Ardia et al. (2017) https://ssrn.com/abstract=2845809.

1072

Empirical Finance

mvtnorm

Multivariate Normal and t Distributions

Computes multivariate normal and t probabilities, quantiles, random deviates and densities.

1073

Empirical Finance

NetworkRiskMeasures

Risk Measures for (Financial) Networks

Implements some risk measures for (financial) networks, such as DebtRank, Impact Susceptibility, Impact Diffusion and Impact Fluidity.

1074

Empirical Finance

nlme

Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixedeffects models.

1075

Empirical Finance

NMOF

Numerical Methods and Optimization in Finance

Functions, examples and data from the book “Numerical Methods and Optimization in Finance” by M. ‘Gilli’, D. ‘Maringer’ and E. Schumann (2011), ISBN 9780123756626. The package provides implementations of several optimisation heuristics, such as Differential Evolution, Genetic Algorithms and Threshold Accepting. There are also functions for the valuation of financial instruments, such as bonds and options, and functions that help with stochastic simulations.

1076

Empirical Finance

obAnalytics

Limit Order Book Analytics

Data processing, visualisation and analysis of Limit Order Book event data.

1077

Empirical Finance

opefimor

Option Pricing and Estimation of Financial Models in R

Companion package to the book Option Pricing and Estimation of Financial Models in R, Wiley, Chichester. ISBN: 9780470745847.

1078

Empirical Finance

OptHedging

Estimation of value and hedging strategy of call and put options

Estimation of value and hedging strategy of call and put options, based on optimal hedging and Monte Carlo method, from Chapter 3 of ‘Statistical Methods for Financial Engineering’, by Bruno Remillard, CRC Press, (2013).

1079

Empirical Finance

OptionPricing

Option Pricing with Efficient Simulation Algorithms

Efficient Monte Carlo Algorithms for the price and the sensitivities of Asian and European Options under Geometric Brownian Motion.

1080

Empirical Finance

pa

Performance Attribution for Equity Portfolios

A package that provides tools for conducting performance attribution for equity portfolios. The package uses two methods: the Brinson method and a regressionbased analysis.

1081

Empirical Finance

parma

Portfolio Allocation and Risk Management Applications

Provision of a set of models and methods for use in the allocation and management of capital in financial portfolios.

1082

Empirical Finance

pbo

Probability of Backtest Overfitting

Following the method of Bailey et al., computes for a collection of candidate models the probability of backtest overfitting, the performance degradation and probability of loss, and the stochastic dominance.

1083

Empirical Finance

PerformanceAnalytics (core)

Econometric tools for performance and risk analysis

Collection of econometric functions for performance and risk analysis. This package aims to aid practitioners and researchers in utilizing the latest research in analysis of nonnormal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.

1084

Empirical Finance

pinbasic

Fast and Stable Estimation of the Probability of Informed Trading (PIN)

Utilities for fast and stable estimation of the probability of informed trading (PIN) in the model introduced by Easley et al. (2002) doi:10.1111/15406261.00493 are implemented. Since the basic model developed by Easley et al. (1996) doi:10.1111/j.15406261.1996.tb04074.x is nested in the former due to equating the intensity of uninformed buys and sells, functions can also be applied to this simpler model structure, if needed. Stateoftheart factorization of the model likelihood function as well as most recent algorithms for generating initial values for optimization routines are implemented. In total, two likelihood factorizations and three methodologies for starting values are included. Furthermore, functions for simulating datasets of daily aggregated buys and sells, calculating confidence intervals for the probability of informed trading and posterior probabilities of trading days’ conditions are available.

1085

Empirical Finance

portfolio

Analysing equity portfolios

Classes for analysing and implementing equity portfolios.

1086

Empirical Finance

PortfolioEffectHFT

High Frequency Portfolio Analytics by PortfolioEffect

R interface to PortfolioEffect cloud service for backtesting high frequency trading (HFT) strategies, intraday portfolio analysis and optimization. Includes autocalibrating model pipeline for market microstructure noise, risk factors, price jumps/outliers, tail risk (highorder moments) and price fractality (long memory). Constructed portfolios could use clientside market data or access HF intraday price history for all major US Equities. See https://www.portfolioeffect.com/ for more information on the PortfolioEffect high frequency portfolio analytics platform.

1087

Empirical Finance

PortfolioOptim

Small/Large Sample Portfolio Optimization

Two functions for financial portfolio optimization by linear programming are provided. One function implements Benders decomposition algorithm and can be used for very large data sets. The other, applicable for moderate sample sizes, finds optimal portfolio which has the smallest distance to a given benchmark portfolio.

1088

Empirical Finance

portfolioSim

Framework for simulating equity portfolio strategies

Classes that serve as a framework for designing equity portfolio simulations.

1089

Empirical Finance

PortRisk

Portfolio Risk Analysis

Risk Attribution of a portfolio with Volatility Risk Analysis.

1090

Empirical Finance

quantmod

Quantitative Financial Modelling Framework

Specify, build, trade, and analyse quantitative financial trading strategies.

1091

Empirical Finance

QuantTools

Enhanced Quantitative Trading Modelling

Download and organize historical market data from multiple sources like Yahoo (https://finance.yahoo.com), Google (https://www.google.com/finance), Finam (https://www.finam.ru/profile/moexakcii/sberbank/export/), MOEX (https://www.moex.com/en/derivatives/contracts.aspx) and IQFeed (https://www.iqfeed.net/symbolguide/index.cfm?symbolguide=lookup). Code your trading algorithms in modern C++11 with powerful event driven tick processing API including trading costs and exchange communication latency and transform detailed data seamlessly into R. In just few lines of code you will be able to visualize every step of your trading model from tick data to multi dimensional heat maps.

1092

Empirical Finance

ragtop

Pricing Equity Derivatives with Extensions of BlackScholes

Algorithms to price American and European equity options, convertible bonds and a variety of other financial derivatives. It uses an extension of the usual BlackScholes model in which jump to default may occur at a probability specified by a powerlaw link between stock price and hazard rate as found in the paper by Takahashi, Kobayashi, and Nakagawa (2001) doi:10.3905/jfi.2001.319302. We use ideas and techniques from Andersen and Buffum (2002) doi:10.2139/ssrn.355308 and Linetsky (2006) doi:10.1111/j.14679965.2006.00271.x.

1093

Empirical Finance

Rbitcoin

R & bitcoin integration

Utilities related to Bitcoin. Unified markets API interface (bitstamp, kraken, btce, bitmarket). Both public and private API calls. Integration of data structures for all markets. Support SSL. Read Rbitcoin documentation (command: ?btc) for more information.

1094

Empirical Finance

Rblpapi

R Interface to ‘Bloomberg’

An R Interface to ‘Bloomberg’ is provided via the ‘Blp API’.

1095

Empirical Finance

Rcmdr

R Commander

A platformindependent basicstatistics GUI (graphical user interface) for R, based on the tcltk package.

1096

Empirical Finance

RcppQuantuccia

R Bindings to the ‘Quantuccia’ HeaderOnly Essentials of ‘QuantLib’

‘QuantLib’ bindings are provided for R using ‘Rcpp’ and the headeronly ‘Quantuccia’ variant (put together by Peter Caspers) offering an essential subset of ‘QuantLib’. See the included file ‘AUTHORS’ for a full list of contributors to both ‘QuantLib’ and ‘Quantuccia’.

1097

Empirical Finance

restimizeapi

Functions for Working with the ‘www.estimize.com’ Web Services

Provides the user with functions to develop their trading strategy, uncover actionable trading ideas, and monitor consensus shifts with crowdsourced earnings and economic estimate data directly from <www.estimize.com>. Further information regarding the web services this package invokes can be found at <www.estimize.com/api>.

1098

Empirical Finance

riskSimul

Risk Quantification for Stock Portfolios under the TCopula Model

Implements efficient simulation procedures to estimate tail loss probabilities and conditional excess for a stock portfolio. The logreturns are assumed to follow a tcopula model with generalized hyperbolic or t marginals.

1099

Empirical Finance

rmgarch

Multivariate GARCH Models

Feasible multivariate GARCH models including DCC, GOGARCH and CopulaGARCH.

1100

Empirical Finance

RND

Risk Neutral Density Extraction Package

Extract the implied risk neutral density from options using various methods.

1101

Empirical Finance

rpatrec

Recognising Visual Charting Patterns in Time Series Data

Generating visual charting patterns and noise, smoothing to find a signal in noisy time series and enabling users to apply their findings to real life data.

1102

Empirical Finance

rpgm

Fast Simulation of Normal/Exponential Random Variables and Stochastic Differential Equations / Poisson Processes

Fast simulation of some random variables than the usual native functions, including rnorm() and rexp(), using Ziggurat method, reference: MARSAGLIA, George, TSANG, Wai Wan, and al. (2000) doi:10.18637/jss.v005.i08, and fast simulation of stochastic differential equations / Poisson processes.

1103

Empirical Finance

RQuantLib

R Interface to the ‘QuantLib’ Library

The ‘RQuantLib’ package makes parts of ‘QuantLib’ accessible from R The ‘QuantLib’ project aims to provide a comprehensive software framework for quantitative finance. The goal is to provide a standard open source library for quantitative analysis, modeling, trading, and risk management of financial assets.

1104

Empirical Finance

rugarch (core)

Univariate GARCH Models

ARFIMA, inmean, external regressors and various GARCH flavors, with methods for fit, forecast, simulation, inference and plotting.

1105

Empirical Finance

rwt

Rice Wavelet Toolbox wrapper

Provides a set of functions for performing digital signal processing.

1106

Empirical Finance

sandwich

Robust Covariance Matrix Estimators

Modelrobust standard error estimators for crosssectional, time series, clustered, panel, and longitudinal data.

1107

Empirical Finance

sde

Simulation and Inference for Stochastic Differential Equations

Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 9780387758381, Springer, NY.

1108

Empirical Finance

SharpeR

Statistical Significance of the Sharpe Ratio

A collection of tools for analyzing significance of trading strategies, based on the Sharpe ratio and overfit of the same.

1109

Empirical Finance

sharpeRratio

MomentFree Estimation of Sharpe Ratios

An efficient momentfree estimator of the Sharpe ratio, or signaltonoise ratio, for heavytailed data (see https://arxiv.org/abs/1505.01333).

1110

Empirical Finance

Sim.DiffProc

Simulation of Diffusion Processes

A package for symbolic and numerical computations on scalar and multivariate systems of stochastic differential equations. It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of these systems in both forms Ito and Stratonovich. Statistical analysis with Parallel MonteCarlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996).

1111

Empirical Finance

SmithWilsonYieldCurve

SmithWilson Yield Curve Construction

Constructs a yield curve by the SmithWilson method from a table of LIBOR and SWAP rates

1112

Empirical Finance

stochvol

Efficient Bayesian Inference for Stochastic Volatility (SV) Models

Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods.

1113

Empirical Finance

strucchange

Testing, Monitoring, and Dating Structural Changes

Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.

1114

Empirical Finance

TAQMNGR

Manage TickbyTick Transaction Data

Manager of tickbytick transaction data that performs ‘cleaning’, ‘aggregation’ and ‘import’ in an efficient and fast way. The package engine, written in C++, exploits the ‘zlib’ and ‘gzstream’ libraries to handle gzipped data without need to uncompress them. ‘Cleaning’ and ‘aggregation’ are performed according to Brownlees and Gallo (2006) doi:10.1016/j.csda.2006.09.030. Currently, TAQMNGR processes raw data from WRDS (Wharton Research Data Service, https://wrdsweb.wharton.upenn.edu/wrds/).

1115

Empirical Finance

tawny

Clean Covariance Matrices Using Random Matrix Theory and Shrinkage Estimators for Portfolio Optimization

Portfolio optimization typically requires an estimate of a covariance matrix of asset returns. There are many approaches for constructing such a covariance matrix, some using the sample covariance matrix as a starting point. This package provides implementations for two such methods: random matrix theory and shrinkage estimation. Each method attempts to clean or remove noise related to the sampling process from the sample covariance matrix.

1116

Empirical Finance

termstrc

Zerocoupon Yield Curve Estimation

The package offers a wide range of functions for term structure estimation based on static and dynamic coupon bond and yield data sets. The implementation focuses on the cubic splines approach of McCulloch (1971, 1975) and the Nelson and Siegel (1987) method with extensions by Svensson (1994), Diebold and Li (2006) and De Pooter (2007). We propose a weighted constrained optimization procedure with analytical gradients and a globally optimal start parameter search algorithm. Extensive summary statistics and plots are provided to compare the results of the different estimation methods. Several demos are available using data from European government bonds and yields.

1117

Empirical Finance

TFX

R API to TrueFX(tm)

Connects R to TrueFX(tm) for free streaming realtime and historical tickbytick market data for dealable interbank foreign exchange rates with millisecond detail.

1118

Empirical Finance

tidyquant

Tidy Quantitative Financial Analysis

Bringing financial analysis to the ‘tidyverse’. The ‘tidyquant’ package provides a convenient wrapper to various ‘xts’, ‘zoo’, ‘quantmod’, ‘TTR’ and ‘PerformanceAnalytics’ package functions and returns the objects in the tidy ‘tibble’ format. The main advantage is being able to use quantitative functions with the ‘tidyverse’ functions including ‘purrr’, ‘dplyr’, ‘tidyr’, ‘ggplot2’, ‘lubridate’, etc. See the ‘tidyquant’ website for more information, documentation and examples.

1119

Empirical Finance

timeDate (core)

Rmetrics  Chronological and Calendar Objects

The ‘timeDate’ class fulfils the conventions of the ISO 8601 standard as well as of the ANSI C and POSIX standards. Beyond these standards it provides the “Financial Center” concept which allows to handle data records collected in different time zones and mix them up to have always the proper time stamps with respect to your personal financial center, or alternatively to the GMT reference time. It can thus also handle time stamps from historical data records from the same time zone, even if the financial centers changed day light saving times at different calendar dates.

1120

Empirical Finance

timeSeries (core)

Rmetrics  Financial Time Series Objects

Provides a class and various tools for financial time series. This includes basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.

1121

Empirical Finance

timsac

Time Series Analysis and Control Package

Functions for statistical analysis, prediction and control of time series.

1122

Empirical Finance

tis

Time Indexes and Time Indexed Series

Functions and S3 classes for time indexes and time indexed series, which are compatible with FAME frequencies.

1123

Empirical Finance

TSdbi

Time Series Database Interface

Provides a common interface to time series databases. The objective is to define a standard interface so users can retrieve time series data from various sources with a simple, common, set of commands, and so programs can be written to be portable with respect to the data source. The SQL implementations also provide a database table design, so users needing to set up a time series database have a reasonably complete way to do this easily. The interface provides for a variety of options with respect to the representation of time series in R. The interface, and the SQL implementations, also handle vintages of time series data (sometime called editions or realtime data). There is also a (not yet well tested) mechanism to handle multilingual data documentation. Comprehensive examples of all the ’TS*‘packages is provided in the vignette Guide.pdf with the ’TSdata’ package.

1124

Empirical Finance

tsDyn

Nonlinear Time Series Models with Regime Switching

Implements nonlinear autoregressive (AR) time series models. For univariate series, a nonparametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).

1125

Empirical Finance

tseries (core)

Time Series Analysis and Computational Finance

Time series analysis and computational finance.

1126

Empirical Finance

tseriesChaos

Analysis of nonlinear time series

Routines for the analysis of nonlinear time series. This work is largely inspired by the TISEAN project, by Rainer Hegger, Holger Kantz and Thomas Schreiber: http://www.mpipksdresden.mpg.de/~tisean/

1127

Empirical Finance

tsfa

Time Series Factor Analysis

Extraction of Factors from Multivariate Time Series. See ?00tsfaIntro for more details.

1128

Empirical Finance

TTR

Technical Trading Rules

Functions and data to construct technical trading rules with R.

1129

Empirical Finance

tvm

Time Value of Money Functions

Functions for managing cashflows and interest rate curves.

1130

Empirical Finance

urca (core)

Unit Root and Cointegration Tests for Time Series Data

Unit root and cointegration tests encountered in applied econometric analysis are implemented.

1131

Empirical Finance

vars

VAR Modelling

Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.

1132

Empirical Finance

VarSwapPrice

Pricing a variance swap on an equity index

Computes a portfolio of European options that replicates the cost of capturing the realised variance of an equity index.

1133

Empirical Finance

vrtest

Variance Ratio tests and other tests for Martingale Difference Hypothesis

A collection of statistical tests for martingale difference hypothesis

1134

Empirical Finance

wavelets

A package of functions for computing wavelet filters, wavelet transforms and multiresolution analyses

This package contains functions for computing and plotting discrete wavelet transforms (DWT) and maximal overlap discrete wavelet transforms (MODWT), as well as their inverses. Additionally, it contains functionality for computing and plotting wavelet transform filters that are used in the above decompositions as well as multiresolution analyses.

1135

Empirical Finance

waveslim

Basic wavelet routines for one, two and threedimensional signal processing

Basic wavelet routines for time series (1D), image (2D) and array (3D) analysis. The code provided here is based on wavelet methodology developed in Percival and Walden (2000); Gencay, Selcuk and Whitcher (2001); the dualtree complex wavelet transform (DTCWT) from Kingsbury (1999, 2001) as implemented by Selesnick; and Hilbert wavelet pairs (Selesnick 2001, 2002). All figures in chapters 47 of GSW (2001) are reproducible using this package and R code available at the book website(s) below.

1136

Empirical Finance

wavethresh

Wavelets Statistics and Transforms

Performs 1, 2 and 3D real and complexvalued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complexvalued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.

1137

Empirical Finance

XBRL

Extraction of Business Financial Information from ‘XBRL’ Documents

Functions to extract business financial information from an Extensible Business Reporting Language (‘XBRL’) instance file and the associated collection of files that defines its ‘Discoverable’ Taxonomy Set (‘DTS’).

1138

Empirical Finance

xts (core)

eXtensible Time Series

Provide for uniform handling of R’s different timebased data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying crossclass interoperability.

1139

Empirical Finance

ycinterextra

Yield curve or zerocoupon prices interpolation and extrapolation

Yield curve or zerocoupon prices interpolation and extrapolation using the NelsonSiegel, Svensson, SmithWilson models, and Hermite cubic splines.

1140

Empirical Finance

YieldCurve

Modelling and estimation of the yield curve

Modelling the yield curve with some parametric models. The models implemented are: NelsonSiegel, DieboldLi and Svensson. The package also includes the data of the term structure of interest rate of Federal Reserve Bank and European Central Bank.

1141

Empirical Finance

Zelig

Everyone’s Statistical Software

A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical workflowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and reweighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.

1142

Empirical Finance

zoo (core)

S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)

An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.

1143

Functional Data Analysis

classiFunc

Classification of Functional Data

Efficient implementation of knearest neighbor estimator and a kernel estimator for functional data classification.

1144

Functional Data Analysis

covsep

Tests for Determining if the Covariance Structure of 2Dimensional Data is Separable

Functions for testing if the covariance structure of 2dimensional data (e.g. samples of surfaces X_i = X_i(s,t)) is separable, i.e. if covariance(X) = C_1 x C_2. A complete descriptions of the implemented tests can be found in the paper arXiv:1505.02023.

1145

Functional Data Analysis

dbstats

DistanceBased Statistics

Prediction methods where explanatory information is coded as a matrix of distances between individuals. Distances can either be directly input as a distances matrix, a squared distances matrix, an innerproducts matrix or computed from observed predictors.

1146

Functional Data Analysis

denseFLMM

Functional Linear Mixed Models for Densely Sampled Data

Estimation of functional linear mixed models for densely sampled data based on functional principal component analysis.

1147

Functional Data Analysis

fda (core)

Functional Data Analysis

These functions were developed to support functional data analysis as described in Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer. They were ported from earlier versions in Matlab and SPLUS. An introduction appears in Ramsay, J. O., Hooker, Giles, and Graves, Spencer (2009) Functional Data Analysis with R and Matlab (Springer). The package includes data sets and script files working many examples including all but one of the 76 figures in this latter book. Matlab versions of the code and sample analyses are no longer distributed through CRAN, as they were when the book was published. For those, ftp from http://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/ There you find a set of .zip files containing the functions and sample analyses, as well as two .txt files giving instructions for installation and some additional information. The changes from Version 2.4.1 are fixes of bugs in density.fd and removal of functions create.polynomial.basis, polynompen, and polynomial. These were deleted because the monomial basis does the same thing and because there were errors in the code.

1148

Functional Data Analysis

fda.usc (core)

Functional Data Analysis and Utilities for Statistical Computing

Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.

1149

Functional Data Analysis

fdakma

Functional Data Analysis: KMean Alignment

It performs simultaneously clustering and alignment of a multidimensional or unidimensional functional dataset by means of kmean alignment.

1150

Functional Data Analysis

fdapace (core)

Functional Data Analysis and Empirical Dynamics

Provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm or numerical integration. PACE is useful for the analysis of data that have been generated by a sample of underlying (but usually not fully observed) random trajectories. It does not rely on presmoothing of trajectories, which is problematic if functional data are sparsely sampled. PACE provides options for functional regression and correlation, for Longitudinal Data Analysis, the analysis of stochastic processes from samples of realized trajectories, and for the analysis of underlying dynamics. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

1151

Functional Data Analysis

fdaPDE

Functional Data Analysis and Partial Differential Equations; Statistical Analysis of Functional and Spatial Data, Based on Regression with Partial Differential Regularizations

An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization.

1152

Functional Data Analysis

fdasrvf (core)

Elastic Functional Data Analysis

Performs alignment, PCA, and modeling of multidimensional and unidimensional functions using the squareroot velocity framework (Srivastava et al., 2011 <arXiv:1103.3817> and Tucker et al., 2014 doi:10.1016/j.csda.2012.12.001). This framework allows for elastic analysis of functional data through phase and amplitude separation.

1153

Functional Data Analysis

fdatest

Interval Testing Procedure for Functional Data

Implementation of the Interval Testing Procedure for functional data in different frameworks (i.e., one or twopopulation frameworks, functional linear models) by means of different basis expansions (i.e., Bspline, Fourier, and phaseamplitude Fourier). The current version of the package requires functional data evaluated on a uniform grid; it automatically projects each function on a chosen functional basis; it performs the entire family of multivariate tests; and, finally, it provides the matrix of the pvalues of the previous tests and the vector of the corrected pvalues. The functional basis, the coupled or uncoupled scenario, and the kind of test can be chosen by the user. The package provides also a plotting function creating a graphical output of the procedure: the pvalue heatmap, the plot of the corrected pvalues, and the plot of the functional data.

1154

Functional Data Analysis

FDboost (core)

Boosting Functional Regression Models

Regression models for functional data, i.e., scalaronfunction, functiononscalar and functiononfunction regression models, are fitted by a componentwise gradient boosting algorithm.

1155

Functional Data Analysis

fdcov

Analysis of Covariance Operators

Provides a variety of tools for the analysis of covariance operators.

1156

Functional Data Analysis

fds

Functional data sets

Functional data sets

1157

Functional Data Analysis

flars

Functional LARS

Variable selection algorithm for functional linear regression with scalar response variable and mixed scalar/functional predictors.

1158

Functional Data Analysis

fpca

Restricted MLE for Functional Principal Components Analysis

A geometric approach to MLE for functional principal components

1159

Functional Data Analysis

freqdom

Frequency Domain Based Analysis: Dynamic PCA

Implementation of dynamic principal component analysis (DPCA), simulation of VAR and VMA processes and frequency domain tools. These frequency domain methods for dimensionality reduction of multivariate time series were introduced by David Brillinger in his book Time Series (1974). We follow implementation guidelines as described in Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component doi:10.1111/rssb.12076.

1160

Functional Data Analysis

freqdom.fda

Functional Time Series: Dynamic Functional Principal Components

Implementations of functional dynamic principle components analysis. Related graphic tools and frequency domain methods. These methods directly use multivariate dynamic principal components implementation, following the guidelines from Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component doi:10.1111/rssb.12076.

1161

Functional Data Analysis

ftsa (core)

Functional Time Series Analysis

Functions for visualizing, modeling, forecasting and hypothesis testing of functional time series.

1162

Functional Data Analysis

ftsspec

Spectral Density Estimation and Comparison for Functional Time Series

Functions for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length.

1163

Functional Data Analysis

Funclustering

A package for functional data clustering

This packages proposes a modelbased clustering algorithm for multivariate functional data. The parametric mixture model, based on the assumption of normality of the principal components resulting from a multivariate functional PCA, is estimated by an EMlike algorithm. The main advantage of the proposed algorithm is its ability to take into account the dependence among curves.

1164

Functional Data Analysis

funcy (core)

Functional Clustering Algorithms

Unified framework to cluster functional data according to one of seven models. All models are based on the projection of the curves onto a basis. The main function funcit() calls wrapper functions for the existing algorithms, so that input parameters are the same. A list is returned with each entry representing the same or extended output for the corresponding method. Method specific as well as general visualization tools are available.

1165

Functional Data Analysis

funData

An S4 Class for Functional Data

S4 classes for univariate and multivariate functional data with utility functions.

1166

Functional Data Analysis

funFEM

Clustering in the Discriminative Functional Subspace

The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.

1167

Functional Data Analysis

funHDDC

Modelbased clustering in groupspecific functional subspaces

The package provides the funHDDC algorithm (Bouveyron & Jacques, 2011) which allows to cluster functional data by modeling each group within a specific functional subspace.

1168

Functional Data Analysis

geofd

Spatial Prediction for Function Value Data

Kriging based methods are used for predicting functional data (curves) with spatial dependence.

1169

Functional Data Analysis

GPFDA

Apply Gaussian Process in Functional data analysis

Use functional regression as the mean structure and Gaussian Process as the covariance structure.

1170

Functional Data Analysis

growfunctions

Bayesian NonParametric Dependent Models for TimeIndexed Functional Data

Estimates a collection of timeindexed functions under either of Gaussian process (GP) or intrinsic Gaussian Markov random field (iGMRF) prior formulations where a Dirichlet process mixture allows subgroupings of the functions to share the same covariance or precision parameters. The GP and iGMRF formulations both support any number of additive covariance or precision terms, respectively, expressing either or both of multiple trend and seasonality.

1171

Functional Data Analysis

MFPCA

Multivariate Functional Principal Component Analysis for Data Observed on Different Dimensional Domains

Calculate a multivariate functional principal component analysis for data observed on different dimensional domains. The estimation algorithm relies on univariate basis expansions for each element of the multivariate functional data. Multivariate and univariate functional data objects are represented by S4 classes for this type of data implemented in the package ‘funData’.

1172

Functional Data Analysis

pcdpca

Dynamic Principal Components for Periodically Correlated Functional Time Series

Method extends multivariate and functional dynamic principal components to periodically correlated multivariate time series. This package allows you to compute true dynamic principal components in the presence of periodicity. We follow implementation guidelines as described in Kidzinski, Kokoszka and Jouzdani (2017), in Principal component analysis of periodically correlated functional time series <arXiv:1612.00040>.

1173

Functional Data Analysis

rainbow

Rainbow Plots, Bagplots and Boxplots for Functional Data

Functions and data sets for functional data display and outlier detection.

1174

Functional Data Analysis

refund (core)

Regression with Functional Data

Methods for regression for functional data, including functiononscalar, scalaronfunction, and functiononfunction regression. Some of the functions are applicable to image data.

1175

Functional Data Analysis

refund.shiny

Interactive Plotting for Functional Data Analyses

Interactive plotting for functional data analyses.

1176

Functional Data Analysis

refund.wave

WaveletDomain Regression with Functional Data

Methods for regressing scalar responses on functional or image predictors, via transformation to the wavelet domain and back.

1177

Functional Data Analysis

RFgroove

Importance Measure and Selection for Groups of Variables with Random Forests

Variable selection tools for groups of variables and functional data based on a new grouped variable importance with random forests.

1178

Functional Data Analysis

roahd

Robust Analysis of High Dimensional Data

A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in highdimensional cases, and hence with attention to computational efficiency and simplicity of use.

1179

Functional Data Analysis

sparseFLMM

Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data

Estimation of functional linear mixed models for irregularly or sparsely sampled data based on functional principal component analysis.

1180

Functional Data Analysis

switchnpreg

Switching nonparametric regression models for a single curve and functional data

Functions for estimating the parameters from the latent state process and the functions corresponding to the J states as proposed by De Souza and Heckman (2013).

1181

Functional Data Analysis

warpMix

Mixed Effects Modeling with Warping for Functional Data Using BSpline

Mixed effects modeling with warping for functional data using B spline. Warping coefficients are considered as random effects, and warping functions are general functions, parameters representing the projection onto B spline basis of a part of the warping functions. Warped data are modelled by a linear mixed effect functional model, the noise is Gaussian and independent from the warping functions.

1182

Statistical Genetics

adegenet

Exploratory Analysis of Genetic and Genomic Data

Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure (‘genind’ class), alleles counts by populations (‘genpop’), and genomewide SNP data (‘genlight’). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.

1183

Statistical Genetics

ape

Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clocklike trees using mean path lengths and penalized likelihood, dating trees with noncontemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, TCoffee, Muscle) whose results are returned into R.

1184

Statistical Genetics

Biodem

Biodemography Functions

The Biodem package provides a number of functions for Biodemographic analysis.

1185

Statistical Genetics

bqtl

Bayesian QTL Mapping Toolkit

QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools.

1186

Statistical Genetics

dlmap

Detection Localization Mapping for QTL

QTL mapping in a mixed model framework with separate detection and localization stages. The first stage detects the number of QTL on each chromosome based on the genetic variation due to grouped markers on the chromosome; the second stage uses this information to determine the most likely QTL positions. The mixed model can accommodate general fixed and random effects, including spatial effects in field trials and pedigree effects. Applicable to backcrosses, doubled haploids, recombinant inbred lines, F2 intercrosses, and association mapping populations.

1187

Statistical Genetics

gap (core)

Genetic Analysis Package

It is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both populationbased and familybased designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates.

1188

Statistical Genetics

GenABEL

genomewide SNP association analysis

a package for genomewide association analysis between quantitative or binary traits and singlenucleotide polymorphisms (SNPs).

1189

Statistical Genetics

genetics (core)

Population Genetics

Classes and methods for handling genetic data. Includes classes to represent genotypes and haplotypes at single markers up to multiple markers on multiple chromosomes. Function include allele frequencies, flagging homo/heterozygotes, flagging carriers of certain alleles, estimating and testing for HardyWeinberg disequilibrium, estimating and testing for linkage disequilibrium, …

1190

Statistical Genetics

hapassoc

Inference of Trait Associations with SNP Haplotypes and Other Attributes using the EM Algorithm

The following R functions are used for inference of trait associations with haplotypes and other covariates in generalized linear models. The functions are developed primarily for data collected in cohort or crosssectional studies. They can accommodate uncertain haplotype phase and handle missing genotypes at some SNPs.

1191

Statistical Genetics

haplo.ccs

Estimate Haplotype Relative Risks in CaseControl Data

‘haplo.ccs’ estimates haplotype and covariate relative risks in casecontrol data by weighted logistic regression. Diplotype probabilities, which are estimated by EM computation with progressive insertion of loci, are utilized as weights.

1192

Statistical Genetics

haplo.stats (core)

Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous

Routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The main functions are: haplo.em(), haplo.glm(), haplo.score(), and haplo.power(); all of which have detailed examples in the vignette.

1193

Statistical Genetics

HardyWeinberg

Statistical Tests and Graphics for HardyWeinberg Equilibrium

Contains tools for exploring HardyWeinberg equilibrium for diallelic genetic marker data. All classical tests (chisquare, exact, likelihoodratio and permutation tests) for HardyWeinberg equilibrium are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the Xchromosome are included. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of diallelic markers: ternary plots with acceptance regions, logratio plots and QQ plots.

1194

Statistical Genetics

hierfstat

Estimation and Tests of Hierarchical FStatistics

Allows the estimation of hierarchical Fstatistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution, 1998, 52(4):950956; doi:10.2307/2411227. Functions are also given to test via randomisations the significance of each F and variance components, using the likelihoodratio statistics G.

1195

Statistical Genetics

hwde

Models and Tests for Departure from HardyWeinberg Equilibrium and Independence Between Loci

Fits models for genotypic disequilibria, as described in Huttley and Wilson (2000), Weir (1996) and Weir and Wilson (1986). Contrast terms are available that account for first order interactions between loci. Also implements, for a single locus in a single population, a conditional exact test for HardyWeinberg equilibrium.

1196

Statistical Genetics

ibdreg

Regression Methods for IBD Linkage With Covariates

A method to test genetic linkage with covariates by regression methods with response IBD sharing for relative pairs. Account for correlations of IBD statistics and covariates for relative pairs within the same pedigree.

1197

Statistical Genetics

LDheatmap

Graphical Display of Pairwise Linkage Disequilibria Between SNPs

Produces a graphical display, as a heat map, of measures of pairwise linkage disequilibria between SNPs. Users may optionally include the physical locations or genetic map distances of each SNP on the plot.

1198

Statistical Genetics

luca

Likelihood inference from casecontrol data Under Covariate Assumptions (LUCA)

Likelihood inference in casecontrol studies of a rare disease under independence or simple dependence of genetic and nongenetic covariates

1199

Statistical Genetics

ouch

OrnsteinUhlenbeck Models for Phylogenetic Comparative Hypotheses

Fit and compare OrnsteinUhlenbeck models for evolution along a phylogenetic tree.

1200

Statistical Genetics

pbatR

P2BAT

This package provides data analysis via the pbat program, and an alternative internal implementation of the power calculations via simulation only. For analysis, this package provides a frontend to the PBAT software, automatically reading in the output from the pbat program and displaying the corresponding figure when appropriate (i.e. PBATlogrank). It includes support for multiple processes and clusters. For analysis, users must download PBAT (developed by Christoph Lange) and accept it’s license, available on the PBAT webpage. Both the data analysis and power calculations have command line and graphical interfaces using tcltk.

1201

Statistical Genetics

phangorn

Phylogenetic Reconstruction and Analysis

Package contains methods for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation. Allows to compare trees, models selection and offers visualizations for trees and split networks.

1202

Statistical Genetics

qtl

Tools for Analyzing QTL Experiments

Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits.

1203

Statistical Genetics

rmetasim

An IndividualBased Population Genetic Simulation Environment

An interface between R and the metasim simulation engine. The simulation environment is documented in: “Strand, A.(2002) doi:10.1046/j.14718286.2002.00208.x Metasim 1.0: an individualbased environment for simulating population genetics of complex population dynamics. Mol. Ecol. Notes. Please see the vignettes CreatingLandscapes and Simulating to get some ideas on how to use the packages. See the rmetasim vignette to get an overview and to see important changes to the code in the most recent version.

1204

Statistical Genetics

seqinr

Biological Sequences Retrieval and Analysis

Exploratory data analysis and data visualization for biological sequence (DNA and protein) data. Seqinr includes utilities for sequence data management under the ACNUC system described in Gouy, M. et al. (1984) Nucleic Acids Res. 12:121127 doi:10.1093/nar/12.1Part1.121.

1205

Statistical Genetics

snp.plotter

snp.plotter

Creates plots of pvalues using single SNP and/or haplotype data. Main features of the package include options to display a linkage disequilibrium (LD) plot and the ability to plot multiple datasets simultaneously. Plots can be created using global and/or individual haplotype pvalues along with single SNP pvalues. Images are created as either PDF/EPS files.

1206

Statistical Genetics

SNPmaxsel

Maximally selected statistics for SNP data

This package implements asymptotic methods related to maximally selected statistics, with applications to SNP data.

1207

Statistical Genetics

stepwise

Stepwise detection of recombination breakpoints

A stepwise approach to identifying recombination breakpoints in a sequence alignment.

1208

Statistical Genetics

tdthap

TDT tests for extended haplotypes

Transmission/disequilibrium tests for extended marker haplotypes

1209

Statistical Genetics

untb

ecological drift under the UNTB

A collection of utilities for biodiversity data. Includes the simulation of ecological drift under Hubbell’s Unified Neutral Theory of Biodiversity, and the calculation of various diagnostics such as Preston curves. Now includes functionality provided by Francois Munoz and Andrea Manica.

1210

Statistical Genetics

wgaim

Whole Genome Average Interval Mapping for QTL Detection using Mixed Models

Integrates sophisticated mixed modelling methods with a whole genome approach to detecting significant QTL in linkage maps.

1211

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

ade4

Analysis of Ecological Data : Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of onetable (e.g., principal component analysis, correspondence analysis), twotable (e.g., coinertia analysis, redundancy analysis), threetable (e.g., RLQ analysis) and Ktable (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) doi:10.18637/jss.v022.i04.

1212

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

animation

A Gallery of Animations in Statistics and Utilities to Create Animations

Provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, nonparametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. Flash, ‘GIF’, HTML pages, ‘PDF’ and videos. ‘PDF’ animations can be inserted into ‘Sweave’ / ‘knitr’ easily.

1213

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

ape

Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clocklike trees using mean path lengths and penalized likelihood, dating trees with noncontemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, TCoffee, Muscle) whose results are returned into R.

1214

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

aplpack

Another Plot PACKage: stem.leaf, bagplot, faces, spin3R, plotsummary, plothulls, and some slider functions

set of functions for drawing some special plots: stem.leaf plots a stem and leaf plot, stem.leaf.backback plots backtoback versions of stem and leafs, bagplot plots a bagplot, skyline.hist plots several histgramm in one plot of a one dimensional data set, plotsummary plots a graphical summary of a data set with one or more variables, plothulls plots sequentially hulls of a bivariate data set, faces plots chernoff faces, spin3R for an inspection of a 3dim point cloud, slider functions for interactive graphics.

1215

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

ash

David Scott’s ASH Routines

David Scott’s ASH routines ported from SPLUS to R.

1216

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

biclust

BiCluster Algorithms

The main function biclust provides several algorithms to find biclusters in twodimensional data: Cheng and Church, Spectral, Plaid Model, Xmotifs and Bimax. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.

1217

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

Cairo

R graphics device using cairo graphics library for creating highquality bitmap (PNG, JPEG, TIFF), vector (PDF, SVG, PostScript) and display (X11 and Win32) output

Cairo graphics device that can be use to create highquality vector (PDF, PostScript and SVG) and bitmap output (PNG,JPEG,TIFF), and highquality rendering in displays (X11 and Win32). Since it uses the same backend for all output, copying across formats is WYSIWYG. Files are created without the dependence on X11 or other external programs. This device supports alpha channel (semitransparent drawing) and resulting images can contain transparent and semitransparent regions. It is ideal for use in server environments (file output) and as a replacement for other devices that don’t have Cairo’s capabilities such as alpha support or antialiasing. Backends are modular such that any subset of backends is supported.

1218

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

cairoDevice

Embeddable Cairo Graphics Device Driver

This device uses Cairo and GTK to draw to the screen, file (png, svg, pdf, and ps) or memory (arbitrary GdkDrawable or Cairo context). The screen device may be embedded into RGtk2 interfaces and supports all interactive features of other graphics devices, including getGraphicsEvent().

1219

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

cba

Clustering for Business Analytics

Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.

1220

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

colorspace

Color Space Manipulation

Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB. Qualitative, sequential, and diverging color palettes based on HCL colors are provided along with an interactive palette picker (with either a Tcl/Tk or a shiny GUI).

1221

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

diagram

Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams

Visualises simple graphs (networks) based on a transition matrix, utilities to plot flow diagrams, visualising webs, electrical networks, etc. Support for the book “A practical guide to ecological modelling  using R as a simulation platform” by Karline Soetaert and Peter M.J. Herman (2009), Springer. and the book “Solving Differential Equations in R” by Karline Soetaert, Jeff Cash and Francesca Mazzia (2012), Springer. Includes demo(flowchart), demo(plotmat), demo(plotweb).

1222

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

dichromat

Color Schemes for Dichromats

Collapse redgreen or greenblue distinctions to simulate the effects of different types of colorblindness.

1223

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

gclus

Clustering Graphics

Orders panels in scatterplot matrices and parallel coordinate displays by some merit index. Package contains various indices of merit, ordering functions, and enhanced versions of pairs and parcoord which color panels according to their merit level.

1224

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

ggplot2 (core)

Create Elegant Data Visualisations Using the Grammar of Graphics

A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

1225

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

gplots

Various R Programming Tools for Plotting Data

Various R programming tools for plotting data, including:  calculating and plotting locally smoothed summary function as (‘bandplot’, ‘wapply’),  enhanced versions of standard plots (‘barplot2’, ‘boxplot2’, ‘heatmap.2’, ‘smartlegend’),  manipulating colors (‘col2hex’, ‘colorpanel’, ‘redgreen’, ‘greenred’, ‘bluered’, ‘redblue’, ‘rich.colors’),  calculating and plotting twodimensional data summaries (‘ci2d’, ‘hist2d’),  enhanced regression diagnostic plots (‘lmplot2’, ‘residplot’),  formulaenabled interface to ‘stats::lowess’ function (‘lowess’),  displaying textual data in plots (‘textplot’, ‘sinkplot’),  plotting a matrix where each cell contains a dot whose size reflects the relative magnitude of the elements (‘balloonplot’),  plotting “Venn” diagrams (‘venn’),  displaying OpenOffice style plots (‘ooplot’),  plotting multiple data on same region, with separate axes (‘overplot’),  plotting means and confidence intervals (‘plotCI’, ‘plotmeans’),  spacing points in an xy plot so they don’t overlap (‘space’).

1226

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

gridBase

Integration of base and grid graphics

Integration of base and grid graphics

1227

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

hexbin

Hexagonal Binning Routines

Binning and plotting functions for hexagonal bins. Now uses and relies on grid graphics and formal (S4) classes and methods.

1228

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

IDPmisc

Utilities of Institute of Data Analyses and Process Design (www.idp.zhaw.ch)

The IDPmisc package contains different highlevel graphics functions for displaying large datasets, displaying circular data in a very flexible way, finding local maxima, brewing color ramps, drawing nice arrows, zooming 2Dplots, creating figures with differently colored margin and plot region. In addition, the package contains auxiliary functions for data manipulation like omitting observations with irregular values or selecting data by logical vectors, which include NAs. Other functions are especially useful in spectroscopy and analyses of environmental data: robust baseline fitting, finding peaks in spectra.

1229

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

igraph

Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

1230

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

iplots

iPlots  interactive graphics for R

Interactive plots for R

1231

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

JavaGD

Java Graphics Device

Graphics device routing all graphics commands to a Java program. The actual functionality of the JavaGD depends on the Javaside implementation. Simple AWT and Swing implementations are included.

1232

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

klaR

Classification and visualization

Miscellaneous functions for classification and visualization developed at the Fakultaet Statistik, Technische Universitaet Dortmund

1233

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

lattice (core)

Trellis Graphics for R

A powerful and elegant highlevel data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.

1234

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

latticeExtra

Extra Graphical Utilities Based on Lattice

Building on the infrastructure provided by the lattice package, this package provides several new highlevel functions and methods, as well as additional utilities such as panel and axis annotation functions.

1235

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

misc3d

Miscellaneous 3D Plots

A collection of miscellaneous 3d plots, including isosurfaces.

1236

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

onion

Octonions and Quaternions

Quaternions and Octonions are four and eight dimensional extensions of the complex numbers. They are normed division algebras over the real numbers and find applications in spatial rotations (quaternions) and string theory and relativity (octonions). The quaternions are noncommutative and the octonions nonassociative. See RKS Hankin 2006, Rnews Volume 6/2: 4951, and the package vignette, for more details.

1237

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

playwith

A GUI for interactive plots using GTK+

A GTK+ graphical user interface for editing and interacting with R plots.

1238

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

plotrix (core)

Various Plotting Functions

Lots of plots, various labeling, axis and color scaling functions.

1239

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RColorBrewer (core)

ColorBrewer Palettes

Provides color schemes for maps (and other graphics) designed by Cynthia Brewer as described at http://colorbrewer2.org

1240

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

rggobi

Interface Between R and ‘GGobi’

A commandline interface to ‘GGobi’, an interactive and dynamic graphics package. ‘Rggobi’ complements the graphical user interface of ‘GGobi’ providing a way to fluidly transition between analysis and exploration, as well as automating common tasks.

1241

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

rgl (core)

3D Visualization Using OpenGL

Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.

1242

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RGraphics

Data and Functions from the Book R Graphics, Second Edition

Data and Functions from the book R Graphics, Second Edition. There is a function to produce each figure in the book, plus several functions, classes, and methods defined in Chapter 8.

1243

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RGtk2

R Bindings for Gtk 2.8.0 and Above

Facilities in the R language for programming graphical interfaces using Gtk, the Gimp Tool Kit.

1244

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RSvgDevice

An R SVG graphics device

A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics.

1245

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

RSVGTipsDevice

An R SVG Graphics Device with Dynamic Tips and Hyperlinks

A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics. This version supports tooltips with 1 to 3 lines, hyperlinks, and line styles.

1246

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

scagnostics

Compute scagnostics  scatterplot diagnostics

Calculates graph theoretic scagnostics. Scagnostics describe various measures of interest for pairs of variables, based on their appearance on a scatterplot. They are useful tool for discovering interesting or unusual scatterplots from a scatterplot matrix, without having to look at every individual plot.

1247

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

scatterplot3d

3D Scatter Plot

Plots a three dimensional (3D) point cloud.

1248

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

seriation

Infrastructure for Ordering Objects Using Seriation

Infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

1249

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

tkrplot

TK Rplot

simple mechanism for placing R graphics in a Tk widget

1250

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

vcd (core)

Visualizing Categorical Data

Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).

1251

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

vioplot

Violin plot

A violin plot is a combination of a box plot and a kernel density plot.

1252

Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

xgobi

Interface to the XGobi and XGvis programs for graphical data analysis

Interface to the XGobi and XGvis programs for graphical data analysis.

1253

HighPerformance and Parallel Computing with R

aprof

Amdahl’s Profiler, Directed Optimization Made Easy

Assists the evaluation of whether and where to focus code optimization, using Amdahl’s law and visual aids based on line profiling. Amdahl’s profiler organises profiling output files (including memory profiling) in a visually appealing way. It is meant to help to balance development vs. execution time by helping to identify the most promising sections of code to optimize and projecting potential gains. The package is an addition to R’s standard profiling tools and is not a wrapper for them.

1254

HighPerformance and Parallel Computing with R

batch

Batching Routines in Parallel and Passing CommandLine Arguments to R

Functions to allow you to easily pass commandline arguments into R, and functions to aid in submitting your R code in parallel on a cluster and joining the results afterward (e.g. multiple parameter values for simulations running in parallel, splitting up a permutation test in parallel, etc.). See ‘parseCommandArgs(…)’ for the main example of how to use this package.

1255

HighPerformance and Parallel Computing with R

BatchExperiments

Statistical Experiments on Batch Computing Clusters

Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.

1256

HighPerformance and Parallel Computing with R

BatchJobs

Batch Computing with R

Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.

1257

HighPerformance and Parallel Computing with R

batchtools

Tools for Computation on Batch Systems

As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers ‘IBM Spectrum LSF’ (http://www03.ibm.com/systems/spectrumcomputing/products/lsf/), ‘OpenLava’ (http://www.openlava.org/), ‘Univa Grid Engine’/‘Oracle Grid Engine’ (http://www.univa.com/), ‘Slurm’ (http://slurm.schedmd.com/), ‘TORQUE/PBS’ (http://www.adaptivecomputing.com/products/opensource/torque/), or ‘Docker Swarm’ (https://docs.docker.com/swarm/). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define largescale computer experiments in a wellorganized and reproducible way.

1258

HighPerformance and Parallel Computing with R

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

1259

HighPerformance and Parallel Computing with R

bcp

Bayesian Analysis of Change Point Problems

Provides an implementation of the Barry and Hartigan (1993) product partition model for the normal errors change point problem using Markov Chain Monte Carlo. It also extends the methodology to regression models on a connected graph (Wang and Emerson, 2015); this allows estimation of change point models with multivariate responses. Parallel MCMC, previously available in bcp v.3.0.0, is currently not implemented.

1260

HighPerformance and Parallel Computing with R

biglars

Scalable LeastAngle Regression and Lasso

Leastangle regression, lasso and stepwise regression for numeric datasets in which the number of observations is greater than the number of predictors. The functions can be used with the ff library to accomodate datasets that are too large to be held in memory.

1261

HighPerformance and Parallel Computing with R

biglm

bounded memory linear and generalized linear models

Regression for data too large to fit in memory

1262

HighPerformance and Parallel Computing with R

bigmemory

Manage Massive Matrices with Shared Memory and MemoryMapped Files

Create, store, access, and manipulate massive matrices. Matrices are allocated to shared memory and may use memorymapped files. Packages ‘biganalytics’, ‘bigtabulate’, ‘synchronicity’, and ‘bigalgebra’ provide advanced functionality.

1263

HighPerformance and Parallel Computing with R

bnlearn

Bayesian Network Structure Learning, Parameter Learning and Inference

Bayesian network structure learning, parameter learning and inference. This package implements constraintbased (GS, IAMB, InterIAMB, FastIAMB, MMPC, HitonPC), pairwise (ARACNE and ChowLiu), scorebased (HillClimbing and Tabu Search) and hybrid (MMHC and RSMAX2) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the TreeAugmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries and crossvalidation. Development snapshots with the latest bugfixes are available from http://www.bnlearn.com.

1264

HighPerformance and Parallel Computing with R

caret

Classification and Regression Training

Misc functions for training and plotting classification and regression models.

1265

HighPerformance and Parallel Computing with R

cudaBayesreg

CUDA Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

Compute Unified Device Architecture (CUDA) is a software platform for massively parallel highperformance computing on NVIDIA GPUs. This package provides a CUDA implementation of a Bayesian multilevel model for the analysis of brain fMRI data. A fMRI data set consists of time series of volume data in 4D space. Typically, volumes are collected as slices of 64 x 64 voxels. Analysis of fMRI data often relies on fitting linear regression models at each voxel of the brain. The volume of the data to be processed, and the type of statistical analysis to perform in fMRI analysis, call for highperformance computing strategies. In this package, the CUDA programming model uses a separate thread for fitting a linear regression model at each voxel in parallel. The global statistical model implements a Gibbs Sampler for hierarchical linear models with a normal prior. This model has been proposed by Rossi, Allenby and McCulloch in ‘Bayesian Statistics and Marketing’, Chapter 3, and is referred to as ‘rhierLinearModel’ in the Rpackage bayesm. A notebook equipped with a NVIDIA ‘GeForce 8400M GS’ card having Compute Capability 1.1 has been used in the tests. The data sets used in the package’s examples are available in the separate package cudaBayesregData.

1266

HighPerformance and Parallel Computing with R

data.table

Extension of ‘data.frame’

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, a fast friendly file reader and parallel file writer. Offers a natural and flexible syntax, for faster development.

1267

HighPerformance and Parallel Computing with R

dclone

Data Cloning and MCMC Tools for Maximum Likelihood Methods

Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods. Sequential and parallel MCMC support for JAGS, WinBUGS and OpenBUGS.

1268

HighPerformance and Parallel Computing with R

doFuture

A Universal Foreach Parallel Adaptor using the Future API of the ‘future’ Package

Provides a ‘%dopar%’ adaptor such that any type of futures can be used as backends for the ‘foreach’ framework.

1269

HighPerformance and Parallel Computing with R

doMC

Foreach Parallel Adaptor for ‘parallel’

Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package.

1270

HighPerformance and Parallel Computing with R

doMPI

Foreach Parallel Adaptor for the Rmpi Package

Provides a parallel backend for the %dopar% function using the Rmpi package.

1271

HighPerformance and Parallel Computing with R

doRedis

Foreach parallel adapter for the rredis package

A Redis parallel backend for the %dopar% function

1272

HighPerformance and Parallel Computing with R

doRNG

Generic Reproducible Parallel Backend for ‘foreach’ Loops

Provides functions to perform reproducible parallel foreach loops, using independent random streams as generated by L’Ecuyer’s combined multiplerecursive generator [L’Ecuyer (1999), doi:10.1287/opre.47.1.159]. It enables to easily convert standard %dopar% loops into fully reproducible loops, independently of the number of workers, the task scheduling strategy, or the chosen parallel environment and associated foreach backend.

1273

HighPerformance and Parallel Computing with R

doSNOW

Foreach Parallel Adaptor for the ‘snow’ Package

Provides a parallel backend for the %dopar% function using the snow package of Tierney, Rossini, Li, and Sevcikova.

1274

HighPerformance and Parallel Computing with R

drake

Data Frames in R for Make

A solution for reproducible code and highperformance computing.

1275

HighPerformance and Parallel Computing with R

ff

memoryefficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory  the effective virtual memory consumption per ff object. ff supports R’s standard atomic data types ‘double’, ‘logical’, ‘raw’ and ‘integer’ and nonstandard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example ‘quad’ allows efficient storage of genomic data as an ‘A’,‘T’,‘G’,‘C’ factor. The unsigned types support ‘circular’ arithmetic. There is also support for closetoatomic types ‘factor’, ‘ordered’, ‘POSIXct’, ‘Date’ and custom closetoatomic types. ff not only has native Csupport for vectors, matrices and arrays with flexible dimorder (major columnorder, major roworder and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have welldefined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/decoding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with ‘permanent’ files as well as creating/removing ‘temporary’ ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, ‘logicals’ and nonstandard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package ‘bit’: chunked looping, fast bit operations and coercions between different objects that can store subscript information (‘bit’, ‘bitwhich’, ff ‘boolean’, ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further highperformance enhancements can be made available upon request.

1276

HighPerformance and Parallel Computing with R

ffbase

Basic Statistical Functions for Package ‘ff’

Extends the out of memory vectors of ‘ff’ with statistical functions and other utilities to ease their usage.

1277

HighPerformance and Parallel Computing with R

flowr

Streamlining Design and Deployment of Complex Workflows

This framework allows you to design and implement complex pipelines, and deploy them on your institution’s computing cluster. This has been built keeping in mind the needs of bioinformatics workflows. However, it is easily extendable to any field where a series of steps (shell commands) are to be executed in a (work)flow.

1278

HighPerformance and Parallel Computing with R

foreach

Provides Foreach Looping Construct for R

Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn’t require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.

1279

HighPerformance and Parallel Computing with R

future

Unified Parallel and Distributed Processing in R for Everyone

The purpose of this package is to provide a lightweight and unified Future API for sequential and parallel processing of R expression via futures. The simplest way to evaluate an expression in parallel is to use ‘x %<% { expression }’ with ‘plan(multiprocess)’. This package implements sequential, multicore, multisession, and cluster futures. With these, R expressions can be evaluated on the local machine, on in parallel a set of local machines, or distributed on a mix of local and remote machines. Extensions to this package implements additional backends for processing futures via compute cluster schedulers etc. Because of its unified API, there is no need to modify code in order switch from sequential on the local machine to, say, distributed processing on a remote compute cluster. Another strength of this package is that global variables and functions are automatically identified and exported as needed, making it straightforward to tweak existing code to make use of futures.

1280

HighPerformance and Parallel Computing with R

future.BatchJobs

A Future API for Parallel and Distributed Processing using BatchJobs

Implementation of the Future API on top of the ‘BatchJobs’ package. This allows you to process futures, as defined by the ‘future’ package, in parallel out of the box, not only on your local machine or adhoc cluster of machines, but also via highperformance compute (‘HPC’) job schedulers such as ‘LSF’, ‘OpenLava’, ‘Slurm’, ‘SGE’, and ‘TORQUE’ / ‘PBS’, e.g. ‘y < future_lapply(files, FUN = process)’.

1281

HighPerformance and Parallel Computing with R

GAMBoost

Generalized linear and additive models by likelihood based boosting

This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting, using penalized Bsplines

1282

HighPerformance and Parallel Computing with R

gcbd

‘GPU’/CPU Benchmarking in DebianBased Systems

‘GPU’/CPU Benchmarking on Debianpackage based systems This package benchmarks performance of a few standard linear algebra operations (such as a matrix product and QR, SVD and LU decompositions) across a number of different ‘BLAS’ libraries as well as a ‘GPU’ implementation. To do so, it takes advantage of the ability to ‘plug and play’ different ‘BLAS’ implementations easily on a Debian and/or Ubuntu system. The current version supports  ‘Reference BLAS’ (‘refblas’) which are unaccelerated as a baseline  Atlas which are tuned but typically configure singlethreaded  Atlas39 which are tuned and configured for multithreaded mode  ‘Goto Blas’ which are accelerated and multithreaded  ‘Intel MKL’ which is a commercial accelerated and multithreaded version. As for ‘GPU’ computing, we use the CRAN package  ‘gputools’ For ‘Goto Blas’, the ‘gotoblas2helper’ script from the ISM in Tokyo can be used. For ‘Intel MKL’ we use the Revolution R packages from Ubuntu 9.10.

1283

HighPerformance and Parallel Computing with R

gmatrix

GPU Computing in R

A general framework for utilizing R to harness the power of NVIDIA GPU’s. The “gmatrix” and “gvector” classes allow for easy management of the separate device and host memory spaces. Numerous numerical operations are implemented for these objects on the GPU. These operations include matrix multiplication, addition, subtraction, the kronecker product, the outer product, comparison operators, logical operators, trigonometric functions, indexing, sorting, random number generation and many more.

1284

HighPerformance and Parallel Computing with R

gpuR

GPU Functions for R Objects

Provides GPU enabled functions for R objects in a simple and approachable manner. New gpu* and vcl* classes have been provided to wrap typical R objects (e.g. vector, matrix), in both host and device spaces, to mirror typical R syntax without the need to know OpenCL.

1285

HighPerformance and Parallel Computing with R

gputools

A Few GPU Enabled Functions

Provides R interfaces to a handful of common functions implemented using the Nvidia CUDA toolkit. Some of the functions require at least GPU Compute Capability 1.3. Thanks to Craig Stark at UC Irvine for donating time on his lab’s Mac.

1286

HighPerformance and Parallel Computing with R

GUIProfiler

Graphical User Interface for Rprof()

Show graphically the results of profiling R functions by tracking their execution time.

1287

HighPerformance and Parallel Computing with R

h2o

R Interface for H2O

R scripting functionality for H2O, the open source math engine for big data that computes parallel distributed machine learning algorithms such as generalized linear models, gradient boosting machines, random forests, and neural networks (deep learning) within various cluster environments.

1288

HighPerformance and Parallel Computing with R

HadoopStreaming

Utilities for using R scripts in Hadoop streaming

Provides a framework for writing map/reduce scripts for use in Hadoop Streaming. Also facilitates operating on data in a streaming fashion, without Hadoop.

1289

HighPerformance and Parallel Computing with R

harvestr

A Parallel Simulation Framework

Functions for easy and reproducible simulation.

1290

HighPerformance and Parallel Computing with R

HistogramTools

Utility Functions for R Histograms

Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.

1291

HighPerformance and Parallel Computing with R

inline

Functions to Inline C, C++, Fortran Function Calls from R

Functionality to dynamically define R functions and S4 methods with inlined C, C++ or Fortran code supporting .C and .Call calling conventions.

1292

HighPerformance and Parallel Computing with R

LaF

Fast Access to Large ASCII Files

Methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF vignette gives an overview of the functionality provided.

1293

HighPerformance and Parallel Computing with R

latentnet

Latent Position and Cluster Models for Statistical Networks

Fit and simulate latent position and cluster models for statistical networks.

1294

HighPerformance and Parallel Computing with R

lga

Tools for linear grouping analysis (LGA)

Tools for linear grouping analysis. Three userlevel functions: gap, rlga and lga.

1295

HighPerformance and Parallel Computing with R

Matching

Multivariate and Propensity Score Matching with Balance Optimization

Provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance has been obtained are also provided.

1296

HighPerformance and Parallel Computing with R

MonetDB.R

Connect MonetDB to R

Allows to pull data from MonetDB into R. Includes a DBI implementation and a dplyr backend.

1297

HighPerformance and Parallel Computing with R

nws

R functions for NetWorkSpaces and Sleigh

Provides coordination and parallel execution facilities, as well as limited crosslanguage data exchange, using the netWorkSpaces server developed by REvolution Computing

1298

HighPerformance and Parallel Computing with R

OpenCL

Interface allowing R to use OpenCL

This package provides an interface to OpenCL, allowing R to leverage computing power of GPUs and other HPC accelerator devices.

1299

HighPerformance and Parallel Computing with R

orloca

Operations Research LOCational Analysis Models

Objects and methods to handle and solve the minsum location problem, also known as FermatWeber problem. The minsum location problem search for a point such that the weighted sum of the distances to the demand points are minimized. See “The FermatWeber location problem revisited” by Brimberg, Mathematical Programming, 1, pg. 7176, 1995. doi:10.1007/BF01592245. General global optimization algorithms are used to solve the problem, along with the adhoc Weiszfeld method, see “Sur le point pour lequel la Somme des distances de n points donnes est minimum”, by Weiszfeld, Tohoku Mathematical Journal, First Series, 43, pg. 355386, 1937.

1300

HighPerformance and Parallel Computing with R

parSim

Parallel Simulation Studies

Perform flexible simulation studies using one or multiple computer cores. The package is set up to be usable on highperformance clusters in addition to being run locally, see examples on https://github.com/SachaEpskamp/parSim.

1301

HighPerformance and Parallel Computing with R

partDSA

Partitioning Using Deletion, Substitution, and Addition Moves

A novel tool for generating a piecewise constant estimation list of increasingly complex predictors based on an intensive and comprehensive search over the entire covariate space.

1302

HighPerformance and Parallel Computing with R

pbapply

Adding Progress Bar to ’*apply’ Functions

A lightweight package that adds progress bar to vectorized R functions (’*apply’). The implementation can easily be added to functions where showing the progress is useful (e.g. bootstrap). The type and style of the progress bar (with percentages or remaining time) can be set through options. Supports several parallel processing backends.

1303

HighPerformance and Parallel Computing with R

pbdBASE

Programming with Big Data Base Wrappers for Distributed Matrices

An interface to and extensions for the ‘PBLAS’ and ‘ScaLAPACK’ numerical libraries. This enables R to utilize distributed linear algebra for codes written in the ‘SPMD’ fashion. This interface is deliberately lowlevel and mimics the style of the native libraries it wraps. For a much higher level way of managing distributed matrices, see the ‘pbdDMAT’ package.

1304

HighPerformance and Parallel Computing with R

pbdDEMO

Programming with Big Data Demonstrations and Examples Using ‘pbdR’ Packages

A set of demos of ‘pbdR’ packages, together with a useful, unifying vignette.

1305

HighPerformance and Parallel Computing with R

pbdDMAT

‘pbdR’ Distributed Matrix Methods

A set of classes for managing distributed matrices, and a collection of methods for computing linear algebra and statistics. Computation is handled mostly by routines from the ‘pbdBASE’ package, which itself relies on the ‘ScaLAPACK’ and ‘PBLAS’ numerical libraries for distributed computing.

1306

HighPerformance and Parallel Computing with R

pbdMPI

Programming with Big Data Interface to MPI

An efficient interface to MPI by utilizing S4 classes and methods with a focus on Single Program/Multiple Data (‘SPMD’) parallel programming style, which is intended for batch parallel execution.

1307

HighPerformance and Parallel Computing with R

pbdNCDF4

Programming with Big Data Interface to Parallel Unidata NetCDF4 Format Data Files

This package adds collective parallel read and write capability to the R package ncdf4 version 1.8. Typical use is as a parallel NetCDF4 file reader in SPMD style programming. Each R process reads and writes its own data in a synchronized collective mode, resulting in faster parallel performance. Performance improvement is conditional on a parallel file system.

1308

HighPerformance and Parallel Computing with R

pbdPROF

Programming with Big Data ― MPI Profiling Tools

MPI profiling tools.

1309

HighPerformance and Parallel Computing with R

pbdSLAP

Programming with Big Data Scalable Linear Algebra Packages

Utilizing scalable linear algebra packages mainly including BLACS, PBLAS, and ScaLAPACK in double precision via pbdMPI based on ScaLAPACK version 2.0.2.

1310

HighPerformance and Parallel Computing with R

peperr

Parallelised Estimation of Prediction Error

Package peperr is designed for prediction error estimation through resampling techniques, possibly accelerated by parallel execution on a compute cluster. Newly developed model fitting routines can be easily incorporated.

1311

HighPerformance and Parallel Computing with R

permGPU

Using GPUs in Statistical Genomics

Can be used to carry out permutation resampling inference in the context of RNA microarray studies.

1312

HighPerformance and Parallel Computing with R

PGICA

Parallel Group ICA Algorithm

A Group ICA Algorithm that can run in parallel on an SGE platform or multicore PCs

1313

HighPerformance and Parallel Computing with R

pls

Partial Least Squares and Principal Component Regression

Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).

1314

HighPerformance and Parallel Computing with R

pmclust

Parallel ModelBased Clustering using ExpectationGatheringMaximization Algorithm for Finite Mixture Gaussian Model

Aims to utilize modelbased clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs pbdMPI to perform a expectationgatheringmaximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through pbdMPI and independent to most MPI applications. See the High Performance Statistical Computing website for more information, documents and examples.

1315

HighPerformance and Parallel Computing with R

profr

An alternative display for profiling information

profr provides an alternative data structure and visual rendering for the profiling information generated by Rprof.

1316

HighPerformance and Parallel Computing with R

proftools

Profile Output Processing Tools for R

Tools for examining Rprof profile output.

1317

HighPerformance and Parallel Computing with R

pvclust

Hierarchical Clustering with PValues via Multiscale Bootstrap Resampling

An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) pvalue as well as BP (bootstrap probability) value for each cluster in a dendrogram.

1318

HighPerformance and Parallel Computing with R

randomForestSRC

Random Forests for Survival, Regression, and Classification (RFSRC)

A unified treatment of Breiman’s random forests for survival, regression and classification problems based on Ishwaran and Kogalur’s random survival forests (RSF) package. Now extended to include multivariate and unsupervised forests. Also includes quantile regression forests for univariate and multivariate training/testing settings. The package runs in both serial and parallel (OpenMP) modes.

1319

HighPerformance and Parallel Computing with R

Rborist

Extensible, Parallelizable Implementation of the Random Forest Algorithm

Scalable decision tree training and prediction.

1320

HighPerformance and Parallel Computing with R

Rcpp

Seamless R and C++ Integration

The ‘Rcpp’ package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of thirdparty libraries. Documentation about ‘Rcpp’ is provided by several vignettes included in this package, via the ‘Rcpp Gallery’ site at http://gallery.rcpp.org, the paper by Eddelbuettel and Francois (2011, doi:10.18637/jss.v040.i08), the book by Eddelbuettel (2013, doi:10.1007/9781461468684) and the paper by Eddelbuettel and Balamuta (2017, doi:10.7287/peerj.preprints.3188v1); see ‘citation(“Rcpp”)’ for details.

1321

HighPerformance and Parallel Computing with R

RcppParallel

Parallel Programming Tools for ‘Rcpp’

High level functions for parallel programming with ‘Rcpp’. For example, the ‘parallelFor()’ function can be used to convert the work of a standard serial “for” loop into a parallel one and the ‘parallelReduce()’ function can be used for accumulating aggregate or other values.

1322

HighPerformance and Parallel Computing with R

Rdsm

Threads Environment for R

Provides a threadstype programming environment for R. The package gives the R programmer the clearer, more concise shared memory world view, and in some cases gives superior performance as well. In addition, it enables parallel processing on very large, outofcore matrices.

1323

HighPerformance and Parallel Computing with R

rgenoud

R Version of GENetic Optimization Using Derivatives

A genetic algorithm plus derivative optimizer.

1324

HighPerformance and Parallel Computing with R

Rhpc

Permits *apply() Style Dispatch for ‘HPC’

Function of apply style using ‘MPI’ provides better ‘HPC’ environment on R. and this package supports long vector, can deal with slightly big data.

1325

HighPerformance and Parallel Computing with R

RhpcBLASctl

Control the Number of Threads on ‘BLAS’

Control the number of threads on ‘BLAS’ (Aka ‘GotoBLAS’, ‘ACML’ and ‘MKL’). and possible to control the number of threads in ‘OpenMP’. get a number of logical cores and physical cores if feasible.

1326

HighPerformance and Parallel Computing with R

RInside

C++ Classes to Embed R in C++ Applications

C++ classes to embed R in C++ applications The ‘RInside’ packages makes it easier to have “R inside” your C++ application by providing a C++ wrapper class providing the R interpreter. As R itself is embedded into your application, a shared library build of R is required. This works on Linux, OS X and even on Windows provided you use the same tools used to build R itself. Numerous examples are provided in the eight subdirectories of the examples/ directory of the installed package: standard, mpi (for parallel computing) qt (showing how to embed ‘RInside’ inside a Qt GUI application), wt (showing how to build a “webapplication” using the Wt toolkit), armadillo (for ‘RInside’ use with ‘RcppArmadillo’) and eigen (for ‘RInside’ use with ‘RcppEigen’). The example use GNUmakefile(s) with GNU extensions, so a GNU make is required (and will use the GNUmakefile automatically). Doxygengenerated documentation of the C++ classes is available at the ‘RInside’ website as well.

1327

HighPerformance and Parallel Computing with R

rJava

LowLevel R to Java Interface

Lowlevel interface to Java VM very much like .C/.Call and friends. Allows creation of objects, calling methods and accessing fields.

1328

HighPerformance and Parallel Computing with R

rlecuyer

R Interface to RNG with Multiple Streams

Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L’Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.

1329

HighPerformance and Parallel Computing with R

Rmpi (core)

Interface (Wrapper) to MPI (MessagePassing Interface)

An interface (wrapper) to MPI APIs. It also provides interactive R manager and worker environment.

1330

HighPerformance and Parallel Computing with R

RProtoBuf

R Interface to the ‘Protocol Buffers’ ‘API’ (Version 2 or 3)

Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal ‘RPC’ protocols and file formats. Additional documentation is available in two included vignettes one of which corresponds to our ‘JSS’ paper (2016, doi:10.18637/jss.v071.i02. Either version 2 or 3 of the ‘Protocol Buffers’ ‘API’ is supported.

1331

HighPerformance and Parallel Computing with R

rredis

“Redis” Key/Value Database Client

R client interface to the “Redis” keyvalue database.

1332

HighPerformance and Parallel Computing with R

rslurm

Submit R Calculations to a Slurm Cluster

Functions that simplify submitting R scripts to a Slurm workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.

1333

HighPerformance and Parallel Computing with R

Sim.DiffProc

Simulation of Diffusion Processes

A package for symbolic and numerical computations on scalar and multivariate systems of stochastic differential equations. It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of these systems in both forms Ito and Stratonovich. Statistical analysis with Parallel MonteCarlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996).

1334

HighPerformance and Parallel Computing with R

snow (core)

Simple Network of Workstations

Support for simple parallel computing in R.

1335

HighPerformance and Parallel Computing with R

snowfall

Easier cluster computing (based on snow)

Usability wrapper around snow for easier development of parallel R programs. This package offers e.g. extended error checks, and additional functions. All functions work in sequential mode, too, if no cluster is present or wished. Package is also designed as connector to the cluster management tool sfCluster, but can also used without it.

1336

HighPerformance and Parallel Computing with R

snowFT

Fault Tolerant Simple Network of Workstations

Extension of the snow package supporting fault tolerant and reproducible applications, as well as supporting easytouse parallel programming  only one function is needed. Dynamic cluster size is also available.

1337

HighPerformance and Parallel Computing with R

speedglm

Fitting Linear and Generalized Linear Models to Large Data Sets

Fitting linear models and generalized linear models to large data sets by updating algorithms.

1338

HighPerformance and Parallel Computing with R

sprint

Simple Parallel R INTerface

SPRINT (Simple Parallel R INTerface) is a parallel framework for R. It provides a High Performance Computing (HPC) harness which allows R scripts to run on HPC clusters. SPRINT contains a library of selected R functions that have been parallelized. Functions are named after the original R function with the added prefix ‘p’, i.e. the parallel version of cor() in SPRINT is called pcor(). Call to the parallel R functions are included directly in standard R scripts. SPRINT contains functions for correlation (pcor), partitioning around medoids (ppam), apply (papply), permutation testing (pmaxT), bootstrapping (pboot), random forest (prandomForest), rank product (pRP) and hamming distance (pstringdistmatrix).

1339

HighPerformance and Parallel Computing with R

sqldf

Manipulate R Data Frames Using SQL

The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. ‘RSQLite’, ‘RH2’, ‘RMySQL’ and ‘RPostgreSQL’ backends are supported.

1340

HighPerformance and Parallel Computing with R

STAR

Spike Train Analysis with R

Functions to analyze neuronal spike trains from a single neuron or from several neurons recorded simultaneously.

1341

HighPerformance and Parallel Computing with R

tm

Text Mining Package

A framework for text mining applications within R.

1342

HighPerformance and Parallel Computing with R

toaster

Big Data inDatabase Analytics that Scales with Teradata Aster Distributed Platform

A consistent set of tools to perform indatabase analytics on Teradata Aster Big Data Discovery Platform. toaster (a.k.a ‘to Aster’) embraces simple 2step approach: compute in Aster  visualize and analyze in R. Its ‘compute’ functions use combination of parallel SQL, SQLMR and SQLGR executing in Aster database  highly scalable parallel and distributed analytical platform. Then ‘create’ functions visualize results with boxplots, scatterplots, histograms, heatmaps, word clouds, maps, networks, or slope graphs. Advanced options such as faceting, coloring, labeling, and others are supported with most visualizations.

1343

HighPerformance and Parallel Computing with R

varSelRF

Variable Selection using Random Forests

Variable selection from random forests using both backwards variable elimination (for the selection of small sets of nonredundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highlycorrelated variables). Main applications in highdimensional data (e.g., microarray data, and other genomics and proteomics applications).

1344

Machine Learning & Statistical Learning

ahaz

Regularization for semiparametric additive hazards regression

Computationally efficient procedures for regularized estimation with the semiparametric additive hazards regression model.

1345

Machine Learning & Statistical Learning

arules

Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat by C. Borgelt.

1346

Machine Learning & Statistical Learning

BayesTree

Bayesian Additive Regression Trees

This is an implementation of BART:Bayesian Additive Regression Trees, by Chipman, George, McCulloch (2010).

1347

Machine Learning & Statistical Learning

biglasso

Extending Lasso Model Fitting to Big Data

Extend lasso and elasticnet model fitting for ultrahighdimensional, multigigabyte data sets that cannot be loaded into memory. It’s much more memory and computationefficient as compared to existing lassofitting packages like ‘glmnet’ and ‘ncvreg’, thus allowing for very powerful big data analysis even with an ordinary laptop.

1348

Machine Learning & Statistical Learning

bigRR

Generalized Ridge Regression (with special advantage for p >> n cases)

The package fits largescale (generalized) ridge regression for various distributions of response. The shrinkage parameters (lambdas) can be prespecified or estimated using an internal update routine (fitting a heteroscedastic effects model, or HEM). It gives possibility to shrink any subset of parameters in the model. It has special computational advantage for the cases when the number of shrinkage parameters exceeds the number of observations. For example, the package is very useful for fitting largescale omics data, such as highthroughput genotype data (genomics), gene expression data (transcriptomics), metabolomics data, etc.

1349

Machine Learning & Statistical Learning

bmrm

Bundle Methods for Regularized Risk Minimization Package

Bundle methods for minimization of convex and nonconvex risk under L1 or L2 regularization. Implements the algorithm proposed by Teo et al. (JMLR 2010) as well as the extension proposed by Do and Artieres (JMLR 2012). The package comes with lot of loss functions for machine learning which make it powerful for big data analysis. The applications includes: structured prediction, linear SVM, multiclass SVM, fbeta optimization, ROC optimization, ordinal regression, quantile regression, epsilon insensitive regression, least mean square, logistic regression, least absolute deviation regression (see package examples), etc… all with L1 and L2 regularization.

1350

Machine Learning & Statistical Learning

Boruta

Wrapper Algorithm for All Relevant Feature Selection

An all relevant feature selection wrapper algorithm. It finds relevant features by comparing original attributes’ importance with importance achievable at random, estimated using their permuted copies.

1351

Machine Learning & Statistical Learning

bst

Gradient Boosting

Functional gradient descent algorithm for a variety of convex and nonconvex loss functions, for both classical and robust regression and classification problems.

1352

Machine Learning & Statistical Learning

C50

C5.0 Decision Trees and RuleBased Models

C5.0 decision trees and rulebased models for pattern recognition.

1353

Machine Learning & Statistical Learning

caret

Classification and Regression Training

Misc functions for training and plotting classification and regression models.

1354

Machine Learning & Statistical Learning

CORElearn

Classification, Regression and Feature Evaluation

A suite of machine learning algorithms written in C++ with the R interface contains several learning techniques for classification and regression. Predictive models include e.g., classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. All predictions obtained with these models can be explained and visualized with the ‘ExplainPrediction’ package. This package is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, and DKM. These methods can be used for feature selection or discretization of numeric attributes. The OrdEval algorithm and its visualization is used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction. Several algorithms support parallel multithreaded execution via OpenMP. The toplevel documentation is reachable through ?CORElearn.

1355

Machine Learning & Statistical Learning

CoxBoost

Cox models by likelihood based boosting for a single survival endpoint or competing risks

This package provides routines for fitting Cox models by likelihood based boosting for a single endpoint or in presence of competing risks

1356

Machine Learning & Statistical Learning

Cubist

Rule And InstanceBased Regression Modeling

Regression modeling using rules with added instancebased corrections.

1357

Machine Learning & Statistical Learning

darch

Package for Deep Architectures and Restricted Boltzmann Machines

The darch package is built on the basis of the code from G. E. Hinton and R. R. Salakhutdinov (available under Matlab Code for deep belief nets). This package is for generating neural networks with many layers (deep architectures) and train them with the method introduced by the publications “A fast learning algorithm for deep belief nets” (G. E. Hinton, S. Osindero, Y. W. Teh (2006) doi:10.1162/neco.2006.18.7.1527) and “Reducing the dimensionality of data with neural networks” (G. E. Hinton, R. R. Salakhutdinov (2006) doi:10.1126/science.1127647). This method includes a pre training with the contrastive divergence method published by G.E Hinton (2002) doi:10.1162/089976602760128018 and a fine tuning with common known training algorithms like backpropagation or conjugate gradients. Additionally, supervised finetuning can be enhanced with maxout and dropout, two recently developed techniques to improve finetuning for deep learning.

1358

Machine Learning & Statistical Learning

deepnet

deep learning toolkit in R

Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.

1359

Machine Learning & Statistical Learning

e1071 (core)

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

1360

Machine Learning & Statistical Learning

earth

Multivariate Adaptive Regression Splines

Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines”. (The term “MARS” is trademarked and thus not used in the name of the package.)

1361

Machine Learning & Statistical Learning

effects

Effect Displays for Linear, Generalized Linear, and Other Models

Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.

1362

Machine Learning & Statistical Learning

elasticnet

ElasticNet for Sparse Estimation and Sparse PCA

This package provides functions for fitting the entire solution path of the ElasticNet and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 200510.

1363

Machine Learning & Statistical Learning

ElemStatLearn

Data Sets, Functions and Examples from the Book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman

Useful when reading the book above mentioned, in the documentation referred to as ‘the book’.

1364

Machine Learning & Statistical Learning

evclass

Evidential DistanceBased Classification

Different evidential distancebased classifiers, which provide outputs in the form of DempsterShafer mass functions. The methods are: the evidential Knearest neighbor rule and the evidential neural network.

1365

Machine Learning & Statistical Learning

evtree

Evolutionary Learning of Globally Optimal Trees

Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. The ‘evtree’ package implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. CPU and memoryintensive tasks are fully computed in C++ while the ‘partykit’ package is leveraged to represent the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions.

1366

Machine Learning & Statistical Learning

FCNN4R

Fast Compressed Neural Networks for R

Provides an interface to kernel routines from the FCNN C++ library. FCNN is based on a completely new Artificial Neural Network representation that offers unmatched efficiency, modularity, and extensibility. FCNN4R provides standard teaching (backpropagation, Rprop, simulated annealing, stochastic gradient) and pruning algorithms (minimum magnitude, Optimal Brain Surgeon), but it is first and foremost an efficient computational engine. Users can easily implement their algorithms by taking advantage of fast gradient computing routines, as well as network reconstruction functionality (removing weights and redundant neurons, reordering inputs, merging networks). Networks can be exported to C functions in order to integrate them into virtually any software solution.

1367

Machine Learning & Statistical Learning

frbs

Fuzzy RuleBased Systems for Classification and Regression Tasks

An implementation of various learning algorithms based on fuzzy rulebased systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IFTHEN rules, to handle reallife problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IFTHEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neurofuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named ‘frbsPMML’, which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XMLbased language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from ‘frbsPMML’. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.

1368

Machine Learning & Statistical Learning

GAMBoost

Generalized linear and additive models by likelihood based boosting

This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting, using penalized Bsplines

1369

Machine Learning & Statistical Learning

gamboostLSS

Boosting Methods for ‘GAMLSS’

Boosting models for fitting generalized additive models for location, shape and scale (‘GAMLSS’) to potentially high dimensional data.

1370

Machine Learning & Statistical Learning

gbm (core)

Generalized Boosted Regression Models

An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, tdistribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart).

1371

Machine Learning & Statistical Learning

ggRandomForests

Visually Exploring Random Forests

Graphic elements for exploring Random Forests using the ‘randomForest’ or ‘randomForestSRC’ package for survival, regression and classification forests and ‘ggplot2’ package plotting.

1372

Machine Learning & Statistical Learning

glmnet

Lasso and ElasticNet Regularized Generalized Linear Models

Extremely efficient procedures for fitting the entire lasso or elasticnet regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model. Two recent additions are the multipleresponse Gaussian, and the grouped multinomial regression. The algorithm uses cyclical coordinate descent in a pathwise fashion, as described in the paper linked to via the URL below.

1373

Machine Learning & Statistical Learning

glmpath

L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model

A pathfollowing algorithm for L1 regularized generalized linear models and Cox proportional hazards model

1374

Machine Learning & Statistical Learning

GMMBoost

Likelihoodbased Boosting for Generalized mixed models

Likelihoodbased Boosting for Generalized mixed models

1375

Machine Learning & Statistical Learning

gmum.r

GMUM Machine Learning Group Package

Direct R interface to Support Vector Machine libraries (‘LIBSVM’ and ‘SVMLight’) and efficient C++ implementations of Growing Neural Gas and models developed by ‘GMUM’ group (Cross Entropy Clustering and 2eSVM).

1376

Machine Learning & Statistical Learning

gradDescent

Gradient Descent for Regression Tasks

An implementation of various learning algorithms based on Gradient Descent for dealing with regression tasks. The variants of gradient descent algorithm are : MiniBatch Gradient Descent (MBGD), which is an optimization to use training data partially to reduce the computation load. Stochastic Gradient Descent (SGD), which is an optimization to use a random data in learning to reduce the computation load drastically. Stochastic Average Gradient (SAG), which is a SGDbased algorithm to minimize stochastic step to average. Momentum Gradient Descent (MGD), which is an optimization to speedup gradient descent learning. Accelerated Gradient Descent (AGD), which is an optimization to accelerate gradient descent learning. Adagrad, which is a gradientdescentbased algorithm that accumulate previous cost to do adaptive learning. Adadelta, which is a gradientdescentbased algorithm that use hessian approximation to do adaptive learning. RMSprop, which is a gradientdescentbased algorithm that combine Adagrad and Adadelta adaptive learning ability. Adam, which is a gradientdescentbased algorithm that mean and variance moment to do adaptive learning.

1377

Machine Learning & Statistical Learning

grplasso

Fitting user specified models with Group Lasso penalty

Fits user specified (GLM) models with Group Lasso penalty

1378

Machine Learning & Statistical Learning

grpreg

Regularization Paths for Regression Models with Grouped Covariates

Efficient algorithms for fitting the regularization path of linear or logistic regression models with grouped penalties. This includes group selection methods such as group lasso, group MCP, and group SCAD as well as bilevel selection methods such as the group exponential lasso, the composite MCP, and the group bridge.

1379

Machine Learning & Statistical Learning

h2o

R Interface for H2O

R scripting functionality for H2O, the open source math engine for big data that computes parallel distributed machine learning algorithms such as generalized linear models, gradient boosting machines, random forests, and neural networks (deep learning) within various cluster environments.

1380

Machine Learning & Statistical Learning

hda

Heteroscedastic Discriminant Analysis

Functions to perform dimensionality reduction for classification if the covariance matrices of the classes are unequal.

1381

Machine Learning & Statistical Learning

hdi

HighDimensional Inference

Implementation of multiple approaches to perform inference in highdimensional models.

1382

Machine Learning & Statistical Learning

hdm

HighDimensional Metrics

Implementation of selected highdimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various lowdimensional causal/ structural parameters are provided which appear in highdimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with nonGaussian errors and for instrumental variable (IV) and treatment effect estimation in a highdimensional setting. Moreover, the methods enable valid postselection inference and rely on a theoretically grounded, datadriven choice of the penalty.

1383

Machine Learning & Statistical Learning

ICEbox

Individual Conditional Expectation Plot Toolbox

Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. ICE plots refine Friedman’s partial dependence plot by graphing the functional relationship between the predicted response and a covariate of interest for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate of interest, suggesting where and to what extent they may exist.

1384

Machine Learning & Statistical Learning

ipred

Improved Predictors

Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.

1385

Machine Learning & Statistical Learning

kernlab (core)

KernelBased Machine Learning Lab

Kernelbased machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

1386

Machine Learning & Statistical Learning

klaR

Classification and visualization

Miscellaneous functions for classification and visualization developed at the Fakultaet Statistik, Technische Universitaet Dortmund

1387

Machine Learning & Statistical Learning

lars

Least Angle Regression, Lasso and Forward Stagewise

Efficient procedures for fitting an entire lasso sequence with the cost of a single least squares fit. Least angle regression and infinitesimal forward stagewise regression are related to the lasso, as described in the paper below.

1388

Machine Learning & Statistical Learning

lasso2

L1 constrained estimation aka ‘lasso’

Routines and documentation for solving regression problems while imposing an L1 constraint on the estimates, based on the algorithm of Osborne et al. (1998)

1389

Machine Learning & Statistical Learning

LiblineaR

Linear Predictive Models Based on the ‘LIBLINEAR’ C/C++ Library

A wrapper around the ‘LIBLINEAR’ C/C++ library for machine learning (available at http://www.csie.ntu.edu.tw/~cjlin/liblinear). ‘LIBLINEAR’ is a simple library for solving largescale regularized linear classification and regression. It currently supports L2regularized classification (such as logistic regression, L2loss linear SVM and L1loss linear SVM) as well as L1regularized classification (such as L2loss linear SVM and logistic regression) and L2regularized support vector regression (with L1 or L2loss). The main features of LiblineaR include multiclass classification (onevsthe rest, and Crammer & Singer method), cross validation for model selection, probability estimates (logistic regression only) or weights for unbalanced data. The estimation of the models is particularly fast as compared to other libraries.

1390

Machine Learning & Statistical Learning

LogicForest

Logic Forest

Two classification ensemble methods based on logic regression models. LogForest uses a bagging approach to construct an ensemble of logic regression models. LBoost uses a combination of boosting and crossvalidation to construct an ensemble of logic regression models. Both methods are used for classification of binary responses based on binary predictors and for identification of important variables and variable interactions predictive of a binary outcome.

1391

Machine Learning & Statistical Learning

LogicReg

Logic Regression

Routines for fitting Logic Regression models.

1392

Machine Learning & Statistical Learning

LTRCtrees

Survival Trees to Fit LeftTruncated and RightCensored and IntervalCensored Survival Data

Recursive partition algorithms designed for fitting survival tree with lefttruncated and right censored (LTRC) data, as well as intervalcensored data. The LTRC trees can also be used to fit survival tree with timevarying covariates.

1393

Machine Learning & Statistical Learning

maptree

Mapping, pruning, and graphing tree models

Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees.

1394

Machine Learning & Statistical Learning

mboost (core)

ModelBased Boosting

Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing componentwise (penalised) least squares estimates or regression trees as baselearners for fitting generalized linear, additive and interaction models to potentially highdimensional data.

1395

Machine Learning & Statistical Learning

mlr

Machine Learning in R

Interface to a large number of classification and regression techniques, including machinereadable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, examplespecific costsensitive learning. Generic resampling, including crossvalidation, bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for single and multiobjective problems. Filter and wrapper methods for feature selection. Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling. Most operations can be parallelized.

1396

Machine Learning & Statistical Learning

MXM

Feature Selection (Including Multiple Solutions) and Bayesian Networks

Feature selection methods for identifying minimal, statisticallyequivalent and equallypredictive feature subsets. Bayesian network algorithms and related functions are also included. The package name ‘MXM’ stands for “Mens eX Machina”, meaning “Mind from the Machine” in Latin. Reference: Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets, Lagani, V. and Athineou, G. and Farcomeni, A. and Tsagris, M. and Tsamardinos, I. (2017). Journal of Statistical Software, 80(7). doi:10.18637/jss.v080.i07.

1397

Machine Learning & Statistical Learning

ncvreg

Regularization Paths for SCAD and MCP Penalized Regression Models

Efficient algorithms for fitting regularization paths for linear or logistic regression models penalized by MCP or SCAD, with optional additional L2 penalty.

1398

Machine Learning & Statistical Learning

nnet (core)

FeedForward Neural Networks and Multinomial LogLinear Models

Software for feedforward neural networks with a single hidden layer, and for multinomial loglinear models.

1399

Machine Learning & Statistical Learning

OneR

One Rule Machine Learning Classification Algorithm with Enhancements

Implements the One Rule (OneR) Machine Learning classification algorithm (Holte, R.C. (1993) doi:10.1023/A:1022631118932) with enhancements for sophisticated handling of numeric data and missing values together with extensive diagnostic functions. It is useful as a baseline for machine learning models and the rules are often helpful heuristics.

1400

Machine Learning & Statistical Learning

opusminer

OPUS Miner Algorithm for Filtered Topk Association Discovery

Provides a simple R interface to the OPUS Miner algorithm (implemented in C++) for finding the topk productive, nonredundant itemsets from transaction data. The OPUS Miner algorithm uses the OPUS search algorithm to efficiently discover the key associations in transaction data, in the form of selfsufficient itemsets, using either leverage or lift. See http://i.giwebb.com/index.php/research/associationdiscovery/ for more information in relation to the OPUS Miner algorithm.

1401

Machine Learning & Statistical Learning

pamr

Pam: prediction analysis for microarrays

Some functions for sample classification in microarrays

1402

Machine Learning & Statistical Learning

party

A Laboratory for Recursive Partytioning

A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed treestructured regression models into a well defined theory of conditional inference procedures. This nonparametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing treestructured regression models is available. The methods are described in Hothorn et al. (2006) doi:10.1198/106186006X133933, Zeileis et al. (2008) doi:10.1198/106186008X319331 and Strobl et al. (2007) doi:10.1186/14712105825.

1403

Machine Learning & Statistical Learning

partykit

A Toolkit for Recursive Partytioning

A toolkit with infrastructure for representing, summarizing, and visualizing treestructured regression and classification models. This unified infrastructure can be used for reading/coercing tree models from different sources (‘rpart’, ‘RWeka’, ‘PMML’) yielding objects that share functionality for print()/plot()/predict() methods. Furthermore, new and improved reimplementations of conditional inference trees (ctree()) and modelbased recursive partitioning (mob()) from the ‘party’ package are provided based on the new infrastructure.

1404

Machine Learning & Statistical Learning

pdp

Partial Dependence Plots

A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.

1405

Machine Learning & Statistical Learning

penalized

L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model

Fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Crossvalidation routines allow optimization of the tuning parameters.

1406

Machine Learning & Statistical Learning

penalizedLDA

Penalized Classification using Fisher’s Linear Discriminant

Implements the penalized LDA proposal of “Witten and Tibshirani (2011), Penalized classification using Fisher’s linear discriminant, to appear in Journal of the Royal Statistical Society, Series B”.

1407

Machine Learning & Statistical Learning

penalizedSVM

Feature Selection SVM using penalty functions

This package provides feature selection SVM using penalty functions. The smoothly clipped absolute deviation (SCAD), ‘L1norm’, ‘Elastic Net’ (‘L1norm’ and ‘L2norm’) and ‘Elastic SCAD’ (SCAD and ‘L2norm’) penalties are availible. The tuning parameters can be founf using either a fixed grid or a interval search.

1408

Machine Learning & Statistical Learning

plotmo

Plot a Model’s Response and Residuals

Plot model surfaces for a wide variety of models using partial dependence plots and other techniques. Also plot model residuals and other information on the model.

1409

Machine Learning & Statistical Learning

quantregForest

Quantile Regression Forests

Quantile Regression Forests is a treebased ensemble method for estimation of conditional quantiles. It is particularly well suited for highdimensional data. Predictor variables of mixed classes can be handled. The package is dependent on the package ‘randomForest’, written by Andy Liaw.

1410

Machine Learning & Statistical Learning

randomForest (core)

Breiman and Cutler’s Random Forests for Classification and Regression

Classification and regression based on a forest of trees using random inputs.

1411

Machine Learning & Statistical Learning

randomForestSRC

Random Forests for Survival, Regression, and Classification (RFSRC)

A unified treatment of Breiman’s random forests for survival, regression and classification problems based on Ishwaran and Kogalur’s random survival forests (RSF) package. Now extended to include multivariate and unsupervised forests. Also includes quantile regression forests for univariate and multivariate training/testing settings. The package runs in both serial and parallel (OpenMP) modes.

1412

Machine Learning & Statistical Learning

ranger

A Fast Implementation of Random Forests

A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genomewide association studies can be analyzed efficiently. In addition to data frames, datasets of class ‘gwaa.data’ (R package ‘GenABEL’) can be directly analyzed.

1413

Machine Learning & Statistical Learning

rattle

Graphical User Interface for Data Science in R

The R Analytic Tool To Learn Easily (Rattle) provides a Gnome (RGtk2) based interface to R functionality for data science. The aim is to provide a simple and intuitive interface that allows a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. All of this with knowing little about R. All R commands are logged and commented through the log tab. Thus they are available to the user as a script file or as an aide for the user to learn R or to copyandpaste directly into R itself. Rattle also exports a number of utility functions and the graphical user interface, invoked as rattle(), does not need to be run to deploy these.

1414

Machine Learning & Statistical Learning

Rborist

Extensible, Parallelizable Implementation of the Random Forest Algorithm

Scalable decision tree training and prediction.

1415

Machine Learning & Statistical Learning

RcppDL

Deep Learning Methods via Rcpp

This package is based on the C++ code from Yusuke Sugomori, which implements basic machine learning methods with many layers (deep learning), including dA (Denoising Autoencoder), SdA (Stacked Denoising Autoencoder), RBM (Restricted Boltzmann machine) and DBN (Deep Belief Nets).

1416

Machine Learning & Statistical Learning

rda

Shrunken Centroids Regularized Discriminant Analysis

Shrunken Centroids Regularized Discriminant Analysis for the classification purpose in high dimensional data.

1417

Machine Learning & Statistical Learning

rdetools

Relevant Dimension Estimation (RDE) in Feature Spaces

The package provides functions for estimating the relevant dimension of a data set in feature spaces, applications to model selection, graphical illustrations and prediction.

1418

Machine Learning & Statistical Learning

REEMtree

Regression Trees with Random Effects for Longitudinal (Panel) Data

This package estimates regression trees with random effects as a way to use data mining techniques to describe longitudinal or panel data.

1419

Machine Learning & Statistical Learning

relaxo

Relaxed Lasso

Relaxed Lasso is a generalisation of the Lasso shrinkage technique for linear regression. Both variable selection and parameter estimation is achieved by regular Lasso, yet both steps do not necessarily use the same penalty parameter. The results include all standard Lasso solutions but allow often for sparser models while having similar or even slightly better predictive performance if many predictor variables are present. The package depends on the LARS package.

1420

Machine Learning & Statistical Learning

rgenoud

R Version of GENetic Optimization Using Derivatives

A genetic algorithm plus derivative optimizer.

1421

Machine Learning & Statistical Learning

rgp

R genetic programming framework

RGP is a simple modular Genetic Programming (GP) system build in pure R. In addition to general GP tasks, the system supports Symbolic Regression by GP through the familiar R model formula interface. GP individuals are represented as R expressions, an (optional) type system enables domainspecific function sets containing functions of diverse domain and range types. A basic set of genetic operators for variation (mutation and crossover) and selection is provided.

1422

Machine Learning & Statistical Learning

RLT

Reinforcement Learning Trees

Random forest with a variety of additional features for regression, classification and survival analysis. The features include: parallel computing with OpenMP, embedded model for selecting the splitting variable (based on Zhu, Zeng & Kosorok, 2015), subject weight, variable weight, tracking subjects used in each tree, etc.

1423

Machine Learning & Statistical Learning

Rmalschains

Continuous Optimization using Memetic Algorithms with Local Search Chains (MALSChains) in R

An implementation of an algorithm family for continuous optimization called memetic algorithms with local search chains (MALSChains). Memetic algorithms are hybridizations of genetic algorithms with local search methods. They are especially suited for continuous optimization.

1424

Machine Learning & Statistical Learning

rminer

Data Mining Classification and Regression Methods

Facilitates the use of data mining algorithms in classification and regression (including time series forecasting) tasks by presenting a short and coherent set of functions. Versions: 1.4.2 new NMAE metric, “xgboost” and “cv.glmnet” models (16 classification and 18 regression models); 1.4.1 new tutorial and more robust version; 1.4  new classification and regression models/algorithms, with a total of 14 classification and 15 regression methods, including: Decision Trees, Neural Networks, Support Vector Machines, Random Forests, Bagging and Boosting; 1.3 and 1.3.1  new classification and regression metrics (improved mmetric function); 1.2  new input importance methods (improved Importance function); 1.0  first version.

1425

Machine Learning & Statistical Learning

rnn

Recurrent Neural Network

Implementation of a Recurrent Neural Network in R.

1426

Machine Learning & Statistical Learning

ROCR

Visualizing the Performance of Scoring Classifiers

ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of tradeoff visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoffparameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different crossvalidation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.

1427

Machine Learning & Statistical Learning

RoughSets

Data Analysis Using Rough Set and Fuzzy Rough Set Theories

Implementations of algorithms for data analysis based on the rough set theory (RST) and the fuzzy rough set theory (FRST). We not only provide implementations for the basic concepts of RST and FRST but also popular algorithms that derive from those theories. The methods included in the package can be divided into several categories based on their functionality: discretization, feature selection, instance selection, rule induction and classification based on nearest neighbors. RST was introduced by Zdzisaw Pawlak in 1982 as a sophisticated mathematical tool to model and process imprecise or incomplete information. By using the indiscernibility relation for objects/instances, RST does not require additional parameters to analyze the data. FRST is an extension of RST. The FRST combines concepts of vagueness and indiscernibility that are expressed with fuzzy sets (as proposed by Zadeh, in 1965) and RST.

1428

Machine Learning & Statistical Learning

rpart (core)

Recursive Partitioning and Regression Trees

Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.

1429

Machine Learning & Statistical Learning

RPMM

Recursively Partitioned Mixture Model

Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a modelbased clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.

1430

Machine Learning & Statistical Learning

RSNNS

Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS)

The Stuttgart Neural Network Simulator (SNNS) is a library containing many standard implementations of neural networks. This package wraps the SNNS functionality to make it available from within R. Using the ‘RSNNS’ lowlevel interface, all of the algorithmic functionality and flexibility of SNNS can be accessed. Furthermore, the package contains a convenient highlevel interface, so that the most common neural network topologies and learning algorithms integrate seamlessly into R.

1431

Machine Learning & Statistical Learning

RWeka

R/Weka Interface

An R interface to Weka (Version 3.9.1). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data preprocessing, classification, regression, clustering, association rules, and visualization. Package ‘RWeka’ contains the interface code, the Weka jar is in a separate package ‘RWekajars’. For more information on Weka see http://www.cs.waikato.ac.nz/ml/weka/.

1432

Machine Learning & Statistical Learning

RXshrink

Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression

Identify and display TRACEs for a specified shrinkage path and determine the extent of shrinkage most likely, under normal distribution theory, to produce an optimal reduction in MSE Risk in estimates of regression (beta) coefficients.

1433

Machine Learning & Statistical Learning

sda

Shrinkage Discriminant Analysis and CAT Score Variable Selection

Provides an efficient framework for highdimensional linear and diagonal discriminant analysis with variable selection. The classifier is trained using JamesSteintype shrinkage estimators and predictor variables are ranked using correlationadjusted tscores (CAT scores). Variable selection error is controlled using false nondiscovery rates or higher criticism.

1434

Machine Learning & Statistical Learning

SIS

Sure Independence Screening

Variable selection techniques are essential tools for model selection and estimation in highdimensional statistical models. Through this publicly available package, we provide a unified environment to carry out variable selection using iterative sure independence screening (SIS) and all of its variants in generalized linear models and the Cox proportional hazards model.

1435

Machine Learning & Statistical Learning

spa

Implements The Sequential Predictions Algorithm

Implements the Sequential Predictions Algorithm

1436

Machine Learning & Statistical Learning

stabs

Stability Selection with Error Control

Resampling procedures to assess the stability of selected variables with additional finite sample error control for highdimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, doi:10.1111/j.14679868.2010.00740.x) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, doi:10.1111/j.14679868.2011.01034.x) are implemented. The package can be combined with arbitrary user specified variable selection approaches.

1437

Machine Learning & Statistical Learning

SuperLearner

Super Learner Prediction

Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner.

1438

Machine Learning & Statistical Learning

svmpath

The SVM Path Algorithm

Computes the entire regularization path for the twoclass svm classifier with essentially the same cost as a single SVM fit.

1439

Machine Learning & Statistical Learning

tensorflow

R Interface to ‘TensorFlow’

Interface to ‘TensorFlow’ https://www.tensorflow.org/, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

1440

Machine Learning & Statistical Learning

tgp

Bayesian Treed Gaussian Process Models

Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP singleindex models. Provides 1d and 2d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgpclass output. Sensitivity analysis and multiresolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivativefree optimization of noisy blackbox functions.

1441

Machine Learning & Statistical Learning

tree

Classification and Regression Trees

Classification and regression trees.

1442

Machine Learning & Statistical Learning

varSelRF

Variable Selection using Random Forests

Variable selection from random forests using both backwards variable elimination (for the selection of small sets of nonredundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highlycorrelated variables). Main applications in highdimensional data (e.g., microarray data, and other genomics and proteomics applications).

1443

Machine Learning & Statistical Learning

vcrpart

TreeBased Varying Coefficient Regression for Generalized Linear and Ordinal Mixed Models

Recursive partitioning for varying coefficient generalized linear models and ordinal linear mixed models. Special features are coefficientwise partitioning, nonvarying coefficients and partitioning of timevarying variables in longitudinal regression.

1444

Machine Learning & Statistical Learning

wsrf

Weighted Subspace Random Forest for Classification

A parallel implementation of Weighted Subspace Random Forest. The Weighted Subspace Random Forest algorithm was proposed in the International Journal of Data Warehousing and Mining by Baoxun Xu, Joshua Zhexue Huang, Graham Williams, Qiang Wang, and Yunming Ye (2012) doi:10.4018/jdwm.2012040103. The algorithm can classify very highdimensional data with random forests built using small subspaces. A novel variable weighting method is used for variable subspace selection in place of the traditional random variable sampling.This new approach is particularly useful in building models from highdimensional data.

1445

Machine Learning & Statistical Learning

xgboost

Extreme Gradient Boosting

Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) doi:10.1145/2939672.2939785. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.

1446

Medical Image Analysis

adaptsmoFMRI

Adaptive Smoothing of FMRI Data

This package contains R functions for estimating the blood oxygenation level dependent (BOLD) effect by using functional Magnetic Resonance Imaging (fMRI) data, based on adaptive Gauss Markov random fields, for real as well as simulated data. The implemented simulations make use of efficient Markov Chain Monte Carlo methods.

1447

Medical Image Analysis

adimpro (core)

Adaptive Smoothing of Digital Images

Implements tools for manipulation of digital images and the Propagation Separation approach by Polzehl and Spokoiny (2006) doi:10.1007/s0044000504641 for smoothing digital images, see Polzehl and Tabelow (2007) doi:10.18637/jss.v019.i01.

1448

Medical Image Analysis

AnalyzeFMRI (core)

Functions for analysis of fMRI datasets stored in the ANALYZE or NIFTI format

Functions for I/O, visualisation and analysis of functional Magnetic Resonance Imaging (fMRI) datasets stored in the ANALYZE or NIFTI format.

1449

Medical Image Analysis

arf3DS4 (core)

Activated Region Fitting, fMRI data analysis (3D)

Activated Region Fitting (ARF) is an analysis method for fMRI data.

1450

Medical Image Analysis

bayesImageS

Bayesian Methods for Image Segmentation using a Potts Model

Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior. Latent labels are sampled using chequerboard updating or SwendsenWang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, approximate Bayesian computation (ABCMCMC and ABCSMC), and Bayesian indirect likelihood (BIL).

1451

Medical Image Analysis

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

1452

Medical Image Analysis

brainR

Helper Functions to ‘misc3d’ and ‘rgl’ Packages for Brain Imaging

This includes functions for creating 3D and 4D images using ‘WebGL’, ‘rgl’, and ‘JavaScript’ commands. This package relies on the X toolkit (‘XTK’, https://github.com/xtk/X#readme).

1453

Medical Image Analysis

brainwaver

Basic wavelet analysis of multivariate time series with a visualisation and parametrisation using graph theory

This package computes the correlation matrix for each scale of a wavelet decomposition, namely the one performed by the R package waveslim (Whitcher, 2000). An hypothesis test is applied to each entry of one matrix in order to construct an adjacency matrix of a graph. The graph obtained is finally analysed using the smallworld theory (Watts and Strogatz, 1998) and using the computation of efficiency (Latora, 2001), tested using simulated attacks. The brainwaver project is complementary to the camba project for braindata preprocessing. A collection of scripts (with a makefile) is avalaible to download along with the brainwaver package, see information on the webpage mentioned below.

1454

Medical Image Analysis

cudaBayesreg

CUDA Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

Compute Unified Device Architecture (CUDA) is a software platform for massively parallel highperformance computing on NVIDIA GPUs. This package provides a CUDA implementation of a Bayesian multilevel model for the analysis of brain fMRI data. A fMRI data set consists of time series of volume data in 4D space. Typically, volumes are collected as slices of 64 x 64 voxels. Analysis of fMRI data often relies on fitting linear regression models at each voxel of the brain. The volume of the data to be processed, and the type of statistical analysis to perform in fMRI analysis, call for highperformance computing strategies. In this package, the CUDA programming model uses a separate thread for fitting a linear regression model at each voxel in parallel. The global statistical model implements a Gibbs Sampler for hierarchical linear models with a normal prior. This model has been proposed by Rossi, Allenby and McCulloch in ‘Bayesian Statistics and Marketing’, Chapter 3, and is referred to as ‘rhierLinearModel’ in the Rpackage bayesm. A notebook equipped with a NVIDIA ‘GeForce 8400M GS’ card having Compute Capability 1.1 has been used in the tests. The data sets used in the package’s examples are available in the separate package cudaBayesregData.

1455

Medical Image Analysis

DATforDCEMRI (core)

Deconvolution Analysis Tool for Dynamic Contrast Enhanced MRI

This package performs voxelwise deconvolution analysis of DCEMRI contrast agent concentration versus time data and generates the Impulse Response Function, which can be used to approximate commonly utilized kinetic parameters such as Ktrans and ve. An interactive advanced voxel diagnosis tool (AVDT) is also provided to facilitate easy navigation of voxelwise data.

1456

Medical Image Analysis

dcemriS4 (core)

A Package for Image Analysis of DCEMRI (S4 Implementation)

A collection of routines and documentation that allows one to perform voxelwise quantitative analysis of dynamic contrastenhanced MRI (DECMRI) and diffusionweighted imaging (DWI) data, with emphasis on oncology applications.

1457

Medical Image Analysis

divest (core)

Get Images Out of DICOM Format Quickly

Provides tools to convert DICOMformat files to NIfTI1 format.

1458

Medical Image Analysis

dpmixsim (core)

Dirichlet Process Mixture model simulation for clustering and image segmentation

The package implements a Dirichlet Process Mixture (DPM) model for clustering and image segmentation. The DPM model is a Bayesian nonparametric methodology that relies on MCMC simulations for exploring mixture models with an unknown number of components. The code implements conjugate models with normal structure (conjugate normalnormal DP mixture model). The package’s applications are oriented towards the classification of magnetic resonance images according to tissue type or region of interest.

1459

Medical Image Analysis

dti (core)

Analysis of Diffusion Weighted Imaging (DWI) Data

Diffusion Weighted Imaging (DWI) is a Magnetic Resonance Imaging modality, that measures diffusion of water in tissues like the human brain. The package contains Rfunctions to process diffusionweighted data. The functionality includes diffusion tensor imaging (DTI), diffusion kurtosis imaging (DKI), modeling for high angular resolution diffusion weighted imaging (HARDI) using Qballreconstruction and tensor mixture models, several methods for structural adaptive smoothing including POAS and msPOAS, and a streamline fiber tracking for tensor and tensor mixture models. The package provides functionality to manipulate and visualize results in 2D and 3D.

1460

Medical Image Analysis

edfReader (core)

Reading EDF(+) and BDF(+) Files

Reads European Data Format files EDF and EDF+, see http://www.edfplus.info, BioSemi Data Format files BDF, see http://www.biosemi.com/faq/file_format.htm, and BDF+ files, see http://www.teuniz.net/edfbrowser/bdfplus%20format%20description.html. The files are read in two steps: first the header is read and then the signals (using the header object as a parameter).

1461

Medical Image Analysis

eegkit (core)

Toolkit for Electroencephalography Data

Analysis and visualization tools for electroencephalography (EEG) data. Includes functions for plotting (a) EEG caps, (b) single and multichannel EEG time courses, and (c) EEG spatial maps. Also includes smoothing and Independent Component Analysis functions for EEG data analysis, and a function for simulating eventrelated potential EEG data.

1462

Medical Image Analysis

fmri (core)

Analysis of fMRI Experiments

Contains Rfunctions to perform an fMRI analysis as described in Tabelow et al. (2006) doi:10.1016/j.neuroimage.2006.06.029, Polzehl et al. (2010) doi:10.1016/j.neuroimage.2010.04.241, Tabelow and Polzehl (2011) doi:10.18637/jss.v044.i11.

1463

Medical Image Analysis

fslr

Wrapper Functions for FSL (‘FMRIB’ Software Library) from Functional MRI of the Brain (‘FMRIB’)

Wrapper functions that interface with ‘FSL’ http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/, a powerful and commonlyused ‘neuroimaging’ software, using system commands. The goal is to be able to interface with ‘FSL’ completely in R, where you pass R objects of class ‘nifti’, implemented by package ‘oro.nifti’, and the function executes an ‘FSL’ command and returns an R object of class ‘nifti’ if desired.

1464

Medical Image Analysis

gdimap (core)

Generalized Diffusion Magnetic Resonance Imaging

Diffusion anisotropy has been used to characterize white matter neuronal pathways in the human brain, and infer global connectivity in the central nervous system. The package implements algorithms to estimate and visualize the orientation of neuronal pathways in modelfree methods (qspace imaging methods). For estimating fibre orientations two methods have been implemented. One method implements fibre orientation detection through local maxima extraction. A second more robust method is based on directional statistical clustering of ODF voxel data. Fibre orientations in multiple fibre voxels are estimated using a mixture of von MisesFisher (vMF) distributions. This statistical estimation procedure is used to resolve crossing fibre configurations. Reconstruction of orientation distribution function (ODF) profiles may be performed using the standard generalized qsampling imaging (GQI) approach, Garyfallidis’ GQI (GQI2) approach, or Aganj’s variant of the Qball imaging (CSAQBI) approach. Procedures for the visualization of RGBmaps, linemaps and glyphmaps of real diffusion magnetic resonance imaging (dMRI) datasets are included in the package.

1465

Medical Image Analysis

KATforDCEMRI (core)

Kinetic analysis and visualization of DCEMRI data

Package for kinetic analysis of longitudinal voxelwise Dynamic Contrast Enhanced MRI data. Includes tools for visualization and exploration of voxelwise parametric maps.

1466

Medical Image Analysis

mmand (core)

Mathematical Morphology in Any Number of Dimensions

Provides tools for performing mathematical morphology operations, such as erosion and dilation, on data of arbitrary dimensionality. Can also be used for finding connected components, resampling, filtering, smoothing and other image processingstyle operations.

1467

Medical Image Analysis

Morpho (core)

Calculations and Visualisations Related to Geometric Morphometrics

A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semilandmarks and semiautomated surface landmark placement.

1468

Medical Image Analysis

mritc (core)

MRI Tissue Classification

Various methods for MRI tissue classification.

1469

Medical Image Analysis

neuroim (core)

Data Structures and Handling for Neuroimaging Data

A collection of data structures that represent volumetric brain imaging data. The focus is on basic data handling for 3D and 4D neuroimaging data. In addition, there are function to read and write NIFTI files and limited support for reading AFNI files.

1470

Medical Image Analysis

neuRosim (core)

Functions to Generate fMRI Data Including Activated Data, Noise Data and Resting State Data

The package allows users to generate fMRI time series or 4D data. Some highlevel functions are created for fast data generation with only a few arguments and a diversity of functions to define activation and noise. For more advanced users it is possible to use the lowlevel functions and manipulate the arguments.

1471

Medical Image Analysis

occ (core)

Estimates PET neuroreceptor occupancies

This package provides a generic function for estimating positron emission tomography (PET) neuroreceptor occupancies from the total volumes of distribution of a set of regions of interest. Fittings methods include the simple ‘reference region’ and ‘ordinary least squares’ (sometimes known as occupancy plot) methods, as well as the more efficient ‘restricted maximum likelihood estimation’.

1472

Medical Image Analysis

oro.dicom (core)

Rigorous  DICOM Input / Output

Data input/output functions for data that conform to the Digital Imaging and Communications in Medicine (DICOM) standard, part of the Rigorous Analytics bundle.

1473

Medical Image Analysis

oro.nifti (core)

Rigorous  NIfTI + ANALYZE + AFNI : Input / Output

Functions for the input/output and visualization of medical imaging data that follow either the ANALYZE, NIfTI or AFNI formats. This package is part of the Rigorous Analytics bundle.

1474

Medical Image Analysis

PET

Simulation and Reconstruction of PET Images

This package implements different analytic/direct and iterative reconstruction methods of Peter Toft. It also offer the possibility to simulate PET data.

1475

Medical Image Analysis

PTAk

Principal Tensor Analysis on k Modes

A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting nonidentity metrics and penalisations. 2way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tuckern) and PARAFAC/CANDECOMP with these extensions.

1476

Medical Image Analysis

RNifti (core)

Fast R and C++ Access to NIfTI Images

Provides very fast access to images stored in the NIfTI1 file format http://www.nitrc.org/docman/view.php/26/64/nifti1.h, with seamless synchronisation between compiled C and interpreted R code. Not to be confused with ‘RNiftyReg’, which provides tools for image registration.

1477

Medical Image Analysis

RNiftyReg (core)

Image Registration Using the ‘NiftyReg’ Library

Provides an ‘R’ interface to the ‘NiftyReg’ image registration tools http://sourceforge.net/projects/niftyreg/. Linear and nonlinear registration are supported, in two and three dimensions.

1478

Medical Image Analysis

Rvcg (core)

Manipulations of Triangular Meshes Based on the ‘VCGLIB’ API

Operations on triangular meshes based on ‘VCGLIB’. This package integrates nicely with the Rpackage ‘rgl’ to render the meshes processed by ‘Rvcg’. The Visualization and Computer Graphics Library (VCG for short) is an open source portable C++ templated library for manipulation, processing and displaying with OpenGL of triangle and tetrahedral meshes. The library, composed by more than 100k lines of code, is released under the GPL license, and it is the base of most of the software tools of the Visual Computing Lab of the Italian National Research Council Institute ISTI http://vcg.isti.cnr.it, like ‘metro’ and ‘MeshLab’. The ‘VCGLIB’ source is pulled from trunk https://github.com/cnristivclab/vcglib and patched to work with options determined by the configure script as well as to work with the header files included by ‘RcppEigen’.

1479

Medical Image Analysis

tractor.base (core)

Read, Manipulate and Visualise Magnetic Resonance Images

Functions for working with magnetic resonance images. Analyze, NIfTI1, NIfTI2 and MGH format images can be read and written; DICOM files can only be read.

1480

Medical Image Analysis

waveslim

Basic wavelet routines for one, two and threedimensional signal processing

Basic wavelet routines for time series (1D), image (2D) and array (3D) analysis. The code provided here is based on wavelet methodology developed in Percival and Walden (2000); Gencay, Selcuk and Whitcher (2001); the dualtree complex wavelet transform (DTCWT) from Kingsbury (1999, 2001) as implemented by Selesnick; and Hilbert wavelet pairs (Selesnick 2001, 2002). All figures in chapters 47 of GSW (2001) are reproducible using this package and R code available at the book website(s) below.

1481

MetaAnalysis

altmeta

Alternative MetaAnalysis Methods

Provides alternative statistical methods for metaanalysis, including new heterogeneity tests and measures that are robust to outliers.

1482

MetaAnalysis

bamdit

Bayesian MetaAnalysis of Diagnostic Test Data

Functions for Bayesian metaanalysis of diagnostic test data which are based on a scale mixtures bivariate randomeffects model.

1483

MetaAnalysis

bayesmeta

Bayesian RandomEffects MetaAnalysis

A collection of functions allowing to derive the posterior distribution of the two parameters in a randomeffects metaanalysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive pvalues, etc.

1484

MetaAnalysis

bmeta

Bayesian MetaAnalysis and MetaRegression

Provides a collection of functions for conducting metaanalyses under Bayesian context in R. The package includes functions for computing various effect size or outcome measures (e.g. odds ratios, mean difference and incidence rate ratio) for different types of data based on MCMC simulations. Users are allowed to fit fixed and randomeffects models with different priors to the data. Metaregression can be carried out if effects of additional covariates are observed. Furthermore, the package provides functions for creating posterior distribution plots and forest plot to display main model output. Traceplots and some other diagnostic plots are also available for assessing model fit and performance.

1485

MetaAnalysis

bspmma

bspmma: Bayesian Semiparametric Models for MetaAnalysis

Some functions for nonparametric and semiparametric Bayesian models for random effects metaanalysis

1486

MetaAnalysis

CAMAN

Finite Mixture Models and MetaAnalysis Tools  Based on C.A.MAN

Tools for the analysis of finite semiparametric mixtures. These are useful when data is heterogeneous, e.g. in pharmacokinetics or metaanalysis. The NPMLE and VEM algorithms (flexible support size) and EM algorithms (fixed support size) are provided for univariate and bivariate data.

1487

MetaAnalysis

CIAAWconsensus

Isotope Ratio MetaAnalysis

Calculation of consensus values for atomic weights, isotope amount ratios, and isotopic abundances with the associated uncertainties using multivariate metaregression approach for consensus building.

1488

MetaAnalysis

clubSandwich

ClusterRobust (Sandwich) Variance Estimators with SmallSample Corrections

Provides several clusterrobust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the biasreduced linearization estimator introduced by Bell and McCaffrey (2002) http://www.statcan.gc.ca/pub/12001x/2002002/article/9058eng.pdf and developed further by Pustejovsky and Tipton (2017) doi:10.1080/07350015.2016.1247004. The package includes functions for estimating the variance covariance matrix and for testing single and multiplecontrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddlepoint corrections. Tests of multiplecontrast hypotheses use an approximation to Hotelling’s Tsquared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), ivreg (from package ‘AER’), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and rma.mv() (from ‘metafor’).

1489

MetaAnalysis

compute.es

Compute Effect Sizes

This package contains several functions for calculating the most widely used effect sizes (ES), along with their variances, confidence intervals and pvalues. The output includes ES’s of d (mean difference), g (unbiased estimate of d), r (correlation coefficient), z’ (Fisher’s z), and OR (odds ratio and log odds ratio). In addition, NNT (number needed to treat), U3, CLES (Common Language Effect Size) and Cliff’s Delta are computed. This package uses recommended formulas as described in The Handbook of Research Synthesis and MetaAnalysis (Cooper, Hedges, & Valentine, 2009).

1490

MetaAnalysis

ConfoundedMeta

Sensitivity Analyses for Unmeasured Confounding in MetaAnalyses

Conducts sensitivity analyses for unmeasured confounding in randomeffects metaanalysis per Mathur & VanderWeele (in preparation). Given output from a randomeffects metaanalysis with a relative risk outcome, computes point estimates and inference for: (1) the proportion of studies with true causal effect sizes more extreme than a specified threshold of scientific significance; and (2) the minimum bias factor and confounding strength required to reduce to less than a specified threshold the proportion of studies with true effect sizes of scientifically significant size. Creates plots and tables for visualizing these metrics across a range of bias values. Provides tools to easily scrape studylevel data from a published forest plot or summary table to obtain the needed estimates when these are not reported.

1491

MetaAnalysis

CopulaREMADA

Copula Mixed Effect Models for Bivariate and Trivariate MetaAnalysis of Diagnostic Test Accuracy Studies

It has functions to implement the copula mixed models for bivariate and trivariate metaanalysis of diagnostic test accuracy studies.

1492

MetaAnalysis

CPBayes

Bayesian Meta Analysis for Studying CrossPhenotype Genetic Associations

A Bayesian metaanalysis method for studying crossphenotype genetic associations. It uses summarylevel data across multiple phenotypes to simultaneously measure the evidence of aggregatelevel pleiotropic association and estimate an optimal subset of traits associated with the risk locus. CPBayes is based on a spike and slab prior and is implemented by Markov chain Monte Carlo technique Gibbs sampling.

1493

MetaAnalysis

CRTSize

Sample Size Estimation Functions for Cluster Randomized Trials

Sample size estimation in cluster (group) randomized trials. Contains traditional powerbased methods, empirical smoothing (Rotondi and Donner, 2009), and updated metaanalysis techniques (Rotondi and Donner, 2012).

1494

MetaAnalysis

dosresmeta

Multivariate DoseResponse MetaAnalysis

Estimates doseresponse relations from summarized doseresponse data and to combines them according to principles of (multivariate) randomeffects models.

1495

MetaAnalysis

EasyStrata

Evaluation of stratified genomewide association metaanalysis results

This is a pipelining tool that facilitates evaluation and visualisation of stratified genomewide association metaanalyses (GWAMAs) results data. It provides (i) statistical methods to test and to account for betweenstrata difference and to clump genomewide results into independent loci and (ii) extended graphical features (e.g., Manhattan, Miami and QQ plots) tailored for stratified GWAMA results.

1496

MetaAnalysis

ecoreg

Ecological Regression using Aggregate and Individual Data

Estimating individuallevel covariateoutcome associations using aggregate data (“ecological inference”) or a combination of aggregate and individuallevel data (“hierarchical related regression”).

1497

MetaAnalysis

effsize

Efficient Effect Size Computation

A collection of functions to compute the standardized effect sizes for experiments (Cohen d, Hedges g, Cliff delta, VarghaDelaney A). The computation algorithms have been optimized to allow efficient computation even with very large data sets.

1498

MetaAnalysis

epiR

Tools for the Analysis of Epidemiological Data

Tools for the analysis of epidemiological data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, and computing confidence intervals around incidence risk and incidence rate estimates. Miscellaneous functions for use in metaanalysis, diagnostic test interpretation, and sample size calculations.

1499

MetaAnalysis

esc

Effect Size Computation for Meta Analysis

Implementation of the webbased ‘Practical MetaAnalysis Effect Size Calculator’ from David B. Wilson (http://www.campbellcollaboration.org/escalc/html/EffectSizeCalculatorHome.php) in R. Based on the input, the effect size can be returned as standardized mean difference, Cohen’s f, Hedges’ g, Pearson’s r or Fisher’s transformation z, odds ratio or log odds, or eta squared effect size.

1500

MetaAnalysis

etma

Epistasis Test in MetaAnalysis

Traditional metaregression based method has been developed for using metaanalysis data, but it faced the challenge of inconsistent estimates. This package purpose a new statistical method to detect epistasis using incomplete information summary, and have proven it not only successfully let consistency of evidence, but also increase the power compared with traditional method (Detailed tutorial is shown in website).

1501

MetaAnalysis

exactmeta

Exact fixed effect meta analysis

Perform exact fixed effect meta analysis for rare events data without the need of artificial continuity correction.

1502

MetaAnalysis

extfunnel

Additional Funnel Plot Augmentations

This is a package containing the function extfunnel() which produces a funnel plot including additional augmentations such as statistical significance contours and heterogeneity contours.

1503

MetaAnalysis

forestmodel

Forest Plots from Regression Models

Produces forest plots using ‘ggplot2’ from models produced by functions such as stats::lm(), stats::glm() and survival::coxph().

1504

MetaAnalysis

forestplot

Advanced Forest Plot Using ‘grid’ Graphics

A forest plot that allows for multiple confidence intervals per row, custom fonts for each text element, custom confidence intervals, text mixed with expressions, and more. The aim is to extend the use of forest plots beyond metaanalyses. This is a more general version of the original ‘rmeta’ package’s forestplot() function and relies heavily on the ‘grid’ package.

1505

MetaAnalysis

gap

Genetic Analysis Package

It is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both populationbased and familybased designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates.

1506

MetaAnalysis

gemtc

Network MetaAnalysis Using Bayesian Methods

Network metaanalyses (mixed treatment comparisons) in the Bayesian framework using JAGS. Includes methods to assess heterogeneity and inconsistency, and a number of standard visualizations.

1507

MetaAnalysis

getmstatistic

Quantifying Systematic Heterogeneity in MetaAnalysis

Quantifying systematic heterogeneity in metaanalysis using R. The M statistic aggregates heterogeneity information across multiple variants to, identify systematic heterogeneity patterns and their direction of effect in metaanalysis. It’s primary use is to identify outlier studies, which either show “null” effects or consistently show stronger or weaker genetic effects than average across, the panel of variants examined in a GWAS metaanalysis. In contrast to conventional heterogeneity metrics (Qstatistic, Isquared and tausquared) which measure random heterogeneity at individual variants, M measures systematic (nonrandom) heterogeneity across multiple independently associated variants. Systematic heterogeneity can arise in a metaanalysis due to differences in the study characteristics of participating studies. Some of the differences may include: ancestry, allele frequencies, phenotype definition, ageofdisease onset, familyhistory, gender, linkage disequilibrium and quality control thresholds. See https://magosil86.github.io/getmstatistic/ for statistical statistical theory, documentation and examples.

1508

MetaAnalysis

gmeta

MetaAnalysis via a Unified Framework of Confidence Distribution

An implementation of an allinone function for a wide range of metaanalysis problems. It contains three functions. The gmeta() function unifies all standard metaanalysis methods and also several newly developed ones under a framework of combining confidence distributions (CDs). Specifically, the package can perform classical pvalue combination methods (such as methods of Fisher, Stouffer, Tippett, etc.), fit metaanalysis fixedeffect and randomeffects models, and synthesizes 2x2 tables. Furthermore, it can perform robust metaanalysis, which provides protection against modelmisspecifications, and limits the impact of any unknown outlying studies. In addition, the package implements two exact metaanalysis methods from synthesizing 2x2 tables with rare events (e.g., zero total event). The np.gmeta() function summarizes information obtained from multiple studies and makes inference for studylevel parameters with no distributional assumption. Specifically, it can construct confidence intervals for unknown, fixed studylevel parameters via confidence distribution. Furthermore, it can perform estimation via asymptotic confidence distribution whether tie or near tie condition exist or not. The plot.gmeta() function to visualize individual and combined CDs through extended forest plots is also available. Compared to version 2.26, version 2.30 contains a new function np.gmeta().

1509

MetaAnalysis

hetmeta

Heterogeneity Measures in MetaAnalysis

Assess the presence of statistical heterogeneity and quantify its impact in the context of metaanalysis. It includes test for heterogeneity as well as other statistical measures (R_b, I^2, R_I).

1510

MetaAnalysis

HSROC

MetaAnalysis of Diagnostic Test Accuracy when Reference Test is Imperfect

Implements a model for joint metaanalysis of sensitivity and specificity of the diagnostic test under evaluation, while taking into account the possibly imperfect sensitivity and specificity of the reference test. This hierarchical model accounts for both within and between study variability. Estimation is carried out using a Bayesian approach, implemented via a Gibbs sampler. The model can be applied in situations where more than one reference test is used in the selected studies.

1511

MetaAnalysis

ipdmeta

Tools for subgroup analyses with multiple trial data using aggregate statistics

This package provides functions to estimate an IPD linear mixed effects model for a continuous outcome and any categorical covariate from study summary statistics. There are also functions for estimating the power of a treatmentcovariate interaction test in an individual patient data metaanalysis from aggregate data.

1512

MetaAnalysis

joint.Cox

Penalized Likelihood Estimation and Dynamic Prediction under the Joint FrailtyCopula Models Between Tumour Progression and Death for MetaAnalysis

Perform the Cox regression and dynamic prediction methods under the joint frailtycopula model between tumour progression and death for metaanalysis. A penalized likelihood is employed for estimating model parameters, where the baseline hazard functions are approximated by smoothing splines. The methods are applicable for metaanalytic data combining several studies. The methods can analyze data having information on both terminal event time (e.g., timetodeath) and nonterminal event time (e.g., timetotumour progression). See Emura et al. (2015) doi:10.1177/0962280215604510 and Emura et al. (2017) doi:10.1177/0962280216688032 for details. Survival data from ovarian cancer patients are also available.

1513

MetaAnalysis

MAc

MetaAnalysis with Correlations

This is an integrated metaanalysis package for conducting a correlational research synthesis. One of the unique features of this package is in its integration of userfriendly functions to facilitate statistical analyses at each stage in a metaanalysis with correlations. It uses recommended procedures as described in The Handbook of Research Synthesis and MetaAnalysis (Cooper, Hedges, & Valentine, 2009).

1514

MetaAnalysis

MAd

MetaAnalysis with Mean Differences

A collection of functions for conducting a metaanalysis with mean differences data. It uses recommended procedures as described in The Handbook of Research Synthesis and MetaAnalysis (Cooper, Hedges, & Valentine, 2009).

1515

MetaAnalysis

mada

MetaAnalysis of Diagnostic Accuracy

Provides functions for diagnostic metaanalysis. Next to basic analysis and visualization the bivariate Model of Reitsma et al. (2005) that is equivalent to the HSROC of Rutter & Gatsonis (2001) can be fitted. A new approach based to diagnostic metaanalysis of Holling et al. (2012) is also available. Standard methods like summary, plot and so on are provided.

1516

MetaAnalysis

MAVIS

Meta Analysis via Shiny

Interactive shiny application for running a metaanalysis, provides support for both random effects and fixed effects models with the ‘metafor’ package. Additional support is included for calculating effect sizes plus support for single case designs, graphical output, and detecting publication bias.

1517

MetaAnalysis

meta (core)

General Package for MetaAnalysis

Userfriendly general package providing standard methods for metaanalysis and supporting Schwarzer, Carpenter, and Rucker doi:10.1007/9783319214160, “MetaAnalysis with R” (2015):  fixed effect and random effects metaanalysis;  several plots (forest, funnel, Galbraith / radial, L’Abbe, Baujat, bubble);  statistical tests and trimandfill method to evaluate bias in metaanalysis;  import data from ‘RevMan 5’;  prediction interval, HartungKnapp and PauleMandel method for random effects model;  cumulative metaanalysis and leaveoneout metaanalysis;  metaregression (if R package ‘metafor’ is installed);  generalised linear mixed models (if R packages ‘metafor’, ‘lme4’, ‘numDeriv’, and ‘BiasedUrn’ are installed).

1518

MetaAnalysis

meta4diag

MetaAnalysis for Diagnostic Test Studies

Bayesian inference analysis for bivariate metaanalysis of diagnostic test studies using integrated nested Laplace approximation with INLA. A purpose built graphic user interface is available. The installation of R package INLA is compulsory for successful usage. The INLA package can be obtained from http://www.rinla.org. We recommend the testing version, which can be downloaded by running: install.packages(“INLA”, repos=“http://www.math.ntnu.no/inla/R/testing”).

1519

MetaAnalysis

MetaAnalyser

An Interactive Visualisation of MetaAnalysis as a Physical Weighing Machine

An interactive application to visualise metaanalysis data as a physical weighing machine. The interface is based on the Shiny web application framework, though can be run locally and with the user’s own data.

1520

MetaAnalysis

MetABEL

Metaanalysis of genomewide SNP association results

A package for metaanalysis of genomewide association scans between quantitative or binary traits and SNPs

1521

MetaAnalysis

metaBMA

Bayesian Model Averaging for Random and Fixed Effects MetaAnalysis

Computes the posterior model probabilities for four metaanalysis models (null model vs. alternative model assuming either fixed or randomeffects, respectively). These posterior probabilities are used to estimate the overall mean effect size as the weighted average of the mean effect size estimates of the random and fixedeffect model as proposed by Gronau, Van Erp, Heck, Cesario, Jonas, & Wagenmakers (2017, doi:10.1080/23743603.2017.1326760). The user can define a wide range of noninformative or informative priors for the mean effect size and the heterogeneity coefficient. Funding for this research was provided by the Berkeley Initiative for Transparency in the Social Sciences, a program of the Center for Effective Global Action (CEGA), with support from the Laura and John Arnold Foundation.

1522

MetaAnalysis

metacart

MetaCART: A Flexible Approach to Identify Moderators in MetaAnalysis

Fits metaCART by integrating classification and regression trees (CART) into metaanalysis. MetaCART is a flexible approach to identify interaction effects between moderators in metaanalysis. The methods are described in Dusseldorp et al. (2014) doi:10.1037/hea0000018 and Li et al. (2017) doi:10.1111/bmsp.12088.

1523

MetaAnalysis

metacor

Metaanalysis of correlation coefficients

Implement the DerSimonianLaird (DSL) and OlkinPratt (OP) metaanalytical approaches with correlation coefficients as effect sizes.

1524

MetaAnalysis

MetaDE

MetaDE: Microarray metaanalysis for differentially expressed gene detection

MetaDE package implements 12 major metaanalysis methods for differential expression analysis.

1525

MetaAnalysis

metafor (core)

MetaAnalysis Package for R

A comprehensive collection of functions for conducting metaanalyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit fixed, random, and mixedeffects models to such data, carry out moderator and metaregression analyses, and create various types of metaanalytical plots (e.g., forest, funnel, radial, L’Abbe, Baujat, GOSH plots). For metaanalyses of binomial and persontime data, the package also provides functions that implement specialized methods, including the MantelHaenszel method, Peto’s method, and a variety of suitable generalized linear (mixedeffects) models (i.e., mixedeffects logistic and Poisson regression models). Finally, the package provides functionality for fitting metaanalytic multivariate/multilevel models that account for nonindependent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network metaanalyses and metaanalyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted.

1526

MetaAnalysis

metaforest

Exploring Heterogeneity in MetaAnalysis using Random Forests

A requirement of classic metaanalysis is that the studies being aggregated are conceptually similar, and ideally, close replications. However, in many fields, there is substantial heterogeneity between studies on the same topic. Similar research questions are studied in different laboratories, using different methods, instruments, and samples. Classic metaanalysis lacks the power to assess more than a handful of univariate moderators, or to investigate interactions between moderators, and nonlinear effects. MetaForest, by contrast, has substantial power to explore heterogeneity in metaanalysis. It can identify important moderators from a larger set of potential candidates, even with as little as 20 studies (Van Lissa, in preparation). This is an appealing quality, because many metaanalyses have small sample sizes. Moreover, MetaForest yields a measure of variable importance which can be used to identify important moderators, and offers partial prediction plots to explore the shape of the marginal relationship between moderators and effect size.

1527

MetaAnalysis

metafuse

Fused Lasso Approach in Regression Coefficient Clustering

Fused lasso method to cluster and estimate regression coefficients of the same covariate across different data sets when a large number of independent data sets are combined. Package supports Gaussian, binomial, Poisson and Cox PH models.

1528

MetaAnalysis

metagear

Comprehensive Research Synthesis Tools for Systematic Reviews and MetaAnalysis

Functionalities for facilitating systematic reviews, data extractions, and metaanalyses. It includes a GUI (graphical user interface) to help screen the abstracts and titles of bibliographic data; tools to assign screening effort across multiple collaborators/reviewers and to assess inter reviewer reliability; tools to help automate the download and retrieval of journal PDF articles from online databases; figure and image extractions from PDFs; web scraping of citations; automated and manual data extraction from scatterplot and barplot images; PRISMA (Preferred Reporting Items for Systematic Reviews and MetaAnalyses) flow diagrams; simple imputation tools to fill gaps in incomplete or missing study parameters; generation of random effects sizes for Hedges’ d, log response ratio, odds ratio, and correlation coefficients for Monte Carlo experiments; covariance equations for modelling dependencies among multiple effect sizes (e.g., effect sizes with a common control); and finally summaries that replicate analyses and outputs from widely used but no longer updated metaanalysis software. Funding for this package was supported by National Science Foundation (NSF) grants DBI1262545 and DEB1451031.

1529

MetaAnalysis

metagen

Inference in Meta Analysis and Meta Regression

Provides methods for making inference in the random effects meta regression model such as point estimates and confidence intervals for the heterogeneity parameter and the regression coefficients vector. Inference methods are based on different approaches to statistical inference. Methods from three different schools are included: methods based on the method of moments approach, methods based on likelihood, and methods based on generalised inference. The package also includes tools to run extensive simulation studies in parallel on high performance clusters in a modular way. This allows extensive testing of custom inferential methods with all implemented stateoftheart methods in a standardised way. Tools for evaluating the performance of both point and interval estimates are provided. Also, a large collection of different predefined plotting functions is implemented in a readytouse fashion.

1530

MetaAnalysis

MetaIntegrator

MetaAnalysis of Gene Expression Data

A pipeline for the metaanalysis of gene expression data. We have assembled several analysis and plot functions to perform integrated multicohort analysis of gene expression data (meta analysis). Methodology described in: http://biorxiv.org/content/early/2016/08/25/071514.

1531

MetaAnalysis

metaLik

Likelihood Inference in MetaAnalysis and MetaRegression Models

First and higherorder likelihood inference in metaanalysis and metaregression models.

1532

MetaAnalysis

metaMA

Metaanalysis for MicroArrays

Combines either pvalues or modified effect sizes from different studies to find differentially expressed genes

1533

MetaAnalysis

metamisc

Diagnostic and Prognostic MetaAnalysis

Metaanalysis of diagnostic and prognostic modeling studies. Summarize estimates of diagnostic test accuracy and prediction model performance. Validate, update and combine published prediction models. Develop new prediction models with data from multiple studies.

1534

MetaAnalysis

metansue

MetaAnalysis of Studies with Non StatisticallySignificant Unreported Effects

Revisited version of MetaNSUE, a novel metaanalytic method that allows an unbiased inclusion of studies with Non StatisticallySignificant Unreported Effects (NSUEs). Briefly, the method first calculates the interval where the unreported effects (e.g. tvalues) should be according to the threshold of statistical significance used in each study. Afterwards, maximum likelihood techniques are used to impute the expected effect size of each study with NSUEs, accounting for betweenstudy heterogeneity and potential covariates. Multiple imputations of the NSUEs are then randomly created based on the expected value, variance and statistical significance bounds. Finally, a restrictedmaximum likelihood randomeffects metaanalysis is separately conducted for each set of imputations, and estimations from these metaanalyses are pooled. Please read the reference in ‘metansue’ for details of the procedure.

1535

MetaAnalysis

metap

MetaAnalysis of Significance Values

The canonical way to perform metaanalysis involves using effect sizes. When they are not available this package provides a number of methods for metaanalysis of significance values including the methods of Edgington, Fisher, Stouffer, Tippett, and Wilkinson; a number of datasets to replicate published results; and a routine for graphical display.

1536

MetaAnalysis

MetaPath

Perform the MetaAnalysis for Pathway Enrichment Analysis (MAPE)

Perform the Metaanalysis for Pathway Enrichment (MAPE) methods introduced by Shen and Tseng (2010). It includes functions to automatically perform MAPE_G (integrating multiple studies at gene level), MAPE_P (integrating multiple studies at pathway level) and MAPE_I (a hybrid method integrating MAEP_G and MAPE_P methods). In the simulation and real data analyses in the paper, MAPE_G and MAPE_P have complementary advantages and detection power depending on the data structure. In general, the integrative form of MAPE_I is recommended to use. In the case that MAPE_G (or MAPE_P) detects almost none pathway, the integrative MAPE_I does not improve performance and MAPE_P (or MAPE_G) should be used. Reference: Shen, Kui, and George C Tseng. Metaanalysis for pathway enrichment analysis when combining multiple microarray studies.Bioinformatics (Oxford, England) 26, no. 10 (April 2010): 13161323. doi:10.1093/bioinformatics/btq148. http://www.ncbi.nlm.nih.gov/pubmed/20410053.

1537

MetaAnalysis

MetaPCA

MetaPCA: Metaanalysis in the Dimension Reduction of Genomic data

MetaPCA implements simultaneous dimension reduction using PCA when multiple studies are combined. We propose two basic ideas to find a common PC subspace by eigenvalue maximization approach and angle minimization approach, and we extend the concept to incorporate Robust PCA and Sparse PCA in the metaanalysis realm.

1538

MetaAnalysis

metaplotr

Creates CrossHairs Plots for MetaAnalyses

Creates crosshairs plots to summarize and analyse metaanalysis results. In due time this package will contain code that will create other kind of metaanalysis graphs.

1539

MetaAnalysis

metaplus

Robust MetaAnalysis and MetaRegression

Performs metaanalysis and metaregression using standard and robust methods with confidence intervals based on the profile likelihood. Robust methods are based on alternative distributions for the random effect, either the tdistribution (Lee and Thompson, 2008 doi:10.1002/sim.2897 or Baker and Jackson, 2008 doi:10.1007/s1072900790418) or mixtures of normals (Beath, 2014 doi:10.1002/jrsm.1114).

1540

MetaAnalysis

MetaQC

MetaQC: Objective Quality Control and Inclusion/Exclusion Criteria for Genomic MetaAnalysis

MetaQC implements our proposed quantitative quality control measures: (1) internal homogeneity of coexpression structure among studies (internal quality control; IQC); (2) external consistency of coexpression structure correlating with pathway database (external quality control; EQC); (3) accuracy of differentially expressed gene detection (accuracy quality control; AQCg) or pathway identification (AQCp); (4) consistency of differential expression ranking in genes (consistency quality control; CQCg) or pathways (CQCp). (See the reference for detailed explanation.) For each quality control index, the pvalues from statistical hypothesis testing are minus log transformed and PCA biplots were applied to assist visualization and decision. Results generate systematic suggestions to exclude problematic studies in microarray metaanalysis and potentially can be extended to GWAS or other types of genomic metaanalysis. The identified problematic studies can be scrutinized to identify technical and biological causes (e.g. sample size, platform, tissue collection, preprocessing etc) of their bad quality or irreproducibility for final inclusion/exclusion decision.

1541

MetaAnalysis

metaRNASeq

Metaanalysis of RNAseq data

Implementation of two pvalue combination techniques (inverse normal and Fisher methods). A vignette is provided to explain how to perform a metaanalysis from two independent RNAseq experiments.

1542

MetaAnalysis

metaSEM

MetaAnalysis using Structural Equation Modeling

A collection of functions for conducting metaanalysis using a structural equation modeling (SEM) approach via the ‘OpenMx’ package. It also implements the twostage SEM approach to conduct metaanalytic structural equation modeling on correlation and covariance matrices.

1543

MetaAnalysis

metasens

Advanced Statistical Methods to Model and Adjust for Bias in MetaAnalysis

The following methods are implemented to evaluate how sensitive the results of a metaanalysis are to potential bias in metaanalysis and to support Schwarzer et al. (2015) doi:10.1007/9783319214160, Chapter 5 “SmallStudy Effects in MetaAnalysis”:  Copas selection model described in Copas & Shi (2001) doi:10.1177/096228020101000402;  limit metaanalysis by Rucker et al. (2011) doi:10.1093/biostatistics/kxq046;  upper bound for outcome reporting bias by Copas & Jackson (2004) doi:10.1111/j.0006341X.2004.00161.x.

1544

MetaAnalysis

MetaSKAT

Meta Analysis for SNPSet (Sequence) Kernel Association Test

Functions for Metaanalysis Burden test, SKAT and SKATO. These methods use summarylevel score statistics to carry out genebased metaanalysis for rare variants.

1545

MetaAnalysis

MetaSubtract

Subtracting Summary Statistics of One or more Cohorts from MetaGWAS Results

If results from a metaGWAS are used for validation in one of the cohorts that was included in the metaanalysis, this will yield biased (i.e. too optimistic) results. The validation cohort needs to be independent from the metaGWAS results. MetaSubtract will subtract the results of the respective cohort from the metaGWAS results analytically without having to redo the metaGWAS analysis using the leaveoneout methodology. It can handle different metaanalyses methods and takes into account if single or double genomic control correction was applied to the original metaanalysis. It can be used for whole GWAS, but also for a limited set of SNPs or other genetic markers.

1546

MetaAnalysis

metatest

Fit and test metaregression models

This package fits meta regression models and generates a number of statistics: next to t and ztests, the likelihood ratio, bartlett corrected likelihood ratio and permutation tests are performed on the model coefficients.

1547

MetaAnalysis

Metatron

Metaanalysis for Classification Data and Correction to Imperfect Reference

This package allows doing metaanalysis for primary studies with classification outcomes in order to evaluate systematically the accuracies of classifiers, namely, the diagnostic tests. It provides functions to fit the bivariate model of Reitsma et al.(2005). Moreover, if the reference employed in the classification process isn’t a gold standard, its deficit can be detected and its influence to the underestimation of the diagnostic test’s accuracy can be corrected, as described in Botella et al.(2013).

1548

MetaAnalysis

metavcov

VarianceCovariance Matrix for Multivariate MetaAnalysis

Compute variancecovariance matrix for multivariate metaanalysis. Effect sizes include correlation (r), mean difference (MD), standardized mean difference (SMD), log odds ratio (logOR), log risk ratio (logRR), and risk difference (RD).

1549

MetaAnalysis

metaviz

Rainforest Plots and Visual Funnel Plot Inference for MetaAnalysis

Creates rainforest plots (proposed by Schild & Voracek, 2015 doi:10.1002/jrsm.1125), a variant and enhancement of the classic forest plot for metaanalysis. In addition, functionalities for visual funnel plot inference are provided. In the near future, the ‘metaviz’ package will be extended by further, established as well as novel, plotting options for visualizing metaanalytic data.

1550

MetaAnalysis

mmeta

Multivariate MetaAnalysis

A novel multivariate metaanalysis.

1551

MetaAnalysis

MultiMeta

Metaanalysis of Multivariate Genome Wide Association Studies

Allows running a metaanalysis of multivariate Genome Wide Association Studies (GWAS) and easily visualizing results through custom plotting functions. The multivariate setting implies that results for each single nucleotide polymorphism (SNP) include several effect sizes (also known as “beta coefficients”, one for each trait), as well as related variance values, but also covariance between the betas. The main goal of the package is to provide combined beta coefficients across different cohorts, together with the combined variance/covariance matrix. The method is inversevariance based, thus each beta is weighted by the inverse of its variancecovariance matrix, before taking the average across all betas. The default options of the main function will work with files obtained from GEMMA multivariate option for GWAS (Zhou & Stephens, 2014). It will work with any other output, as soon as columns are formatted to have the according names. The package also provides several plotting functions for QQplots, Manhattan Plots and custom summary plots.

1552

MetaAnalysis

mvmeta

Multivariate and Univariate MetaAnalysis and MetaRegression

Collection of functions to perform fixed and randomeffects multivariate and univariate metaanalysis and metaregression.

1553

MetaAnalysis

mvtmeta

Multivariate metaanalysis

This package contains functions to run fixed effects or random effects multivariate metaanalysis.

1554

MetaAnalysis

netmeta

Network MetaAnalysis using Frequentist Methods

A comprehensive set of functions providing frequentist methods for network metaanalysis and supporting Schwarzer et al. (2015) doi:10.1007/9783319214160, Chapter 8 “Network MetaAnalysis”:  frequentist network metaanalysis following Rucker (2012) doi:10.1002/jrsm.1058;  net heat plot and designbased decomposition of Cochran’s Q according to Krahn et al. (2013) doi:10.1186/147122881335;  measures characterizing the flow of evidence between two treatments by Konig et al. (2013) doi:10.1002/sim.6001;  ranking of treatments (frequentist analogue of SUCRA) according to Rucker & Schwarzer (2015) doi:10.1186/s1287401500608;  partial order of treatment rankings (‘poset’) and Hasse diagram for ‘poset’ (Carlsen & Bruggemann, 2014) doi:10.1002/cem.2569;  split direct and indirect evidence to check consistency (Dias et al., 2010) doi:10.1002/sim.3767;  league table with network metaanalysis results;  automated drawing of network graphs described in Rucker & Schwarzer (2016) doi:10.1002/jrsm.1143.

1555

MetaAnalysis

nmaINLA

Network MetaAnalysis using Integrated Nested Laplace Approximations

Performs network metaanalysis using integrated nested Laplace approximations (‘INLA’). Includes methods to assess the heterogeneity and inconsistency in the network. Contains more than ten different network metaanalysis data. ‘INLA’ package can be obtained from http://www.rinla.org. We recommend the testing version.

1556

MetaAnalysis

pcnetmeta

PatientCentered Network MetaAnalysis

Performs armbased network metaanalysis for datasets with binary, continuous, and count outcomes using the Bayesian methods of Zhang et al (2014) doi:10.1177/1740774513498322 and Lin et al (2017) doi:10.18637/jss.v080.i05.

1557

MetaAnalysis

psychmeta

Psychometric MetaAnalysis Toolkit

Tools for computing barebones and psychometric metaanalyses and for generating psychometric data for use in metaanalysis simulations. Supports barebones, individualcorrection, and artifactdistribution methods for metaanalyzing correlations and d values. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping metaanalytic databases, computing multivariate corrections for range variation, and more.

1558

MetaAnalysis

psychometric

Applied Psychometric Theory

Contains functions useful for correlation theory, metaanalysis (validitygeneralization), reliability, item analysis, interrater reliability, and classical utility

1559

MetaAnalysis

PubBias

Performs simulation study to look for publication bias, using a technique described by Ioannidis and Trikalinos; Clin Trials. 2007;4(3):24553

I adapted a method designed by Ioannidis and Trikalinos, which compares the observed number of positive studies in a metaanalysis with the expected number, if the summary measure of effect, averaged over the individual studies, were assumed true. Excess in the observed number of positive studies, compared to the expected, is taken as evidence of publication bias. The observed number of positive studies, at a given level for statistical significance, is calculated by applying Fisher’s exact test to the reported 2x2 table data of each constituent study, doubling the Fisher onesided Pvalue to make a twosided test. The corresponding expected number of positive studies was obtained by summing the statistical powers of each study. The statistical power depended on a given measure of effect which, here, was the pooled odds ratio of the metaanalysis was used. By simulating each constituent study, with the given odds ratio, and the same number of treated and nontreated as in the real study, the power of the study is estimated as the proportion of simulated studies that are positive, again by a Fisher’s exact test. The simulated number of events in the treated and untreated groups was done with binomial sampling. In the untreated group, the binomial proportion was the percentage of actual events reported in the study and, in the treated group, the binomial sampling proportion was the untreated percentage multiplied by the risk ratio which was derived from the assumed common odds ratio. The statistical significance for judging a positive study may be varied and large differences between expected and observed number of positive studies around the level of 0.05 significance constitutes evidence of publication bias. The difference between the observed and expected is tested by chisquare. A chisquare test Pvalue for the difference below 0.05 is suggestive of publication bias, however, a less stringent level of 0.1 is often used in studies of publication bias as the number of published studies is usually small.

1560

MetaAnalysis

RandMeta

Efficient Numerical Algorithm for Exact Inference in Meta Analysis

A novel numerical algorithm that provides functionality for estimating the exact 95% confidence interval of the location parameter in the random effects model, and is much faster than the naive method. Works best when the number of studies is between 620.

1561

MetaAnalysis

ratesci

Confidence Intervals for Comparisons of Binomial or Poisson Rates

Computes confidence intervals for the rate (or risk) difference (“RD”) or rate ratio (or relative risk, “RR”) for binomial proportions or Poisson rates, or for odds ratio (“OR”, binomial only). Also confidence intervals for a single binomial or Poisson rate, and intervals for matched pairs. Includes asymptotic score methods including skewness corrections, which have been developed in Laud (2017, in press) from Miettinen & Nurminen (1985) doi:10.1002/sim.4780040211 and Gart & Nam (1988) doi:10.2307/2531848. Also includes MOVER methods (Method Of Variance Estimates Recovery), derived from the Newcombe method but using equaltailed Jeffreys intervals, and generalised for incorporating prior information. Also methods for stratified calculations (e.g. metaanalysis), either assuming fixed effects or incorporating stratum heterogeneity.

1562

MetaAnalysis

RcmdrPlugin.EZR

R Commander Plugin for the EZR (Easy R) Package

EZR (Easy R) adds a variety of statistical functions, including survival analyses, ROC analyses, metaanalyses, sample size calculation, and so on, to the R commander. EZR enables pointandclick easy access to statistical functions, especially for medical statistics. EZR is platformindependent and runs on Windows, Mac OS X, and UNIX. Its complete manual is available only in Japanese (Chugai Igakusha, ISBN: 9784498109018 or Nankodo, ISBN: 9784524261581), but an report that introduced the investigation of EZR was published in Bone Marrow Transplantation (Nature Publishing Group) as an Open article. This report can be used as a simple manual. It can be freely downloaded from the journal website as shown below. This report has been cited in more than 1,000 scientific articles.

1563

MetaAnalysis

RcmdrPlugin.RMTCJags

R MTC Jags ‘Rcmdr’ Plugin

Mixed Treatment Comparison is a methodology to compare directly and/or indirectly health strategies (drugs, treatments, devices). This package provides an ‘Rcmdr’ plugin to perform Mixed Treatment Comparison for binary outcome using BUGS code from Bristol University (Lu and Ades).

1564

MetaAnalysis

rma.exact

Exact Confidence Intervals for Random Effects MetaAnalyses

Compute an exact CI for the population mean under a random effects model. The routines implement the algorithm described in Michael, Thronton, Xie, and Tian (2017) https://habenmichael.github.io/research/Exact_Inference_Meta.pdf.

1565

MetaAnalysis

rmeta

Metaanalysis

Functions for simple fixed and random effects metaanalysis for twosample comparisons and cumulative metaanalyses. Draws standard summary plots, funnel plots, and computes summaries and tests for association and heterogeneity

1566

MetaAnalysis

robumeta

Robust Variance MetaRegression

Functions for conducting robust variance estimation (RVE) metaregression using both large and small sample RVE estimators under various weighting schemes. These methods are distribution free and provide valid point estimates, standard errors and hypothesis tests even when the degree and structure of dependence between effect sizes is unknown. Also included are functions for conducting sensitivity analyses under correlated effects weighting and producing RVEbased forest plots.

1567

MetaAnalysis

SAMURAI

Sensitivity Analysis of a Metaanalysis with Unpublished but Registered Analytical Investigations

This package contains R functions to gauge the impact of unpublished studies upon the metaanalytic summary effect of a set of published studies. (Credits: The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/20072013) under grant agreement no. 282574.)

1568

MetaAnalysis

SCMA

SingleCase MetaAnalysis

Perform metaanalysis of singlecase experiments, including calculating various effect size measures (SMD, PND, PEM and NAP) and probability combining (additive and multiplicative method).

1569

MetaAnalysis

selectMeta

Estimation of Weight Functions in Meta Analysis

Publication bias, the fact that studies identified for inclusion in a meta analysis do not represent all studies on the topic of interest, is commonly recognized as a threat to the validity of the results of a meta analysis. One way to explicitly model publication bias is via selection models or weighted probability distributions. In this package we provide implementations of several parametric and nonparametric weight functions. The novelty in Rufibach (2011) is the proposal of a nonincreasing variant of the nonparametric weight function of Dear & Begg (1992). The new approach potentially offers more insight in the selection process than other methods, but is more flexible than parametric approaches. To maximize the loglikelihood function proposed by Dear & Begg (1992) under a monotonicity constraint we use a differential evolution algorithm proposed by Ardia et al (2010a, b) and implemented in Mullen et al (2009). In addition, we offer a method to compute a confidence interval for the overall effect size theta, adjusted for selection bias as well as a function that computes the simulationbased pvalue to assess the null hypothesis of no selection as described in Rufibach (2011, Section 6).

1570

MetaAnalysis

seqMeta

MetaAnalysis of RegionBased Tests of Rare DNA Variants

Computes necessary information to meta analyze regionbased tests for rare genetic variants (e.g. SKAT, T1) in individual studies, and performs meta analysis.

1571

MetaAnalysis

surrosurv

Evaluation of Failure Time Surrogate Endpoints in Individual Patient Data MetaAnalyses

Provides functions for the evaluation of surrogate endpoints when both the surrogate and the true endpoint are failure time variables. The approaches implemented are: (1) the twostep approach (Burzykowski et al, 2001) doi:10.1111/14679876.00244 with a copula model (Clayton, Plackett, Hougaard) at the first step and either a linear regression of loghazard ratios at the second step (either adjusted or not for measurement error); (2) mixed proportional hazard models estimated via mixed Poisson GLM (Rotolo et al, 2017 doi:10.1177/0962280217718582).

1572

MetaAnalysis

TFisher

Optimal Thresholding Fisher’s PValue Combination Method

We provide the cumulative distribution function (CDF), quantile, and statistical power calculator for a collection of thresholding Fisher’s pvalue combination methods, including Fisher’s pvalue combination method, truncated product method and, in particular, softthresholding Fisher’s pvalue combination method which is proven to be optimal in some context of signal detection. The pvalue calculator for the omnibus version of these tests are also included. For reference, please see Hong Zhang and Zheyang Wu. “Optimal Thresholding of Fisher’s Pvalue Combination Tests for Signal Detection”, submitted.

1573

MetaAnalysis

weightr

Estimating WeightFunction Models for Publication Bias

Estimates the Vevea and Hedges (1995) doi:10.1007/BF02294384 weightfunction model. By specifying arguments, users can also estimate the modified model described in Vevea and Woods (2005) doi:10.1037/1082989X.10.4.428, which may be more practical with small datasets. Users can also specify moderators to estimate a linear model. The package functionality allows users to easily extract the results of these analyses as R objects for other uses. In addition, the package includes a function to launch both models as a Shiny application. Although the Shiny application is also available online, this function allows users to launch it locally if they choose.

1574

MetaAnalysis

xmeta

A Toolbox for Multivariate MetaAnalysis

A toolbox for metaanalysis. This package includes a collection of functions for (1) implementing robust multivariate metaanalysis of continuous or binary outcomes; and (2) a bivariate Egger’s test for detecting publication bias.

1575

Multivariate Statistics

abind

Combine Multidimensional Arrays

Combine multidimensional arrays into a single array. This is a generalization of ‘cbind’ and ‘rbind’. Works with vectors, matrices, and higherdimensional arrays. Also provides functions ‘adrop’, ‘asub’, and ‘afill’ for manipulating, extracting and replacing data in arrays.

1576

Multivariate Statistics

ade4 (core)

Analysis of Ecological Data : Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of onetable (e.g., principal component analysis, correspondence analysis), twotable (e.g., coinertia analysis, redundancy analysis), threetable (e.g., RLQ analysis) and Ktable (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) doi:10.18637/jss.v022.i04.

1577

Multivariate Statistics

amap

Another Multidimensional Analysis Package

Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).

1578

Multivariate Statistics

aplpack

Another Plot PACKage: stem.leaf, bagplot, faces, spin3R, plotsummary, plothulls, and some slider functions

set of functions for drawing some special plots: stem.leaf plots a stem and leaf plot, stem.leaf.backback plots backtoback versions of stem and leafs, bagplot plots a bagplot, skyline.hist plots several histgramm in one plot of a one dimensional data set, plotsummary plots a graphical summary of a data set with one or more variables, plothulls plots sequentially hulls of a bivariate data set, faces plots chernoff faces, spin3R for an inspection of a 3dim point cloud, slider functions for interactive graphics.

1579

Multivariate Statistics

ash

David Scott’s ASH Routines

David Scott’s ASH routines ported from SPLUS to R.

1580

Multivariate Statistics

bayesm

Bayesian Inference for Marketing/MicroEconometrics

Covers many important models used in marketing and microeconometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choicebased conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non and SemiParametric Methods and Applications (Princeton U Press 2014).

1581

Multivariate Statistics

ca

Simple, Multiple and Joint Correspondence Analysis

Computation and visualization of simple, multiple and joint correspondence analysis.

1582

Multivariate Statistics

calibrate

Calibration of Scatterplot and Biplot Axes

Package for drawing calibrated scales with tick marks on (nonorthogonal) variable vectors in scatterplots and biplots.

1583

Multivariate Statistics

car

Companion to Applied Regression

Functions and Datasets to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Second Edition, Sage, 2011.

1584

Multivariate Statistics

caret

Classification and Regression Training

Misc functions for training and plotting classification and regression models.

1585

Multivariate Statistics

class

Functions for Classification

Various functions for classification, including knearest neighbour, Learning Vector Quantization and SelfOrganizing Maps.

1586

Multivariate Statistics

clue

Cluster Ensembles

CLUster Ensembles.

1587

Multivariate Statistics

cluster (core)

“Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.

1588

Multivariate Statistics

clusterGeneration

Random Cluster Generation (with Specified Degree of Separation)

We developed the clusterGeneration package to provide functions for generating random clusters, generating random covariance/correlation matrices, calculating a separation index (data and population version) for pairs of clusters or cluster distributions, and 1D and 2D projection plots to visualize clusters. The package also contains a function to generate random clusters based on factorial designs with factors such as degree of separation, number of clusters, number of variables, number of noisy variables.

1589

Multivariate Statistics

clusterSim

Searching for Optimal Clustering Procedure for a Data Set

Distance measures (GDM1, GDM2, SokalMichener, BrayCurtis, for symbolic intervalvalued data), cluster quality indices (CalinskiHarabasz, BakerHubert, HubertLevine, Silhouette, KrzanowskiLai, Hartigan, Gap, DaviesBouldin), data normalization formulas, data generation (typical and nontypical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic intervalvalued data). (MILLIGAN, G.W., COOPER, M.C. (1985) doi:10.1007/BF02294245, HUBERT, L., ARABIE, P. (1985), doi:10.1007%2FBF01908075, RAND, W.M. (1971) doi:10.1080/01621459.1971.10482356, JAJUGA, K., WALESIAK, M. (2000) doi:10.1007/9783642572807_11, MILLIGAN, G.W., COOPER, M.C. (1988) doi:10.1007/BF01897163, CORMACK, R.M. (1971) doi:10.2307/2344237, JAJUGA, K., WALESIAK, M., BAK, A. (2003) doi:10.1007/9783642557217_12, CARMONE, F.J., KARA, A., MAXWELL, S. (1999) doi:10.2307/3152003, DAVIES, D.L., BOULDIN, D.W. (1979) doi:10.1109/TPAMI.1979.4766909, CALINSKI, T., HARABASZ, J. (1974) doi:10.1080/03610927408827101, HUBERT, L. (1974) doi:10.1080/01621459.1974.10480191, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) doi:10.1111/14679868.00293, KRZANOWSKI, W.J., LAI, Y.T. (1988) doi:10.2307/2531893, BRECKENRIDGE, J.N. (2000) doi:10.1207/S15327906MBR3502_5, WALESIAK, M., DUDEK, A. (2008) doi:10.1007/9783540782469_11).

1590

Multivariate Statistics

clustvarsel

Variable Selection for Gaussian ModelBased Clustering

Variable selection for Gaussian modelbased clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forwardbackward or backwardforward direction, with or without subsampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.

1591

Multivariate Statistics

clv

Cluster Validation Techniques

Package contains most of the popular internal and external cluster validation methods ready to use for the most of the outputs produced by functions coming from package “cluster”. Package contains also functions and examples of usage for cluster stability approach that might be applied to algorithms implemented in “cluster” package as well as user defined clustering algorithms.

1592

Multivariate Statistics

cocorresp

CoCorrespondence Analysis Methods

Fits predictive and symmetric cocorrespondence analysis (CoCA) models to relate one data matrix to another data matrix. More specifically, CoCA maximises the weighted covariance between the weighted averaged species scores of one community and the weighted averaged species scores of another community. CoCA attempts to find patterns that are common to both communities.

1593

Multivariate Statistics

concor

Concordance

The four functions svdcp (cp for column partitioned), svdbip or svdbip2 (bip for bipartitioned), and svdbips (s for a simultaneous optimization of one set of r solutions), correspond to a “SVD by blocks” notion, by supposing each block depending on relative subspaces, rather than on two whole spaces as usual SVD does. The other functions, based on this notion, are relative to two column partitioned data matrices x and y defining two sets of subsets xi and yj of variables and amount to estimate a link between xi and yj for the pair (xi, yj) relatively to the links associated to all the other pairs.

1594

Multivariate Statistics

copula

Multivariate Dependence with Copulas

Classes (S4) of commonly used elliptical, Archimedean, extremevalue and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extremevalue dependence, goodnessoffit) and model selection based on crossvalidation. Empirical copula, smoothed versions, and nonparametric estimators of the Pickands dependence function.

1595

Multivariate Statistics

corpcor

Efficient Estimation of Covariance and (Partial) Correlation

Implements a JamesSteintype shrinkage estimator for the covariance matrix, with separate shrinkage for variances and correlations. The details of the method are explained in Schafer and Strimmer (2005) doi:10.2202/15446115.1175 and OpgenRhein and Strimmer (2007) doi:10.2202/15446115.1252. The approach is both computationally as well as statistically very efficient, it is applicable to “small n, large p” data, and always returns a positive definite and wellconditioned covariance matrix. In addition to inferring the covariance matrix the package also provides shrinkage estimators for partial correlations and partial variances. The inverse of the covariance and correlation matrix can be efficiently computed, as well as any arbitrary power of the shrinkage correlation matrix. Furthermore, functions are available for fast singular value decomposition, for computing the pseudoinverse, and for checking the rank and positive definiteness of a matrix.

1596

Multivariate Statistics

covRobust

Robust Covariance Estimation via Nearest Neighbor Cleaning

The cov.nnve() function implements robust covariance estimation by the nearest neighbor variance estimation (NNVE) method of Wang and Raftery (2002) doi:10.1198/016214502388618780.

1597

Multivariate Statistics

cramer

Multivariate nonparametric CramerTest for the twosampleproblem

Provides R routine for the so called twosample CramerTest. This nonparametric twosampletest on equality of the underlying distributions can be applied to multivariate data as well as univariate data. It offers two possibilities to approximate the critical value both of which are included in this package.

1598

Multivariate Statistics

cwhmisc

Miscellaneous Functions for Math, Plotting, Printing, Statistics, Strings, and Tools

Miscellaneous useful or interesting functions.

1599

Multivariate Statistics

delt

Estimation of Multivariate Densities Using Adaptive Partitions

We implement methods for estimating multivariate densities. We include a discretized kernel estimator, an adaptive histogram (a greedy histogram and a CARThistogram), stagewise minimization, and bootstrap aggregation.

1600

Multivariate Statistics

denpro

Visualization of Multivariate Functions, Sets, and Data

We provide tools to (1) visualize multivariate density functions and density estimates with level set trees, (2) visualize level sets with shape trees, (3) visualize multivariate data with tail trees, (4) visualize scales of multivariate density estimates with mode graphs and branching maps, and (5) visualize anisotropic spread with 2D volume functions and 2D probability content functions. Level set trees visualize mode structure, shape trees visualize shapes of level sets of unimodal densities, and tail trees visualize connected data sets. The kernel estimator is implemented but the package may also be applied for visualizing other density estimates.

1601

Multivariate Statistics

desirability

Function Optimization and Ranking via Desirability Functions

S3 classes for multivariate optimization using the desirability function by Derringer and Suich (1980).

1602

Multivariate Statistics

dr

Methods for Dimension Reduction for Regression

Functions, methods, and datasets for fitting dimension reduction regression, using slicing (methods SAVE and SIR), Principal Hessian Directions (phd, using residuals and the response), and an iterative IRE. Partial methods, that condition on categorical predictors are also available. A variety of tests, and stepwise deletion of predictors, is also included. Also included is code for computing permutation tests of dimension. Adding additional methods of estimating dimension is straightforward. For documentation, see the vignette in the package. With version 3.0.4, the arguments for dr.step have been modified.

1603

Multivariate Statistics

e1071

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …

1604

Multivariate Statistics

earth

Multivariate Adaptive Regression Splines

Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines”. (The term “MARS” is trademarked and thus not used in the name of the package.)

1605

Multivariate Statistics

ellipse

Functions for drawing ellipses and ellipselike confidence regions

This package contains various routines for drawing ellipses and ellipselike confidence regions, implementing the plots described in Murdoch and Chow (1996), A graphical display of large correlation matrices, The American Statistician 50, 178180. There are also routines implementing the profile plots described in Bates and Watts (1988), Nonlinear Regression Analysis and its Applications.

1606

Multivariate Statistics

energy

EStatistics: Multivariate Inference via the Energy of Data

Estatistics (energy) tests and statistics for multivariate and univariate inference, including distance correlation, onesample, twosample, and multisample tests for comparing multivariate distributions, are implemented. Measuring and testing multivariate independence based on distance correlation, partial distance correlation, multivariate goodnessoffit tests, clustering based on energy distance, testing for multivariate normality, distance components (disco) for nonparametric analysis of structured data, and other energy statistics/methods are implemented.

1607

Multivariate Statistics

eRm

Extended Rasch Modeling

Fits Rasch models (RM), linear logistic test models (LLTM), rating scale model (RSM), linear rating scale models (LRSM), partial credit models (PCM), and linear partial credit models (LPCM). Missing values are allowed in the data matrix. Additional features are the ML estimation of the person parameters, Andersen’s LRtest, itemspecific Wald test, MartinLofTest, nonparametric MonteCarlo Tests, itemfit and personfit statistics including infit and outfit measures, various ICC and related plots, automated stepwise item elimination, simulation module for various binary data matrices.

1608

Multivariate Statistics

FactoMineR

Multivariate Exploratory Data Analysis and Data Mining

Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017) doi:10.1201/b103452.

1609

Multivariate Statistics

FAiR

Factor Analysis in R

This package estimates factor analysis models using a genetic algorithm, which permits a general mechanism for restricted optimization with arbitrary restrictions that are chosen at run time with the help of a GUI. Importantly, inequality restrictions can be imposed on functions of multiple parameters, which provides a new avenues for testing and generating theories with factor analysis models. This package also includes an entirely new estimator of the common factor analysis model called semiexploratory factor analysis, which is a general alternative to exploratory and confirmatory factor analysis. Finally, this package integrates a lot of other packages that estimate sample covariance matrices and thus provides a lot of alternatives to the traditional sample covariance calculation. Note that you need to have the Gtk run time library installed on your system to use this package; see the URL below for detailed installation instructions. Most users would only need to understand the first twentyfour pages of the PDF manual.

1610

Multivariate Statistics

fastICA

FastICA Algorithms to Perform ICA and Projection Pursuit

Implementation of FastICA algorithm to perform Independent Component Analysis (ICA) and Projection Pursuit.

1611

Multivariate Statistics

feature

Local Inferential Feature Significance for Multivariate Kernel Density Estimation

Local inferential feature significance for multivariate kernel density estimation.

1612

Multivariate Statistics

fgac

Generalized Archimedean Copula

Bivariate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).

1613

Multivariate Statistics

fpc

Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Clusterwise cluster stability assessment. Methods for estimation of the number of clusters: CalinskiHarabasz, Tibshirani and Walther’s prediction strength, Fang and Wang’s bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variablewise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

1614

Multivariate Statistics

fso

Fuzzy Set Ordination

Fuzzy set ordination is a multivariate analysis used in ecology to relate the composition of samples to possible explanatory variables. While differing in theory and method, in practice, the use is similar to ‘constrained ordination.’ The package contains plotting and summary functions as well as the analyses

1615

Multivariate Statistics

gclus

Clustering Graphics

Orders panels in scatterplot matrices and parallel coordinate displays by some merit index. Package contains various indices of merit, ordering functions, and enhanced versions of pairs and parcoord which color panels according to their merit level.

1616

Multivariate Statistics

GenKern

Functions for generating and manipulating binned kernel density estimates

Computes generalised KDEs

1617

Multivariate Statistics

geometry

Mesh Generation and Surface Tesselation

Makes the qhull library (www.qhull.org) available in R, in a similar manner as in Octave and MATLAB. Qhull computes convex hulls, Delaunay triangulations, halfspace intersections about a point, Voronoi diagrams, furthestsite Delaunay triangulations, and furthestsite Voronoi diagrams. It runs in 2d, 3d, 4d, and higher dimensions. It implements the Quickhull algorithm for computing the convex hull. Qhull does not support constrained Delaunay triangulations, or mesh generation of nonconvex objects, but the package does include some R functions that allow for this. Currently the package only gives access to Delaunay triangulation and convex hull computation.

1618

Multivariate Statistics

geozoo

Zoo of Geometric Objects

Geometric objects defined in ‘geozoo’ can be simulated or displayed in the R package ‘tourr’.

1619

Multivariate Statistics

gmodels

Various R Programming Tools for Model Fitting

Various R programming tools for model fitting.

1620

Multivariate Statistics

GPArotation

GPA Factor Rotation

Gradient Projection Algorithm Rotation for Factor Analysis. See ?GPArotation.Intro for more details.

1621

Multivariate Statistics

hddplot

Use Known Groups in HighDimensional Data to Derive Scores for Plots

Crossvalidated linear discriminant calculations determine the optimum number of features. Test and training scores from successive crossvalidation steps determine, via a principal components calculation, a lowdimensional global space onto which test scores are projected, in order to plot them. Further functions are included that are intended for didactic use. The package implements, and extends, methods described in J.H. Maindonald and C.J. Burden (2005) https://journal.austms.org.au/V46/CTAC2004/Main/home.html.

1622

Multivariate Statistics

Hmisc

Harrell Miscellaneous

Contains many functions useful for data analysis, highlevel graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.

1623

Multivariate Statistics

homals

Gifi Methods for Optimal Scaling

Performs a homogeneity analysis (multiple correspondence analysis) and various extensions. Rank restrictions on the category quantifications can be imposed (nonlinear PCA). The categories are transformed by means of optimal scaling with options for nominal, ordinal, and numerical scale levels (for rank1 restrictions). Variables can be grouped into sets, in order to emulate regression analysis and canonical correlation analysis.

1624

Multivariate Statistics

hybridHclust

Hybrid Hierarchical Clustering

Hybrid hierarchical clustering via mutual clusters. A mutual cluster is a set of points closer to each other than to all other points. Mutual clusters are used to enrich topdown hierarchical clustering.

1625

Multivariate Statistics

ICS

Tools for Exploring Multivariate Data via ICS/ICA

Implementation of Tyler, Critchley, Duembgen and Oja’s (JRSS B, 2009, doi:10.1111/j.14679868.2009.00706.x) and Oja, Sirkia and Eriksson’s (AJS, 2006, http://www.ajs.or.at/index.php/ajs/article/view/vol35,%20no2%263%20%207) method of two different scatter matrices to obtain an invariant coordinate system or independent components, depending on the underlying assumptions.

1626

Multivariate Statistics

ICSNP

Tools for Multivariate Nonparametrics

Tools for multivariate nonparametrics, as location tests based on marginal ranks, spatial median and spatial signs computation, Hotelling’s Ttest, estimates of shape are implemented.

1627

Multivariate Statistics

iplots

iPlots  interactive graphics for R

Interactive plots for R

1628

Multivariate Statistics

JADE

Blind Source Separation Methods Based on Joint Diagonalization and Some BSS Performance Criteria

Cardoso’s JADE algorithm as well as his functions for joint diagonalization are ported to R. Also several other blind source separation (BSS) methods, like AMUSE and SOBI, and some criteria for performance evaluation of BSS algorithms, are given.

1629

Multivariate Statistics

kernlab

KernelBased Machine Learning Lab

Kernelbased machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

1630

Multivariate Statistics

KernSmooth

Functions for Kernel Smoothing Supporting Wand & Jones (1995)

Functions for kernel smoothing (and density estimation) corresponding to the book: Wand, M.P. and Jones, M.C. (1995) “Kernel Smoothing”.

1631

Multivariate Statistics

kknn

Weighted kNearest Neighbors

Weighted kNearest Neighbors for Classification, Regression and Clustering.

1632

Multivariate Statistics

klaR

Classification and visualization

Miscellaneous functions for classification and visualization developed at the Fakultaet Statistik, Technische Universitaet Dortmund

1633

Multivariate Statistics

knncat

Nearestneighbor Classification with Categorical Variables

Scale categorical variables in such a way as to make NN classification as accurate as possible. The code also handles continuous variables and prior probabilities, and does intelligent variable selection and estimation of both error rates and the right number of NN’s.

1634

Multivariate Statistics

kohonen

Supervised and Unsupervised SelfOrganising Maps

Functions to train selforganising maps (SOMs). Also interrogation of the maps and prediction using trained maps are supported. The name of the package refers to Teuvo Kohonen, the inventor of the SOM.

1635

Multivariate Statistics

ks

Kernel Smoothing

Kernel smoothers for univariate and multivariate data, including density functions, density derivatives, cumulative distributions, modal clustering, discriminant analysis, significant modal regions and twosample hypothesis testing.

1636

Multivariate Statistics

lattice

Trellis Graphics for R

A powerful and elegant highlevel data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.

1637

Multivariate Statistics

ltm

Latent Trait Models under IRT

Analysis of multivariate dichotomous and polytomous data using latent trait models under the Item Response Theory approach. It includes the Rasch, the TwoParameter Logistic, the Birnbaum’s ThreeParameter, the Graded Response, and the Generalized Partial Credit Models.

1638

Multivariate Statistics

mAr

Multivariate AutoRegressive analysis

R functions for multivariate autoregressive analysis

1639

Multivariate Statistics

MASS (core)

Support Functions and Datasets for Venables and Ripley’s MASS

Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).

1640

Multivariate Statistics

Matrix

Sparse and Dense Matrix Classes and Methods

A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using ‘LAPACK’ and ‘SuiteSparse’ libraries.

1641

Multivariate Statistics

matrixcalc

Collection of functions for matrix calculations

A collection of functions to support matrix calculations for probability, econometric and numerical analysis. There are additional functions that are comparable to APL functions which are useful for actuarial models such as pension mathematics. This package is used for teaching and research purposes at the Department of Finance and Risk Engineering, New York University, Polytechnic Institute, Brooklyn, NY 11201.

1642

Multivariate Statistics

mclust

Gaussian Mixture Modelling for ModelBased Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for modelbased clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resamplingbased inference.

1643

Multivariate Statistics

MCMCpack

Markov Chain Monte Carlo (MCMC) Package

Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return coda mcmc objects that can then be summarized using the coda package. Some useful utility functions such as density functions, pseudorandom number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.

1644

Multivariate Statistics

mda

Mixture and Flexible Discriminant Analysis

Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, …

1645

Multivariate Statistics

mice

Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and GroothuisOudshoorn (2011) doi:10.18637/jss.v045.i03. Each variable has its own imputation model. Builtin imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous twolevel data (normal model, pan, secondlevel variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

1646

Multivariate Statistics

misc3d

Miscellaneous 3D Plots

A collection of miscellaneous 3d plots, including isosurfaces.

1647

Multivariate Statistics

mitools

Tools for multiple imputation of missing data

Tools to perform analyses and combine results from multipleimputation datasets.

1648

Multivariate Statistics

mix

Estimation/Multiple Imputation for Mixed Categorical and Continuous Data

Estimation/multiple imputation programs for mixed categorical and continuous data.

1649

Multivariate Statistics

mnormt

The Multivariate Normal and t Distributions

Functions are provided for computing the density and the distribution function of multivariate normal and “t” random variables, and for generating random vectors sampled from these distributions. Probabilities are computed via nonMonte Carlo methods; different routines are used in the case d=1, d=2, d>2, if d denotes the number of dimensions.

1650

Multivariate Statistics

MNP

R Package for Fitting the Multinomial Probit Model

Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311334. doi:10.1016/j.jeconom.2004.02.002 Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 132. doi:10.18637/jss.v014.i03.

1651

Multivariate Statistics

monomvn

Estimation for Multivariate Normal and Studentt Data with Monotone Missingness

Estimation of multivariate normal and studentt data of arbitrary dimension where the pattern of missing data is monotone. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scalemixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), NormalGamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and studentt errors (from Geweke) is also provided.

1652

Multivariate Statistics

mvnmle

ML estimation for multivariate normal data with missing values

Finds the maximum likelihood estimate of the mean vector and variancecovariance matrix for multivariate normal data with missing values.

1653

Multivariate Statistics

mvnormtest

Normality test for multivariate variables

Generalization of shapirowilk test for multivariate variables.

1654

Multivariate Statistics

mvoutlier

Multivariate Outlier Detection Based on Robust Methods

Various Methods for Multivariate Outlier Detection.

1655

Multivariate Statistics

mvtnorm

Multivariate Normal and t Distributions

Computes multivariate normal and t probabilities, quantiles, random deviates and densities.

1656

Multivariate Statistics

nFactors

Parallel Analysis and Non Graphical Solutions to the Cattell Scree Test

Indices, heuristics and strategies to help determine the number of factors/components to retain: 1. Acceleration factor (af with or without Parallel Analysis); 2. Optimal Coordinates (noc with or without Parallel Analysis); 3. Parallel analysis (components, factors and bootstrap); 4. lambda > mean(lambda) (Kaiser, CFA and related); 5. CattellNelsonGorsuch (CNG); 6. Zoski and Jurs multiple regression (b, t and p); 7. Zoski and Jurs standard error of the regression coeffcient (sescree); 8. Nelson R2; 9. Bartlett khi2; 10. Anderson khi2; 11. Lawley khi2 and 12. BentlerYuan khi2.

1657

Multivariate Statistics

pan

Multiple Imputation for Multivariate Panel or Clustered Data

Multiple imputation for multivariate panel or clustered data.

1658

Multivariate Statistics

paran

Horn’s Test of Principal Components/Factors

paran is an implementation of Horn’s technique for numerically and graphically evaluating the components or factors retained in a principle components analysis (PCA) or common factor analysis (FA). Horn’s method contrasts eigenvalues produced through a PCA or FA on a number of random data sets of uncorrelated variables with the same number of variables and observations as the experimental or observational data set to produce eigenvalues for components or factors that are adjusted for the sample errorinduced inflation. Components with adjusted eigenvalues greater than one are retained. paran may also be used to conduct parallel analysis following Glorfeld’s (1995) suggestions to reduce the likelihood of overretention.

1659

Multivariate Statistics

party

A Laboratory for Recursive Partytioning

A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed treestructured regression models into a well defined theory of conditional inference procedures. This nonparametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing treestructured regression models is available. The methods are described in Hothorn et al. (2006) doi:10.1198/106186006X133933, Zeileis et al. (2008) doi:10.1198/106186008X319331 and Strobl et al. (2007) doi:10.1186/14712105825.

1660

Multivariate Statistics

pcaPP

Robust PCA by Projection Pursuit

Provides functions for robust PCA by projection pursuit. The methods are described in Croux et al. (2006) doi:10.2139/ssrn.968376, Croux et al. (2013) doi:10.1080/00401706.2012.727746, Todorov and Filzmoser (2013) doi:10.1007/9783642330421_31.

1661

Multivariate Statistics

PearsonICA

Independent component analysis using score functions from the Pearson system

The PearsonICA algorithm is a mutual informationbased method for blind separation of statistically independent source signals. It has been shown that the minimization of mutual information leads to iterative use of score functions, i.e. derivatives of log densities. The Pearson system allows adaptive modeling of score functions. The flexibility of the Pearson system makes it possible to model a wide range of source distributions including asymmetric distributions. The algorithm is designed especially for problems with asymmetric sources but it works for symmetric sources as well.

1662

Multivariate Statistics

pls

Partial Least Squares and Principal Component Regression

Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).

1663

Multivariate Statistics

plsgenomics

PLS Analyses for Genomics

Routines for PLSbased genomic analyses, implementing PLS methods for classification with microarray data and prediction of transcription factor activities from combined ChIPchip analysis. The >=1.21 versions include two new classification methods for microarray data: GSIM and Ridge PLS. The >=1.3 versions includes a new classification method combining variable selection and compression in logistic regression context: logitSPLS; and an adaptive version of the sparse PLS.

1664

Multivariate Statistics

poLCA

Polytomous variable Latent Class Analysis

Latent class analysis and latent class regression models for polytomous outcome variables. Also known as latent structure analysis.

1665

Multivariate Statistics

polycor

Polychoric and Polyserial Correlations

Computes polychoric and polyserial correlations by quick “twostep” methods or ML, optionally with standard errors; tetrachoric and biserial correlations are special cases.

1666

Multivariate Statistics

ppls

Penalized Partial Least Squares

This package contains linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via crossvalidation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.

1667

Multivariate Statistics

prim

Patient Rule Induction Method (PRIM)

Patient Rule Induction Method (PRIM) for bump hunting in highdimensional data.

1668

Multivariate Statistics

proxy

Distance and Similarity Measures

Provides an extensible framework for the efficient calculation of auto and crossproximities, along with implementations of the most popular ones.

1669

Multivariate Statistics

psy

Various procedures used in psychometry

Kappa, ICC, Cronbach alpha, screeplot, mtmm

1670

Multivariate Statistics

PTAk

Principal Tensor Analysis on k Modes

A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting nonidentity metrics and penalisations. 2way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tuckern) and PARAFAC/CANDECOMP with these extensions.

1671

Multivariate Statistics

rda

Shrunken Centroids Regularized Discriminant Analysis

Shrunken Centroids Regularized Discriminant Analysis for the classification purpose in high dimensional data.

1672

Multivariate Statistics

relaimpo

Relative importance of regressors in linear models

relaimpo provides several metrics for assessing relative importance in linear models. These can be printed, plotted and bootstrapped. The recommended metric is lmg, which provides a decomposition of the model explained variance into nonnegative contributions. There is a version of this package available that additionally provides a new and also recommended metric called pmvd. If you are a nonUS user, you can download this extended version from Ulrike Groempings web site.

1673

Multivariate Statistics

rggobi

Interface Between R and ‘GGobi’

A commandline interface to ‘GGobi’, an interactive and dynamic graphics package. ‘Rggobi’ complements the graphical user interface of ‘GGobi’ providing a way to fluidly transition between analysis and exploration, as well as automating common tasks.

1674

Multivariate Statistics

rgl

3D Visualization Using OpenGL

Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.

1675

Multivariate Statistics

robustbase

Basic Robust Statistics

“Essential” Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book “Robust Statistics, Theory and Methods” by ‘Maronna, Martin and Yohai’; Wiley 2006.

1676

Multivariate Statistics

ROCR

Visualizing the Performance of Scoring Classifiers

ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of tradeoff visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoffparameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different crossvalidation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.

1677

Multivariate Statistics

rpart

Recursive Partitioning and Regression Trees

Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.

1678

Multivariate Statistics

rrcov

Scalable Robust Estimators with High Breakdown Point

Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point.

1679

Multivariate Statistics

sca

Simple Component Analysis

Simple Component Analysis (SCA) often provides much more interpretable components than Principal Components (PCA) while still representing much of the variability in the data.

1680

Multivariate Statistics

scatterplot3d

3D Scatter Plot

Plots a three dimensional (3D) point cloud.

1681

Multivariate Statistics

sem

Structural Equation Models

Functions for fitting general linear structural equation models (with observed and latent variables) using the RAM approach, and for fitting structural equations in observedvariable models by twostage least squares.

1682

Multivariate Statistics

SensoMineR

Sensory data analysis with R

an R package for analysing sensory data

1683

Multivariate Statistics

seriation

Infrastructure for Ordering Objects Using Seriation

Infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

1684

Multivariate Statistics

simba

A Collection of functions for similarity analysis of vegetation data

Besides functions for the calculation of similarity and multiple plot similarity measures with binary data (for instance presence/absence species data) the package contains some simple wrapper functions for reshaping species lists into matrices and vice versa and some other functions for further processing of similarity data (Mantellike permutation procedures) as well as some other useful stuff for vegetation analysis.

1685

Multivariate Statistics

smatr

(Standardised) Major Axis Estimation and Testing Routines

This package provides methods of fitting bivariate lines in allometry using the major axis (MA) or standardised major axis (SMA), and for making inferences about such lines. The available methods of inference include confidence intervals and onesample tests for slope and elevation, testing for a common slope or elevation amongst several allometric lines, constructing a confidence interval for a common slope or elevation, and testing for no shift along a common axis, amongst several samples.

1686

Multivariate Statistics

sn

The SkewNormal and Related Distributions Such as the Skewt

Build and manipulate probability distributions of the skewnormal family and some related ones, notably the skewt family, and provide related statistical methods for data fitting and model diagnostics, in the univariate and the multivariate case.

1687

Multivariate Statistics

spam

SPArse Matrix

Set of functions for sparse matrix algebra. Differences with other sparse matrix packages are: (1) we only support (essentially) one sparse matrix format, (2) based on transparent and simple structure(s), (3) tailored for MCMC calculations within G(M)RF. (4) and it is fast and scalable (with the extension package spam64).

1688

Multivariate Statistics

SparseM

Sparse Linear Algebra

Some basic linear algebra functionality for sparse matrices is provided: including Cholesky decomposition and backsolving as well as standard R subsetting and Kronecker products.

1689

Multivariate Statistics

SpatialNP

Multivariate Nonparametric Methods Based on Spatial Signs and Ranks

Test and estimates of location, tests of independence, tests of sphericity and several estimates of shape all based on spatial signs, symmetrized signs, ranks and signed ranks. For details, see Oja and Randles (2004) doi:10.1214/088342304000000558 and Oja (2010) doi:10.1007/9781441904683.

1690

Multivariate Statistics

superpc

Supervised principal components

Supervised principal components for regression and survival analsysis. Especially useful for highdimnesional data, including microarray data.

1691

Multivariate Statistics

trimcluster

Cluster analysis with trimming

Trimmed kmeans clustering.

1692

Multivariate Statistics

tsfa

Time Series Factor Analysis

Extraction of Factors from Multivariate Time Series. See ?00tsfaIntro for more details.

1693

Multivariate Statistics

vcd

Visualizing Categorical Data

Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).

1694

Multivariate Statistics

vegan (core)

Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

1695

Multivariate Statistics

VGAM

Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. At the heart of it are the vector generalized linear and additive model (VGLM/VGAM) classes, and the book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) doi:10.1007/9781493928187 gives details of the statistical framework and VGAM package. Currently only fixedeffects models are implemented, i.e., no randomeffects models. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE, using Fisher scoring. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are datadriven VGLMs (i.e., with smoothing). The other classes are RRVGLMs (reducedrank VGLMs), quadratic RRVGLMs, reducedrank VGAMs, RCIMs (rowcolumn interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

1696

Multivariate Statistics

VIM

Visualization and Imputation of Missing Values

New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.

1697

Multivariate Statistics

xgobi

Interface to the XGobi and XGvis programs for graphical data analysis

Interface to the XGobi and XGvis programs for graphical data analysis.

1698

Multivariate Statistics

YaleToolkit

Data exploration tools from Yale University

This collection of data exploration tools was developed at Yale University for the graphical exploration of complex multivariate data; barcode and gpairs now have their own packages. The new big.read.table() provided here may be useful for large files when only a subset is needed.

1699

Natural Language Processing

alineR

Alignment of Phonetic Sequences Using the ‘ALINE’ Algorithm

Functions are provided to calculate the ‘ALINE’ Distance between words as per (Kondrak 2000) and (Downey, Hallmark, Cox, Norquest, & Lansing, 2008, doi:10.1080/09296170802326681). The score is based on phonetic features represented using the Unicodecompliant International Phonetic Alphabet (IPA). Parameterized features weights are used to determine the optimal alignment and functions are provided to estimate optimum values using a genetic algorithm and supervised learning. See (Downey, Sun, and Norquest 2017, https://journal.rproject.org/archive/2017/RJ2017005/index.html.

1700

Natural Language Processing

boilerpipeR

Interface to the Boilerpipe Java Library

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe (http://code.google.com/p/boilerpipe/) Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

1701

Natural Language Processing

corpora

Statistics and data sets for corpus frequency data

Utility functions and data sets for the statistical analysis of corpus frequency data, used in the SIGIL statistics course.

1702

Natural Language Processing

gsubfn

Utilities for strings and function arguments

gsubfn is like gsub but can take a replacement function or certain other objects instead of the replacement string. Matches and back references are input to the replacement function and replaced by the function output. gsubfn can be used to split strings based on content rather than delimiters and for quasiperlstyle string interpolation. The package also has facilities for translating formulas to functions and allowing such formulas in function calls instead of functions. This can be used with R functions such as apply, sapply, lapply, optim, integrate, xyplot, Filter and any other function that expects another function as an input argument or functions like cat or sql calls that may involve strings where substitution is desirable.

1703

Natural Language Processing

gutenbergr

Download and Process Public Domain Works from Project Gutenberg

Download and process public domain works in the Project Gutenberg collection http://www.gutenberg.org/. Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.

1704

Natural Language Processing

hunspell

HighPerformance Stemmer, Tokenizer, and Spell Checker for R

A spell checker and morphological analyzer library designed for languages with rich morphology and complex word compounding or character encoding. The package can check and analyze individual words as well as search for incorrect words within a text, latex, html or xml document. Use the ‘devtools’ package to spell check R documentation with ‘hunspell’.

1705

Natural Language Processing

kernlab

KernelBased Machine Learning Lab

Kernelbased machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

1706

Natural Language Processing

KoNLP

Korean NLP Package

POS Tagger and Morphological Analyzer for Korean text based research. It provides tools for corpus linguistics research such as Keystroke converter, Hangul automata, Concordance, and Mutual Information. It also provides a convenient interface for users to apply, edit and add morphological dictionary selectively.

1707

Natural Language Processing

koRpus

An R Package for Text Analysis

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HDD/vocdD, MTLD) and readability (e.g., Flesch, SMOG, LIX, DaleChall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tfidf. Support for additional languages can be added onthefly or by plugin packages. Note: For full functionality a local installation of TreeTagger is recommended. ‘koRpus’ also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package ‘rkward’ cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from https://rkward.kde.org (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, request features, or discuss the development of the package, please subscribe to the koRpusdev mailing list (http://korpusml.reaktanz.de).

1708

Natural Language Processing

languageR

Data sets and functions with “Analyzing Linguistic Data: A practical introduction to statistics”

Data sets exemplifying statistical methods, and some facilitatory utility functions used in “Analyzing Linguistic Data: A practical introduction to statistics using R”, Cambridge University Press, 2008.

1709

Natural Language Processing

lda

Collapsed Gibbs Sampling Methods for Topic Models

Implements latent Dirichlet allocation (LDA) and related models. This includes (but is not limited to) sLDA, corrLDA, and the mixedmembership stochastic blockmodel. Inference for all of these models is implemented via a fast collapsed Gibbs sampler written in C. Utility functions for reading/writing data typically used in topic models, as well as tools for examining posterior distributions are also included.

1710

Natural Language Processing

lsa

Latent Semantic Analysis

The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a twomode factor analysis) over a given documentterm matrix, this variability problem can be overcome.

1711

Natural Language Processing

maxent

Lowmemory Multinomial Logistic Regression with Support for Text Classification

maxent is an R package with tools for lowmemory multinomial logistic regression, also known as maximum entropy. The focus of this maximum entropy classifier is to minimize memory consumption on very large datasets, particularly sparse documentterm matrices represented by the tm package. The classifier is based on an efficient C++ implementation written by Dr. Yoshimasa Tsuruoka.

1712

Natural Language Processing

monkeylearn

Accesses the Monkeylearn API for Text Classifiers and Extractors

Allows using some services of Monkeylearn http://monkeylearn.com/ which is a Machine Learning platform on the cloud for text analysis (classification and extraction).

1713

Natural Language Processing

movMF

Mixtures of von MisesFisher Distributions

Fit and simulate mixtures of von MisesFisher distributions.

1714

Natural Language Processing

mscstexta4r

R Client for the Microsoft Cognitive Services Text Analytics REST API

R Client for the Microsoft Cognitive Services Text Analytics REST API, including Sentiment Analysis, Topic Detection, Language Detection, and Key Phrase Extraction. An account MUST be registered at the Microsoft Cognitive Services website https://www.microsoft.com/cognitiveservices/ in order to obtain a (free) API key. Without an API key, this package will not work properly.

1715

Natural Language Processing

mscsweblm4r

R Client for the Microsoft Cognitive Services Web Language Model REST API

R Client for the Microsoft Cognitive Services Web Language Model REST API, including Break Into Words, Calculate Conditional Probability, Calculate Joint Probability, Generate Next Words, and List Available Models. A valid account MUST be registered at the Microsoft Cognitive Services website https://www.microsoft.com/cognitiveservices/ in order to obtain a (free) API key. Without an API key, this package will not work properly.

1716

Natural Language Processing

openNLP

Apache OpenNLP Tools Interface

An interface to the Apache OpenNLP tools (version 1.5.3). The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text written in Java. It supports the most common NLP tasks, such as tokenization, sentence segmentation, partofspeech tagging, named entity extraction, chunking, parsing, and coreference resolution. See http://opennlp.apache.org/ for more information.

1717

Natural Language Processing

ore

An R Interface to the Onigmo Regular Expression Library

Provides an alternative to R’s builtin functionality for handling regular expressions, based on the Onigmo library. Offers firstclass compiled regex objects, partial matching and functionbased substitutions, amongst other features.

1718

Natural Language Processing

phonics

Phonetic Spelling Algorithms

Provides a collection of phonetic algorithms including Soundex, Metaphone, NYSIIS, Caverphone, and others.

1719

Natural Language Processing

phonics

Phonetic Spelling Algorithms

Provides a collection of phonetic algorithms including Soundex, Metaphone, NYSIIS, Caverphone, and others.

1720

Natural Language Processing

qdap

Bridging the Gap Between Qualitative Data and Quantitative Analysis

Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. ‘qdap’ is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.

1721

Natural Language Processing

quanteda

Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and ngrams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature cooccurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

1722

Natural Language Processing

RcmdrPlugin.temis

Graphical Integrated Text Mining Solution

An ‘R Commander’ plugin providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms cooccurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheetlike files, directories of raw text files, ‘Twitter’ queries, as well as from ‘Dow Jones Factiva’, ‘LexisNexis’, ‘Europresse’ and ‘Alceste’ files.

1723

Natural Language Processing

rel

Reliability Coefficients

Derives point estimates with confidence intervals for Bennett et als S, Cohen’s kappa, Conger’s kappa, Fleiss’ kappa, Gwet’s AC, intraclass correlation coefficients, Krippendorff’s alpha, Scott’s pi, the standard error of measurement, and weighted kappa.

1724

Natural Language Processing

RKEA

R/KEA Interface

An R interface to KEA (Version 5.0). KEA (for Keyphrase Extraction Algorithm) allows for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary. For more information see http://www.nzdl.org/Kea/.

1725

Natural Language Processing

RTextTools

Automatic Text Classification via Supervised Learning

RTextTools is a machine learning package for automatic text classification that makes it simple for novice users to get started with machine learning, while allowing experienced users to easily experiment with different settings and algorithm combinations. The package includes nine algorithms for ensemble classification (svm, slda, boosting, bagging, random forests, glmnet, decision trees, neural networks, maximum entropy), comprehensive analytics, and thorough documentation.

1726

Natural Language Processing

RWeka

R/Weka Interface

An R interface to Weka (Version 3.9.1). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data preprocessing, classification, regression, clustering, association rules, and visualization. Package ‘RWeka’ contains the interface code, the Weka jar is in a separate package ‘RWekajars’. For more information on Weka see http://www.cs.waikato.ac.nz/ml/weka/.

1727

Natural Language Processing

skmeans

Spherical kMeans Clustering

Algorithms to compute spherical kmeans partitions. Features several methods, including a genetic and a fixedpoint algorithm and an interface to the CLUTO vcluster program.

1728

Natural Language Processing

SnowballC

Snowball stemmers based on the C libstemmer UTF8 library

An R interface to the C libstemmer library that implements Porter’s word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.

1729

Natural Language Processing

stm

Estimation of the Structural Topic Model

The Structural Topic Model (STM) allows researchers to estimate topic models with documentlevel covariates. The package also includes tools for model selection, visualization, and estimation of topiccovariate regressions.

1730

Natural Language Processing

stringdist

Approximate String Matching and String Distance Functions

Implements an approximate string matching version of R’s native ‘match’ function. Can calculate various string distances based on edits (DamerauLevenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q gram, cosine, jaccard distance) or heuristic metrics (Jaro, JaroWinkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences.

1731

Natural Language Processing

stringi

Character String Processing Facilities

Allows for fast, correct, consistent, portable, as well as convenient character string/text processing in every locale and any native encoding. Owing to the use of the ICU library, the package provides R users with platformindependent functions known to Java, Perl, Python, PHP, and Ruby programmers. Available features include: pattern searching (e.g., with ICU Javalike regular expressions or the Unicode Collation Algorithm), random string generation, case mapping, string transliteration, concatenation, Unicode normalization, datetime formatting and parsing, etc.

1732

Natural Language Processing

tau

Text Analysis Utilities

Utilities for text analysis.

1733

Natural Language Processing

tesseract

Open Source OCR Engine for R

Bindings to ‘Tesseract’: An OCR engine with unicode (UTF8) support that can recognize over 100 languages out of the box.

1734

Natural Language Processing

text2vec

Modern Text Mining Framework for R

Fast and memoryfriendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a sourceagnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

1735

Natural Language Processing

textcat

NGram Based Te 